2024-09-16 12:25:21,779 INFO [train.py:1266] (0/2) Training started 2024-09-16 12:25:21,782 INFO [train.py:1276] (0/2) Device: cuda:0 2024-09-16 12:25:21,784 INFO [train.py:1307] (0/2) Using dtype=torch.float16 2024-09-16 12:25:21,784 INFO [train.py:1308] (0/2) Use AMP=True 2024-09-16 12:25:21,784 INFO [train.py:1310] (0/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9f6206b565b833d71e19b4411493d04d99f0a308', 'k2-git-date': 'Thu Mar 28 09:46:54 2024', 'lhotse-version': '1.27.0', 'torch-version': '2.2.2+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': '07d6b123-dirty', 'icefall-git-date': 'Wed Sep 4 19:33:41 2024', 'icefall-path': '/zw/mnt/yaozengwei/workspace/icefall_cr_ctc', 'k2-path': '/root/anaconda3/envs/python3.10/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/envs/python3.10/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'NGK_zengwei'}, 'world_size': 2, 'master_port': 12341, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.1, 'cr_loss_scale': 0.02, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.9, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,4,5,4,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1536,2048,1536,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,512,768,512,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,320,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': False, 'use_ctc': True, 'use_attention_decoder': True, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1200, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-16 12:25:21,784 INFO [train.py:1312] (0/2) About to create model 2024-09-16 12:25:22,609 INFO [train.py:1316] (0/2) Number of model parameters: 174319650 2024-09-16 12:25:22,609 INFO [train.py:752] (0/2) num_frame_masks: 25.0, max_frames_mask_fraction: 0.375 2024-09-16 12:25:24,846 INFO [train.py:1338] (0/2) Using DDP 2024-09-16 12:25:26,129 INFO [asr_datamodule.py:436] (0/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-16 12:25:26,130 INFO [asr_datamodule.py:232] (0/2) Enable MUSAN 2024-09-16 12:25:26,130 INFO [asr_datamodule.py:233] (0/2) About to get Musan cuts 2024-09-16 12:25:27,754 INFO [asr_datamodule.py:279] (0/2) Disable SpecAugment 2024-09-16 12:25:27,754 INFO [asr_datamodule.py:281] (0/2) About to create train dataset 2024-09-16 12:25:27,754 INFO [asr_datamodule.py:308] (0/2) Using DynamicBucketingSampler. 2024-09-16 12:25:28,567 INFO [asr_datamodule.py:325] (0/2) About to create train dataloader 2024-09-16 12:25:28,568 INFO [asr_datamodule.py:453] (0/2) About to get dev-clean cuts 2024-09-16 12:25:28,569 INFO [asr_datamodule.py:460] (0/2) About to get dev-other cuts 2024-09-16 12:25:28,570 INFO [asr_datamodule.py:356] (0/2) About to create dev dataset 2024-09-16 12:25:28,729 INFO [asr_datamodule.py:373] (0/2) About to create dev dataloader 2024-09-16 12:25:28,729 INFO [train.py:1545] (0/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-16 12:28:13,904 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 46330MB 2024-09-16 12:28:15,779 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 46406MB 2024-09-16 12:28:17,886 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 46728MB 2024-09-16 12:28:19,115 INFO [scaling.py:1024] (0/2) Whitening: name=None, num_groups=1, num_channels=512, metric=119.30 vs. limit=7.5 2024-09-16 12:28:20,062 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 47362MB 2024-09-16 12:28:22,390 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 47362MB 2024-09-16 12:28:23,765 INFO [scaling.py:1024] (0/2) Whitening: name=None, num_groups=1, num_channels=512, metric=197.33 vs. limit=7.5 2024-09-16 12:28:24,625 INFO [train.py:1576] (0/2) Maximum memory allocated so far is 47362MB 2024-09-16 12:28:54,373 INFO [train.py:1198] (0/2) Epoch 1, batch 0, loss[loss=8.249, ctc_loss=4.732, cr_loss=0.5653, attn_decoder_loss=8.627, over 29639.00 frames. ], tot_loss[loss=8.249, ctc_loss=4.732, cr_loss=0.5653, attn_decoder_loss=8.627, over 29639.00 frames. ], batch size: 73, lr: 2.25e-02, grad_scale: 2.0 2024-09-16 12:28:54,374 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 12:29:13,593 INFO [train.py:1230] (0/2) Epoch 1, validation: loss=8.234, ctc_loss=4.87, cr_loss=1.182e-15, attn_decoder_loss=8.607, over 944034.00 frames. 2024-09-16 12:29:13,594 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 47369MB 2024-09-16 12:29:23,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.29 vs. limit=5.0 2024-09-16 12:29:24,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=0.0, ans=7.5 2024-09-16 12:29:31,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.329e+03 2.556e+03 3.012e+03 3.068e+03 4.530e+03, threshold=1.205e+04, percent-clipped=0.0 2024-09-16 12:29:40,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=40.0, ans=0.498125 2024-09-16 12:29:43,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=76.71 vs. limit=7.515 2024-09-16 12:29:46,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=7.515 2024-09-16 12:29:51,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.743e+03 2.098e+03 2.580e+03 3.037e+03 5.426e+03, threshold=1.032e+04, percent-clipped=0.0 2024-09-16 12:29:57,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=21.92 vs. limit=7.53 2024-09-16 12:30:02,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=80.0, ans=5.05 2024-09-16 12:30:10,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=20.74 vs. limit=7.545 2024-09-16 12:30:12,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.11 vs. limit=7.59 2024-09-16 12:30:13,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=120.0, ans=0.8958 2024-09-16 12:30:23,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=120.0, ans=0.09730000000000001 2024-09-16 12:30:23,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=7.545 2024-09-16 12:30:24,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=7.59 2024-09-16 12:30:27,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=7.56 2024-09-16 12:30:28,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.091e+02 1.328e+03 1.895e+03 2.580e+03 5.426e+03, threshold=7.580e+03, percent-clipped=0.0 2024-09-16 12:30:28,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=160.0, ans=0.004 2024-09-16 12:30:33,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.16 vs. limit=7.62 2024-09-16 12:30:38,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=21.88 vs. limit=4.064 2024-09-16 12:30:42,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=160.0, ans=0.8944 2024-09-16 12:30:43,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=160.0, ans=0.4925 2024-09-16 12:30:47,303 INFO [train.py:1198] (0/2) Epoch 1, batch 50, loss[loss=1.73, ctc_loss=1.098, cr_loss=0.1849, attn_decoder_loss=1.796, over 29435.00 frames. ], tot_loss[loss=3.654, ctc_loss=1.997, cr_loss=0.2521, attn_decoder_loss=3.832, over 1268774.28 frames. ], batch size: 70, lr: 2.48e-02, grad_scale: 2.0 2024-09-16 12:30:48,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=79.26 vs. limit=5.1 2024-09-16 12:30:54,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=24.49 vs. limit=5.1 2024-09-16 12:31:01,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=4.08 2024-09-16 12:31:04,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=50.89 vs. limit=7.575 2024-09-16 12:31:08,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=240.0, ans=0.0946 2024-09-16 12:31:13,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=28.12 vs. limit=7.68 2024-09-16 12:31:17,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=63.66 vs. limit=5.12 2024-09-16 12:31:29,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=280.0, ans=5.175 2024-09-16 12:31:41,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=4.112 2024-09-16 12:31:43,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=17.16 vs. limit=7.62 2024-09-16 12:31:59,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=34.80 vs. limit=5.08 2024-09-16 12:32:13,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=360.0, ans=0.1865 2024-09-16 12:32:22,339 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.399e+02 6.271e+02 8.906e+02 1.633e+03 5.426e+03, threshold=1.781e+03, percent-clipped=0.0 2024-09-16 12:32:22,362 INFO [train.py:1198] (0/2) Epoch 1, batch 100, loss[loss=1.178, ctc_loss=1.143, cr_loss=0.1249, attn_decoder_loss=1.179, over 29550.00 frames. ], tot_loss[loss=2.444, ctc_loss=1.554, cr_loss=0.1861, attn_decoder_loss=2.539, over 2251495.81 frames. ], batch size: 76, lr: 2.70e-02, grad_scale: 4.0 2024-09-16 12:32:25,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=17.42 vs. limit=4.16 2024-09-16 12:32:25,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.86 vs. limit=5.1 2024-09-16 12:32:50,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=49.22 vs. limit=5.11 2024-09-16 12:32:59,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.98 vs. limit=7.83 2024-09-16 12:32:59,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=96.15 vs. limit=7.665 2024-09-16 12:33:01,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=20.03 vs. limit=4.192 2024-09-16 12:33:16,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=126.39 vs. limit=7.68 2024-09-16 12:33:17,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=480.0, ans=0.0892 2024-09-16 12:33:31,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.72 vs. limit=7.695 2024-09-16 12:33:31,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=7.695 2024-09-16 12:33:35,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=55.17 vs. limit=7.695 2024-09-16 12:33:56,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=560.0, ans=5.14 2024-09-16 12:33:58,497 INFO [train.py:1198] (0/2) Epoch 1, batch 150, loss[loss=0.9565, ctc_loss=1.064, cr_loss=0.113, attn_decoder_loss=0.942, over 29390.00 frames. ], tot_loss[loss=1.876, ctc_loss=1.394, cr_loss=0.1605, attn_decoder_loss=1.926, over 3046228.56 frames. ], batch size: 70, lr: 2.93e-02, grad_scale: 4.0 2024-09-16 12:33:59,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=39.38 vs. limit=5.3 2024-09-16 12:34:01,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=5.15 2024-09-16 12:34:01,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.12 vs. limit=5.15 2024-09-16 12:34:09,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=39.35 vs. limit=7.725 2024-09-16 12:34:10,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=21.43 vs. limit=7.725 2024-09-16 12:34:30,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=125.14 vs. limit=7.74 2024-09-16 12:34:31,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=7.74 2024-09-16 12:34:40,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=73.56 vs. limit=7.755 2024-09-16 12:34:45,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=212.97 vs. limit=5.34 2024-09-16 12:34:50,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=19.59 vs. limit=7.755 2024-09-16 12:34:55,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=680.0, ans=7.755 2024-09-16 12:34:56,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720.0, ans=0.2928 2024-09-16 12:34:58,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=176.79 vs. limit=8.04 2024-09-16 12:35:06,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.73 vs. limit=5.18 2024-09-16 12:35:07,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=720.0, ans=0.2928 2024-09-16 12:35:11,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720.0, ans=0.2928 2024-09-16 12:35:12,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=720.0, ans=4.288 2024-09-16 12:35:15,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=760.0, ans=0.20725 2024-09-16 12:35:16,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.53 vs. limit=5.38 2024-09-16 12:35:35,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.064e-01 2024-09-16 12:35:36,604 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.861e+02 2.379e+02 2.686e+02 3.220e+02 5.129e+02, threshold=5.373e+02, percent-clipped=0.0 2024-09-16 12:35:36,627 INFO [train.py:1198] (0/2) Epoch 1, batch 200, loss[loss=1.018, ctc_loss=1.183, cr_loss=0.1261, attn_decoder_loss=0.9971, over 27229.00 frames. ], tot_loss[loss=1.576, ctc_loss=1.315, cr_loss=0.147, attn_decoder_loss=1.602, over 3657771.64 frames. ], batch size: 124, lr: 3.15e-02, grad_scale: 8.0 2024-09-16 12:35:37,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=800.0, ans=0.5 2024-09-16 12:35:38,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800.0, ans=0.292 2024-09-16 12:35:43,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.90 vs. limit=5.4 2024-09-16 12:35:57,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=840.0, ans=0.460625 2024-09-16 12:36:06,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=8.13 vs. limit=4.336 2024-09-16 12:36:09,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=26.34 vs. limit=7.815 2024-09-16 12:36:16,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=880.0, ans=0.45875 2024-09-16 12:36:30,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=17.61 vs. limit=7.83 2024-09-16 12:36:33,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=8.19 2024-09-16 12:36:44,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.05 vs. limit=5.23 2024-09-16 12:36:49,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=148.19 vs. limit=7.845 2024-09-16 12:36:51,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.87 vs. limit=8.22 2024-09-16 12:36:53,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=960.0, ans=0.455 2024-09-16 12:37:00,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=960.0, ans=0.2904 2024-09-16 12:37:07,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=105.40 vs. limit=7.86 2024-09-16 12:37:11,732 INFO [train.py:1198] (0/2) Epoch 1, batch 250, loss[loss=1.039, ctc_loss=1.224, cr_loss=0.1252, attn_decoder_loss=1.016, over 29327.00 frames. ], tot_loss[loss=1.398, ctc_loss=1.273, cr_loss=0.1413, attn_decoder_loss=1.408, over 4140463.03 frames. ], batch size: 100, lr: 3.38e-02, grad_scale: 8.0 2024-09-16 12:37:12,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1000.0, ans=0.453125 2024-09-16 12:37:29,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.12 vs. limit=8.28 2024-09-16 12:37:35,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.46 vs. limit=8.28 2024-09-16 12:37:44,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.19 vs. limit=8.28 2024-09-16 12:37:48,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=4.416 2024-09-16 12:38:08,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.25 vs. limit=5.28 2024-09-16 12:38:18,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1120.0, ans=0.4475 2024-09-16 12:38:40,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=7.935 2024-09-16 12:38:43,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=62.83 vs. limit=5.58 2024-09-16 12:38:48,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.367e+02 1.660e+02 1.791e+02 1.982e+02 5.267e+02, threshold=3.582e+02, percent-clipped=0.0 2024-09-16 12:38:48,431 INFO [train.py:1198] (0/2) Epoch 1, batch 300, loss[loss=0.9728, ctc_loss=1.181, cr_loss=0.1256, attn_decoder_loss=0.9469, over 29541.00 frames. ], tot_loss[loss=1.274, ctc_loss=1.242, cr_loss=0.14, attn_decoder_loss=1.275, over 4509793.95 frames. ], batch size: 92, lr: 3.60e-02, grad_scale: 8.0 2024-09-16 12:38:48,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1200.0, ans=0.44375 2024-09-16 12:38:58,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=19.97 vs. limit=7.95 2024-09-16 12:38:59,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=11.94 vs. limit=7.95 2024-09-16 12:39:08,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=8.43 2024-09-16 12:39:09,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1240.0, ans=0.8566 2024-09-16 12:39:12,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.65 vs. limit=7.965 2024-09-16 12:39:14,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=8.43 2024-09-16 12:39:19,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=48.04 vs. limit=5.62 2024-09-16 12:39:23,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.58 vs. limit=4.496 2024-09-16 12:39:28,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=1280.0, ans=0.178 2024-09-16 12:39:29,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=188.46 vs. limit=7.98 2024-09-16 12:39:32,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=109.53 vs. limit=7.98 2024-09-16 12:39:41,356 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.256e+00 2024-09-16 12:39:46,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1320.0, ans=0.438125 2024-09-16 12:39:58,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=149.75 vs. limit=7.995 2024-09-16 12:40:02,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1320.0, ans=0.0703 2024-09-16 12:40:07,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1360.0, ans=0.43625 2024-09-16 12:40:15,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1360.0, ans=0.5 2024-09-16 12:40:21,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1360.0, ans=8.52 2024-09-16 12:40:24,358 INFO [train.py:1198] (0/2) Epoch 1, batch 350, loss[loss=0.8687, ctc_loss=1.051, cr_loss=0.1826, attn_decoder_loss=0.8444, over 29335.00 frames. ], tot_loss[loss=1.187, ctc_loss=1.219, cr_loss=0.1457, attn_decoder_loss=1.18, over 4794808.84 frames. ], batch size: 71, lr: 3.83e-02, grad_scale: 8.0 2024-09-16 12:40:25,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=5.35 2024-09-16 12:40:36,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=65.95 vs. limit=8.025 2024-09-16 12:40:38,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=8.55 2024-09-16 12:41:00,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=131.36 vs. limit=8.055 2024-09-16 12:41:06,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=8.61 2024-09-16 12:41:11,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=101.87 vs. limit=8.055 2024-09-16 12:41:25,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.72 vs. limit=8.64 2024-09-16 12:41:28,133 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.905e+00 2024-09-16 12:41:36,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.41 vs. limit=8.64 2024-09-16 12:41:45,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=8.085 2024-09-16 12:41:45,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=10.85 vs. limit=8.085 2024-09-16 12:41:47,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=8.085 2024-09-16 12:41:57,764 INFO [train.py:1198] (0/2) Epoch 1, batch 400, loss[loss=0.9182, ctc_loss=1.131, cr_loss=0.2054, attn_decoder_loss=0.8899, over 29724.00 frames. ], tot_loss[loss=1.119, ctc_loss=1.197, cr_loss=0.1579, attn_decoder_loss=1.106, over 5024820.18 frames. ], batch size: 82, lr: 4.05e-02, grad_scale: 8.0 2024-09-16 12:41:59,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 1.617e+02 1.838e+02 2.123e+02 1.289e+03, threshold=3.677e+02, percent-clipped=4.0 2024-09-16 12:42:04,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=8.1 2024-09-16 12:42:13,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.26 vs. limit=4.64 2024-09-16 12:42:20,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1640.0, ans=0.8426 2024-09-16 12:42:21,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.24 vs. limit=8.73 2024-09-16 12:42:28,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.54 vs. limit=5.82 2024-09-16 12:42:29,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=1640.0, ans=6.025 2024-09-16 12:42:32,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=8.73 2024-09-16 12:42:35,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=1680.0, ans=0.1555 2024-09-16 12:43:02,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1720.0, ans=0.419375 2024-09-16 12:43:03,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.47 vs. limit=5.0 2024-09-16 12:43:07,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=5.86 2024-09-16 12:43:33,845 INFO [train.py:1198] (0/2) Epoch 1, batch 450, loss[loss=0.952, ctc_loss=1.193, cr_loss=0.3137, attn_decoder_loss=0.9183, over 29659.00 frames. ], tot_loss[loss=1.067, ctc_loss=1.177, cr_loss=0.1732, attn_decoder_loss=1.051, over 5186771.85 frames. ], batch size: 83, lr: 4.28e-02, grad_scale: 8.0 2024-09-16 12:43:34,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1800.0, ans=0.0595 2024-09-16 12:43:36,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=8.85 2024-09-16 12:44:00,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=8.879999999999999 2024-09-16 12:44:22,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=45.06 vs. limit=8.205 2024-09-16 12:44:35,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=8.22 2024-09-16 12:44:40,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=30.37 vs. limit=8.22 2024-09-16 12:44:42,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=8.22 2024-09-16 12:44:44,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=42.86 vs. limit=8.22 2024-09-16 12:44:47,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1960.0, ans=0.408125 2024-09-16 12:44:51,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=1.88 vs. limit=3.294 2024-09-16 12:44:56,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=27.61 vs. limit=8.235 2024-09-16 12:44:56,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=8.97 2024-09-16 12:45:02,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.07 vs. limit=8.97 2024-09-16 12:45:04,718 INFO [train.py:1198] (0/2) Epoch 1, batch 500, loss[loss=0.9217, ctc_loss=1.132, cr_loss=0.2463, attn_decoder_loss=0.8929, over 29467.00 frames. ], tot_loss[loss=1.022, ctc_loss=1.155, cr_loss=0.1909, attn_decoder_loss=1.003, over 5330622.24 frames. ], batch size: 94, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:45:06,534 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.448e+02 1.612e+02 2.007e+02 3.487e+02, threshold=3.225e+02, percent-clipped=0.0 2024-09-16 12:45:12,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=2000.0, ans=0.8300000000000001 2024-09-16 12:45:18,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2000.0, ans=0.125 2024-09-16 12:45:24,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=2.27 vs. limit=4.816 2024-09-16 12:45:27,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2040.0, ans=0.404375 2024-09-16 12:45:27,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2040.0, ans=0.1235 2024-09-16 12:45:44,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.70 vs. limit=6.04 2024-09-16 12:45:46,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=8.28 2024-09-16 12:46:05,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2120.0, ans=0.400625 2024-09-16 12:46:05,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2120.0, ans=6.325 2024-09-16 12:46:07,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2120.0, ans=0.235 2024-09-16 12:46:22,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=35.78 vs. limit=8.31 2024-09-16 12:46:25,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.79 vs. limit=6.08 2024-09-16 12:46:31,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=23.54 vs. limit=8.31 2024-09-16 12:46:35,770 INFO [train.py:1198] (0/2) Epoch 1, batch 550, loss[loss=0.9017, ctc_loss=1.088, cr_loss=0.335, attn_decoder_loss=0.8736, over 28862.00 frames. ], tot_loss[loss=0.9882, ctc_loss=1.131, cr_loss=0.2114, attn_decoder_loss=0.9676, over 5424749.00 frames. ], batch size: 104, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:46:42,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=6.1 2024-09-16 12:47:37,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=2320.0, ans=0.113 2024-09-16 12:47:41,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=9.24 2024-09-16 12:47:50,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.18 vs. limit=4.944 2024-09-16 12:47:52,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.49 vs. limit=5.59 2024-09-16 12:48:01,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=9.27 2024-09-16 12:48:12,273 INFO [train.py:1198] (0/2) Epoch 1, batch 600, loss[loss=0.9046, ctc_loss=1.043, cr_loss=0.341, attn_decoder_loss=0.8816, over 29228.00 frames. ], tot_loss[loss=0.9594, ctc_loss=1.105, cr_loss=0.2344, attn_decoder_loss=0.938, over 5510636.73 frames. ], batch size: 100, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:48:13,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=8.4 2024-09-16 12:48:14,061 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.236e+02 1.461e+02 1.874e+02 1.065e+03, threshold=2.921e+02, percent-clipped=6.0 2024-09-16 12:48:14,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2400.0, ans=0.3875 2024-09-16 12:48:16,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.01 vs. limit=6.2 2024-09-16 12:48:18,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=9.3 2024-09-16 12:48:22,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=33.11 vs. limit=8.4 2024-09-16 12:48:23,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=26.52 vs. limit=8.4 2024-09-16 12:48:28,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2440.0, ans=0.8146 2024-09-16 12:48:39,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=43.27 vs. limit=8.415 2024-09-16 12:48:43,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.95 vs. limit=8.415 2024-09-16 12:48:48,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.90 vs. limit=6.24 2024-09-16 12:48:56,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2480.0, ans=0.2752 2024-09-16 12:49:02,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2480.0, ans=0.2752 2024-09-16 12:49:08,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.19 vs. limit=6.26 2024-09-16 12:49:23,987 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=5.768e-01 2024-09-16 12:49:29,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=8.46 2024-09-16 12:49:41,294 INFO [train.py:1198] (0/2) Epoch 1, batch 650, loss[loss=0.8576, ctc_loss=0.9698, cr_loss=0.3533, attn_decoder_loss=0.8373, over 29749.00 frames. ], tot_loss[loss=0.9314, ctc_loss=1.071, cr_loss=0.258, attn_decoder_loss=0.9101, over 5588098.12 frames. ], batch size: 81, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:49:47,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.62 vs. limit=9.45 2024-09-16 12:49:47,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.69 vs. limit=5.65 2024-09-16 12:49:51,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=8.475 2024-09-16 12:49:54,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2600.0, ans=0.1025 2024-09-16 12:50:00,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=8.49 2024-09-16 12:50:04,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=2640.0, ans=0.8076000000000001 2024-09-16 12:50:13,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2640.0, ans=0.0406 2024-09-16 12:50:17,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=5.072 2024-09-16 12:50:22,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=8.504999999999999 2024-09-16 12:50:26,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.78 vs. limit=5.072 2024-09-16 12:50:30,394 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=8.504999999999999 2024-09-16 12:50:36,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2720.0, ans=0.3725 2024-09-16 12:50:46,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=12.80 vs. limit=8.52 2024-09-16 12:50:49,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2720.0, ans=0.3725 2024-09-16 12:50:51,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=9.57 2024-09-16 12:51:09,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=2760.0, ans=0.2224 2024-09-16 12:51:09,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.89 vs. limit=9.57 2024-09-16 12:51:12,598 INFO [train.py:1198] (0/2) Epoch 1, batch 700, loss[loss=0.7817, ctc_loss=0.8814, cr_loss=0.337, attn_decoder_loss=0.7631, over 29510.00 frames. ], tot_loss[loss=0.9086, ctc_loss=1.043, cr_loss=0.2797, attn_decoder_loss=0.8875, over 5639107.19 frames. ], batch size: 76, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:51:14,379 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.893e+01 1.304e+02 1.539e+02 2.330e+02 9.417e+02, threshold=3.077e+02, percent-clipped=6.0 2024-09-16 12:51:17,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=9.6 2024-09-16 12:51:17,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=8.55 2024-09-16 12:51:24,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.54 vs. limit=5.12 2024-09-16 12:51:29,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=2840.0, ans=5.136 2024-09-16 12:51:43,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=8.565 2024-09-16 12:51:56,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.39 vs. limit=8.58 2024-09-16 12:52:07,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=2920.0, ans=0.36312500000000003 2024-09-16 12:52:15,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=8.595 2024-09-16 12:52:16,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=8.595 2024-09-16 12:52:20,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=9.69 2024-09-16 12:52:22,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=9.69 2024-09-16 12:52:32,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=10.05 vs. limit=8.61 2024-09-16 12:52:33,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2960.0, ans=0.36124999999999996 2024-09-16 12:52:34,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=11.60 vs. limit=9.72 2024-09-16 12:52:38,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2960.0, ans=0.5 2024-09-16 12:52:43,665 INFO [train.py:1198] (0/2) Epoch 1, batch 750, loss[loss=0.7794, ctc_loss=0.9116, cr_loss=0.3059, attn_decoder_loss=0.7579, over 29731.00 frames. ], tot_loss[loss=0.8792, ctc_loss=1.009, cr_loss=0.2929, attn_decoder_loss=0.8582, over 5677686.06 frames. ], batch size: 82, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 12:52:50,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=8.625 2024-09-16 12:53:08,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3040.0, ans=0.09899494936611666 2024-09-16 12:53:09,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=8.64 2024-09-16 12:53:10,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3040.0, ans=0.2696 2024-09-16 12:53:11,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=8.64 2024-09-16 12:53:16,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=9.78 2024-09-16 12:53:23,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=9.81 2024-09-16 12:53:24,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3080.0, ans=0.2692 2024-09-16 12:53:27,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.64 vs. limit=5.232 2024-09-16 12:53:31,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=3080.0, ans=0.07675000000000001 2024-09-16 12:53:32,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=8.655 2024-09-16 12:53:44,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=8.67 2024-09-16 12:53:48,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.38 vs. limit=5.78 2024-09-16 12:53:51,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=5.248 2024-09-16 12:54:04,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=9.870000000000001 2024-09-16 12:54:07,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=8.685 2024-09-16 12:54:14,212 INFO [train.py:1198] (0/2) Epoch 1, batch 800, loss[loss=0.6445, ctc_loss=0.7679, cr_loss=0.3144, attn_decoder_loss=0.6239, over 29588.00 frames. ], tot_loss[loss=0.8448, ctc_loss=0.9749, cr_loss=0.3011, attn_decoder_loss=0.8237, over 5706838.24 frames. ], batch size: 73, lr: 4.49e-02, grad_scale: 16.0 2024-09-16 12:54:15,974 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.537e+02 3.189e+02 4.432e+02 8.958e+02, threshold=6.378e+02, percent-clipped=52.0 2024-09-16 12:54:39,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=64.09 vs. limit=9.93 2024-09-16 12:54:50,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=3.492 2024-09-16 12:55:06,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3320.0, ans=0.344375 2024-09-16 12:55:13,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.56 vs. limit=6.66 2024-09-16 12:55:14,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=43.42 vs. limit=8.745000000000001 2024-09-16 12:55:43,350 INFO [train.py:1198] (0/2) Epoch 1, batch 850, loss[loss=0.7151, ctc_loss=0.8631, cr_loss=0.3715, attn_decoder_loss=0.6904, over 29691.00 frames. ], tot_loss[loss=0.8064, ctc_loss=0.9392, cr_loss=0.3067, attn_decoder_loss=0.7848, over 5736874.89 frames. ], batch size: 89, lr: 4.49e-02, grad_scale: 16.0 2024-09-16 12:55:47,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=8.775 2024-09-16 12:56:04,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=3440.0, ans=0.7796000000000001 2024-09-16 12:56:07,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3440.0, ans=0.07099999999999998 2024-09-16 12:56:25,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=10.11 2024-09-16 12:56:35,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3520.0, ans=8.82 2024-09-16 12:56:56,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3560.0, ans=0.333125 2024-09-16 12:57:04,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3560.0, ans=0.26439999999999997 2024-09-16 12:57:11,501 INFO [train.py:1198] (0/2) Epoch 1, batch 900, loss[loss=0.6111, ctc_loss=0.7578, cr_loss=0.355, attn_decoder_loss=0.5869, over 29601.00 frames. ], tot_loss[loss=0.7675, ctc_loss=0.9037, cr_loss=0.3125, attn_decoder_loss=0.7454, over 5741832.46 frames. ], batch size: 73, lr: 4.48e-02, grad_scale: 16.0 2024-09-16 12:57:13,149 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.606e+02 2.694e+02 3.422e+02 4.565e+02 1.517e+03, threshold=6.845e+02, percent-clipped=7.0 2024-09-16 12:57:18,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125 2024-09-16 12:57:23,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3600.0, ans=0.254 2024-09-16 12:57:44,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=10.26 2024-09-16 12:57:45,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3680.0, ans=0.06199999999999997 2024-09-16 12:57:54,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=19.71 vs. limit=8.879999999999999 2024-09-16 12:58:06,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=3720.0, ans=0.2558 2024-09-16 12:58:15,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=8.895 2024-09-16 12:58:31,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3760.0, ans=0.26239999999999997 2024-09-16 12:58:36,751 INFO [train.py:1198] (0/2) Epoch 1, batch 950, loss[loss=0.5455, ctc_loss=0.686, cr_loss=0.3595, attn_decoder_loss=0.5219, over 29515.00 frames. ], tot_loss[loss=0.7297, ctc_loss=0.8691, cr_loss=0.3182, attn_decoder_loss=0.7071, over 5742077.41 frames. ], batch size: 74, lr: 4.48e-02, grad_scale: 16.0 2024-09-16 12:58:45,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3800.0, ans=0.038125000000000006 2024-09-16 12:58:47,313 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=4.316e+01 2024-09-16 12:58:49,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3800.0, ans=0.767 2024-09-16 12:58:52,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3840.0, ans=0.055999999999999994 2024-09-16 12:59:00,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3840.0, ans=0.32 2024-09-16 12:59:10,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=8.955 2024-09-16 12:59:13,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3880.0, ans=10.41 2024-09-16 12:59:25,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=8.955 2024-09-16 12:59:52,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3960.0, ans=0.31437499999999996 2024-09-16 12:59:53,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.16 vs. limit=5.584 2024-09-16 12:59:54,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3960.0, ans=0.2104 2024-09-16 13:00:04,839 INFO [train.py:1198] (0/2) Epoch 1, batch 1000, loss[loss=0.521, ctc_loss=0.6487, cr_loss=0.3394, attn_decoder_loss=0.4993, over 29498.00 frames. ], tot_loss[loss=0.6954, ctc_loss=0.8361, cr_loss=0.3278, attn_decoder_loss=0.6725, over 5736726.55 frames. ], batch size: 77, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 13:00:08,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.514e+02 2.266e+02 2.878e+02 3.816e+02 1.272e+03, threshold=5.756e+02, percent-clipped=5.0 2024-09-16 13:00:27,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=10.53 2024-09-16 13:00:29,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=10.53 2024-09-16 13:00:51,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4080.0, ans=0.07 2024-09-16 13:00:52,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4080.0, ans=0.0 2024-09-16 13:00:56,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4120.0, ans=0.306875 2024-09-16 13:01:01,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=3.618 2024-09-16 13:01:31,759 INFO [train.py:1198] (0/2) Epoch 1, batch 1050, loss[loss=0.5391, ctc_loss=0.6715, cr_loss=0.3807, attn_decoder_loss=0.5159, over 29683.00 frames. ], tot_loss[loss=0.659, ctc_loss=0.7987, cr_loss=0.3373, attn_decoder_loss=0.636, over 5745240.66 frames. ], batch size: 85, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 13:01:47,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4240.0, ans=0.049 2024-09-16 13:02:13,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4280.0, ans=0.04883333333333333 2024-09-16 13:02:26,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4320.0, ans=0.2568 2024-09-16 13:02:33,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=9.120000000000001 2024-09-16 13:02:36,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4320.0, ans=0.2568 2024-09-16 13:02:55,794 INFO [train.py:1198] (0/2) Epoch 1, batch 1100, loss[loss=0.5469, ctc_loss=0.6524, cr_loss=0.3124, attn_decoder_loss=0.5282, over 29443.00 frames. ], tot_loss[loss=0.627, ctc_loss=0.7634, cr_loss=0.3467, attn_decoder_loss=0.6042, over 5757793.12 frames. ], batch size: 78, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 13:02:59,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.372e+02 1.990e+02 2.415e+02 3.242e+02 8.137e+02, threshold=4.830e+02, percent-clipped=5.0 2024-09-16 13:03:09,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4400.0, ans=0.266 2024-09-16 13:03:38,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4480.0, ans=0.29000000000000004 2024-09-16 13:04:08,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4560.0, ans=0.04949747468305833 2024-09-16 13:04:09,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=10.92 2024-09-16 13:04:12,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4560.0, ans=0.04766666666666667 2024-09-16 13:04:17,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4560.0, ans=0.28625 2024-09-16 13:04:20,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4600.0, ans=0.009869565217391305 2024-09-16 13:04:21,898 INFO [train.py:1198] (0/2) Epoch 1, batch 1150, loss[loss=0.4932, ctc_loss=0.6142, cr_loss=0.3888, attn_decoder_loss=0.4712, over 29480.00 frames. ], tot_loss[loss=0.601, ctc_loss=0.7334, cr_loss=0.3553, attn_decoder_loss=0.5784, over 5755990.57 frames. ], batch size: 78, lr: 4.47e-02, grad_scale: 8.0 2024-09-16 13:04:37,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4640.0, ans=0.2825 2024-09-16 13:04:59,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=11.01 2024-09-16 13:05:07,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4680.0, ans=0.280625 2024-09-16 13:05:15,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4720.0, ans=0.27875 2024-09-16 13:05:24,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.41 vs. limit=5.888 2024-09-16 13:05:47,983 INFO [train.py:1198] (0/2) Epoch 1, batch 1200, loss[loss=0.498, ctc_loss=0.5944, cr_loss=0.3762, attn_decoder_loss=0.479, over 29686.00 frames. ], tot_loss[loss=0.5781, ctc_loss=0.7058, cr_loss=0.3638, attn_decoder_loss=0.5558, over 5748996.29 frames. ], batch size: 85, lr: 4.47e-02, grad_scale: 16.0 2024-09-16 13:05:51,295 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.465e+02 1.904e+02 2.227e+02 2.860e+02 9.470e+02, threshold=4.454e+02, percent-clipped=3.0 2024-09-16 13:06:02,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=11.1 2024-09-16 13:06:05,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=4840.0, ans=0.2516 2024-09-16 13:06:23,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4880.0, ans=0.0 2024-09-16 13:06:24,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4880.0, ans=0.27125 2024-09-16 13:06:26,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4880.0, ans=0.04633333333333334 2024-09-16 13:06:29,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2024-09-16 13:06:54,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4960.0, ans=0.046 2024-09-16 13:07:10,530 INFO [train.py:1198] (0/2) Epoch 1, batch 1250, loss[loss=0.4894, ctc_loss=0.5835, cr_loss=0.4242, attn_decoder_loss=0.4695, over 29527.00 frames. ], tot_loss[loss=0.5566, ctc_loss=0.6783, cr_loss=0.3726, attn_decoder_loss=0.5348, over 5776031.90 frames. ], batch size: 92, lr: 4.47e-02, grad_scale: 16.0 2024-09-16 13:07:23,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=5000.0, ans=11.25 2024-09-16 13:07:35,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5040.0, ans=0.26375000000000004 2024-09-16 13:07:44,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.96 vs. limit=11.31 2024-09-16 13:07:45,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=5080.0, ans=0.045500000000000006 2024-09-16 13:08:07,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5120.0, ans=0.04533333333333334 2024-09-16 13:08:30,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5160.0, ans=0.009747826086956521 2024-09-16 13:08:34,875 INFO [train.py:1198] (0/2) Epoch 1, batch 1300, loss[loss=0.5028, ctc_loss=0.6052, cr_loss=0.4383, attn_decoder_loss=0.4817, over 28242.00 frames. ], tot_loss[loss=0.5369, ctc_loss=0.6519, cr_loss=0.3785, attn_decoder_loss=0.5157, over 5780082.60 frames. ], batch size: 111, lr: 4.47e-02, grad_scale: 16.0 2024-09-16 13:08:38,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.446e+02 1.796e+02 2.066e+02 2.551e+02 7.251e+02, threshold=4.131e+02, percent-clipped=4.0 2024-09-16 13:08:49,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.48 vs. limit=6.3 2024-09-16 13:08:51,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5240.0, ans=0.254375 2024-09-16 13:08:59,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.50 vs. limit=6.3100000000000005 2024-09-16 13:09:48,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=11.52 2024-09-16 13:09:49,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5360.0, ans=0.0 2024-09-16 13:09:53,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.73 vs. limit=6.34 2024-09-16 13:09:56,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5360.0, ans=0.24875000000000003 2024-09-16 13:09:59,471 INFO [train.py:1198] (0/2) Epoch 1, batch 1350, loss[loss=0.4506, ctc_loss=0.5237, cr_loss=0.4385, attn_decoder_loss=0.4327, over 29769.00 frames. ], tot_loss[loss=0.5194, ctc_loss=0.6272, cr_loss=0.3845, attn_decoder_loss=0.4989, over 5796444.63 frames. ], batch size: 81, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:10:23,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5440.0, ans=0.009686956521739131 2024-09-16 13:10:45,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.16 vs. limit=11.61 2024-09-16 13:11:04,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5560.0, ans=0.2834 2024-09-16 13:11:06,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5560.0, ans=0.239375 2024-09-16 13:11:07,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=5560.0, ans=0.025 2024-09-16 13:11:21,354 INFO [train.py:1198] (0/2) Epoch 1, batch 1400, loss[loss=0.4132, ctc_loss=0.4784, cr_loss=0.3803, attn_decoder_loss=0.3975, over 29581.00 frames. ], tot_loss[loss=0.5043, ctc_loss=0.6047, cr_loss=0.3896, attn_decoder_loss=0.4845, over 5807429.20 frames. ], batch size: 69, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:11:24,541 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.351e+02 1.700e+02 1.984e+02 2.487e+02 6.195e+02, threshold=3.968e+02, percent-clipped=5.0 2024-09-16 13:11:24,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=5600.0, ans=0.025 2024-09-16 13:11:35,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.11 vs. limit=11.7 2024-09-16 13:11:38,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=11.73 2024-09-16 13:11:47,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=9.615 2024-09-16 13:12:08,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5680.0, ans=0.23375 2024-09-16 13:12:13,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5720.0, ans=0.6998 2024-09-16 13:12:17,191 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:12:23,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=11.79 2024-09-16 13:12:31,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5760.0, ans=0.0 2024-09-16 13:12:38,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=11.82 2024-09-16 13:12:44,056 INFO [train.py:1198] (0/2) Epoch 1, batch 1450, loss[loss=0.4921, ctc_loss=0.5856, cr_loss=0.4361, attn_decoder_loss=0.472, over 29419.00 frames. ], tot_loss[loss=0.4929, ctc_loss=0.5873, cr_loss=0.3953, attn_decoder_loss=0.4737, over 5805179.68 frames. ], batch size: 94, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:12:52,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=5800.0, ans=0.2 2024-09-16 13:12:54,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5800.0, ans=0.22812500000000002 2024-09-16 13:12:57,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=6.45 2024-09-16 13:13:00,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5840.0, ans=0.0635 2024-09-16 13:13:29,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=5880.0, ans=0.025 2024-09-16 13:13:47,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5960.0, ans=0.22062500000000002 2024-09-16 13:13:53,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5960.0, ans=0.6914 2024-09-16 13:13:56,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=5960.0, ans=0.00957391304347826 2024-09-16 13:14:06,645 INFO [train.py:1198] (0/2) Epoch 1, batch 1500, loss[loss=0.4626, ctc_loss=0.5422, cr_loss=0.4061, attn_decoder_loss=0.4448, over 29638.00 frames. ], tot_loss[loss=0.4816, ctc_loss=0.5695, cr_loss=0.4004, attn_decoder_loss=0.4629, over 5804353.82 frames. ], batch size: 86, lr: 4.46e-02, grad_scale: 16.0 2024-09-16 13:14:07,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=12.0 2024-09-16 13:14:09,809 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.657e+02 1.840e+02 2.318e+02 6.248e+02, threshold=3.680e+02, percent-clipped=4.0 2024-09-16 13:14:24,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=6040.0, ans=0.04949747468305833 2024-09-16 13:14:36,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=9.765 2024-09-16 13:14:37,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=6080.0, ans=0.21500000000000002 2024-09-16 13:14:41,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=6080.0, ans=0.21500000000000002 2024-09-16 13:14:54,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=11.98 vs. limit=12.09 2024-09-16 13:15:01,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=3.918 2024-09-16 13:15:04,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=12.09 2024-09-16 13:15:12,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=6160.0, ans=0.2924 2024-09-16 13:15:27,847 INFO [train.py:1198] (0/2) Epoch 1, batch 1550, loss[loss=0.4641, ctc_loss=0.5356, cr_loss=0.4579, attn_decoder_loss=0.446, over 29531.00 frames. ], tot_loss[loss=0.4727, ctc_loss=0.5548, cr_loss=0.4039, attn_decoder_loss=0.4546, over 5779811.56 frames. ], batch size: 90, lr: 4.45e-02, grad_scale: 16.0 2024-09-16 13:15:29,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=6200.0, ans=0.20937499999999998 2024-09-16 13:15:33,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.71 vs. limit=9.825 2024-09-16 13:15:52,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=6240.0, ans=0.04066666666666667 2024-09-16 13:16:03,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=6280.0, ans=0.6802 2024-09-16 13:16:05,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=6280.0, ans=0.0405 2024-09-16 13:16:08,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=6280.0, ans=0.0405 2024-09-16 13:16:24,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=6320.0, ans=0.20375 2024-09-16 13:16:26,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=6320.0, ans=0.0 2024-09-16 13:16:27,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=3.948 2024-09-16 13:16:28,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.30 vs. limit=9.870000000000001 2024-09-16 13:16:38,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.33 vs. limit=9.885 2024-09-16 13:16:42,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=6360.0, ans=0.20187500000000003 2024-09-16 13:16:50,864 INFO [train.py:1198] (0/2) Epoch 1, batch 1600, loss[loss=0.4429, ctc_loss=0.4985, cr_loss=0.4625, attn_decoder_loss=0.4264, over 29661.00 frames. ], tot_loss[loss=0.4643, ctc_loss=0.5407, cr_loss=0.4076, attn_decoder_loss=0.4468, over 5760982.73 frames. ], batch size: 85, lr: 4.45e-02, grad_scale: 32.0 2024-09-16 13:16:53,976 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.353e+02 1.789e+02 2.003e+02 2.671e+02 7.111e+02, threshold=4.005e+02, percent-clipped=7.0 2024-09-16 13:16:55,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=6400.0, ans=0.2 2024-09-16 13:16:57,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=6400.0, ans=0.2 2024-09-16 13:16:58,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6400.0, ans=0.236 2024-09-16 13:17:03,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=6400.0, ans=0.2 2024-09-16 13:17:20,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=6440.0, ans=0.009469565217391304 2024-09-16 13:17:23,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=6480.0, ans=0.025 2024-09-16 13:17:34,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=6480.0, ans=0.19624999999999998 2024-09-16 13:18:08,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6560.0, ans=0.2344 2024-09-16 13:18:13,087 INFO [train.py:1198] (0/2) Epoch 1, batch 1650, loss[loss=0.4475, ctc_loss=0.5003, cr_loss=0.4588, attn_decoder_loss=0.4315, over 29704.00 frames. ], tot_loss[loss=0.4564, ctc_loss=0.527, cr_loss=0.4103, attn_decoder_loss=0.4395, over 5755977.87 frames. ], batch size: 89, lr: 4.45e-02, grad_scale: 32.0 2024-09-16 13:18:16,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=6600.0, ans=0.03916666666666667 2024-09-16 13:18:26,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=6600.0, ans=0.669 2024-09-16 13:18:38,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=6640.0, ans=0.025 2024-09-16 13:19:17,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=6760.0, ans=0.038500000000000006 2024-09-16 13:19:20,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=6760.0, ans=0.6634 2024-09-16 13:19:25,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=6760.0, ans=8.379999999999999 2024-09-16 13:19:26,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=6760.0, ans=0.025 2024-09-16 13:19:31,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6800.0, ans=0.23199999999999998 2024-09-16 13:19:32,657 INFO [train.py:1198] (0/2) Epoch 1, batch 1700, loss[loss=0.3626, ctc_loss=0.3917, cr_loss=0.3866, attn_decoder_loss=0.3508, over 29555.00 frames. ], tot_loss[loss=0.4476, ctc_loss=0.5114, cr_loss=0.4135, attn_decoder_loss=0.4313, over 5778848.91 frames. ], batch size: 69, lr: 4.44e-02, grad_scale: 16.0 2024-09-16 13:19:37,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.510e+02 1.749e+02 2.059e+02 5.300e+02, threshold=3.498e+02, percent-clipped=2.0 2024-09-16 13:19:52,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=6840.0, ans=0.009382608695652174 2024-09-16 13:19:53,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=6840.0, ans=0.179375 2024-09-16 13:20:04,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=6840.0, ans=0.04949747468305833 2024-09-16 13:20:10,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6880.0, ans=0.23120000000000002 2024-09-16 13:20:36,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=12.690000000000001 2024-09-16 13:20:44,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.35 vs. limit=8.48 2024-09-16 13:20:51,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6960.0, ans=0.2304 2024-09-16 13:20:54,500 INFO [train.py:1198] (0/2) Epoch 1, batch 1750, loss[loss=0.3629, ctc_loss=0.4045, cr_loss=0.3702, attn_decoder_loss=0.3501, over 29332.00 frames. ], tot_loss[loss=0.4402, ctc_loss=0.4982, cr_loss=0.4154, attn_decoder_loss=0.4245, over 5787860.01 frames. ], batch size: 67, lr: 4.44e-02, grad_scale: 16.0 2024-09-16 13:20:58,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.80 vs. limit=10.125 2024-09-16 13:21:53,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=7120.0, ans=0.009321739130434784 2024-09-16 13:21:59,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7160.0, ans=0.2284 2024-09-16 13:22:16,321 INFO [train.py:1198] (0/2) Epoch 1, batch 1800, loss[loss=0.4156, ctc_loss=0.4391, cr_loss=0.4392, attn_decoder_loss=0.4033, over 29697.00 frames. ], tot_loss[loss=0.4347, ctc_loss=0.4877, cr_loss=0.4187, attn_decoder_loss=0.4195, over 5789992.94 frames. ], batch size: 83, lr: 4.44e-02, grad_scale: 16.0 2024-09-16 13:22:17,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.76 vs. limit=12.9 2024-09-16 13:22:21,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.247e+02 1.571e+02 1.759e+02 2.049e+02 3.849e+02, threshold=3.518e+02, percent-clipped=1.0 2024-09-16 13:22:25,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=12.9 2024-09-16 13:22:29,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=7200.0, ans=0.03666666666666667 2024-09-16 13:22:29,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=12.9 2024-09-16 13:22:56,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7280.0, ans=0.2272 2024-09-16 13:23:02,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.86 vs. limit=12.99 2024-09-16 13:23:04,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=10.245000000000001 2024-09-16 13:23:19,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7360.0, ans=0.15500000000000003 2024-09-16 13:23:29,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.08 vs. limit=13.02 2024-09-16 13:23:35,125 INFO [train.py:1198] (0/2) Epoch 1, batch 1850, loss[loss=0.4266, ctc_loss=0.4562, cr_loss=0.4452, attn_decoder_loss=0.4134, over 29654.00 frames. ], tot_loss[loss=0.4286, ctc_loss=0.4763, cr_loss=0.4205, attn_decoder_loss=0.414, over 5796768.11 frames. ], batch size: 86, lr: 4.43e-02, grad_scale: 16.0 2024-09-16 13:23:38,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=7400.0, ans=0.009260869565217392 2024-09-16 13:23:40,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=7400.0, ans=0.025 2024-09-16 13:23:46,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=7400.0, ans=0.153125 2024-09-16 13:23:52,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=7440.0, ans=0.15125 2024-09-16 13:23:54,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=7440.0, ans=0.09899494936611666 2024-09-16 13:24:13,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=7480.0, ans=0.14937499999999998 2024-09-16 13:24:17,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=7480.0, ans=0.14937499999999998 2024-09-16 13:24:19,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=7480.0, ans=0.04949747468305833 2024-09-16 13:24:39,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=7560.0, ans=0.025 2024-09-16 13:24:53,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=7600.0, ans=0.025 2024-09-16 13:24:54,517 INFO [train.py:1198] (0/2) Epoch 1, batch 1900, loss[loss=0.4147, ctc_loss=0.439, cr_loss=0.4387, attn_decoder_loss=0.4022, over 29721.00 frames. ], tot_loss[loss=0.4249, ctc_loss=0.4683, cr_loss=0.4239, attn_decoder_loss=0.4107, over 5804683.10 frames. ], batch size: 89, lr: 4.43e-02, grad_scale: 16.0 2024-09-16 13:24:59,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.597e+02 1.785e+02 2.217e+02 4.479e+02, threshold=3.571e+02, percent-clipped=3.0 2024-09-16 13:25:07,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=13.2 2024-09-16 13:25:13,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=7640.0, ans=0.14187499999999997 2024-09-16 13:26:05,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=7760.0, ans=0.13624999999999998 2024-09-16 13:26:05,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.75 vs. limit=13.32 2024-09-16 13:26:14,696 INFO [train.py:1198] (0/2) Epoch 1, batch 1950, loss[loss=0.4166, ctc_loss=0.4456, cr_loss=0.4555, attn_decoder_loss=0.4033, over 29482.00 frames. ], tot_loss[loss=0.422, ctc_loss=0.4608, cr_loss=0.4279, attn_decoder_loss=0.4081, over 5819109.47 frames. ], batch size: 78, lr: 4.43e-02, grad_scale: 16.0 2024-09-16 13:26:19,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=7800.0, ans=0.627 2024-09-16 13:26:21,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=7800.0, ans=0.009173913043478261 2024-09-16 13:26:31,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.92 vs. limit=10.44 2024-09-16 13:26:54,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=13.41 2024-09-16 13:27:02,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=10.47 2024-09-16 13:27:22,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.00 vs. limit=6.99 2024-09-16 13:27:26,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7960.0, ans=0.22039999999999998 2024-09-16 13:27:28,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=7960.0, ans=9.975 2024-09-16 13:27:33,411 INFO [train.py:1198] (0/2) Epoch 1, batch 2000, loss[loss=0.3476, ctc_loss=0.3658, cr_loss=0.3484, attn_decoder_loss=0.3378, over 29328.00 frames. ], tot_loss[loss=0.4192, ctc_loss=0.4547, cr_loss=0.4296, attn_decoder_loss=0.4057, over 5797471.64 frames. ], batch size: 67, lr: 4.42e-02, grad_scale: 32.0 2024-09-16 13:27:38,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.451e+02 1.684e+02 2.248e+02 3.741e+02, threshold=3.368e+02, percent-clipped=1.0 2024-09-16 13:27:43,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=8000.0, ans=0.125 2024-09-16 13:28:01,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.28 vs. limit=10.515 2024-09-16 13:28:08,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8080.0, ans=0.125 2024-09-16 13:28:12,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=8080.0, ans=0.6172 2024-09-16 13:28:12,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8080.0, ans=0.125 2024-09-16 13:28:19,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=8080.0, ans=0.125 2024-09-16 13:28:24,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.64 vs. limit=9.059999999999999 2024-09-16 13:28:30,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=8120.0, ans=0.125 2024-09-16 13:28:41,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=8160.0, ans=0.125 2024-09-16 13:28:52,986 INFO [train.py:1198] (0/2) Epoch 1, batch 2050, loss[loss=0.368, ctc_loss=0.3769, cr_loss=0.4089, attn_decoder_loss=0.358, over 29435.00 frames. ], tot_loss[loss=0.415, ctc_loss=0.4467, cr_loss=0.4293, attn_decoder_loss=0.402, over 5789589.48 frames. ], batch size: 70, lr: 4.42e-02, grad_scale: 16.0 2024-09-16 13:28:59,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=8200.0, ans=0.125 2024-09-16 13:29:08,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=8240.0, ans=0.125 2024-09-16 13:29:08,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=8240.0, ans=0.125 2024-09-16 13:29:11,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=8240.0, ans=0.125 2024-09-16 13:29:41,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=8320.0, ans=0.125 2024-09-16 13:29:45,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=8320.0, ans=0.025 2024-09-16 13:29:55,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.32 vs. limit=9.18 2024-09-16 13:30:01,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.51 vs. limit=9.18 2024-09-16 13:30:10,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.89 vs. limit=13.77 2024-09-16 13:30:10,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=8400.0, ans=0.125 2024-09-16 13:30:12,267 INFO [train.py:1198] (0/2) Epoch 1, batch 2100, loss[loss=0.3875, ctc_loss=0.3978, cr_loss=0.4165, attn_decoder_loss=0.3771, over 29795.00 frames. ], tot_loss[loss=0.4109, ctc_loss=0.4391, cr_loss=0.4306, attn_decoder_loss=0.3982, over 5801770.81 frames. ], batch size: 81, lr: 4.42e-02, grad_scale: 16.0 2024-09-16 13:30:16,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.61 vs. limit=13.8 2024-09-16 13:30:18,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.523e+02 1.725e+02 2.064e+02 6.365e+02, threshold=3.449e+02, percent-clipped=2.0 2024-09-16 13:30:35,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=8440.0, ans=0.125 2024-09-16 13:30:38,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=8440.0, ans=0.009034782608695653 2024-09-16 13:30:45,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=10.68 2024-09-16 13:31:16,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=10.71 2024-09-16 13:31:17,483 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:31:19,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=8560.0, ans=0.125 2024-09-16 13:31:29,553 INFO [train.py:1198] (0/2) Epoch 1, batch 2150, loss[loss=0.394, ctc_loss=0.4058, cr_loss=0.4266, attn_decoder_loss=0.3832, over 29461.00 frames. ], tot_loss[loss=0.4068, ctc_loss=0.4317, cr_loss=0.4312, attn_decoder_loss=0.3944, over 5816398.39 frames. ], batch size: 78, lr: 4.41e-02, grad_scale: 16.0 2024-09-16 13:31:29,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=8600.0, ans=0.009000000000000001 2024-09-16 13:31:34,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=8600.0, ans=0.0 2024-09-16 13:31:42,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=8600.0, ans=0.030833333333333338 2024-09-16 13:31:56,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=8640.0, ans=0.125 2024-09-16 13:32:06,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=8680.0, ans=0.008982608695652174 2024-09-16 13:32:09,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=8680.0, ans=0.125 2024-09-16 13:32:26,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=10.77 2024-09-16 13:32:38,699 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:32:43,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=10.785 2024-09-16 13:32:49,790 INFO [train.py:1198] (0/2) Epoch 1, batch 2200, loss[loss=0.4029, ctc_loss=0.4048, cr_loss=0.4228, attn_decoder_loss=0.3933, over 29615.00 frames. ], tot_loss[loss=0.4045, ctc_loss=0.4266, cr_loss=0.4323, attn_decoder_loss=0.3924, over 5813094.68 frames. ], batch size: 86, lr: 4.41e-02, grad_scale: 16.0 2024-09-16 13:32:55,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.455e+02 1.695e+02 2.050e+02 4.766e+02, threshold=3.390e+02, percent-clipped=3.0 2024-09-16 13:33:00,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=8800.0, ans=0.008956521739130436 2024-09-16 13:33:16,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=8840.0, ans=0.125 2024-09-16 13:33:17,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8840.0, ans=0.2116 2024-09-16 13:33:55,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8960.0, ans=0.2104 2024-09-16 13:33:56,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=8960.0, ans=0.5864 2024-09-16 13:33:59,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8960.0, ans=0.125 2024-09-16 13:34:01,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=8960.0, ans=0.5864 2024-09-16 13:34:09,150 INFO [train.py:1198] (0/2) Epoch 1, batch 2250, loss[loss=0.4089, ctc_loss=0.4177, cr_loss=0.4708, attn_decoder_loss=0.3974, over 29707.00 frames. ], tot_loss[loss=0.402, ctc_loss=0.4211, cr_loss=0.4331, attn_decoder_loss=0.3903, over 5811389.82 frames. ], batch size: 82, lr: 4.40e-02, grad_scale: 16.0 2024-09-16 13:34:13,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=9000.0, ans=0.02916666666666667 2024-09-16 13:34:34,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=9040.0, ans=0.5836 2024-09-16 13:34:36,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=9040.0, ans=0.029 2024-09-16 13:34:56,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.84 vs. limit=9.559999999999999 2024-09-16 13:35:23,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=9160.0, ans=0.008878260869565217 2024-09-16 13:35:26,223 INFO [train.py:1198] (0/2) Epoch 1, batch 2300, loss[loss=0.3703, ctc_loss=0.3731, cr_loss=0.4347, attn_decoder_loss=0.3604, over 29298.00 frames. ], tot_loss[loss=0.398, ctc_loss=0.4143, cr_loss=0.4329, attn_decoder_loss=0.3866, over 5798473.97 frames. ], batch size: 71, lr: 4.40e-02, grad_scale: 16.0 2024-09-16 13:35:26,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=9200.0, ans=0.20800000000000002 2024-09-16 13:35:31,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=9200.0, ans=0.125 2024-09-16 13:35:32,248 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.236e+02 1.494e+02 1.712e+02 1.992e+02 4.170e+02, threshold=3.424e+02, percent-clipped=4.0 2024-09-16 13:35:53,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.92 vs. limit=10.965 2024-09-16 13:36:15,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.88 vs. limit=9.66 2024-09-16 13:36:17,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=9320.0, ans=0.1568 2024-09-16 13:36:23,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=24.39 vs. limit=14.49 2024-09-16 13:36:27,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.92 vs. limit=14.49 2024-09-16 13:36:45,226 INFO [train.py:1198] (0/2) Epoch 1, batch 2350, loss[loss=0.4071, ctc_loss=0.4221, cr_loss=0.4346, attn_decoder_loss=0.3958, over 29701.00 frames. ], tot_loss[loss=0.3958, ctc_loss=0.4096, cr_loss=0.4339, attn_decoder_loss=0.3847, over 5803444.36 frames. ], batch size: 83, lr: 4.40e-02, grad_scale: 16.0 2024-09-16 13:36:51,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=9400.0, ans=0.027500000000000004 2024-09-16 13:36:53,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=9400.0, ans=0.09899494936611666 2024-09-16 13:37:04,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=14.58 2024-09-16 13:37:06,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9440.0, ans=0.2056 2024-09-16 13:37:25,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=9480.0, ans=0.125 2024-09-16 13:37:52,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.50 vs. limit=14.67 2024-09-16 13:38:02,860 INFO [train.py:1198] (0/2) Epoch 1, batch 2400, loss[loss=0.3511, ctc_loss=0.3475, cr_loss=0.4182, attn_decoder_loss=0.3422, over 29532.00 frames. ], tot_loss[loss=0.3944, ctc_loss=0.406, cr_loss=0.4353, attn_decoder_loss=0.3834, over 5807653.97 frames. ], batch size: 76, lr: 4.39e-02, grad_scale: 32.0 2024-09-16 13:38:08,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9600.0, ans=0.125 2024-09-16 13:38:08,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.34 vs. limit=9.8 2024-09-16 13:38:10,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.192e+02 1.445e+02 1.624e+02 1.930e+02 3.418e+02, threshold=3.248e+02, percent-clipped=0.0 2024-09-16 13:38:14,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9600.0, ans=0.20400000000000001 2024-09-16 13:38:17,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9600.0, ans=0.125 2024-09-16 13:38:26,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=9640.0, ans=0.026500000000000003 2024-09-16 13:38:28,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=9640.0, ans=10.0 2024-09-16 13:38:35,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.66 vs. limit=11.129999999999999 2024-09-16 13:38:39,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=9680.0, ans=0.025 2024-09-16 13:38:39,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9680.0, ans=0.125 2024-09-16 13:38:48,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=9680.0, ans=0.5612 2024-09-16 13:38:56,389 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:39:04,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=9720.0, ans=11.145 2024-09-16 13:39:11,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=9760.0, ans=0.026000000000000002 2024-09-16 13:39:22,236 INFO [train.py:1198] (0/2) Epoch 1, batch 2450, loss[loss=0.4058, ctc_loss=0.4125, cr_loss=0.4606, attn_decoder_loss=0.3948, over 29700.00 frames. ], tot_loss[loss=0.3951, ctc_loss=0.4055, cr_loss=0.4372, attn_decoder_loss=0.3842, over 5783295.83 frames. ], batch size: 82, lr: 4.39e-02, grad_scale: 16.0 2024-09-16 13:40:04,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9880.0, ans=0.125 2024-09-16 13:40:06,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=11.205 2024-09-16 13:40:06,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=7.952 2024-09-16 13:40:18,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=9920.0, ans=11.22 2024-09-16 13:40:19,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=9920.0, ans=0.125 2024-09-16 13:40:34,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.71 vs. limit=4.494 2024-09-16 13:40:41,274 INFO [train.py:1198] (0/2) Epoch 1, batch 2500, loss[loss=0.3965, ctc_loss=0.3885, cr_loss=0.4856, attn_decoder_loss=0.3866, over 29609.00 frames. ], tot_loss[loss=0.3923, ctc_loss=0.4002, cr_loss=0.4385, attn_decoder_loss=0.3817, over 5793886.11 frames. ], batch size: 86, lr: 4.38e-02, grad_scale: 16.0 2024-09-16 13:40:43,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=4.5 2024-09-16 13:40:46,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=11.25 2024-09-16 13:40:47,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10000.0, ans=0.125 2024-09-16 13:40:48,949 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.428e+02 1.613e+02 1.938e+02 4.379e+02, threshold=3.227e+02, percent-clipped=3.0 2024-09-16 13:41:08,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=10040.0, ans=0.1996 2024-09-16 13:41:15,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=10080.0, ans=0.02 2024-09-16 13:41:28,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=10120.0, ans=0.008669565217391305 2024-09-16 13:41:34,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=10120.0, ans=0.5458000000000001 2024-09-16 13:41:36,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=10120.0, ans=0.008669565217391305 2024-09-16 13:41:37,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=10120.0, ans=0.008669565217391305 2024-09-16 13:42:01,547 INFO [train.py:1198] (0/2) Epoch 1, batch 2550, loss[loss=0.3372, ctc_loss=0.326, cr_loss=0.4025, attn_decoder_loss=0.3295, over 29361.00 frames. ], tot_loss[loss=0.3898, ctc_loss=0.3954, cr_loss=0.4382, attn_decoder_loss=0.3795, over 5796611.75 frames. ], batch size: 67, lr: 4.38e-02, grad_scale: 16.0 2024-09-16 13:42:17,318 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:42:52,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=11.370000000000001 2024-09-16 13:42:53,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.55 vs. limit=15.24 2024-09-16 13:43:05,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=10360.0, ans=0.125 2024-09-16 13:43:07,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=10360.0, ans=0.125 2024-09-16 13:43:08,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=10360.0, ans=0.023500000000000004 2024-09-16 13:43:19,971 INFO [train.py:1198] (0/2) Epoch 1, batch 2600, loss[loss=0.365, ctc_loss=0.3506, cr_loss=0.4333, attn_decoder_loss=0.3569, over 29426.00 frames. ], tot_loss[loss=0.3883, ctc_loss=0.3915, cr_loss=0.4387, attn_decoder_loss=0.3782, over 5794113.80 frames. ], batch size: 78, lr: 4.37e-02, grad_scale: 16.0 2024-09-16 13:43:21,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=10400.0, ans=0.023333333333333334 2024-09-16 13:43:29,539 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.430e+02 1.543e+02 1.954e+02 3.702e+02, threshold=3.087e+02, percent-clipped=5.0 2024-09-16 13:43:29,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=10400.0, ans=0.125 2024-09-16 13:43:30,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=11.4 2024-09-16 13:43:35,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.54 vs. limit=10.2 2024-09-16 13:43:46,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=10440.0, ans=0.125 2024-09-16 13:43:49,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=10440.0, ans=0.125 2024-09-16 13:44:04,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=10480.0, ans=0.125 2024-09-16 13:44:32,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=10560.0, ans=0.02266666666666667 2024-09-16 13:44:32,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=10560.0, ans=0.02266666666666667 2024-09-16 13:44:38,232 INFO [train.py:1198] (0/2) Epoch 1, batch 2650, loss[loss=0.3933, ctc_loss=0.3927, cr_loss=0.4503, attn_decoder_loss=0.3833, over 29336.00 frames. ], tot_loss[loss=0.387, ctc_loss=0.3885, cr_loss=0.4392, attn_decoder_loss=0.3771, over 5800734.95 frames. ], batch size: 100, lr: 4.37e-02, grad_scale: 16.0 2024-09-16 13:44:58,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=10640.0, ans=0.5276000000000001 2024-09-16 13:45:07,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=11.49 2024-09-16 13:45:28,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=10720.0, ans=0.125 2024-09-16 13:45:39,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.85 vs. limit=15.57 2024-09-16 13:45:48,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=10760.0, ans=0.021833333333333337 2024-09-16 13:45:49,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=10760.0, ans=0.125 2024-09-16 13:45:57,785 INFO [train.py:1198] (0/2) Epoch 1, batch 2700, loss[loss=0.4015, ctc_loss=0.3935, cr_loss=0.4598, attn_decoder_loss=0.3922, over 29511.00 frames. ], tot_loss[loss=0.3864, ctc_loss=0.3863, cr_loss=0.4408, attn_decoder_loss=0.3766, over 5796894.82 frames. ], batch size: 87, lr: 4.36e-02, grad_scale: 16.0 2024-09-16 13:46:01,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.74 vs. limit=15.6 2024-09-16 13:46:05,442 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.417e+02 1.675e+02 2.035e+02 4.386e+02, threshold=3.351e+02, percent-clipped=4.0 2024-09-16 13:46:13,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=10840.0, ans=0.021500000000000002 2024-09-16 13:46:26,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.63 2024-09-16 13:46:28,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10880.0, ans=0.125 2024-09-16 13:46:34,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=11.58 2024-09-16 13:46:41,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=10880.0, ans=0.125 2024-09-16 13:47:17,504 INFO [train.py:1198] (0/2) Epoch 1, batch 2750, loss[loss=0.3506, ctc_loss=0.3514, cr_loss=0.4193, attn_decoder_loss=0.3412, over 29501.00 frames. ], tot_loss[loss=0.3831, ctc_loss=0.3813, cr_loss=0.4401, attn_decoder_loss=0.3735, over 5796533.77 frames. ], batch size: 75, lr: 4.36e-02, grad_scale: 16.0 2024-09-16 13:47:20,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11000.0, ans=0.125 2024-09-16 13:47:52,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=11080.0, ans=0.125 2024-09-16 13:47:54,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.42 vs. limit=15.81 2024-09-16 13:47:55,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=11.655000000000001 2024-09-16 13:47:57,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=11080.0, ans=0.125 2024-09-16 13:48:13,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=11120.0, ans=0.1888 2024-09-16 13:48:13,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11120.0, ans=0.1888 2024-09-16 13:48:14,698 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:48:18,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=11.684999999999999 2024-09-16 13:48:30,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11160.0, ans=0.18839999999999998 2024-09-16 13:48:35,663 INFO [train.py:1198] (0/2) Epoch 1, batch 2800, loss[loss=0.4267, ctc_loss=0.445, cr_loss=0.462, attn_decoder_loss=0.4144, over 20050.00 frames. ], tot_loss[loss=0.3825, ctc_loss=0.3798, cr_loss=0.4405, attn_decoder_loss=0.3731, over 5776526.34 frames. ], batch size: 209, lr: 4.36e-02, grad_scale: 32.0 2024-09-16 13:48:39,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=11200.0, ans=15.9 2024-09-16 13:48:43,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.389e+02 1.617e+02 2.129e+02 5.220e+02, threshold=3.235e+02, percent-clipped=5.0 2024-09-16 13:49:09,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=11280.0, ans=0.008417391304347826 2024-09-16 13:49:14,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11280.0, ans=0.125 2024-09-16 13:49:18,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=11280.0, ans=0.0 2024-09-16 13:49:31,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=11320.0, ans=0.008408695652173913 2024-09-16 13:49:54,514 INFO [train.py:1198] (0/2) Epoch 1, batch 2850, loss[loss=0.3485, ctc_loss=0.3314, cr_loss=0.4234, attn_decoder_loss=0.341, over 29514.00 frames. ], tot_loss[loss=0.3819, ctc_loss=0.3781, cr_loss=0.4413, attn_decoder_loss=0.3725, over 5763794.66 frames. ], batch size: 77, lr: 4.35e-02, grad_scale: 32.0 2024-09-16 13:50:13,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=11440.0, ans=0.49960000000000004 2024-09-16 13:50:16,551 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:50:27,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11480.0, ans=0.18519999999999998 2024-09-16 13:50:27,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=11.805 2024-09-16 13:50:28,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=11480.0, ans=0.008373913043478261 2024-09-16 13:50:31,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11480.0, ans=0.18519999999999998 2024-09-16 13:50:36,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=11480.0, ans=0.125 2024-09-16 13:50:47,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.12 vs. limit=7.88 2024-09-16 13:50:49,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=11.82 2024-09-16 13:51:13,785 INFO [train.py:1198] (0/2) Epoch 1, batch 2900, loss[loss=0.3713, ctc_loss=0.3634, cr_loss=0.4603, attn_decoder_loss=0.3619, over 29441.00 frames. ], tot_loss[loss=0.3817, ctc_loss=0.3763, cr_loss=0.4428, attn_decoder_loss=0.3725, over 5788267.97 frames. ], batch size: 79, lr: 4.35e-02, grad_scale: 16.0 2024-09-16 13:51:22,927 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.352e+02 1.492e+02 1.728e+02 4.022e+02, threshold=2.985e+02, percent-clipped=1.0 2024-09-16 13:51:28,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=4.746 2024-09-16 13:51:31,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=11640.0, ans=0.4926000000000001 2024-09-16 13:51:44,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=11680.0, ans=0.125 2024-09-16 13:51:58,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=11720.0, ans=0.125 2024-09-16 13:52:08,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.66 vs. limit=16.29 2024-09-16 13:52:11,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=11720.0, ans=0.05 2024-09-16 13:52:17,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11760.0, ans=0.125 2024-09-16 13:52:17,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11760.0, ans=0.1824 2024-09-16 13:52:30,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=8.719999999999999 2024-09-16 13:52:30,705 INFO [train.py:1198] (0/2) Epoch 1, batch 2950, loss[loss=0.3615, ctc_loss=0.3459, cr_loss=0.4431, attn_decoder_loss=0.3534, over 29510.00 frames. ], tot_loss[loss=0.3784, ctc_loss=0.3716, cr_loss=0.4404, attn_decoder_loss=0.3694, over 5782927.42 frames. ], batch size: 75, lr: 4.34e-02, grad_scale: 16.0 2024-09-16 13:52:40,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.97 vs. limit=16.35 2024-09-16 13:52:55,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=11840.0, ans=0.01733333333333334 2024-09-16 13:53:04,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=4.782 2024-09-16 13:53:04,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=11.955 2024-09-16 13:53:15,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=11920.0, ans=0.4828 2024-09-16 13:53:18,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=11920.0, ans=0.017 2024-09-16 13:53:34,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11960.0, ans=0.0 2024-09-16 13:53:50,812 INFO [train.py:1198] (0/2) Epoch 1, batch 3000, loss[loss=0.3696, ctc_loss=0.3511, cr_loss=0.4192, attn_decoder_loss=0.3623, over 29753.00 frames. ], tot_loss[loss=0.3774, ctc_loss=0.3694, cr_loss=0.4412, attn_decoder_loss=0.3685, over 5784371.32 frames. ], batch size: 81, lr: 4.34e-02, grad_scale: 16.0 2024-09-16 13:53:50,813 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 13:54:09,120 INFO [train.py:1230] (0/2) Epoch 1, validation: loss=0.2655, ctc_loss=0.1548, cr_loss=4.113e-15, attn_decoder_loss=0.2778, over 944034.00 frames. 2024-09-16 13:54:09,121 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 13:54:09,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=12000.0, ans=0.48000000000000004 2024-09-16 13:54:11,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=12000.0, ans=0.00826086956521739 2024-09-16 13:54:15,825 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 13:54:18,437 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.470e+02 1.654e+02 2.017e+02 3.240e+02, threshold=3.308e+02, percent-clipped=3.0 2024-09-16 13:54:31,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=12040.0, ans=0.125 2024-09-16 13:54:38,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=12080.0, ans=0.008243478260869566 2024-09-16 13:54:47,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=12080.0, ans=0.125 2024-09-16 13:54:54,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=12080.0, ans=0.008243478260869566 2024-09-16 13:55:17,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12160.0, ans=0.125 2024-09-16 13:55:21,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.08 vs. limit=12.059999999999999 2024-09-16 13:55:28,459 INFO [train.py:1198] (0/2) Epoch 1, batch 3050, loss[loss=0.3693, ctc_loss=0.3563, cr_loss=0.4395, attn_decoder_loss=0.361, over 29534.00 frames. ], tot_loss[loss=0.3768, ctc_loss=0.3677, cr_loss=0.4417, attn_decoder_loss=0.368, over 5779890.45 frames. ], batch size: 76, lr: 4.33e-02, grad_scale: 16.0 2024-09-16 13:55:36,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=12200.0, ans=0.008217391304347826 2024-09-16 13:55:45,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=12240.0, ans=0.008208695652173914 2024-09-16 13:56:29,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=12360.0, ans=0.07 2024-09-16 13:56:45,424 INFO [train.py:1198] (0/2) Epoch 1, batch 3100, loss[loss=0.3977, ctc_loss=0.3863, cr_loss=0.4856, attn_decoder_loss=0.3882, over 29297.00 frames. ], tot_loss[loss=0.3753, ctc_loss=0.3649, cr_loss=0.4411, attn_decoder_loss=0.3667, over 5779631.18 frames. ], batch size: 100, lr: 4.33e-02, grad_scale: 16.0 2024-09-16 13:56:54,618 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.036e+02 1.315e+02 1.501e+02 1.811e+02 4.491e+02, threshold=3.002e+02, percent-clipped=4.0 2024-09-16 13:57:09,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.94 vs. limit=11.219999999999999 2024-09-16 13:57:18,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=16.86 2024-09-16 13:57:24,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=12480.0, ans=0.125 2024-09-16 13:57:43,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.65 vs. limit=16.89 2024-09-16 13:57:45,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=12520.0, ans=0.125 2024-09-16 13:57:51,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=12560.0, ans=0.0 2024-09-16 13:57:54,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=4.884 2024-09-16 13:58:02,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=12560.0, ans=0.125 2024-09-16 13:58:04,869 INFO [train.py:1198] (0/2) Epoch 1, batch 3150, loss[loss=0.3875, ctc_loss=0.379, cr_loss=0.4516, attn_decoder_loss=0.3784, over 28928.00 frames. ], tot_loss[loss=0.3742, ctc_loss=0.3625, cr_loss=0.4417, attn_decoder_loss=0.3657, over 5784319.00 frames. ], batch size: 104, lr: 4.32e-02, grad_scale: 16.0 2024-09-16 13:58:21,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.08 vs. limit=16.98 2024-09-16 13:58:27,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=12.24 2024-09-16 13:58:44,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=12680.0, ans=0.125 2024-09-16 13:58:46,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=17.009999999999998 2024-09-16 13:59:04,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=12720.0, ans=0.013666666666666667 2024-09-16 13:59:19,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=12760.0, ans=0.05 2024-09-16 13:59:24,639 INFO [train.py:1198] (0/2) Epoch 1, batch 3200, loss[loss=0.3648, ctc_loss=0.3402, cr_loss=0.4376, attn_decoder_loss=0.3578, over 29413.00 frames. ], tot_loss[loss=0.3725, ctc_loss=0.3596, cr_loss=0.4416, attn_decoder_loss=0.3641, over 5794778.73 frames. ], batch size: 79, lr: 4.32e-02, grad_scale: 32.0 2024-09-16 13:59:27,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.39 vs. limit=17.1 2024-09-16 13:59:33,918 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.352e+02 1.572e+02 1.941e+02 4.814e+02, threshold=3.143e+02, percent-clipped=7.0 2024-09-16 13:59:48,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.69 vs. limit=17.130000000000003 2024-09-16 14:00:05,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=12880.0, ans=0.125 2024-09-16 14:00:11,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=12920.0, ans=0.125 2024-09-16 14:00:14,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=12920.0, ans=0.012833333333333335 2024-09-16 14:00:42,053 INFO [train.py:1198] (0/2) Epoch 1, batch 3250, loss[loss=0.3595, ctc_loss=0.3315, cr_loss=0.436, attn_decoder_loss=0.353, over 29695.00 frames. ], tot_loss[loss=0.3721, ctc_loss=0.3579, cr_loss=0.4427, attn_decoder_loss=0.3638, over 5801284.11 frames. ], batch size: 84, lr: 4.31e-02, grad_scale: 32.0 2024-09-16 14:00:49,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.61 vs. limit=17.25 2024-09-16 14:00:51,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13000.0, ans=0.125 2024-09-16 14:00:57,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.13 vs. limit=17.28 2024-09-16 14:00:57,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13040.0, ans=0.1696 2024-09-16 14:01:04,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=17.09 vs. limit=12.39 2024-09-16 14:01:48,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=17.369999999999997 2024-09-16 14:01:54,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=12.434999999999999 2024-09-16 14:02:01,650 INFO [train.py:1198] (0/2) Epoch 1, batch 3300, loss[loss=0.3864, ctc_loss=0.3744, cr_loss=0.4468, attn_decoder_loss=0.3779, over 28598.00 frames. ], tot_loss[loss=0.37, ctc_loss=0.3554, cr_loss=0.4409, attn_decoder_loss=0.3618, over 5798317.42 frames. ], batch size: 112, lr: 4.31e-02, grad_scale: 16.0 2024-09-16 14:02:12,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.388e+02 1.553e+02 1.864e+02 4.414e+02, threshold=3.106e+02, percent-clipped=4.0 2024-09-16 14:02:36,339 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:02:45,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=13280.0, ans=0.125 2024-09-16 14:02:45,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=13280.0, ans=0.43520000000000003 2024-09-16 14:03:20,270 INFO [train.py:1198] (0/2) Epoch 1, batch 3350, loss[loss=0.3901, ctc_loss=0.3705, cr_loss=0.4856, attn_decoder_loss=0.3815, over 28840.00 frames. ], tot_loss[loss=0.37, ctc_loss=0.3547, cr_loss=0.4409, attn_decoder_loss=0.3619, over 5772863.26 frames. ], batch size: 104, lr: 4.30e-02, grad_scale: 16.0 2024-09-16 14:03:25,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=13400.0, ans=0.125 2024-09-16 14:03:36,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=13440.0, ans=0.2 2024-09-16 14:04:07,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=13520.0, ans=0.4268 2024-09-16 14:04:19,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=13520.0, ans=0.025 2024-09-16 14:04:19,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=11.76 2024-09-16 14:04:30,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.39 vs. limit=12.585 2024-09-16 14:04:35,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.50 vs. limit=12.585 2024-09-16 14:04:38,390 INFO [train.py:1198] (0/2) Epoch 1, batch 3400, loss[loss=0.3278, ctc_loss=0.306, cr_loss=0.4386, attn_decoder_loss=0.3205, over 29312.00 frames. ], tot_loss[loss=0.3685, ctc_loss=0.3522, cr_loss=0.4407, attn_decoder_loss=0.3605, over 5765995.42 frames. ], batch size: 67, lr: 4.29e-02, grad_scale: 16.0 2024-09-16 14:04:49,180 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.397e+02 1.601e+02 1.904e+02 5.092e+02, threshold=3.203e+02, percent-clipped=2.0 2024-09-16 14:04:49,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=49.09 vs. limit=17.7 2024-09-16 14:05:12,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=12.629999999999999 2024-09-16 14:05:17,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=13680.0, ans=0.007895652173913043 2024-09-16 14:05:24,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=5.058 2024-09-16 14:05:28,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=13720.0, ans=0.009500000000000001 2024-09-16 14:05:29,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=12.645 2024-09-16 14:05:34,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13720.0, ans=0.125 2024-09-16 14:05:36,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13720.0, ans=0.1628 2024-09-16 14:05:42,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=13760.0, ans=0.125 2024-09-16 14:05:51,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13760.0, ans=0.16240000000000002 2024-09-16 14:05:53,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=13760.0, ans=0.125 2024-09-16 14:05:55,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.65 vs. limit=17.82 2024-09-16 14:05:57,557 INFO [train.py:1198] (0/2) Epoch 1, batch 3450, loss[loss=0.3907, ctc_loss=0.3687, cr_loss=0.4695, attn_decoder_loss=0.3827, over 28315.00 frames. ], tot_loss[loss=0.3681, ctc_loss=0.3504, cr_loss=0.4413, attn_decoder_loss=0.3603, over 5773686.90 frames. ], batch size: 111, lr: 4.29e-02, grad_scale: 16.0 2024-09-16 14:06:02,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=13800.0, ans=0.025 2024-09-16 14:06:11,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=13840.0, ans=0.04949747468305833 2024-09-16 14:06:36,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.23 vs. limit=12.705 2024-09-16 14:06:42,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=5.082 2024-09-16 14:06:46,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=13920.0, ans=0.125 2024-09-16 14:07:16,931 INFO [train.py:1198] (0/2) Epoch 1, batch 3500, loss[loss=0.3396, ctc_loss=0.3184, cr_loss=0.4222, attn_decoder_loss=0.3326, over 29322.00 frames. ], tot_loss[loss=0.3667, ctc_loss=0.3482, cr_loss=0.4407, attn_decoder_loss=0.3589, over 5775191.89 frames. ], batch size: 71, lr: 4.28e-02, grad_scale: 16.0 2024-09-16 14:07:23,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14000.0, ans=0.16 2024-09-16 14:07:27,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.349e+02 1.530e+02 1.819e+02 5.462e+02, threshold=3.060e+02, percent-clipped=1.0 2024-09-16 14:07:55,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=14080.0, ans=0.4072 2024-09-16 14:08:03,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=14120.0, ans=0.025 2024-09-16 14:08:06,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=14120.0, ans=0.05 2024-09-16 14:08:18,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=14160.0, ans=0.007666666666666669 2024-09-16 14:08:32,831 INFO [train.py:1198] (0/2) Epoch 1, batch 3550, loss[loss=0.3711, ctc_loss=0.3418, cr_loss=0.4395, attn_decoder_loss=0.3646, over 29726.00 frames. ], tot_loss[loss=0.3657, ctc_loss=0.3462, cr_loss=0.4406, attn_decoder_loss=0.3581, over 5783007.99 frames. ], batch size: 89, lr: 4.28e-02, grad_scale: 16.0 2024-09-16 14:08:39,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=18.90 vs. limit=12.825 2024-09-16 14:08:51,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=14240.0, ans=0.125 2024-09-16 14:09:01,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14280.0, ans=0.125 2024-09-16 14:09:10,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=14280.0, ans=0.125 2024-09-16 14:09:45,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=14360.0, ans=0.007747826086956521 2024-09-16 14:09:49,096 INFO [train.py:1198] (0/2) Epoch 1, batch 3600, loss[loss=0.3578, ctc_loss=0.3388, cr_loss=0.4385, attn_decoder_loss=0.3501, over 29507.00 frames. ], tot_loss[loss=0.3649, ctc_loss=0.3449, cr_loss=0.4413, attn_decoder_loss=0.3573, over 5792492.72 frames. ], batch size: 77, lr: 4.27e-02, grad_scale: 32.0 2024-09-16 14:09:51,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.78 vs. limit=12.9 2024-09-16 14:09:59,804 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.344e+02 1.491e+02 1.790e+02 3.419e+02, threshold=2.982e+02, percent-clipped=2.0 2024-09-16 14:10:09,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=14440.0, ans=0.09899494936611666 2024-09-16 14:10:17,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.219999999999999 2024-09-16 14:10:25,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=14480.0, ans=0.4172 2024-09-16 14:10:31,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=14480.0, ans=0.006333333333333337 2024-09-16 14:11:00,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=12.96 2024-09-16 14:11:05,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=14600.0, ans=0.005833333333333336 2024-09-16 14:11:06,519 INFO [train.py:1198] (0/2) Epoch 1, batch 3650, loss[loss=0.4012, ctc_loss=0.3826, cr_loss=0.5069, attn_decoder_loss=0.392, over 29485.00 frames. ], tot_loss[loss=0.363, ctc_loss=0.342, cr_loss=0.4397, attn_decoder_loss=0.3555, over 5794393.98 frames. ], batch size: 90, lr: 4.27e-02, grad_scale: 32.0 2024-09-16 14:11:17,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.975 2024-09-16 14:11:40,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=14680.0, ans=0.125 2024-09-16 14:12:03,187 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.144e-01 2024-09-16 14:12:14,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=18.57 2024-09-16 14:12:15,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=14760.0, ans=0.125 2024-09-16 14:12:20,084 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:12:24,267 INFO [train.py:1198] (0/2) Epoch 1, batch 3700, loss[loss=0.3749, ctc_loss=0.3458, cr_loss=0.4779, attn_decoder_loss=0.3675, over 29714.00 frames. ], tot_loss[loss=0.3622, ctc_loss=0.3402, cr_loss=0.4402, attn_decoder_loss=0.3548, over 5804349.53 frames. ], batch size: 84, lr: 4.26e-02, grad_scale: 32.0 2024-09-16 14:12:34,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.317e+02 1.543e+02 1.858e+02 5.259e+02, threshold=3.086e+02, percent-clipped=2.0 2024-09-16 14:12:38,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=14840.0, ans=0.125 2024-09-16 14:12:51,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=14840.0, ans=0.125 2024-09-16 14:12:56,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=14880.0, ans=0.1512 2024-09-16 14:12:57,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=14880.0, ans=0.004666666666666666 2024-09-16 14:12:58,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.79 vs. limit=9.952 2024-09-16 14:13:04,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=14880.0, ans=0.3792 2024-09-16 14:13:18,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=14920.0, ans=0.007626086956521739 2024-09-16 14:13:25,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=13.11 2024-09-16 14:13:32,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=14960.0, ans=0.125 2024-09-16 14:13:40,160 INFO [train.py:1198] (0/2) Epoch 1, batch 3750, loss[loss=0.3178, ctc_loss=0.2825, cr_loss=0.3824, attn_decoder_loss=0.3132, over 29319.00 frames. ], tot_loss[loss=0.361, ctc_loss=0.3383, cr_loss=0.4399, attn_decoder_loss=0.3538, over 5806888.05 frames. ], batch size: 67, lr: 4.26e-02, grad_scale: 32.0 2024-09-16 14:13:43,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=15000.0, ans=0.125 2024-09-16 14:13:57,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=13.14 2024-09-16 14:14:03,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15040.0, ans=0.1496 2024-09-16 14:14:06,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=15040.0, ans=0.37360000000000004 2024-09-16 14:14:18,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=15080.0, ans=0.0075913043478260875 2024-09-16 14:14:27,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15120.0, ans=0.14880000000000002 2024-09-16 14:14:54,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=18.869999999999997 2024-09-16 14:14:56,695 INFO [train.py:1198] (0/2) Epoch 1, batch 3800, loss[loss=0.3845, ctc_loss=0.3588, cr_loss=0.4869, attn_decoder_loss=0.3766, over 29641.00 frames. ], tot_loss[loss=0.3601, ctc_loss=0.3368, cr_loss=0.4394, attn_decoder_loss=0.353, over 5796913.66 frames. ], batch size: 86, lr: 4.25e-02, grad_scale: 32.0 2024-09-16 14:15:07,357 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.372e+02 1.609e+02 1.860e+02 5.053e+02, threshold=3.218e+02, percent-clipped=1.0 2024-09-16 14:15:08,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.85 vs. limit=13.2 2024-09-16 14:15:10,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=15240.0, ans=0.07 2024-09-16 14:15:24,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=15240.0, ans=0.003166666666666672 2024-09-16 14:15:26,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=13.23 2024-09-16 14:15:26,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=13.31 vs. limit=12.64 2024-09-16 14:15:35,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15280.0, ans=0.125 2024-09-16 14:15:36,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=15280.0, ans=0.36519999999999997 2024-09-16 14:15:45,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=13.245000000000001 2024-09-16 14:15:47,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=15320.0, ans=0.007539130434782609 2024-09-16 14:16:02,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=19.02 2024-09-16 14:16:12,322 INFO [train.py:1198] (0/2) Epoch 1, batch 3850, loss[loss=0.3808, ctc_loss=0.3531, cr_loss=0.4369, attn_decoder_loss=0.3742, over 29285.00 frames. ], tot_loss[loss=0.3592, ctc_loss=0.3346, cr_loss=0.4395, attn_decoder_loss=0.3521, over 5807244.80 frames. ], batch size: 100, lr: 4.24e-02, grad_scale: 32.0 2024-09-16 14:16:51,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=15480.0, ans=0.04949747468305833 2024-09-16 14:16:58,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=15520.0, ans=0.125 2024-09-16 14:17:22,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=15560.0, ans=10.0 2024-09-16 14:17:22,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=13.335 2024-09-16 14:17:31,254 INFO [train.py:1198] (0/2) Epoch 1, batch 3900, loss[loss=0.3489, ctc_loss=0.3162, cr_loss=0.4551, attn_decoder_loss=0.3425, over 29652.00 frames. ], tot_loss[loss=0.3592, ctc_loss=0.334, cr_loss=0.4405, attn_decoder_loss=0.3522, over 5812476.41 frames. ], batch size: 86, lr: 4.24e-02, grad_scale: 32.0 2024-09-16 14:17:34,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=15600.0, ans=0.0016666666666666705 2024-09-16 14:17:41,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.343e+02 1.512e+02 1.794e+02 6.576e+02, threshold=3.024e+02, percent-clipped=3.0 2024-09-16 14:17:42,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=15600.0, ans=0.0 2024-09-16 14:17:49,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=15640.0, ans=0.3526 2024-09-16 14:17:53,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=15640.0, ans=0.125 2024-09-16 14:18:04,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.87 vs. limit=12.84 2024-09-16 14:18:05,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=15680.0, ans=0.125 2024-09-16 14:18:40,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=15760.0, ans=0.34840000000000004 2024-09-16 14:18:46,500 INFO [train.py:1198] (0/2) Epoch 1, batch 3950, loss[loss=0.3601, ctc_loss=0.3232, cr_loss=0.4529, attn_decoder_loss=0.3541, over 29505.00 frames. ], tot_loss[loss=0.3593, ctc_loss=0.3336, cr_loss=0.4403, attn_decoder_loss=0.3523, over 5832640.08 frames. ], batch size: 97, lr: 4.23e-02, grad_scale: 32.0 2024-09-16 14:18:46,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=15800.0, ans=0.347 2024-09-16 14:18:55,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=15800.0, ans=0.125 2024-09-16 14:18:58,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.93 vs. limit=12.9 2024-09-16 14:19:20,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.47 vs. limit=19.41 2024-09-16 14:20:00,779 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-4000.pt 2024-09-16 14:20:09,346 INFO [train.py:1198] (0/2) Epoch 1, batch 4000, loss[loss=0.341, ctc_loss=0.3168, cr_loss=0.4174, attn_decoder_loss=0.3344, over 29530.00 frames. ], tot_loss[loss=0.3589, ctc_loss=0.333, cr_loss=0.4405, attn_decoder_loss=0.352, over 5810777.84 frames. ], batch size: 74, lr: 4.23e-02, grad_scale: 32.0 2024-09-16 14:20:17,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=16000.0, ans=0.0 2024-09-16 14:20:19,715 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.414e+02 1.598e+02 1.942e+02 7.205e+02, threshold=3.195e+02, percent-clipped=1.0 2024-09-16 14:20:20,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.4 2024-09-16 14:20:29,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.68 vs. limit=19.53 2024-09-16 14:20:44,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=10.432 2024-09-16 14:20:53,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.92 vs. limit=13.06 2024-09-16 14:21:03,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=16120.0, ans=0.125 2024-09-16 14:21:24,252 INFO [train.py:1198] (0/2) Epoch 1, batch 4050, loss[loss=0.4206, ctc_loss=0.4309, cr_loss=0.4674, attn_decoder_loss=0.4091, over 19824.00 frames. ], tot_loss[loss=0.359, ctc_loss=0.3326, cr_loss=0.4404, attn_decoder_loss=0.3521, over 5793777.63 frames. ], batch size: 210, lr: 4.22e-02, grad_scale: 32.0 2024-09-16 14:22:08,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=16280.0, ans=0.0 2024-09-16 14:22:11,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=16320.0, ans=0.125 2024-09-16 14:22:38,766 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=13.635 2024-09-16 14:22:40,684 INFO [train.py:1198] (0/2) Epoch 1, batch 4100, loss[loss=0.3669, ctc_loss=0.3211, cr_loss=0.4425, attn_decoder_loss=0.3621, over 29491.00 frames. ], tot_loss[loss=0.3588, ctc_loss=0.3321, cr_loss=0.4407, attn_decoder_loss=0.352, over 5789591.17 frames. ], batch size: 90, lr: 4.22e-02, grad_scale: 32.0 2024-09-16 14:22:51,003 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.366e+02 1.525e+02 1.800e+02 4.946e+02, threshold=3.051e+02, percent-clipped=3.0 2024-09-16 14:23:03,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.46 vs. limit=13.665 2024-09-16 14:23:15,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=16480.0, ans=0.32320000000000004 2024-09-16 14:23:21,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=16480.0, ans=0.007286956521739131 2024-09-16 14:23:54,727 INFO [train.py:1198] (0/2) Epoch 1, batch 4150, loss[loss=0.351, ctc_loss=0.3155, cr_loss=0.43, attn_decoder_loss=0.3454, over 29502.00 frames. ], tot_loss[loss=0.3578, ctc_loss=0.3302, cr_loss=0.4402, attn_decoder_loss=0.3511, over 5796288.80 frames. ], batch size: 77, lr: 4.21e-02, grad_scale: 32.0 2024-09-16 14:24:03,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten.whitening_limit, batch_count=16600.0, ans=13.725 2024-09-16 14:24:16,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=13.74 2024-09-16 14:24:31,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=16680.0, ans=0.125 2024-09-16 14:24:40,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=16720.0, ans=0.125 2024-09-16 14:24:42,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16720.0, ans=0.1328 2024-09-16 14:25:09,099 INFO [train.py:1198] (0/2) Epoch 1, batch 4200, loss[loss=0.3854, ctc_loss=0.3617, cr_loss=0.4513, attn_decoder_loss=0.378, over 29503.00 frames. ], tot_loss[loss=0.358, ctc_loss=0.3298, cr_loss=0.4411, attn_decoder_loss=0.3513, over 5798280.64 frames. ], batch size: 90, lr: 4.20e-02, grad_scale: 32.0 2024-09-16 14:25:13,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=16800.0, ans=0.31200000000000006 2024-09-16 14:25:19,663 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.356e+02 1.563e+02 1.936e+02 3.144e+02, threshold=3.127e+02, percent-clipped=1.0 2024-09-16 14:25:50,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=16880.0, ans=0.125 2024-09-16 14:26:16,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=16960.0, ans=0.125 2024-09-16 14:26:19,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=16960.0, ans=0.0 2024-09-16 14:26:22,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=16960.0, ans=0.125 2024-09-16 14:26:22,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=16960.0, ans=0.0 2024-09-16 14:26:25,325 INFO [train.py:1198] (0/2) Epoch 1, batch 4250, loss[loss=0.3337, ctc_loss=0.2957, cr_loss=0.4459, attn_decoder_loss=0.328, over 29519.00 frames. ], tot_loss[loss=0.3571, ctc_loss=0.3279, cr_loss=0.4408, attn_decoder_loss=0.3505, over 5803651.38 frames. ], batch size: 74, lr: 4.20e-02, grad_scale: 32.0 2024-09-16 14:26:30,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=10.8 2024-09-16 14:26:34,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=17000.0, ans=0.0 2024-09-16 14:26:37,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=10.8 2024-09-16 14:27:10,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=17120.0, ans=0.0 2024-09-16 14:27:23,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=17160.0, ans=0.025 2024-09-16 14:27:39,364 INFO [train.py:1198] (0/2) Epoch 1, batch 4300, loss[loss=0.3627, ctc_loss=0.3337, cr_loss=0.4299, attn_decoder_loss=0.3564, over 29512.00 frames. ], tot_loss[loss=0.357, ctc_loss=0.3277, cr_loss=0.4417, attn_decoder_loss=0.3505, over 5791717.70 frames. ], batch size: 87, lr: 4.19e-02, grad_scale: 32.0 2024-09-16 14:27:49,862 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.364e+02 1.537e+02 1.919e+02 5.209e+02, threshold=3.074e+02, percent-clipped=5.0 2024-09-16 14:28:37,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=17360.0, ans=0.2924 2024-09-16 14:28:39,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=17360.0, ans=0.125 2024-09-16 14:28:53,573 INFO [train.py:1198] (0/2) Epoch 1, batch 4350, loss[loss=0.3797, ctc_loss=0.3505, cr_loss=0.4916, attn_decoder_loss=0.372, over 29506.00 frames. ], tot_loss[loss=0.3608, ctc_loss=0.3311, cr_loss=0.4457, attn_decoder_loss=0.3542, over 5796068.22 frames. ], batch size: 97, lr: 4.19e-02, grad_scale: 32.0 2024-09-16 14:29:01,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=17400.0, ans=0.07 2024-09-16 14:29:10,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=17440.0, ans=0.125 2024-09-16 14:29:15,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17440.0, ans=0.125 2024-09-16 14:29:35,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=17480.0, ans=0.09899494936611666 2024-09-16 14:29:44,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=14.07 2024-09-16 14:29:50,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=17520.0, ans=0.28680000000000005 2024-09-16 14:30:05,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.19 vs. limit=13.78 2024-09-16 14:30:09,476 INFO [train.py:1198] (0/2) Epoch 1, batch 4400, loss[loss=0.3567, ctc_loss=0.3341, cr_loss=0.4373, attn_decoder_loss=0.3495, over 27614.00 frames. ], tot_loss[loss=0.3635, ctc_loss=0.3343, cr_loss=0.4481, attn_decoder_loss=0.3568, over 5768013.25 frames. ], batch size: 124, lr: 4.18e-02, grad_scale: 32.0 2024-09-16 14:30:11,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=17600.0, ans=0.0 2024-09-16 14:30:19,703 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.315e+02 1.467e+02 1.766e+02 6.671e+02, threshold=2.933e+02, percent-clipped=1.0 2024-09-16 14:30:37,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=17680.0, ans=0.125 2024-09-16 14:31:02,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=17720.0, ans=0.27980000000000005 2024-09-16 14:31:06,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=20.79 2024-09-16 14:31:06,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=14.145 2024-09-16 14:31:10,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=20.82 2024-09-16 14:31:13,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=17760.0, ans=0.125 2024-09-16 14:31:16,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17760.0, ans=0.125 2024-09-16 14:31:24,221 INFO [train.py:1198] (0/2) Epoch 1, batch 4450, loss[loss=0.3941, ctc_loss=0.393, cr_loss=0.4677, attn_decoder_loss=0.3838, over 20904.00 frames. ], tot_loss[loss=0.3675, ctc_loss=0.3411, cr_loss=0.4489, attn_decoder_loss=0.3605, over 5581140.11 frames. ], batch size: 209, lr: 4.17e-02, grad_scale: 32.0 2024-09-16 14:31:29,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=20.85 2024-09-16 14:31:35,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17800.0, ans=0.12200000000000003 2024-09-16 14:32:34,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=14.235 2024-09-16 14:32:40,074 INFO [train.py:1198] (0/2) Epoch 1, batch 4500, loss[loss=0.3911, ctc_loss=0.3933, cr_loss=0.4535, attn_decoder_loss=0.3808, over 20691.00 frames. ], tot_loss[loss=0.3723, ctc_loss=0.351, cr_loss=0.4476, attn_decoder_loss=0.3647, over 5237228.89 frames. ], batch size: 209, lr: 4.17e-02, grad_scale: 32.0 2024-09-16 14:32:40,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18000.0, ans=0.12000000000000002 2024-09-16 14:32:50,391 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.043e+02 1.290e+02 1.458e+02 1.671e+02 6.229e+02, threshold=2.915e+02, percent-clipped=1.0 2024-09-16 14:33:16,834 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-1.pt 2024-09-16 14:34:13,792 INFO [train.py:1198] (0/2) Epoch 2, batch 0, loss[loss=0.4849, ctc_loss=0.3034, cr_loss=0.4392, attn_decoder_loss=0.4953, over 29614.00 frames. ], tot_loss[loss=0.4849, ctc_loss=0.3034, cr_loss=0.4392, attn_decoder_loss=0.4953, over 29614.00 frames. ], batch size: 73, lr: 4.08e-02, grad_scale: 32.0 2024-09-16 14:34:13,793 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 14:34:17,241 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([3.1274, 2.0603, 2.5104, 2.5775, 2.4812, 1.3811, 1.9125, 2.1626], device='cuda:0') 2024-09-16 14:34:32,032 INFO [train.py:1230] (0/2) Epoch 2, validation: loss=0.3071, ctc_loss=0.1367, cr_loss=4.721e-15, attn_decoder_loss=0.326, over 944034.00 frames. 2024-09-16 14:34:32,033 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 14:35:01,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=18180.0, ans=0.0 2024-09-16 14:35:04,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=18180.0, ans=0.26370000000000005 2024-09-16 14:35:14,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=18180.0, ans=0.125 2024-09-16 14:35:35,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=21.195 2024-09-16 14:35:48,032 INFO [train.py:1198] (0/2) Epoch 2, batch 50, loss[loss=0.322, ctc_loss=0.2946, cr_loss=0.3803, attn_decoder_loss=0.3166, over 29399.00 frames. ], tot_loss[loss=0.3769, ctc_loss=0.341, cr_loss=0.4442, attn_decoder_loss=0.3711, over 1266739.56 frames. ], batch size: 70, lr: 4.08e-02, grad_scale: 16.0 2024-09-16 14:35:48,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=18300.0, ans=0.0068913043478260865 2024-09-16 14:36:18,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=18380.0, ans=0.125 2024-09-16 14:36:20,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=18380.0, ans=0.006873913043478261 2024-09-16 14:36:25,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=18380.0, ans=0.025 2024-09-16 14:36:42,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.396e+02 1.768e+02 2.293e+02 2.873e+03, threshold=3.536e+02, percent-clipped=13.0 2024-09-16 14:36:42,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=18420.0, ans=0.0 2024-09-16 14:37:06,518 INFO [train.py:1198] (0/2) Epoch 2, batch 100, loss[loss=0.3565, ctc_loss=0.3233, cr_loss=0.4547, attn_decoder_loss=0.3501, over 29558.00 frames. ], tot_loss[loss=0.3684, ctc_loss=0.3355, cr_loss=0.4447, attn_decoder_loss=0.3622, over 2250999.15 frames. ], batch size: 76, lr: 4.07e-02, grad_scale: 16.0 2024-09-16 14:37:09,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=18500.0, ans=0.0 2024-09-16 14:37:44,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=14.467500000000001 2024-09-16 14:37:57,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.34 vs. limit=21.465 2024-09-16 14:38:24,316 INFO [train.py:1198] (0/2) Epoch 2, batch 150, loss[loss=0.3277, ctc_loss=0.3017, cr_loss=0.4266, attn_decoder_loss=0.3211, over 29411.00 frames. ], tot_loss[loss=0.3599, ctc_loss=0.3269, cr_loss=0.4412, attn_decoder_loss=0.3537, over 3044961.00 frames. ], batch size: 70, lr: 4.06e-02, grad_scale: 16.0 2024-09-16 14:38:25,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.79 vs. limit=14.5125 2024-09-16 14:39:11,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=18820.0, ans=0.125 2024-09-16 14:39:15,767 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.312e+02 1.456e+02 1.615e+02 4.569e+02, threshold=2.911e+02, percent-clipped=2.0 2024-09-16 14:39:28,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=18860.0, ans=0.125 2024-09-16 14:39:40,021 INFO [train.py:1198] (0/2) Epoch 2, batch 200, loss[loss=0.3698, ctc_loss=0.3424, cr_loss=0.4446, attn_decoder_loss=0.363, over 27174.00 frames. ], tot_loss[loss=0.3552, ctc_loss=0.3212, cr_loss=0.4392, attn_decoder_loss=0.3492, over 3656558.12 frames. ], batch size: 124, lr: 4.06e-02, grad_scale: 16.0 2024-09-16 14:40:04,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=18940.0, ans=0.125 2024-09-16 14:40:32,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=19020.0, ans=0.025 2024-09-16 14:40:40,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=19020.0, ans=0.125 2024-09-16 14:40:58,052 INFO [train.py:1198] (0/2) Epoch 2, batch 250, loss[loss=0.3889, ctc_loss=0.3663, cr_loss=0.4662, attn_decoder_loss=0.3811, over 29203.00 frames. ], tot_loss[loss=0.3531, ctc_loss=0.3186, cr_loss=0.4396, attn_decoder_loss=0.3471, over 4138741.90 frames. ], batch size: 100, lr: 4.05e-02, grad_scale: 16.0 2024-09-16 14:41:14,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=21.855 2024-09-16 14:41:21,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19140.0, ans=0.125 2024-09-16 14:41:22,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=19140.0, ans=0.125 2024-09-16 14:41:23,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=19140.0, ans=0.07 2024-09-16 14:41:36,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=19180.0, ans=0.125 2024-09-16 14:41:42,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=19220.0, ans=0.125 2024-09-16 14:41:48,974 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:41:50,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.356e+02 1.504e+02 1.757e+02 3.092e+02, threshold=3.008e+02, percent-clipped=1.0 2024-09-16 14:41:50,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=19220.0, ans=0.125 2024-09-16 14:42:07,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=19260.0, ans=0.09899494936611666 2024-09-16 14:42:16,768 INFO [train.py:1198] (0/2) Epoch 2, batch 300, loss[loss=0.3764, ctc_loss=0.331, cr_loss=0.4424, attn_decoder_loss=0.3716, over 29564.00 frames. ], tot_loss[loss=0.3515, ctc_loss=0.3164, cr_loss=0.4393, attn_decoder_loss=0.3457, over 4509217.47 frames. ], batch size: 92, lr: 4.05e-02, grad_scale: 16.0 2024-09-16 14:42:18,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=19300.0, ans=0.05 2024-09-16 14:42:20,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19300.0, ans=0.10700000000000001 2024-09-16 14:42:27,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19300.0, ans=0.0 2024-09-16 14:42:35,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=19340.0, ans=0.09899494936611666 2024-09-16 14:42:43,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=19340.0, ans=0.125 2024-09-16 14:42:53,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=19380.0, ans=0.0 2024-09-16 14:42:54,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.43 vs. limit=11.751999999999999 2024-09-16 14:43:10,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=11.768 2024-09-16 14:43:28,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.92 vs. limit=9.865 2024-09-16 14:43:33,266 INFO [train.py:1198] (0/2) Epoch 2, batch 350, loss[loss=0.327, ctc_loss=0.2912, cr_loss=0.4193, attn_decoder_loss=0.3217, over 29734.00 frames. ], tot_loss[loss=0.3514, ctc_loss=0.3158, cr_loss=0.4405, attn_decoder_loss=0.3456, over 4795080.78 frames. ], batch size: 72, lr: 4.04e-02, grad_scale: 16.0 2024-09-16 14:43:42,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=19500.0, ans=0.21750000000000003 2024-09-16 14:43:49,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=14.8275 2024-09-16 14:43:51,776 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:44:22,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=19620.0, ans=0.125 2024-09-16 14:44:26,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.410e+02 1.578e+02 1.828e+02 5.190e+02, threshold=3.157e+02, percent-clipped=4.0 2024-09-16 14:44:31,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=19620.0, ans=0.1038 2024-09-16 14:44:31,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19620.0, ans=0.0 2024-09-16 14:44:51,067 INFO [train.py:1198] (0/2) Epoch 2, batch 400, loss[loss=0.3505, ctc_loss=0.3163, cr_loss=0.4393, attn_decoder_loss=0.3445, over 29714.00 frames. ], tot_loss[loss=0.35, ctc_loss=0.314, cr_loss=0.4396, attn_decoder_loss=0.3442, over 5025132.01 frames. ], batch size: 82, lr: 4.03e-02, grad_scale: 32.0 2024-09-16 14:44:54,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=19700.0, ans=0.04949747468305833 2024-09-16 14:45:03,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=19700.0, ans=0.025 2024-09-16 14:45:14,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=19740.0, ans=0.125 2024-09-16 14:45:28,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=19780.0, ans=0.0 2024-09-16 14:45:38,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19820.0, ans=0.125 2024-09-16 14:45:43,471 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:45:44,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=19820.0, ans=0.0 2024-09-16 14:46:10,054 INFO [train.py:1198] (0/2) Epoch 2, batch 450, loss[loss=0.3532, ctc_loss=0.3186, cr_loss=0.4573, attn_decoder_loss=0.3469, over 29681.00 frames. ], tot_loss[loss=0.3497, ctc_loss=0.3141, cr_loss=0.4405, attn_decoder_loss=0.3439, over 5185003.38 frames. ], batch size: 83, lr: 4.03e-02, grad_scale: 32.0 2024-09-16 14:46:12,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=19900.0, ans=9.975 2024-09-16 14:46:14,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19900.0, ans=0.101 2024-09-16 14:46:16,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=7.03 vs. limit=11.96 2024-09-16 14:46:30,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=19940.0, ans=0.125 2024-09-16 14:46:43,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=19980.0, ans=0.125 2024-09-16 14:47:02,432 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.299e+02 1.486e+02 1.745e+02 5.446e+02, threshold=2.972e+02, percent-clipped=3.0 2024-09-16 14:47:07,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2024-09-16 14:47:11,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=20060.0, ans=0.2 2024-09-16 14:47:20,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20060.0, ans=0.1 2024-09-16 14:47:26,732 INFO [train.py:1198] (0/2) Epoch 2, batch 500, loss[loss=0.3547, ctc_loss=0.3169, cr_loss=0.4446, attn_decoder_loss=0.349, over 29442.00 frames. ], tot_loss[loss=0.3479, ctc_loss=0.312, cr_loss=0.4401, attn_decoder_loss=0.3421, over 5328411.28 frames. ], batch size: 94, lr: 4.02e-02, grad_scale: 32.0 2024-09-16 14:48:04,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-09-16 14:48:10,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=10.11 vs. limit=10.0 2024-09-16 14:48:16,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20220.0, ans=0.1 2024-09-16 14:48:19,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=20220.0, ans=0.2 2024-09-16 14:48:21,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20220.0, ans=0.125 2024-09-16 14:48:25,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=20220.0, ans=0.0 2024-09-16 14:48:28,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=20260.0, ans=0.125 2024-09-16 14:48:44,958 INFO [train.py:1198] (0/2) Epoch 2, batch 550, loss[loss=0.3529, ctc_loss=0.3154, cr_loss=0.4583, attn_decoder_loss=0.3469, over 28785.00 frames. ], tot_loss[loss=0.3478, ctc_loss=0.3119, cr_loss=0.4407, attn_decoder_loss=0.342, over 5421229.98 frames. ], batch size: 104, lr: 4.02e-02, grad_scale: 16.0 2024-09-16 14:49:03,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=20340.0, ans=0.2 2024-09-16 14:49:06,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=20340.0, ans=0.125 2024-09-16 14:49:14,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=20380.0, ans=0.0 2024-09-16 14:49:15,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=20380.0, ans=0.2 2024-09-16 14:49:22,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-09-16 14:49:26,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=20380.0, ans=0.0 2024-09-16 14:49:38,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.358e+02 1.600e+02 1.893e+02 5.686e+02, threshold=3.199e+02, percent-clipped=4.0 2024-09-16 14:49:41,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=20420.0, ans=0.025 2024-09-16 14:50:00,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=20460.0, ans=0.125 2024-09-16 14:50:03,475 INFO [train.py:1198] (0/2) Epoch 2, batch 600, loss[loss=0.3606, ctc_loss=0.3191, cr_loss=0.4449, attn_decoder_loss=0.3554, over 29305.00 frames. ], tot_loss[loss=0.348, ctc_loss=0.3116, cr_loss=0.4408, attn_decoder_loss=0.3422, over 5508901.29 frames. ], batch size: 100, lr: 4.01e-02, grad_scale: 16.0 2024-09-16 14:50:24,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20540.0, ans=0.125 2024-09-16 14:50:36,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=20580.0, ans=0.125 2024-09-16 14:50:43,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2024-09-16 14:51:19,278 INFO [train.py:1198] (0/2) Epoch 2, batch 650, loss[loss=0.3291, ctc_loss=0.2826, cr_loss=0.4598, attn_decoder_loss=0.324, over 29780.00 frames. ], tot_loss[loss=0.3461, ctc_loss=0.3094, cr_loss=0.4395, attn_decoder_loss=0.3404, over 5585981.94 frames. ], batch size: 81, lr: 4.00e-02, grad_scale: 16.0 2024-09-16 14:51:19,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=20700.0, ans=0.04949747468305833 2024-09-16 14:51:39,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=20740.0, ans=0.2 2024-09-16 14:51:59,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20780.0, ans=0.125 2024-09-16 14:52:04,390 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 14:52:07,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20820.0, ans=0.125 2024-09-16 14:52:11,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-16 14:52:15,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.306e+02 1.501e+02 1.738e+02 3.373e+02, threshold=3.002e+02, percent-clipped=2.0 2024-09-16 14:52:21,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=20860.0, ans=0.0 2024-09-16 14:52:38,095 INFO [train.py:1198] (0/2) Epoch 2, batch 700, loss[loss=0.3355, ctc_loss=0.295, cr_loss=0.4435, attn_decoder_loss=0.3301, over 29532.00 frames. ], tot_loss[loss=0.3463, ctc_loss=0.3093, cr_loss=0.4402, attn_decoder_loss=0.3407, over 5637971.83 frames. ], batch size: 76, lr: 4.00e-02, grad_scale: 16.0 2024-09-16 14:52:40,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-09-16 14:52:43,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20900.0, ans=0.125 2024-09-16 14:52:51,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=20940.0, ans=0.2 2024-09-16 14:53:05,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=15.02 vs. limit=15.0 2024-09-16 14:53:07,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=20980.0, ans=0.2 2024-09-16 14:53:08,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=20980.0, ans=0.2 2024-09-16 14:53:26,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=21020.0, ans=0.0 2024-09-16 14:53:26,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=21020.0, ans=0.125 2024-09-16 14:53:56,476 INFO [train.py:1198] (0/2) Epoch 2, batch 750, loss[loss=0.3454, ctc_loss=0.2978, cr_loss=0.4379, attn_decoder_loss=0.3409, over 29706.00 frames. ], tot_loss[loss=0.3458, ctc_loss=0.3088, cr_loss=0.4401, attn_decoder_loss=0.3401, over 5676946.78 frames. ], batch size: 82, lr: 3.99e-02, grad_scale: 16.0 2024-09-16 14:54:23,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21140.0, ans=0.1 2024-09-16 14:54:49,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.356e+02 1.549e+02 1.774e+02 3.247e+02, threshold=3.098e+02, percent-clipped=2.0 2024-09-16 14:55:08,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-09-16 14:55:12,220 INFO [train.py:1198] (0/2) Epoch 2, batch 800, loss[loss=0.2906, ctc_loss=0.2387, cr_loss=0.3861, attn_decoder_loss=0.2878, over 29611.00 frames. ], tot_loss[loss=0.3453, ctc_loss=0.3079, cr_loss=0.4395, attn_decoder_loss=0.3397, over 5707519.61 frames. ], batch size: 73, lr: 3.98e-02, grad_scale: 32.0 2024-09-16 14:55:27,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=8.0 2024-09-16 14:55:31,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=21340.0, ans=0.006230434782608696 2024-09-16 14:56:21,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.79 vs. limit=15.0 2024-09-16 14:56:28,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=21500.0, ans=0.2 2024-09-16 14:56:30,085 INFO [train.py:1198] (0/2) Epoch 2, batch 850, loss[loss=0.3515, ctc_loss=0.314, cr_loss=0.4679, attn_decoder_loss=0.3453, over 29721.00 frames. ], tot_loss[loss=0.3443, ctc_loss=0.3062, cr_loss=0.4394, attn_decoder_loss=0.3387, over 5736994.44 frames. ], batch size: 89, lr: 3.98e-02, grad_scale: 16.0 2024-09-16 14:56:53,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21540.0, ans=0.1 2024-09-16 14:57:13,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=21580.0, ans=0.125 2024-09-16 14:57:25,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.316e+02 1.489e+02 1.639e+02 3.105e+02, threshold=2.978e+02, percent-clipped=1.0 2024-09-16 14:57:46,625 INFO [train.py:1198] (0/2) Epoch 2, batch 900, loss[loss=0.3113, ctc_loss=0.2664, cr_loss=0.3956, attn_decoder_loss=0.3075, over 29571.00 frames. ], tot_loss[loss=0.3443, ctc_loss=0.3062, cr_loss=0.4401, attn_decoder_loss=0.3388, over 5741541.30 frames. ], batch size: 73, lr: 3.97e-02, grad_scale: 16.0 2024-09-16 14:57:49,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=21700.0, ans=0.0 2024-09-16 14:58:00,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-09-16 14:58:01,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=21700.0, ans=0.006152173913043478 2024-09-16 14:58:26,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=12.0 2024-09-16 14:58:35,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-16 14:58:37,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21820.0, ans=0.1 2024-09-16 14:58:43,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=21820.0, ans=0.04949747468305833 2024-09-16 14:58:46,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21820.0, ans=0.2 2024-09-16 14:58:57,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=21860.0, ans=0.1 2024-09-16 14:59:01,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=21860.0, ans=0.006117391304347826 2024-09-16 14:59:04,589 INFO [train.py:1198] (0/2) Epoch 2, batch 950, loss[loss=0.3196, ctc_loss=0.272, cr_loss=0.4017, attn_decoder_loss=0.316, over 29493.00 frames. ], tot_loss[loss=0.3443, ctc_loss=0.3062, cr_loss=0.4402, attn_decoder_loss=0.3387, over 5744361.87 frames. ], batch size: 74, lr: 3.97e-02, grad_scale: 16.0 2024-09-16 14:59:06,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=21900.0, ans=0.2 2024-09-16 14:59:11,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.03 vs. limit=15.0 2024-09-16 14:59:29,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=21940.0, ans=0.125 2024-09-16 14:59:32,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=21940.0, ans=0.0060999999999999995 2024-09-16 14:59:43,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21980.0, ans=0.1 2024-09-16 14:59:52,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=22020.0, ans=0.2 2024-09-16 14:59:52,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=22020.0, ans=0.07 2024-09-16 15:00:01,565 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.375e+02 1.582e+02 1.931e+02 4.850e+02, threshold=3.164e+02, percent-clipped=3.0 2024-09-16 15:00:11,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=22060.0, ans=0.05 2024-09-16 15:00:19,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=22060.0, ans=0.125 2024-09-16 15:00:19,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=22060.0, ans=0.2 2024-09-16 15:00:21,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=22100.0, ans=0.125 2024-09-16 15:00:22,483 INFO [train.py:1198] (0/2) Epoch 2, batch 1000, loss[loss=0.3296, ctc_loss=0.2825, cr_loss=0.4437, attn_decoder_loss=0.325, over 29500.00 frames. ], tot_loss[loss=0.3456, ctc_loss=0.3077, cr_loss=0.4416, attn_decoder_loss=0.34, over 5736735.60 frames. ], batch size: 77, lr: 3.96e-02, grad_scale: 16.0 2024-09-16 15:00:33,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=22100.0, ans=0.0 2024-09-16 15:00:35,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=22100.0, ans=0.125 2024-09-16 15:00:41,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=22140.0, ans=0.125 2024-09-16 15:00:48,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=22140.0, ans=0.125 2024-09-16 15:01:03,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=22180.0, ans=0.006047826086956522 2024-09-16 15:01:05,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=22180.0, ans=0.125 2024-09-16 15:01:06,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=22220.0, ans=0.07 2024-09-16 15:01:09,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22220.0, ans=0.125 2024-09-16 15:01:11,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22220.0, ans=0.1 2024-09-16 15:01:15,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2024-09-16 15:01:24,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=22260.0, ans=0.125 2024-09-16 15:01:26,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=22260.0, ans=0.125 2024-09-16 15:01:28,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=22260.0, ans=0.07 2024-09-16 15:01:34,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=22260.0, ans=0.125 2024-09-16 15:01:34,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-16 15:01:38,480 INFO [train.py:1198] (0/2) Epoch 2, batch 1050, loss[loss=0.3376, ctc_loss=0.2988, cr_loss=0.4426, attn_decoder_loss=0.332, over 29686.00 frames. ], tot_loss[loss=0.3438, ctc_loss=0.306, cr_loss=0.4399, attn_decoder_loss=0.3383, over 5744760.89 frames. ], batch size: 85, lr: 3.95e-02, grad_scale: 16.0 2024-09-16 15:02:10,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22380.0, ans=0.1 2024-09-16 15:02:14,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=22380.0, ans=0.125 2024-09-16 15:02:35,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=22420.0, ans=0.125 2024-09-16 15:02:36,118 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.353e+02 1.564e+02 1.813e+02 2.890e+02, threshold=3.129e+02, percent-clipped=0.0 2024-09-16 15:02:36,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=22420.0, ans=0.025 2024-09-16 15:02:36,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=22420.0, ans=0.125 2024-09-16 15:02:57,505 INFO [train.py:1198] (0/2) Epoch 2, batch 1100, loss[loss=0.3258, ctc_loss=0.2786, cr_loss=0.4294, attn_decoder_loss=0.3215, over 29426.00 frames. ], tot_loss[loss=0.343, ctc_loss=0.3046, cr_loss=0.4404, attn_decoder_loss=0.3375, over 5756796.82 frames. ], batch size: 78, lr: 3.95e-02, grad_scale: 16.0 2024-09-16 15:02:59,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22500.0, ans=0.1 2024-09-16 15:02:59,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2024-09-16 15:03:00,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=22500.0, ans=0.125 2024-09-16 15:03:12,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2024-09-16 15:03:22,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=22540.0, ans=0.005969565217391304 2024-09-16 15:04:06,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=22660.0, ans=0.1 2024-09-16 15:04:10,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2024-09-16 15:04:15,691 INFO [train.py:1198] (0/2) Epoch 2, batch 1150, loss[loss=0.3363, ctc_loss=0.2976, cr_loss=0.416, attn_decoder_loss=0.3313, over 29460.00 frames. ], tot_loss[loss=0.343, ctc_loss=0.3045, cr_loss=0.4401, attn_decoder_loss=0.3375, over 5755685.04 frames. ], batch size: 78, lr: 3.94e-02, grad_scale: 16.0 2024-09-16 15:04:33,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2024-09-16 15:04:42,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-16 15:04:44,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22780.0, ans=0.1 2024-09-16 15:04:51,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=22780.0, ans=0.125 2024-09-16 15:04:54,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-09-16 15:05:08,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.27 vs. limit=15.0 2024-09-16 15:05:10,469 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.307e+02 1.503e+02 1.816e+02 4.036e+02, threshold=3.005e+02, percent-clipped=3.0 2024-09-16 15:05:19,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22860.0, ans=0.125 2024-09-16 15:05:21,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.97 vs. limit=22.5 2024-09-16 15:05:31,663 INFO [train.py:1198] (0/2) Epoch 2, batch 1200, loss[loss=0.3554, ctc_loss=0.313, cr_loss=0.4413, attn_decoder_loss=0.3503, over 29671.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.3058, cr_loss=0.4419, attn_decoder_loss=0.3387, over 5748685.00 frames. ], batch size: 85, lr: 3.93e-02, grad_scale: 32.0 2024-09-16 15:05:34,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=22900.0, ans=0.125 2024-09-16 15:06:01,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=22940.0, ans=0.02 2024-09-16 15:06:09,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.45 vs. limit=22.5 2024-09-16 15:06:29,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-09-16 15:06:50,272 INFO [train.py:1198] (0/2) Epoch 2, batch 1250, loss[loss=0.3632, ctc_loss=0.3214, cr_loss=0.4753, attn_decoder_loss=0.3573, over 29530.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.3054, cr_loss=0.4429, attn_decoder_loss=0.3386, over 5776466.00 frames. ], batch size: 92, lr: 3.93e-02, grad_scale: 16.0 2024-09-16 15:07:49,030 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.374e+02 1.508e+02 1.823e+02 4.800e+02, threshold=3.017e+02, percent-clipped=3.0 2024-09-16 15:07:56,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2024-09-16 15:08:08,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=25.18 vs. limit=22.5 2024-09-16 15:08:08,747 INFO [train.py:1198] (0/2) Epoch 2, batch 1300, loss[loss=0.3494, ctc_loss=0.3051, cr_loss=0.4641, attn_decoder_loss=0.344, over 28631.00 frames. ], tot_loss[loss=0.3426, ctc_loss=0.3036, cr_loss=0.4415, attn_decoder_loss=0.3371, over 5781427.55 frames. ], batch size: 112, lr: 3.92e-02, grad_scale: 16.0 2024-09-16 15:08:15,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.33 vs. limit=15.0 2024-09-16 15:08:18,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=23300.0, ans=0.125 2024-09-16 15:08:21,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=23300.0, ans=0.125 2024-09-16 15:08:41,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=23380.0, ans=0.05 2024-09-16 15:08:42,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=23380.0, ans=0.125 2024-09-16 15:08:44,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-16 15:08:48,722 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.965e-02 2024-09-16 15:09:05,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=23420.0, ans=0.025 2024-09-16 15:09:07,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-16 15:09:14,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=23460.0, ans=0.0057695652173913045 2024-09-16 15:09:25,051 INFO [train.py:1198] (0/2) Epoch 2, batch 1350, loss[loss=0.3504, ctc_loss=0.3221, cr_loss=0.4508, attn_decoder_loss=0.3436, over 29786.00 frames. ], tot_loss[loss=0.3414, ctc_loss=0.3018, cr_loss=0.4405, attn_decoder_loss=0.3361, over 5797558.38 frames. ], batch size: 81, lr: 3.91e-02, grad_scale: 16.0 2024-09-16 15:10:05,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.97 vs. limit=10.0 2024-09-16 15:10:06,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=23580.0, ans=0.125 2024-09-16 15:10:11,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.43 vs. limit=15.0 2024-09-16 15:10:23,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.283e+02 1.428e+02 1.705e+02 2.892e+02, threshold=2.856e+02, percent-clipped=0.0 2024-09-16 15:10:39,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=23660.0, ans=0.125 2024-09-16 15:10:42,652 INFO [train.py:1198] (0/2) Epoch 2, batch 1400, loss[loss=0.2837, ctc_loss=0.2332, cr_loss=0.3523, attn_decoder_loss=0.2815, over 29577.00 frames. ], tot_loss[loss=0.3411, ctc_loss=0.3013, cr_loss=0.4398, attn_decoder_loss=0.3358, over 5807985.58 frames. ], batch size: 69, lr: 3.91e-02, grad_scale: 16.0 2024-09-16 15:10:44,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=23700.0, ans=0.125 2024-09-16 15:10:52,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=23700.0, ans=0.2 2024-09-16 15:11:32,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.43 vs. limit=22.5 2024-09-16 15:11:48,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.23 vs. limit=22.5 2024-09-16 15:11:51,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=23860.0, ans=0.125 2024-09-16 15:11:54,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=23860.0, ans=0.2 2024-09-16 15:11:55,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.64 vs. limit=22.5 2024-09-16 15:11:59,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=23900.0, ans=0.04949747468305833 2024-09-16 15:12:00,325 INFO [train.py:1198] (0/2) Epoch 2, batch 1450, loss[loss=0.3675, ctc_loss=0.3388, cr_loss=0.4611, attn_decoder_loss=0.3605, over 29451.00 frames. ], tot_loss[loss=0.3423, ctc_loss=0.3024, cr_loss=0.4405, attn_decoder_loss=0.3369, over 5805522.61 frames. ], batch size: 94, lr: 3.90e-02, grad_scale: 16.0 2024-09-16 15:12:11,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=23900.0, ans=0.125 2024-09-16 15:12:15,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=23940.0, ans=0.2 2024-09-16 15:12:16,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.44 vs. limit=6.0 2024-09-16 15:12:17,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=23940.0, ans=0.125 2024-09-16 15:12:23,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.40 vs. limit=10.0 2024-09-16 15:12:37,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=23980.0, ans=0.2 2024-09-16 15:12:44,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=24020.0, ans=0.125 2024-09-16 15:12:56,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.348e+02 1.492e+02 1.698e+02 3.722e+02, threshold=2.983e+02, percent-clipped=2.0 2024-09-16 15:12:58,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=24020.0, ans=0.0 2024-09-16 15:13:16,027 INFO [train.py:1198] (0/2) Epoch 2, batch 1500, loss[loss=0.343, ctc_loss=0.3014, cr_loss=0.4264, attn_decoder_loss=0.3381, over 29618.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.3018, cr_loss=0.441, attn_decoder_loss=0.3367, over 5806731.72 frames. ], batch size: 86, lr: 3.90e-02, grad_scale: 16.0 2024-09-16 15:13:22,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=24100.0, ans=0.2 2024-09-16 15:13:25,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=24100.0, ans=0.07 2024-09-16 15:13:31,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=24140.0, ans=0.025 2024-09-16 15:13:37,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=24140.0, ans=0.0 2024-09-16 15:13:38,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=24140.0, ans=0.125 2024-09-16 15:13:54,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=24180.0, ans=10.0 2024-09-16 15:14:02,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.62 vs. limit=15.0 2024-09-16 15:14:05,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-09-16 15:14:34,957 INFO [train.py:1198] (0/2) Epoch 2, batch 1550, loss[loss=0.3563, ctc_loss=0.3137, cr_loss=0.4844, attn_decoder_loss=0.3502, over 29534.00 frames. ], tot_loss[loss=0.3422, ctc_loss=0.3023, cr_loss=0.442, attn_decoder_loss=0.3368, over 5782564.18 frames. ], batch size: 90, lr: 3.89e-02, grad_scale: 8.0 2024-09-16 15:14:42,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=24300.0, ans=0.2 2024-09-16 15:14:44,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=24300.0, ans=0.0 2024-09-16 15:14:44,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=24300.0, ans=0.125 2024-09-16 15:14:54,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=24340.0, ans=0.05 2024-09-16 15:14:54,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24340.0, ans=0.125 2024-09-16 15:14:55,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24340.0, ans=0.125 2024-09-16 15:15:05,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24380.0, ans=0.125 2024-09-16 15:15:34,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.379e+02 1.577e+02 1.948e+02 4.764e+02, threshold=3.154e+02, percent-clipped=9.0 2024-09-16 15:15:37,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24460.0, ans=0.1 2024-09-16 15:15:51,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.48 vs. limit=22.5 2024-09-16 15:15:52,707 INFO [train.py:1198] (0/2) Epoch 2, batch 1600, loss[loss=0.3432, ctc_loss=0.2902, cr_loss=0.4677, attn_decoder_loss=0.3387, over 29673.00 frames. ], tot_loss[loss=0.3418, ctc_loss=0.3022, cr_loss=0.4422, attn_decoder_loss=0.3364, over 5763640.70 frames. ], batch size: 85, lr: 3.88e-02, grad_scale: 16.0 2024-09-16 15:15:53,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=24500.0, ans=0.125 2024-09-16 15:15:56,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-16 15:16:14,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24540.0, ans=0.1 2024-09-16 15:16:23,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24580.0, ans=0.1 2024-09-16 15:16:39,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2024-09-16 15:16:43,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=24620.0, ans=0.0 2024-09-16 15:16:44,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=24620.0, ans=0.2 2024-09-16 15:16:48,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-09-16 15:16:58,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=24660.0, ans=0.125 2024-09-16 15:16:59,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=24660.0, ans=0.125 2024-09-16 15:17:08,438 INFO [train.py:1198] (0/2) Epoch 2, batch 1650, loss[loss=0.3654, ctc_loss=0.3257, cr_loss=0.4712, attn_decoder_loss=0.3594, over 29694.00 frames. ], tot_loss[loss=0.3415, ctc_loss=0.3018, cr_loss=0.442, attn_decoder_loss=0.3361, over 5759161.09 frames. ], batch size: 89, lr: 3.88e-02, grad_scale: 16.0 2024-09-16 15:17:24,484 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:17:25,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-09-16 15:17:42,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=24780.0, ans=0.125 2024-09-16 15:17:53,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=24780.0, ans=0.2 2024-09-16 15:17:53,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=24780.0, ans=0.0 2024-09-16 15:17:56,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=24820.0, ans=0.0 2024-09-16 15:18:08,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.020e+02 1.312e+02 1.453e+02 1.722e+02 6.388e+02, threshold=2.905e+02, percent-clipped=6.0 2024-09-16 15:18:19,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=24860.0, ans=0.2 2024-09-16 15:18:26,756 INFO [train.py:1198] (0/2) Epoch 2, batch 1700, loss[loss=0.301, ctc_loss=0.2543, cr_loss=0.4282, attn_decoder_loss=0.2967, over 29579.00 frames. ], tot_loss[loss=0.3408, ctc_loss=0.3007, cr_loss=0.4422, attn_decoder_loss=0.3355, over 5780811.13 frames. ], batch size: 69, lr: 3.87e-02, grad_scale: 16.0 2024-09-16 15:18:33,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=24900.0, ans=10.0 2024-09-16 15:19:05,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24980.0, ans=0.125 2024-09-16 15:19:16,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=25020.0, ans=0.0 2024-09-16 15:19:19,059 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:19:28,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=25060.0, ans=0.125 2024-09-16 15:19:40,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=25060.0, ans=0.025 2024-09-16 15:19:44,699 INFO [train.py:1198] (0/2) Epoch 2, batch 1750, loss[loss=0.3122, ctc_loss=0.2731, cr_loss=0.4232, attn_decoder_loss=0.3071, over 29366.00 frames. ], tot_loss[loss=0.3402, ctc_loss=0.2997, cr_loss=0.4418, attn_decoder_loss=0.3348, over 5788857.78 frames. ], batch size: 67, lr: 3.86e-02, grad_scale: 16.0 2024-09-16 15:19:50,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=8.0 2024-09-16 15:20:10,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=25140.0, ans=0.0 2024-09-16 15:20:23,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=25180.0, ans=0.2 2024-09-16 15:20:41,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-16 15:20:42,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.354e+02 1.539e+02 1.820e+02 3.547e+02, threshold=3.078e+02, percent-clipped=3.0 2024-09-16 15:20:43,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2024-09-16 15:20:53,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.81 vs. limit=10.0 2024-09-16 15:20:54,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=25260.0, ans=0.125 2024-09-16 15:21:00,332 INFO [train.py:1198] (0/2) Epoch 2, batch 1800, loss[loss=0.3593, ctc_loss=0.3199, cr_loss=0.4433, attn_decoder_loss=0.3538, over 29700.00 frames. ], tot_loss[loss=0.3403, ctc_loss=0.2999, cr_loss=0.4413, attn_decoder_loss=0.335, over 5791115.41 frames. ], batch size: 83, lr: 3.86e-02, grad_scale: 16.0 2024-09-16 15:21:12,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=15.0 2024-09-16 15:21:17,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=25340.0, ans=0.07 2024-09-16 15:21:32,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=25380.0, ans=0.125 2024-09-16 15:21:33,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25380.0, ans=0.1 2024-09-16 15:21:44,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=25380.0, ans=0.2 2024-09-16 15:21:50,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25420.0, ans=0.1 2024-09-16 15:21:50,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25420.0, ans=0.0 2024-09-16 15:21:53,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25420.0, ans=0.0 2024-09-16 15:21:54,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25420.0, ans=0.0 2024-09-16 15:22:00,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=25420.0, ans=0.125 2024-09-16 15:22:08,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=25460.0, ans=0.125 2024-09-16 15:22:18,555 INFO [train.py:1198] (0/2) Epoch 2, batch 1850, loss[loss=0.3562, ctc_loss=0.3009, cr_loss=0.4605, attn_decoder_loss=0.3521, over 29614.00 frames. ], tot_loss[loss=0.3398, ctc_loss=0.2991, cr_loss=0.4416, attn_decoder_loss=0.3345, over 5797722.75 frames. ], batch size: 86, lr: 3.85e-02, grad_scale: 16.0 2024-09-16 15:22:30,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=25500.0, ans=0.125 2024-09-16 15:22:56,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2024-09-16 15:23:19,024 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.341e+02 1.488e+02 1.704e+02 7.229e+02, threshold=2.976e+02, percent-clipped=2.0 2024-09-16 15:23:37,144 INFO [train.py:1198] (0/2) Epoch 2, batch 1900, loss[loss=0.3371, ctc_loss=0.2896, cr_loss=0.4359, attn_decoder_loss=0.3327, over 29712.00 frames. ], tot_loss[loss=0.3405, ctc_loss=0.2993, cr_loss=0.4435, attn_decoder_loss=0.3353, over 5805749.38 frames. ], batch size: 89, lr: 3.85e-02, grad_scale: 16.0 2024-09-16 15:23:53,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=25740.0, ans=0.005273913043478261 2024-09-16 15:23:58,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=25740.0, ans=0.125 2024-09-16 15:24:00,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=25740.0, ans=0.0 2024-09-16 15:24:53,305 INFO [train.py:1198] (0/2) Epoch 2, batch 1950, loss[loss=0.3456, ctc_loss=0.3076, cr_loss=0.4719, attn_decoder_loss=0.3393, over 29451.00 frames. ], tot_loss[loss=0.3415, ctc_loss=0.2998, cr_loss=0.445, attn_decoder_loss=0.3362, over 5819842.44 frames. ], batch size: 78, lr: 3.84e-02, grad_scale: 16.0 2024-09-16 15:25:02,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=25900.0, ans=0.125 2024-09-16 15:25:11,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25940.0, ans=0.1 2024-09-16 15:25:25,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.00 vs. limit=15.0 2024-09-16 15:25:27,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=25980.0, ans=0.04949747468305833 2024-09-16 15:25:35,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25980.0, ans=0.125 2024-09-16 15:25:41,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=12.0 2024-09-16 15:25:50,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26020.0, ans=0.1 2024-09-16 15:25:52,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.307e+02 1.485e+02 1.949e+02 3.051e+02, threshold=2.970e+02, percent-clipped=1.0 2024-09-16 15:25:57,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=26060.0, ans=0.125 2024-09-16 15:26:10,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=26100.0, ans=0.125 2024-09-16 15:26:11,402 INFO [train.py:1198] (0/2) Epoch 2, batch 2000, loss[loss=0.2864, ctc_loss=0.2374, cr_loss=0.3883, attn_decoder_loss=0.2832, over 29351.00 frames. ], tot_loss[loss=0.3416, ctc_loss=0.3002, cr_loss=0.4441, attn_decoder_loss=0.3364, over 5797484.79 frames. ], batch size: 67, lr: 3.83e-02, grad_scale: 32.0 2024-09-16 15:26:40,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-16 15:26:48,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.16 vs. limit=10.0 2024-09-16 15:26:52,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=26180.0, ans=0.025 2024-09-16 15:26:54,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.60 vs. limit=10.0 2024-09-16 15:27:11,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=26220.0, ans=0.0 2024-09-16 15:27:30,018 INFO [train.py:1198] (0/2) Epoch 2, batch 2050, loss[loss=0.3034, ctc_loss=0.2525, cr_loss=0.4133, attn_decoder_loss=0.2999, over 29400.00 frames. ], tot_loss[loss=0.3403, ctc_loss=0.299, cr_loss=0.4437, attn_decoder_loss=0.3351, over 5790124.50 frames. ], batch size: 70, lr: 3.83e-02, grad_scale: 16.0 2024-09-16 15:27:33,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=26300.0, ans=0.125 2024-09-16 15:27:38,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-16 15:27:58,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=26380.0, ans=0.125 2024-09-16 15:28:03,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=26380.0, ans=0.5 2024-09-16 15:28:28,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26420.0, ans=0.1 2024-09-16 15:28:29,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.317e+02 1.483e+02 1.822e+02 5.194e+02, threshold=2.965e+02, percent-clipped=3.0 2024-09-16 15:28:29,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=26460.0, ans=0.005117391304347826 2024-09-16 15:28:41,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=26460.0, ans=0.95 2024-09-16 15:28:46,403 INFO [train.py:1198] (0/2) Epoch 2, batch 2100, loss[loss=0.3264, ctc_loss=0.2695, cr_loss=0.417, attn_decoder_loss=0.3234, over 29758.00 frames. ], tot_loss[loss=0.3387, ctc_loss=0.2966, cr_loss=0.4428, attn_decoder_loss=0.3335, over 5801392.47 frames. ], batch size: 81, lr: 3.82e-02, grad_scale: 16.0 2024-09-16 15:28:51,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26500.0, ans=0.1 2024-09-16 15:29:10,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=26540.0, ans=0.125 2024-09-16 15:29:33,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=26620.0, ans=0.125 2024-09-16 15:29:33,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-09-16 15:29:42,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=26620.0, ans=0.025 2024-09-16 15:29:43,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=26620.0, ans=0.0 2024-09-16 15:30:05,024 INFO [train.py:1198] (0/2) Epoch 2, batch 2150, loss[loss=0.317, ctc_loss=0.2622, cr_loss=0.4439, attn_decoder_loss=0.3133, over 29440.00 frames. ], tot_loss[loss=0.3373, ctc_loss=0.2947, cr_loss=0.4421, attn_decoder_loss=0.3322, over 5816277.93 frames. ], batch size: 78, lr: 3.81e-02, grad_scale: 16.0 2024-09-16 15:30:16,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=26700.0, ans=0.125 2024-09-16 15:30:22,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26740.0, ans=0.1 2024-09-16 15:30:45,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26780.0, ans=0.1 2024-09-16 15:31:06,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.386e+02 1.602e+02 1.803e+02 8.431e+02, threshold=3.204e+02, percent-clipped=4.0 2024-09-16 15:31:23,606 INFO [train.py:1198] (0/2) Epoch 2, batch 2200, loss[loss=0.3431, ctc_loss=0.2882, cr_loss=0.4551, attn_decoder_loss=0.339, over 29631.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2946, cr_loss=0.4413, attn_decoder_loss=0.3324, over 5813750.49 frames. ], batch size: 86, lr: 3.81e-02, grad_scale: 16.0 2024-09-16 15:31:45,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=26940.0, ans=0.125 2024-09-16 15:31:56,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-16 15:31:57,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=26980.0, ans=0.0 2024-09-16 15:32:05,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=26980.0, ans=0.125 2024-09-16 15:32:05,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=26980.0, ans=10.0 2024-09-16 15:32:40,168 INFO [train.py:1198] (0/2) Epoch 2, batch 2250, loss[loss=0.3271, ctc_loss=0.279, cr_loss=0.4323, attn_decoder_loss=0.3228, over 29710.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.2941, cr_loss=0.4414, attn_decoder_loss=0.3321, over 5812786.53 frames. ], batch size: 82, lr: 3.80e-02, grad_scale: 16.0 2024-09-16 15:32:45,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.46 vs. limit=10.0 2024-09-16 15:32:49,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=27100.0, ans=0.0 2024-09-16 15:32:49,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=27100.0, ans=0.125 2024-09-16 15:33:03,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.82 vs. limit=22.5 2024-09-16 15:33:11,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.82 vs. limit=15.0 2024-09-16 15:33:25,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.11 vs. limit=22.5 2024-09-16 15:33:31,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=27220.0, ans=0.125 2024-09-16 15:33:41,482 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.301e+02 1.512e+02 1.808e+02 4.415e+02, threshold=3.025e+02, percent-clipped=2.0 2024-09-16 15:33:58,448 INFO [train.py:1198] (0/2) Epoch 2, batch 2300, loss[loss=0.2851, ctc_loss=0.2408, cr_loss=0.3688, attn_decoder_loss=0.2818, over 29318.00 frames. ], tot_loss[loss=0.3359, ctc_loss=0.2933, cr_loss=0.4403, attn_decoder_loss=0.3309, over 5799903.35 frames. ], batch size: 71, lr: 3.79e-02, grad_scale: 16.0 2024-09-16 15:34:04,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=27300.0, ans=0.95 2024-09-16 15:34:04,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=27300.0, ans=0.2 2024-09-16 15:34:09,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=27300.0, ans=0.125 2024-09-16 15:34:13,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=27340.0, ans=0.2 2024-09-16 15:34:16,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-09-16 15:34:34,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=27380.0, ans=0.125 2024-09-16 15:34:40,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=27380.0, ans=0.004917391304347826 2024-09-16 15:34:53,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=27420.0, ans=0.2 2024-09-16 15:35:16,568 INFO [train.py:1198] (0/2) Epoch 2, batch 2350, loss[loss=0.3655, ctc_loss=0.3238, cr_loss=0.4927, attn_decoder_loss=0.3592, over 29692.00 frames. ], tot_loss[loss=0.3364, ctc_loss=0.2936, cr_loss=0.4415, attn_decoder_loss=0.3313, over 5805027.13 frames. ], batch size: 83, lr: 3.79e-02, grad_scale: 16.0 2024-09-16 15:35:24,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=27500.0, ans=0.07 2024-09-16 15:35:50,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2024-09-16 15:35:54,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=27580.0, ans=0.125 2024-09-16 15:35:56,375 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.259e-03 2024-09-16 15:36:00,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=27620.0, ans=0.0 2024-09-16 15:36:11,543 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:36:15,850 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.428e+02 1.608e+02 2.014e+02 4.831e+02, threshold=3.217e+02, percent-clipped=8.0 2024-09-16 15:36:32,932 INFO [train.py:1198] (0/2) Epoch 2, batch 2400, loss[loss=0.3103, ctc_loss=0.2626, cr_loss=0.4015, attn_decoder_loss=0.3067, over 29540.00 frames. ], tot_loss[loss=0.3367, ctc_loss=0.2941, cr_loss=0.4423, attn_decoder_loss=0.3316, over 5808782.19 frames. ], batch size: 76, lr: 3.78e-02, grad_scale: 32.0 2024-09-16 15:36:48,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=27740.0, ans=0.1 2024-09-16 15:36:52,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=12.0 2024-09-16 15:36:59,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2024-09-16 15:37:31,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2024-09-16 15:37:48,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27860.0, ans=0.1 2024-09-16 15:37:51,267 INFO [train.py:1198] (0/2) Epoch 2, batch 2450, loss[loss=0.3279, ctc_loss=0.2829, cr_loss=0.4432, attn_decoder_loss=0.3231, over 29718.00 frames. ], tot_loss[loss=0.3379, ctc_loss=0.2952, cr_loss=0.4441, attn_decoder_loss=0.3327, over 5785726.79 frames. ], batch size: 82, lr: 3.78e-02, grad_scale: 16.0 2024-09-16 15:37:57,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=27900.0, ans=0.2 2024-09-16 15:38:03,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=27900.0, ans=0.004804347826086957 2024-09-16 15:38:08,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27940.0, ans=0.1 2024-09-16 15:38:08,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=27940.0, ans=0.1 2024-09-16 15:38:19,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=27940.0, ans=0.125 2024-09-16 15:38:39,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=28020.0, ans=0.004778260869565217 2024-09-16 15:38:54,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.358e+02 1.541e+02 1.889e+02 3.653e+02, threshold=3.082e+02, percent-clipped=2.0 2024-09-16 15:39:06,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2024-09-16 15:39:08,575 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:39:09,707 INFO [train.py:1198] (0/2) Epoch 2, batch 2500, loss[loss=0.3528, ctc_loss=0.2964, cr_loss=0.4517, attn_decoder_loss=0.349, over 29653.00 frames. ], tot_loss[loss=0.3376, ctc_loss=0.2947, cr_loss=0.4439, attn_decoder_loss=0.3326, over 5795965.45 frames. ], batch size: 86, lr: 3.77e-02, grad_scale: 16.0 2024-09-16 15:39:26,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=28140.0, ans=0.004752173913043479 2024-09-16 15:39:38,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28180.0, ans=0.125 2024-09-16 15:39:41,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28180.0, ans=0.1 2024-09-16 15:39:44,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=28180.0, ans=0.035 2024-09-16 15:40:04,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=28220.0, ans=0.125 2024-09-16 15:40:25,989 INFO [train.py:1198] (0/2) Epoch 2, batch 2550, loss[loss=0.2871, ctc_loss=0.2372, cr_loss=0.3774, attn_decoder_loss=0.2843, over 29340.00 frames. ], tot_loss[loss=0.3378, ctc_loss=0.2951, cr_loss=0.4448, attn_decoder_loss=0.3327, over 5798714.25 frames. ], batch size: 67, lr: 3.76e-02, grad_scale: 16.0 2024-09-16 15:40:39,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=28340.0, ans=0.2 2024-09-16 15:40:42,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=28340.0, ans=0.2 2024-09-16 15:40:45,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=28340.0, ans=0.125 2024-09-16 15:41:01,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28380.0, ans=0.125 2024-09-16 15:41:02,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-09-16 15:41:23,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-16 15:41:24,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=28420.0, ans=0.004691304347826087 2024-09-16 15:41:28,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.393e+02 1.535e+02 1.794e+02 3.607e+02, threshold=3.070e+02, percent-clipped=1.0 2024-09-16 15:41:30,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=28460.0, ans=0.125 2024-09-16 15:41:35,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=28460.0, ans=0.05 2024-09-16 15:41:38,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=28460.0, ans=0.0 2024-09-16 15:41:44,116 INFO [train.py:1198] (0/2) Epoch 2, batch 2600, loss[loss=0.3421, ctc_loss=0.2968, cr_loss=0.4605, attn_decoder_loss=0.3369, over 29432.00 frames. ], tot_loss[loss=0.338, ctc_loss=0.295, cr_loss=0.4446, attn_decoder_loss=0.3329, over 5794738.27 frames. ], batch size: 78, lr: 3.76e-02, grad_scale: 16.0 2024-09-16 15:41:53,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28500.0, ans=0.1 2024-09-16 15:42:10,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.46 vs. limit=15.0 2024-09-16 15:42:11,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=28540.0, ans=0.0 2024-09-16 15:42:25,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=28580.0, ans=0.025 2024-09-16 15:42:41,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=28620.0, ans=0.0 2024-09-16 15:42:41,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=15.0 2024-09-16 15:42:53,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=28660.0, ans=0.125 2024-09-16 15:43:00,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=28700.0, ans=0.004630434782608696 2024-09-16 15:43:01,956 INFO [train.py:1198] (0/2) Epoch 2, batch 2650, loss[loss=0.36, ctc_loss=0.3239, cr_loss=0.493, attn_decoder_loss=0.353, over 29228.00 frames. ], tot_loss[loss=0.3377, ctc_loss=0.2944, cr_loss=0.4447, attn_decoder_loss=0.3326, over 5801157.00 frames. ], batch size: 100, lr: 3.75e-02, grad_scale: 16.0 2024-09-16 15:43:02,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=28700.0, ans=0.125 2024-09-16 15:43:10,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=28700.0, ans=12.0 2024-09-16 15:43:11,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=28700.0, ans=0.125 2024-09-16 15:43:19,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=28740.0, ans=0.125 2024-09-16 15:43:20,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-09-16 15:43:26,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=28740.0, ans=0.125 2024-09-16 15:44:03,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.330e+02 1.517e+02 1.799e+02 4.153e+02, threshold=3.035e+02, percent-clipped=1.0 2024-09-16 15:44:18,681 INFO [train.py:1198] (0/2) Epoch 2, batch 2700, loss[loss=0.3408, ctc_loss=0.2838, cr_loss=0.4782, attn_decoder_loss=0.3365, over 29540.00 frames. ], tot_loss[loss=0.3382, ctc_loss=0.2949, cr_loss=0.4462, attn_decoder_loss=0.3331, over 5796578.94 frames. ], batch size: 87, lr: 3.74e-02, grad_scale: 16.0 2024-09-16 15:44:22,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=28900.0, ans=0.2 2024-09-16 15:44:34,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=28940.0, ans=0.004578260869565217 2024-09-16 15:44:37,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=28940.0, ans=0.07 2024-09-16 15:44:37,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-09-16 15:44:43,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=28940.0, ans=0.2 2024-09-16 15:45:00,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=28980.0, ans=0.0 2024-09-16 15:45:01,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=28980.0, ans=0.125 2024-09-16 15:45:27,819 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:45:36,948 INFO [train.py:1198] (0/2) Epoch 2, batch 2750, loss[loss=0.3227, ctc_loss=0.2824, cr_loss=0.4449, attn_decoder_loss=0.3172, over 29516.00 frames. ], tot_loss[loss=0.3367, ctc_loss=0.2933, cr_loss=0.4446, attn_decoder_loss=0.3317, over 5793992.87 frames. ], batch size: 75, lr: 3.74e-02, grad_scale: 8.0 2024-09-16 15:45:43,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=29100.0, ans=0.2 2024-09-16 15:45:47,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=29100.0, ans=0.125 2024-09-16 15:45:54,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=29140.0, ans=0.125 2024-09-16 15:46:08,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.60 vs. limit=12.0 2024-09-16 15:46:09,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=29180.0, ans=0.025 2024-09-16 15:46:11,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-09-16 15:46:18,617 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:46:23,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.18 vs. limit=22.5 2024-09-16 15:46:30,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=29220.0, ans=0.1 2024-09-16 15:46:40,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=29260.0, ans=0.125 2024-09-16 15:46:41,282 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.322e+02 1.540e+02 1.938e+02 5.454e+02, threshold=3.080e+02, percent-clipped=6.0 2024-09-16 15:46:55,039 INFO [train.py:1198] (0/2) Epoch 2, batch 2800, loss[loss=0.4122, ctc_loss=0.4138, cr_loss=0.4561, attn_decoder_loss=0.4019, over 20155.00 frames. ], tot_loss[loss=0.3368, ctc_loss=0.2934, cr_loss=0.4439, attn_decoder_loss=0.3317, over 5774490.12 frames. ], batch size: 211, lr: 3.73e-02, grad_scale: 16.0 2024-09-16 15:47:15,092 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:47:36,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.40 vs. limit=15.0 2024-09-16 15:47:43,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=29420.0, ans=0.125 2024-09-16 15:47:43,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=29420.0, ans=0.0 2024-09-16 15:48:00,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.04 vs. limit=15.0 2024-09-16 15:48:10,149 INFO [train.py:1198] (0/2) Epoch 2, batch 2850, loss[loss=0.3235, ctc_loss=0.2818, cr_loss=0.4218, attn_decoder_loss=0.3188, over 29521.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2943, cr_loss=0.445, attn_decoder_loss=0.3325, over 5760796.79 frames. ], batch size: 77, lr: 3.73e-02, grad_scale: 16.0 2024-09-16 15:48:13,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=29500.0, ans=0.125 2024-09-16 15:48:36,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2024-09-16 15:48:42,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=29580.0, ans=0.125 2024-09-16 15:48:42,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=29580.0, ans=0.004439130434782609 2024-09-16 15:48:45,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=29580.0, ans=0.125 2024-09-16 15:49:15,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.408e+02 1.587e+02 1.885e+02 4.187e+02, threshold=3.175e+02, percent-clipped=5.0 2024-09-16 15:49:20,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=29660.0, ans=0.004421739130434782 2024-09-16 15:49:20,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=29660.0, ans=0.1 2024-09-16 15:49:21,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=29660.0, ans=0.035 2024-09-16 15:49:22,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=29660.0, ans=0.125 2024-09-16 15:49:28,801 INFO [train.py:1198] (0/2) Epoch 2, batch 2900, loss[loss=0.3339, ctc_loss=0.2826, cr_loss=0.4583, attn_decoder_loss=0.3294, over 29398.00 frames. ], tot_loss[loss=0.3384, ctc_loss=0.2945, cr_loss=0.4468, attn_decoder_loss=0.3333, over 5786684.12 frames. ], batch size: 79, lr: 3.72e-02, grad_scale: 16.0 2024-09-16 15:49:31,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-09-16 15:49:46,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.86 vs. limit=10.0 2024-09-16 15:50:02,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=29780.0, ans=0.125 2024-09-16 15:50:05,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=29780.0, ans=0.2 2024-09-16 15:50:06,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2024-09-16 15:50:13,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=29780.0, ans=0.0 2024-09-16 15:50:15,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.58 vs. limit=22.5 2024-09-16 15:50:23,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29820.0, ans=0.1 2024-09-16 15:50:44,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2024-09-16 15:50:46,443 INFO [train.py:1198] (0/2) Epoch 2, batch 2950, loss[loss=0.3103, ctc_loss=0.2623, cr_loss=0.4438, attn_decoder_loss=0.3058, over 29529.00 frames. ], tot_loss[loss=0.3356, ctc_loss=0.2917, cr_loss=0.4442, attn_decoder_loss=0.3306, over 5782798.37 frames. ], batch size: 75, lr: 3.71e-02, grad_scale: 16.0 2024-09-16 15:51:03,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=29940.0, ans=0.2 2024-09-16 15:51:18,614 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:51:19,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=29980.0, ans=0.125 2024-09-16 15:51:23,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=29980.0, ans=0.07 2024-09-16 15:51:25,972 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:51:31,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.61 vs. limit=10.0 2024-09-16 15:51:35,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=30020.0, ans=0.05 2024-09-16 15:51:48,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.374e+02 1.533e+02 1.890e+02 8.560e+02, threshold=3.066e+02, percent-clipped=4.0 2024-09-16 15:52:02,437 INFO [train.py:1198] (0/2) Epoch 2, batch 3000, loss[loss=0.3302, ctc_loss=0.274, cr_loss=0.4408, attn_decoder_loss=0.3266, over 29750.00 frames. ], tot_loss[loss=0.3353, ctc_loss=0.2917, cr_loss=0.4437, attn_decoder_loss=0.3303, over 5784118.58 frames. ], batch size: 81, lr: 3.71e-02, grad_scale: 16.0 2024-09-16 15:52:02,438 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 15:52:20,640 INFO [train.py:1230] (0/2) Epoch 2, validation: loss=0.2432, ctc_loss=0.1092, cr_loss=4.796e-15, attn_decoder_loss=0.2581, over 944034.00 frames. 2024-09-16 15:52:20,640 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 15:52:25,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=30100.0, ans=0.0 2024-09-16 15:52:27,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=30100.0, ans=0.09899494936611666 2024-09-16 15:52:34,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=30140.0, ans=0.0 2024-09-16 15:52:36,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=30140.0, ans=0.2 2024-09-16 15:52:39,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.79 vs. limit=15.0 2024-09-16 15:52:57,548 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.044e-02 2024-09-16 15:53:14,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=30220.0, ans=0.2 2024-09-16 15:53:17,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=30220.0, ans=0.2 2024-09-16 15:53:17,524 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 15:53:26,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-16 15:53:33,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=30260.0, ans=0.125 2024-09-16 15:53:42,110 INFO [train.py:1198] (0/2) Epoch 2, batch 3050, loss[loss=0.3015, ctc_loss=0.2461, cr_loss=0.3951, attn_decoder_loss=0.2989, over 29525.00 frames. ], tot_loss[loss=0.3364, ctc_loss=0.2928, cr_loss=0.4451, attn_decoder_loss=0.3313, over 5778212.43 frames. ], batch size: 76, lr: 3.70e-02, grad_scale: 16.0 2024-09-16 15:53:43,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=30300.0, ans=0.125 2024-09-16 15:53:53,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-09-16 15:54:14,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.73 vs. limit=22.5 2024-09-16 15:54:16,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.03 vs. limit=15.0 2024-09-16 15:54:16,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2024-09-16 15:54:28,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=30420.0, ans=0.125 2024-09-16 15:54:34,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2024-09-16 15:54:41,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=30460.0, ans=0.2 2024-09-16 15:54:44,413 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.365e+02 1.556e+02 1.852e+02 9.980e+02, threshold=3.113e+02, percent-clipped=5.0 2024-09-16 15:54:47,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=30460.0, ans=0.0042478260869565215 2024-09-16 15:54:47,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=30460.0, ans=0.0042478260869565215 2024-09-16 15:54:54,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2024-09-16 15:54:56,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=30500.0, ans=0.004239130434782609 2024-09-16 15:54:57,987 INFO [train.py:1198] (0/2) Epoch 2, batch 3100, loss[loss=0.3783, ctc_loss=0.3545, cr_loss=0.4712, attn_decoder_loss=0.3704, over 29258.00 frames. ], tot_loss[loss=0.3357, ctc_loss=0.292, cr_loss=0.4445, attn_decoder_loss=0.3307, over 5777605.01 frames. ], batch size: 100, lr: 3.69e-02, grad_scale: 16.0 2024-09-16 15:55:08,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=30500.0, ans=0.125 2024-09-16 15:55:10,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=30500.0, ans=0.125 2024-09-16 15:55:25,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=30540.0, ans=0.04949747468305833 2024-09-16 15:55:30,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=24.03 vs. limit=22.5 2024-09-16 15:55:36,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-09-16 15:55:43,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=30620.0, ans=0.125 2024-09-16 15:55:53,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-09-16 15:56:02,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=30660.0, ans=0.125 2024-09-16 15:56:05,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=30660.0, ans=0.1 2024-09-16 15:56:10,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30660.0, ans=0.1 2024-09-16 15:56:14,194 INFO [train.py:1198] (0/2) Epoch 2, batch 3150, loss[loss=0.365, ctc_loss=0.3211, cr_loss=0.4689, attn_decoder_loss=0.3594, over 28974.00 frames. ], tot_loss[loss=0.3353, ctc_loss=0.2917, cr_loss=0.4445, attn_decoder_loss=0.3303, over 5783729.60 frames. ], batch size: 104, lr: 3.69e-02, grad_scale: 16.0 2024-09-16 15:56:41,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=30740.0, ans=0.125 2024-09-16 15:57:18,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-09-16 15:57:20,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.339e+02 1.514e+02 1.730e+02 4.890e+02, threshold=3.027e+02, percent-clipped=3.0 2024-09-16 15:57:28,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=30860.0, ans=0.025 2024-09-16 15:57:34,169 INFO [train.py:1198] (0/2) Epoch 2, batch 3200, loss[loss=0.3477, ctc_loss=0.3105, cr_loss=0.5123, attn_decoder_loss=0.3405, over 29413.00 frames. ], tot_loss[loss=0.3344, ctc_loss=0.2901, cr_loss=0.4447, attn_decoder_loss=0.3294, over 5793171.15 frames. ], batch size: 79, lr: 3.68e-02, grad_scale: 32.0 2024-09-16 15:57:43,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30900.0, ans=0.1 2024-09-16 15:57:48,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=30940.0, ans=0.125 2024-09-16 15:57:54,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=30940.0, ans=0.5 2024-09-16 15:57:55,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=30940.0, ans=0.004143478260869566 2024-09-16 15:58:17,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=30980.0, ans=0.125 2024-09-16 15:58:19,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=31020.0, ans=0.0 2024-09-16 15:58:29,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=31020.0, ans=0.125 2024-09-16 15:58:36,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.63 vs. limit=22.5 2024-09-16 15:58:38,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=31060.0, ans=0.025 2024-09-16 15:58:50,040 INFO [train.py:1198] (0/2) Epoch 2, batch 3250, loss[loss=0.3339, ctc_loss=0.2805, cr_loss=0.4203, attn_decoder_loss=0.3305, over 29701.00 frames. ], tot_loss[loss=0.3345, ctc_loss=0.2898, cr_loss=0.4448, attn_decoder_loss=0.3296, over 5799976.37 frames. ], batch size: 84, lr: 3.68e-02, grad_scale: 16.0 2024-09-16 15:59:02,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=31100.0, ans=0.125 2024-09-16 15:59:24,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=31180.0, ans=0.125 2024-09-16 15:59:47,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.49 vs. limit=22.5 2024-09-16 15:59:53,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.355e+02 1.599e+02 1.863e+02 1.090e+03, threshold=3.197e+02, percent-clipped=6.0 2024-09-16 15:59:58,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=31260.0, ans=0.0 2024-09-16 16:00:03,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.39 vs. limit=10.0 2024-09-16 16:00:04,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=31300.0, ans=0.09899494936611666 2024-09-16 16:00:06,053 INFO [train.py:1198] (0/2) Epoch 2, batch 3300, loss[loss=0.3465, ctc_loss=0.3002, cr_loss=0.4704, attn_decoder_loss=0.3412, over 28204.00 frames. ], tot_loss[loss=0.3332, ctc_loss=0.2888, cr_loss=0.4435, attn_decoder_loss=0.3282, over 5797286.09 frames. ], batch size: 111, lr: 3.67e-02, grad_scale: 16.0 2024-09-16 16:00:15,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=31300.0, ans=0.0 2024-09-16 16:00:22,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten.whitening_limit, batch_count=31340.0, ans=15.0 2024-09-16 16:00:41,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=31380.0, ans=0.125 2024-09-16 16:00:42,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-09-16 16:00:56,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31420.0, ans=0.1 2024-09-16 16:01:01,982 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.068e-02 2024-09-16 16:01:11,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31460.0, ans=0.0 2024-09-16 16:01:25,569 INFO [train.py:1198] (0/2) Epoch 2, batch 3350, loss[loss=0.3568, ctc_loss=0.3173, cr_loss=0.4685, attn_decoder_loss=0.3508, over 28806.00 frames. ], tot_loss[loss=0.3347, ctc_loss=0.2907, cr_loss=0.4443, attn_decoder_loss=0.3297, over 5774403.99 frames. ], batch size: 104, lr: 3.66e-02, grad_scale: 16.0 2024-09-16 16:02:03,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=31580.0, ans=0.125 2024-09-16 16:02:05,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2024-09-16 16:02:18,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=31620.0, ans=0.1 2024-09-16 16:02:29,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.443e+02 1.604e+02 1.937e+02 5.792e+02, threshold=3.209e+02, percent-clipped=1.0 2024-09-16 16:02:29,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=31660.0, ans=0.00398695652173913 2024-09-16 16:02:41,560 INFO [train.py:1198] (0/2) Epoch 2, batch 3400, loss[loss=0.298, ctc_loss=0.2441, cr_loss=0.4402, attn_decoder_loss=0.2942, over 29356.00 frames. ], tot_loss[loss=0.3342, ctc_loss=0.2903, cr_loss=0.4445, attn_decoder_loss=0.3292, over 5766377.80 frames. ], batch size: 67, lr: 3.66e-02, grad_scale: 16.0 2024-09-16 16:02:47,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=31700.0, ans=0.125 2024-09-16 16:02:58,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=31740.0, ans=0.125 2024-09-16 16:03:06,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31740.0, ans=0.1 2024-09-16 16:03:39,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=31820.0, ans=0.0 2024-09-16 16:03:54,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=31860.0, ans=0.2 2024-09-16 16:03:57,317 INFO [train.py:1198] (0/2) Epoch 2, batch 3450, loss[loss=0.3335, ctc_loss=0.2848, cr_loss=0.4398, attn_decoder_loss=0.3291, over 28234.00 frames. ], tot_loss[loss=0.3344, ctc_loss=0.2898, cr_loss=0.4449, attn_decoder_loss=0.3295, over 5774552.36 frames. ], batch size: 111, lr: 3.65e-02, grad_scale: 16.0 2024-09-16 16:04:08,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=31900.0, ans=0.125 2024-09-16 16:04:08,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=31900.0, ans=0.0 2024-09-16 16:04:14,142 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:04:36,149 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-8000.pt 2024-09-16 16:04:55,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=32020.0, ans=0.125 2024-09-16 16:05:09,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.17 vs. limit=22.5 2024-09-16 16:05:12,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.023e+02 1.335e+02 1.514e+02 1.734e+02 4.417e+02, threshold=3.028e+02, percent-clipped=1.0 2024-09-16 16:05:14,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=32060.0, ans=0.125 2024-09-16 16:05:24,257 INFO [train.py:1198] (0/2) Epoch 2, batch 3500, loss[loss=0.3122, ctc_loss=0.2701, cr_loss=0.42, attn_decoder_loss=0.3075, over 29314.00 frames. ], tot_loss[loss=0.3338, ctc_loss=0.2896, cr_loss=0.4445, attn_decoder_loss=0.3289, over 5775977.11 frames. ], batch size: 71, lr: 3.65e-02, grad_scale: 16.0 2024-09-16 16:05:42,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=32140.0, ans=0.0 2024-09-16 16:05:54,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=32180.0, ans=0.09899494936611666 2024-09-16 16:06:01,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=32180.0, ans=0.125 2024-09-16 16:06:06,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=32180.0, ans=0.125 2024-09-16 16:06:09,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=32220.0, ans=0.125 2024-09-16 16:06:15,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=32220.0, ans=0.003865217391304348 2024-09-16 16:06:20,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.93 vs. limit=15.0 2024-09-16 16:06:25,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=32260.0, ans=0.05 2024-09-16 16:06:38,935 INFO [train.py:1198] (0/2) Epoch 2, batch 3550, loss[loss=0.321, ctc_loss=0.261, cr_loss=0.4234, attn_decoder_loss=0.3183, over 29707.00 frames. ], tot_loss[loss=0.3331, ctc_loss=0.2883, cr_loss=0.444, attn_decoder_loss=0.3283, over 5782431.52 frames. ], batch size: 89, lr: 3.64e-02, grad_scale: 16.0 2024-09-16 16:07:01,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=32340.0, ans=0.125 2024-09-16 16:07:03,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-16 16:07:07,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=32380.0, ans=0.0 2024-09-16 16:07:26,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=32420.0, ans=0.09899494936611666 2024-09-16 16:07:26,932 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:07:41,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.383e+02 1.528e+02 1.788e+02 3.393e+02, threshold=3.056e+02, percent-clipped=1.0 2024-09-16 16:07:44,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.37 vs. limit=15.0 2024-09-16 16:07:53,261 INFO [train.py:1198] (0/2) Epoch 2, batch 3600, loss[loss=0.3095, ctc_loss=0.2552, cr_loss=0.4371, attn_decoder_loss=0.3058, over 29530.00 frames. ], tot_loss[loss=0.3329, ctc_loss=0.2877, cr_loss=0.4446, attn_decoder_loss=0.3281, over 5791223.75 frames. ], batch size: 77, lr: 3.63e-02, grad_scale: 32.0 2024-09-16 16:07:54,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-16 16:08:06,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=32540.0, ans=0.125 2024-09-16 16:08:09,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=32540.0, ans=0.125 2024-09-16 16:08:19,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-16 16:08:31,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-09-16 16:08:48,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=32620.0, ans=0.003778260869565218 2024-09-16 16:09:07,512 INFO [train.py:1198] (0/2) Epoch 2, batch 3650, loss[loss=0.3481, ctc_loss=0.2968, cr_loss=0.4608, attn_decoder_loss=0.3435, over 29488.00 frames. ], tot_loss[loss=0.332, ctc_loss=0.2865, cr_loss=0.4446, attn_decoder_loss=0.3272, over 5792980.17 frames. ], batch size: 90, lr: 3.63e-02, grad_scale: 16.0 2024-09-16 16:09:21,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=32740.0, ans=0.125 2024-09-16 16:09:56,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=32820.0, ans=0.0 2024-09-16 16:10:02,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32820.0, ans=0.1 2024-09-16 16:10:12,078 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.348e+02 1.495e+02 1.801e+02 3.465e+02, threshold=2.990e+02, percent-clipped=2.0 2024-09-16 16:10:16,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=32860.0, ans=0.2 2024-09-16 16:10:23,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=32900.0, ans=0.2 2024-09-16 16:10:24,619 INFO [train.py:1198] (0/2) Epoch 2, batch 3700, loss[loss=0.3598, ctc_loss=0.3209, cr_loss=0.49, attn_decoder_loss=0.3532, over 29702.00 frames. ], tot_loss[loss=0.3322, ctc_loss=0.2868, cr_loss=0.4449, attn_decoder_loss=0.3273, over 5803487.21 frames. ], batch size: 84, lr: 3.62e-02, grad_scale: 16.0 2024-09-16 16:10:40,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=32940.0, ans=0.125 2024-09-16 16:10:43,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=32940.0, ans=0.125 2024-09-16 16:10:43,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=32940.0, ans=0.125 2024-09-16 16:11:11,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=33020.0, ans=10.0 2024-09-16 16:11:26,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.76 vs. limit=15.0 2024-09-16 16:11:32,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33060.0, ans=0.1 2024-09-16 16:11:32,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=33060.0, ans=0.2 2024-09-16 16:11:41,225 INFO [train.py:1198] (0/2) Epoch 2, batch 3750, loss[loss=0.2848, ctc_loss=0.2478, cr_loss=0.3819, attn_decoder_loss=0.2804, over 29349.00 frames. ], tot_loss[loss=0.3316, ctc_loss=0.286, cr_loss=0.4442, attn_decoder_loss=0.3267, over 5808139.59 frames. ], batch size: 67, lr: 3.62e-02, grad_scale: 16.0 2024-09-16 16:11:44,680 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.602e-03 2024-09-16 16:11:52,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=33100.0, ans=0.0 2024-09-16 16:11:52,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=33100.0, ans=0.2 2024-09-16 16:11:59,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=33140.0, ans=0.125 2024-09-16 16:11:59,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=33140.0, ans=0.125 2024-09-16 16:12:18,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=33180.0, ans=0.003656521739130435 2024-09-16 16:12:24,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=33220.0, ans=0.0036478260869565226 2024-09-16 16:12:33,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=33220.0, ans=0.125 2024-09-16 16:12:42,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=33260.0, ans=0.125 2024-09-16 16:12:44,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=33260.0, ans=0.125 2024-09-16 16:12:45,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.433e+02 1.658e+02 2.025e+02 1.075e+03, threshold=3.317e+02, percent-clipped=10.0 2024-09-16 16:12:55,952 INFO [train.py:1198] (0/2) Epoch 2, batch 3800, loss[loss=0.33, ctc_loss=0.2706, cr_loss=0.4507, attn_decoder_loss=0.3266, over 29626.00 frames. ], tot_loss[loss=0.3315, ctc_loss=0.286, cr_loss=0.4443, attn_decoder_loss=0.3267, over 5799408.14 frames. ], batch size: 86, lr: 3.61e-02, grad_scale: 16.0 2024-09-16 16:13:00,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=33300.0, ans=0.125 2024-09-16 16:13:08,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=33300.0, ans=0.07 2024-09-16 16:13:08,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33300.0, ans=0.125 2024-09-16 16:13:14,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=33340.0, ans=0.0 2024-09-16 16:13:20,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.00 vs. limit=15.0 2024-09-16 16:13:25,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=24.96 vs. limit=22.5 2024-09-16 16:13:27,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=33380.0, ans=0.00361304347826087 2024-09-16 16:13:48,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=33420.0, ans=0.0 2024-09-16 16:14:10,551 INFO [train.py:1198] (0/2) Epoch 2, batch 3850, loss[loss=0.3489, ctc_loss=0.2984, cr_loss=0.4427, attn_decoder_loss=0.3447, over 29203.00 frames. ], tot_loss[loss=0.3316, ctc_loss=0.286, cr_loss=0.4444, attn_decoder_loss=0.3268, over 5813559.35 frames. ], batch size: 100, lr: 3.60e-02, grad_scale: 16.0 2024-09-16 16:14:18,307 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:14:21,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33500.0, ans=0.1 2024-09-16 16:14:22,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=33500.0, ans=0.0 2024-09-16 16:14:24,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-16 16:14:35,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=33540.0, ans=0.0 2024-09-16 16:14:37,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=33540.0, ans=0.125 2024-09-16 16:15:04,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=33620.0, ans=0.95 2024-09-16 16:15:05,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.27 vs. limit=10.0 2024-09-16 16:15:14,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.336e+02 1.556e+02 1.830e+02 4.264e+02, threshold=3.112e+02, percent-clipped=3.0 2024-09-16 16:15:25,317 INFO [train.py:1198] (0/2) Epoch 2, batch 3900, loss[loss=0.3764, ctc_loss=0.3482, cr_loss=0.4995, attn_decoder_loss=0.3685, over 29637.00 frames. ], tot_loss[loss=0.3321, ctc_loss=0.2865, cr_loss=0.4464, attn_decoder_loss=0.3273, over 5817847.41 frames. ], batch size: 86, lr: 3.60e-02, grad_scale: 16.0 2024-09-16 16:15:57,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.37 vs. limit=10.0 2024-09-16 16:16:20,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.31 vs. limit=22.5 2024-09-16 16:16:42,720 INFO [train.py:1198] (0/2) Epoch 2, batch 3950, loss[loss=0.3441, ctc_loss=0.2985, cr_loss=0.442, attn_decoder_loss=0.3394, over 29451.00 frames. ], tot_loss[loss=0.3317, ctc_loss=0.2858, cr_loss=0.4465, attn_decoder_loss=0.3269, over 5837192.31 frames. ], batch size: 97, lr: 3.59e-02, grad_scale: 16.0 2024-09-16 16:16:52,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.03 vs. limit=22.5 2024-09-16 16:16:54,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33900.0, ans=0.125 2024-09-16 16:17:05,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=33940.0, ans=0.025 2024-09-16 16:17:11,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=33980.0, ans=0.125 2024-09-16 16:17:28,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=34020.0, ans=0.1 2024-09-16 16:17:38,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=12.0 2024-09-16 16:17:40,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=15.0 2024-09-16 16:17:44,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.99 vs. limit=22.5 2024-09-16 16:17:46,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.360e+02 1.544e+02 1.951e+02 4.705e+02, threshold=3.088e+02, percent-clipped=4.0 2024-09-16 16:17:57,006 INFO [train.py:1198] (0/2) Epoch 2, batch 4000, loss[loss=0.3295, ctc_loss=0.2892, cr_loss=0.4097, attn_decoder_loss=0.3249, over 29493.00 frames. ], tot_loss[loss=0.3317, ctc_loss=0.286, cr_loss=0.4456, attn_decoder_loss=0.3269, over 5814567.55 frames. ], batch size: 74, lr: 3.59e-02, grad_scale: 32.0 2024-09-16 16:18:08,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34100.0, ans=0.1 2024-09-16 16:18:19,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=34140.0, ans=0.125 2024-09-16 16:18:28,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=34180.0, ans=0.2 2024-09-16 16:18:28,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-09-16 16:18:38,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34180.0, ans=0.1 2024-09-16 16:18:39,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=34180.0, ans=0.125 2024-09-16 16:18:44,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=34220.0, ans=0.2 2024-09-16 16:18:58,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34260.0, ans=0.125 2024-09-16 16:19:11,332 INFO [train.py:1198] (0/2) Epoch 2, batch 4050, loss[loss=0.3931, ctc_loss=0.3838, cr_loss=0.4493, attn_decoder_loss=0.3842, over 20658.00 frames. ], tot_loss[loss=0.3321, ctc_loss=0.2865, cr_loss=0.4456, attn_decoder_loss=0.3273, over 5799121.65 frames. ], batch size: 210, lr: 3.58e-02, grad_scale: 16.0 2024-09-16 16:19:37,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-09-16 16:19:43,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=34380.0, ans=0.05 2024-09-16 16:19:51,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=34380.0, ans=0.125 2024-09-16 16:19:56,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.92 vs. limit=15.0 2024-09-16 16:19:58,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=34420.0, ans=0.2 2024-09-16 16:20:04,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=34420.0, ans=0.125 2024-09-16 16:20:06,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.47 vs. limit=12.0 2024-09-16 16:20:16,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.473e+02 1.673e+02 1.934e+02 5.199e+02, threshold=3.345e+02, percent-clipped=3.0 2024-09-16 16:20:19,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=34460.0, ans=0.125 2024-09-16 16:20:26,777 INFO [train.py:1198] (0/2) Epoch 2, batch 4100, loss[loss=0.339, ctc_loss=0.2921, cr_loss=0.451, attn_decoder_loss=0.3342, over 29510.00 frames. ], tot_loss[loss=0.3319, ctc_loss=0.2859, cr_loss=0.4463, attn_decoder_loss=0.3271, over 5792730.46 frames. ], batch size: 90, lr: 3.57e-02, grad_scale: 16.0 2024-09-16 16:20:28,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=34500.0, ans=0.125 2024-09-16 16:20:51,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=34540.0, ans=0.025 2024-09-16 16:21:21,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.28 vs. limit=15.0 2024-09-16 16:21:42,416 INFO [train.py:1198] (0/2) Epoch 2, batch 4150, loss[loss=0.3178, ctc_loss=0.2734, cr_loss=0.4773, attn_decoder_loss=0.3121, over 29507.00 frames. ], tot_loss[loss=0.3312, ctc_loss=0.2851, cr_loss=0.4465, attn_decoder_loss=0.3264, over 5797992.66 frames. ], batch size: 77, lr: 3.57e-02, grad_scale: 8.0 2024-09-16 16:21:44,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.44 vs. limit=22.5 2024-09-16 16:22:07,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=34740.0, ans=0.003317391304347826 2024-09-16 16:22:22,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=34780.0, ans=0.0033086956521739133 2024-09-16 16:22:24,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=34780.0, ans=0.125 2024-09-16 16:22:31,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=34820.0, ans=0.025 2024-09-16 16:22:48,752 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.357e+02 1.525e+02 1.720e+02 3.077e+02, threshold=3.049e+02, percent-clipped=0.0 2024-09-16 16:22:50,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.36 vs. limit=22.5 2024-09-16 16:22:56,137 INFO [train.py:1198] (0/2) Epoch 2, batch 4200, loss[loss=0.3626, ctc_loss=0.3153, cr_loss=0.4578, attn_decoder_loss=0.3577, over 29539.00 frames. ], tot_loss[loss=0.3314, ctc_loss=0.2849, cr_loss=0.446, attn_decoder_loss=0.3266, over 5800199.16 frames. ], batch size: 90, lr: 3.56e-02, grad_scale: 8.0 2024-09-16 16:23:18,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=34940.0, ans=0.125 2024-09-16 16:23:44,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2024-09-16 16:23:45,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=35020.0, ans=0.2 2024-09-16 16:23:49,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35020.0, ans=0.1 2024-09-16 16:23:55,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=35060.0, ans=0.2 2024-09-16 16:23:56,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35060.0, ans=0.1 2024-09-16 16:24:01,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=35060.0, ans=0.2 2024-09-16 16:24:09,857 INFO [train.py:1198] (0/2) Epoch 2, batch 4250, loss[loss=0.2973, ctc_loss=0.2433, cr_loss=0.4014, attn_decoder_loss=0.2943, over 29496.00 frames. ], tot_loss[loss=0.3312, ctc_loss=0.2843, cr_loss=0.4453, attn_decoder_loss=0.3265, over 5805294.78 frames. ], batch size: 74, lr: 3.56e-02, grad_scale: 8.0 2024-09-16 16:24:31,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=35140.0, ans=0.2 2024-09-16 16:24:37,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2024-09-16 16:24:40,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35180.0, ans=0.1 2024-09-16 16:24:50,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=35180.0, ans=0.04949747468305833 2024-09-16 16:25:18,427 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.431e+02 1.619e+02 1.873e+02 2.888e+02, threshold=3.237e+02, percent-clipped=0.0 2024-09-16 16:25:22,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-16 16:25:25,684 INFO [train.py:1198] (0/2) Epoch 2, batch 4300, loss[loss=0.3552, ctc_loss=0.3011, cr_loss=0.4552, attn_decoder_loss=0.3511, over 29517.00 frames. ], tot_loss[loss=0.3313, ctc_loss=0.2843, cr_loss=0.4456, attn_decoder_loss=0.3266, over 5794834.34 frames. ], batch size: 87, lr: 3.55e-02, grad_scale: 8.0 2024-09-16 16:26:03,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=35380.0, ans=0.0 2024-09-16 16:26:21,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=35420.0, ans=0.125 2024-09-16 16:26:40,300 INFO [train.py:1198] (0/2) Epoch 2, batch 4350, loss[loss=0.33, ctc_loss=0.2854, cr_loss=0.4345, attn_decoder_loss=0.3253, over 29488.00 frames. ], tot_loss[loss=0.335, ctc_loss=0.2879, cr_loss=0.4504, attn_decoder_loss=0.3302, over 5796773.34 frames. ], batch size: 97, lr: 3.54e-02, grad_scale: 8.0 2024-09-16 16:27:00,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-09-16 16:27:11,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=35580.0, ans=0.125 2024-09-16 16:27:16,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=35580.0, ans=0.125 2024-09-16 16:27:48,086 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.425e+02 1.627e+02 1.817e+02 2.716e+02, threshold=3.254e+02, percent-clipped=0.0 2024-09-16 16:27:48,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=35660.0, ans=0.0 2024-09-16 16:27:55,406 INFO [train.py:1198] (0/2) Epoch 2, batch 4400, loss[loss=0.3466, ctc_loss=0.2974, cr_loss=0.478, attn_decoder_loss=0.3414, over 27412.00 frames. ], tot_loss[loss=0.338, ctc_loss=0.2915, cr_loss=0.4528, attn_decoder_loss=0.3331, over 5766800.93 frames. ], batch size: 124, lr: 3.54e-02, grad_scale: 16.0 2024-09-16 16:28:00,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=35700.0, ans=0.0 2024-09-16 16:28:07,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35700.0, ans=0.1 2024-09-16 16:28:08,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35740.0, ans=0.1 2024-09-16 16:28:19,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.46 vs. limit=22.5 2024-09-16 16:28:36,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=35780.0, ans=0.125 2024-09-16 16:29:09,905 INFO [train.py:1198] (0/2) Epoch 2, batch 4450, loss[loss=0.3684, ctc_loss=0.3553, cr_loss=0.4578, attn_decoder_loss=0.3597, over 20974.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.2985, cr_loss=0.4547, attn_decoder_loss=0.3366, over 5574518.81 frames. ], batch size: 209, lr: 3.53e-02, grad_scale: 16.0 2024-09-16 16:29:23,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=35940.0, ans=0.0 2024-09-16 16:29:27,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=35940.0, ans=0.0 2024-09-16 16:29:34,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=35940.0, ans=0.125 2024-09-16 16:29:35,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=35940.0, ans=0.125 2024-09-16 16:29:47,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=35980.0, ans=0.2 2024-09-16 16:29:51,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=35980.0, ans=0.125 2024-09-16 16:29:58,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=36020.0, ans=0.125 2024-09-16 16:29:59,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=36020.0, ans=0.125 2024-09-16 16:30:00,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36020.0, ans=0.125 2024-09-16 16:30:13,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36060.0, ans=0.1 2024-09-16 16:30:15,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=36060.0, ans=0.125 2024-09-16 16:30:16,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=16.41 vs. limit=15.0 2024-09-16 16:30:18,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.305e+02 1.463e+02 1.734e+02 4.707e+02, threshold=2.926e+02, percent-clipped=2.0 2024-09-16 16:30:26,375 INFO [train.py:1198] (0/2) Epoch 2, batch 4500, loss[loss=0.3663, ctc_loss=0.3515, cr_loss=0.4654, attn_decoder_loss=0.3576, over 20339.00 frames. ], tot_loss[loss=0.3467, ctc_loss=0.3083, cr_loss=0.4544, attn_decoder_loss=0.3408, over 5232175.73 frames. ], batch size: 210, lr: 3.53e-02, grad_scale: 16.0 2024-09-16 16:30:42,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=36140.0, ans=0.0030130434782608692 2024-09-16 16:30:43,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36140.0, ans=0.125 2024-09-16 16:31:03,468 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-2.pt 2024-09-16 16:31:56,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=36200.0, ans=0.125 2024-09-16 16:31:58,151 INFO [train.py:1198] (0/2) Epoch 3, batch 0, loss[loss=0.415, ctc_loss=0.2741, cr_loss=0.4216, attn_decoder_loss=0.4213, over 29623.00 frames. ], tot_loss[loss=0.415, ctc_loss=0.2741, cr_loss=0.4216, attn_decoder_loss=0.4213, over 29623.00 frames. ], batch size: 73, lr: 3.35e-02, grad_scale: 8.0 2024-09-16 16:31:58,152 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 16:32:16,468 INFO [train.py:1230] (0/2) Epoch 3, validation: loss=0.2699, ctc_loss=0.1122, cr_loss=5.059e-15, attn_decoder_loss=0.2874, over 944034.00 frames. 2024-09-16 16:32:16,469 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 16:32:17,921 WARNING [optim.py:503] (0/2) Scaling gradients by 0.08523014932870865, model_norm_threshold=292.6158752441406 2024-09-16 16:32:18,132 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.1.norm_self_attn.weight with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.447e+06, grad_sumsq=2.900e+09, orig_rms_sq=1.188e-03 2024-09-16 16:32:25,520 WARNING [optim.py:503] (0/2) Scaling gradients by 0.08528286218643188, model_norm_threshold=292.6158752441406 2024-09-16 16:32:25,720 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.0.norm_self_attn.weight with proportion 0.56, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.615e+06, grad_sumsq=1.664e+09, orig_rms_sq=3.977e-03 2024-09-16 16:32:27,307 WARNING [optim.py:503] (0/2) Scaling gradients by 0.07857576757669449, model_norm_threshold=292.6158752441406 2024-09-16 16:32:27,524 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.0.norm_self_attn.weight with proportion 0.54, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.424e+06, grad_sumsq=1.867e+09, orig_rms_sq=3.977e-03 2024-09-16 16:32:32,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=36240.0, ans=0.0 2024-09-16 16:32:35,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=36240.0, ans=0.125 2024-09-16 16:32:38,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=36240.0, ans=0.2 2024-09-16 16:32:54,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=36280.0, ans=0.125 2024-09-16 16:33:05,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=36320.0, ans=22.5 2024-09-16 16:33:21,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36360.0, ans=0.1 2024-09-16 16:33:23,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=36360.0, ans=0.1 2024-09-16 16:33:24,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=36360.0, ans=0.0029652173913043475 2024-09-16 16:33:35,006 INFO [train.py:1198] (0/2) Epoch 3, batch 50, loss[loss=0.2906, ctc_loss=0.2464, cr_loss=0.3934, attn_decoder_loss=0.2867, over 29432.00 frames. ], tot_loss[loss=0.3464, ctc_loss=0.2964, cr_loss=0.4507, attn_decoder_loss=0.3419, over 1266798.90 frames. ], batch size: 70, lr: 3.34e-02, grad_scale: 8.0 2024-09-16 16:34:08,225 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.045e+02 1.388e+02 1.745e+02 2.275e+02 3.724e+03, threshold=3.490e+02, percent-clipped=16.0 2024-09-16 16:34:11,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36480.0, ans=0.1 2024-09-16 16:34:17,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36480.0, ans=0.125 2024-09-16 16:34:19,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=36520.0, ans=0.125 2024-09-16 16:34:50,241 INFO [train.py:1198] (0/2) Epoch 3, batch 100, loss[loss=0.3199, ctc_loss=0.2682, cr_loss=0.4433, attn_decoder_loss=0.3157, over 29570.00 frames. ], tot_loss[loss=0.3406, ctc_loss=0.2925, cr_loss=0.4509, attn_decoder_loss=0.336, over 2251483.92 frames. ], batch size: 76, lr: 3.34e-02, grad_scale: 8.0 2024-09-16 16:35:00,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.98 vs. limit=22.5 2024-09-16 16:35:04,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=36600.0, ans=0.00291304347826087 2024-09-16 16:35:22,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=36680.0, ans=0.025 2024-09-16 16:35:31,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=36680.0, ans=0.125 2024-09-16 16:35:40,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=36720.0, ans=0.125 2024-09-16 16:36:06,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36800.0, ans=0.125 2024-09-16 16:36:07,252 INFO [train.py:1198] (0/2) Epoch 3, batch 150, loss[loss=0.2997, ctc_loss=0.2507, cr_loss=0.4149, attn_decoder_loss=0.2959, over 29450.00 frames. ], tot_loss[loss=0.3336, ctc_loss=0.2852, cr_loss=0.4457, attn_decoder_loss=0.3291, over 3047565.08 frames. ], batch size: 70, lr: 3.33e-02, grad_scale: 8.0 2024-09-16 16:36:07,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36800.0, ans=0.125 2024-09-16 16:36:24,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36840.0, ans=0.1 2024-09-16 16:36:29,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36840.0, ans=0.125 2024-09-16 16:36:38,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2024-09-16 16:36:42,366 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.372e+02 1.536e+02 1.787e+02 3.735e+02, threshold=3.071e+02, percent-clipped=1.0 2024-09-16 16:36:51,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36880.0, ans=0.125 2024-09-16 16:37:05,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2024-09-16 16:37:06,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=36920.0, ans=0.025 2024-09-16 16:37:10,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=36960.0, ans=0.015 2024-09-16 16:37:12,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36960.0, ans=0.125 2024-09-16 16:37:14,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=36960.0, ans=0.125 2024-09-16 16:37:24,140 INFO [train.py:1198] (0/2) Epoch 3, batch 200, loss[loss=0.3383, ctc_loss=0.2915, cr_loss=0.4829, attn_decoder_loss=0.3327, over 27561.00 frames. ], tot_loss[loss=0.3304, ctc_loss=0.2822, cr_loss=0.445, attn_decoder_loss=0.3259, over 3658302.33 frames. ], batch size: 125, lr: 3.33e-02, grad_scale: 8.0 2024-09-16 16:37:36,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=37000.0, ans=0.2 2024-09-16 16:37:37,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.60 vs. limit=15.0 2024-09-16 16:37:44,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=37040.0, ans=0.0 2024-09-16 16:38:18,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=37120.0, ans=0.2 2024-09-16 16:38:39,573 INFO [train.py:1198] (0/2) Epoch 3, batch 250, loss[loss=0.3455, ctc_loss=0.2922, cr_loss=0.4846, attn_decoder_loss=0.3406, over 29292.00 frames. ], tot_loss[loss=0.3288, ctc_loss=0.2797, cr_loss=0.4449, attn_decoder_loss=0.3244, over 4141526.98 frames. ], batch size: 100, lr: 3.32e-02, grad_scale: 8.0 2024-09-16 16:39:03,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=37240.0, ans=0.125 2024-09-16 16:39:15,193 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.348e+02 1.507e+02 1.717e+02 3.533e+02, threshold=3.014e+02, percent-clipped=1.0 2024-09-16 16:39:35,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37320.0, ans=0.125 2024-09-16 16:39:35,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=37320.0, ans=0.0 2024-09-16 16:39:46,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=22.5 2024-09-16 16:39:51,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=37360.0, ans=0.125 2024-09-16 16:39:57,440 INFO [train.py:1198] (0/2) Epoch 3, batch 300, loss[loss=0.3551, ctc_loss=0.3099, cr_loss=0.4436, attn_decoder_loss=0.3503, over 29505.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2783, cr_loss=0.4439, attn_decoder_loss=0.3233, over 4509543.62 frames. ], batch size: 92, lr: 3.32e-02, grad_scale: 8.0 2024-09-16 16:40:08,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=37400.0, ans=0.05 2024-09-16 16:40:15,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=37440.0, ans=0.125 2024-09-16 16:40:49,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2024-09-16 16:40:53,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=37520.0, ans=0.125 2024-09-16 16:41:16,159 INFO [train.py:1198] (0/2) Epoch 3, batch 350, loss[loss=0.2853, ctc_loss=0.2372, cr_loss=0.4172, attn_decoder_loss=0.2814, over 29322.00 frames. ], tot_loss[loss=0.3276, ctc_loss=0.2783, cr_loss=0.4447, attn_decoder_loss=0.3232, over 4794516.51 frames. ], batch size: 71, lr: 3.31e-02, grad_scale: 8.0 2024-09-16 16:41:49,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.316e+02 1.494e+02 1.817e+02 5.633e+02, threshold=2.988e+02, percent-clipped=5.0 2024-09-16 16:41:57,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=37680.0, ans=0.1 2024-09-16 16:42:08,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=37720.0, ans=0.0 2024-09-16 16:42:11,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=37720.0, ans=0.0 2024-09-16 16:42:26,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=37760.0, ans=0.125 2024-09-16 16:42:32,048 INFO [train.py:1198] (0/2) Epoch 3, batch 400, loss[loss=0.3178, ctc_loss=0.2661, cr_loss=0.4033, attn_decoder_loss=0.3146, over 29712.00 frames. ], tot_loss[loss=0.3265, ctc_loss=0.277, cr_loss=0.4441, attn_decoder_loss=0.3222, over 5024392.84 frames. ], batch size: 82, lr: 3.31e-02, grad_scale: 16.0 2024-09-16 16:42:35,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37800.0, ans=0.1 2024-09-16 16:42:35,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=37800.0, ans=0.07 2024-09-16 16:42:46,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37800.0, ans=0.125 2024-09-16 16:43:10,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2024-09-16 16:43:12,831 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 16:43:21,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37920.0, ans=0.1 2024-09-16 16:43:40,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=37960.0, ans=0.125 2024-09-16 16:43:42,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=37960.0, ans=15.0 2024-09-16 16:43:50,666 INFO [train.py:1198] (0/2) Epoch 3, batch 450, loss[loss=0.3348, ctc_loss=0.2757, cr_loss=0.4439, attn_decoder_loss=0.3315, over 29692.00 frames. ], tot_loss[loss=0.3262, ctc_loss=0.2763, cr_loss=0.4435, attn_decoder_loss=0.3219, over 5187422.14 frames. ], batch size: 83, lr: 3.30e-02, grad_scale: 8.0 2024-09-16 16:43:50,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38000.0, ans=0.1 2024-09-16 16:44:25,550 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.316e+02 1.463e+02 1.797e+02 4.950e+02, threshold=2.926e+02, percent-clipped=3.0 2024-09-16 16:44:39,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38120.0, ans=0.1 2024-09-16 16:44:43,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=38120.0, ans=0.125 2024-09-16 16:44:43,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=38120.0, ans=0.125 2024-09-16 16:44:54,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=38160.0, ans=0.125 2024-09-16 16:45:09,110 INFO [train.py:1198] (0/2) Epoch 3, batch 500, loss[loss=0.3355, ctc_loss=0.276, cr_loss=0.4812, attn_decoder_loss=0.3314, over 29438.00 frames. ], tot_loss[loss=0.3247, ctc_loss=0.2745, cr_loss=0.4426, attn_decoder_loss=0.3204, over 5331144.63 frames. ], batch size: 94, lr: 3.30e-02, grad_scale: 8.0 2024-09-16 16:45:09,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38200.0, ans=0.1 2024-09-16 16:45:12,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=38200.0, ans=0.0 2024-09-16 16:45:12,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=38200.0, ans=0.1 2024-09-16 16:45:20,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.80 vs. limit=15.0 2024-09-16 16:45:39,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=38280.0, ans=0.0025478260869565214 2024-09-16 16:45:47,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=38280.0, ans=0.0 2024-09-16 16:46:20,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38360.0, ans=0.1 2024-09-16 16:46:25,200 INFO [train.py:1198] (0/2) Epoch 3, batch 550, loss[loss=0.36, ctc_loss=0.3186, cr_loss=0.4946, attn_decoder_loss=0.3536, over 28831.00 frames. ], tot_loss[loss=0.3253, ctc_loss=0.2753, cr_loss=0.4438, attn_decoder_loss=0.321, over 5424297.70 frames. ], batch size: 104, lr: 3.29e-02, grad_scale: 8.0 2024-09-16 16:46:32,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-16 16:47:02,146 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.886e+01 1.383e+02 1.615e+02 1.876e+02 3.927e+02, threshold=3.230e+02, percent-clipped=4.0 2024-09-16 16:47:11,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2024-09-16 16:47:35,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.51 vs. limit=15.0 2024-09-16 16:47:43,876 INFO [train.py:1198] (0/2) Epoch 3, batch 600, loss[loss=0.348, ctc_loss=0.2966, cr_loss=0.474, attn_decoder_loss=0.3432, over 29216.00 frames. ], tot_loss[loss=0.325, ctc_loss=0.2745, cr_loss=0.444, attn_decoder_loss=0.3208, over 5510927.49 frames. ], batch size: 100, lr: 3.28e-02, grad_scale: 8.0 2024-09-16 16:47:44,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=38600.0, ans=0.0 2024-09-16 16:47:45,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38600.0, ans=0.1 2024-09-16 16:47:47,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=38600.0, ans=0.125 2024-09-16 16:48:06,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38640.0, ans=0.125 2024-09-16 16:48:11,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=38640.0, ans=0.125 2024-09-16 16:48:22,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=38680.0, ans=0.125 2024-09-16 16:48:47,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=38760.0, ans=0.0024434782608695653 2024-09-16 16:48:49,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=38760.0, ans=0.0 2024-09-16 16:49:02,384 INFO [train.py:1198] (0/2) Epoch 3, batch 650, loss[loss=0.2999, ctc_loss=0.2398, cr_loss=0.4348, attn_decoder_loss=0.2969, over 29769.00 frames. ], tot_loss[loss=0.3233, ctc_loss=0.2725, cr_loss=0.4424, attn_decoder_loss=0.3191, over 5587769.55 frames. ], batch size: 81, lr: 3.28e-02, grad_scale: 8.0 2024-09-16 16:49:07,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2024-09-16 16:49:22,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=38840.0, ans=0.125 2024-09-16 16:49:28,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=38840.0, ans=0.07 2024-09-16 16:49:34,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=38880.0, ans=0.0 2024-09-16 16:49:37,337 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.937e+01 1.311e+02 1.474e+02 1.676e+02 3.343e+02, threshold=2.947e+02, percent-clipped=2.0 2024-09-16 16:49:48,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38920.0, ans=0.1 2024-09-16 16:50:09,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=38960.0, ans=0.125 2024-09-16 16:50:18,170 INFO [train.py:1198] (0/2) Epoch 3, batch 700, loss[loss=0.3072, ctc_loss=0.2469, cr_loss=0.4296, attn_decoder_loss=0.3044, over 29554.00 frames. ], tot_loss[loss=0.3239, ctc_loss=0.2731, cr_loss=0.4433, attn_decoder_loss=0.3197, over 5638374.08 frames. ], batch size: 76, lr: 3.27e-02, grad_scale: 8.0 2024-09-16 16:50:23,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=39000.0, ans=0.125 2024-09-16 16:50:57,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.85 vs. limit=22.5 2024-09-16 16:51:13,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=39120.0, ans=0.0 2024-09-16 16:51:16,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=39120.0, ans=0.04949747468305833 2024-09-16 16:51:25,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=39160.0, ans=0.0023565217391304343 2024-09-16 16:51:36,717 INFO [train.py:1198] (0/2) Epoch 3, batch 750, loss[loss=0.3255, ctc_loss=0.2701, cr_loss=0.4233, attn_decoder_loss=0.3222, over 29693.00 frames. ], tot_loss[loss=0.3235, ctc_loss=0.2725, cr_loss=0.4432, attn_decoder_loss=0.3193, over 5676685.11 frames. ], batch size: 82, lr: 3.27e-02, grad_scale: 8.0 2024-09-16 16:51:58,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=39240.0, ans=0.0 2024-09-16 16:52:02,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.10 vs. limit=15.0 2024-09-16 16:52:11,489 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.529e+01 1.527e+02 1.781e+02 2.064e+02 4.131e+02, threshold=3.563e+02, percent-clipped=5.0 2024-09-16 16:52:14,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=39280.0, ans=0.125 2024-09-16 16:52:25,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=39320.0, ans=0.0 2024-09-16 16:52:39,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=39360.0, ans=0.1 2024-09-16 16:52:47,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=39360.0, ans=0.0 2024-09-16 16:52:52,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.04 vs. limit=15.0 2024-09-16 16:52:53,969 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.20 vs. limit=22.5 2024-09-16 16:52:54,789 INFO [train.py:1198] (0/2) Epoch 3, batch 800, loss[loss=0.302, ctc_loss=0.2622, cr_loss=0.3992, attn_decoder_loss=0.2975, over 29604.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2726, cr_loss=0.4433, attn_decoder_loss=0.3192, over 5706178.31 frames. ], batch size: 73, lr: 3.26e-02, grad_scale: 16.0 2024-09-16 16:52:55,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=39400.0, ans=0.125 2024-09-16 16:53:12,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.41 vs. limit=22.5 2024-09-16 16:53:13,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=39440.0, ans=0.125 2024-09-16 16:53:26,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39480.0, ans=0.1 2024-09-16 16:53:54,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=39560.0, ans=0.125 2024-09-16 16:53:58,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=39560.0, ans=0.0 2024-09-16 16:54:06,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2024-09-16 16:54:10,346 INFO [train.py:1198] (0/2) Epoch 3, batch 850, loss[loss=0.3422, ctc_loss=0.2871, cr_loss=0.4518, attn_decoder_loss=0.3383, over 29717.00 frames. ], tot_loss[loss=0.3229, ctc_loss=0.2716, cr_loss=0.4425, attn_decoder_loss=0.3187, over 5736278.91 frames. ], batch size: 89, lr: 3.26e-02, grad_scale: 8.0 2024-09-16 16:54:30,817 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.579e-02 2024-09-16 16:54:46,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=39680.0, ans=0.125 2024-09-16 16:54:48,812 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.307e+02 1.480e+02 1.661e+02 7.090e+02, threshold=2.960e+02, percent-clipped=1.0 2024-09-16 16:55:04,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=39720.0, ans=0.2 2024-09-16 16:55:16,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=39760.0, ans=0.125 2024-09-16 16:55:28,604 INFO [train.py:1198] (0/2) Epoch 3, batch 900, loss[loss=0.2956, ctc_loss=0.2409, cr_loss=0.3858, attn_decoder_loss=0.2931, over 29581.00 frames. ], tot_loss[loss=0.3236, ctc_loss=0.2725, cr_loss=0.4427, attn_decoder_loss=0.3195, over 5741950.50 frames. ], batch size: 73, lr: 3.25e-02, grad_scale: 8.0 2024-09-16 16:55:34,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2024-09-16 16:55:36,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=39800.0, ans=0.125 2024-09-16 16:55:44,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=39840.0, ans=0.002208695652173912 2024-09-16 16:55:59,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=39880.0, ans=10.0 2024-09-16 16:56:06,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=39880.0, ans=0.1 2024-09-16 16:56:08,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=39880.0, ans=0.125 2024-09-16 16:56:08,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=17.44 vs. limit=15.0 2024-09-16 16:56:39,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=39960.0, ans=0.09899494936611666 2024-09-16 16:56:42,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=39960.0, ans=0.2 2024-09-16 16:56:44,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2024-09-16 16:56:46,825 INFO [train.py:1198] (0/2) Epoch 3, batch 950, loss[loss=0.3012, ctc_loss=0.241, cr_loss=0.4434, attn_decoder_loss=0.298, over 29516.00 frames. ], tot_loss[loss=0.3242, ctc_loss=0.2734, cr_loss=0.4444, attn_decoder_loss=0.32, over 5743325.33 frames. ], batch size: 74, lr: 3.25e-02, grad_scale: 8.0 2024-09-16 16:57:14,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=40040.0, ans=0.125 2024-09-16 16:57:22,765 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.433e+02 1.639e+02 1.993e+02 1.138e+03, threshold=3.278e+02, percent-clipped=4.0 2024-09-16 16:57:24,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=40080.0, ans=0.0 2024-09-16 16:57:29,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=40080.0, ans=0.125 2024-09-16 16:57:53,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=40160.0, ans=0.0021391304347826087 2024-09-16 16:58:01,740 INFO [train.py:1198] (0/2) Epoch 3, batch 1000, loss[loss=0.3102, ctc_loss=0.2548, cr_loss=0.4326, attn_decoder_loss=0.3067, over 29484.00 frames. ], tot_loss[loss=0.3244, ctc_loss=0.2734, cr_loss=0.4439, attn_decoder_loss=0.3202, over 5736393.60 frames. ], batch size: 77, lr: 3.24e-02, grad_scale: 8.0 2024-09-16 16:58:57,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=40320.0, ans=0.035 2024-09-16 16:59:00,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=40320.0, ans=0.1 2024-09-16 16:59:03,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=40360.0, ans=0.2 2024-09-16 16:59:06,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=40360.0, ans=0.0 2024-09-16 16:59:07,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=40360.0, ans=0.125 2024-09-16 16:59:16,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.44 vs. limit=12.0 2024-09-16 16:59:19,706 INFO [train.py:1198] (0/2) Epoch 3, batch 1050, loss[loss=0.3449, ctc_loss=0.2885, cr_loss=0.4748, attn_decoder_loss=0.3406, over 29675.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2719, cr_loss=0.4432, attn_decoder_loss=0.3192, over 5745429.03 frames. ], batch size: 85, lr: 3.24e-02, grad_scale: 8.0 2024-09-16 16:59:32,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=40400.0, ans=0.125 2024-09-16 16:59:32,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=40400.0, ans=0.2 2024-09-16 16:59:36,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2024-09-16 16:59:43,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=40440.0, ans=0.002078260869565217 2024-09-16 16:59:55,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=40480.0, ans=0.0020695652173913035 2024-09-16 16:59:56,491 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.303e+02 1.463e+02 1.706e+02 2.902e+02, threshold=2.927e+02, percent-clipped=0.0 2024-09-16 17:00:26,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=40560.0, ans=0.0 2024-09-16 17:00:32,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.05 vs. limit=15.0 2024-09-16 17:00:37,928 INFO [train.py:1198] (0/2) Epoch 3, batch 1100, loss[loss=0.3061, ctc_loss=0.2579, cr_loss=0.4131, attn_decoder_loss=0.3023, over 29480.00 frames. ], tot_loss[loss=0.3226, ctc_loss=0.2708, cr_loss=0.4425, attn_decoder_loss=0.3185, over 5757614.45 frames. ], batch size: 78, lr: 3.23e-02, grad_scale: 8.0 2024-09-16 17:00:48,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=40600.0, ans=0.125 2024-09-16 17:00:57,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=40640.0, ans=0.125 2024-09-16 17:01:26,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40720.0, ans=0.125 2024-09-16 17:01:27,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40720.0, ans=0.1 2024-09-16 17:01:35,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=40720.0, ans=0.2 2024-09-16 17:01:47,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=40760.0, ans=0.125 2024-09-16 17:01:52,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=40800.0, ans=0.1 2024-09-16 17:01:53,973 INFO [train.py:1198] (0/2) Epoch 3, batch 1150, loss[loss=0.3057, ctc_loss=0.251, cr_loss=0.419, attn_decoder_loss=0.3024, over 29443.00 frames. ], tot_loss[loss=0.3226, ctc_loss=0.2709, cr_loss=0.443, attn_decoder_loss=0.3185, over 5757516.27 frames. ], batch size: 78, lr: 3.23e-02, grad_scale: 8.0 2024-09-16 17:02:32,381 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.405e+02 1.623e+02 1.892e+02 4.412e+02, threshold=3.246e+02, percent-clipped=6.0 2024-09-16 17:02:35,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40880.0, ans=0.0 2024-09-16 17:03:06,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=40960.0, ans=0.0 2024-09-16 17:03:10,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=41000.0, ans=0.2 2024-09-16 17:03:11,767 INFO [train.py:1198] (0/2) Epoch 3, batch 1200, loss[loss=0.3291, ctc_loss=0.2683, cr_loss=0.4853, attn_decoder_loss=0.3251, over 29683.00 frames. ], tot_loss[loss=0.3238, ctc_loss=0.2719, cr_loss=0.4442, attn_decoder_loss=0.3196, over 5749116.76 frames. ], batch size: 85, lr: 3.22e-02, grad_scale: 16.0 2024-09-16 17:03:12,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=41000.0, ans=0.125 2024-09-16 17:03:15,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=41000.0, ans=0.001956521739130435 2024-09-16 17:03:17,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.38 vs. limit=15.0 2024-09-16 17:03:38,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2024-09-16 17:03:42,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=41080.0, ans=0.125 2024-09-16 17:03:47,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=41080.0, ans=0.0019391304347826082 2024-09-16 17:03:51,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=41080.0, ans=0.0019391304347826082 2024-09-16 17:03:57,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=41120.0, ans=0.0 2024-09-16 17:04:12,002 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:04:15,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-16 17:04:29,438 INFO [train.py:1198] (0/2) Epoch 3, batch 1250, loss[loss=0.3536, ctc_loss=0.3, cr_loss=0.467, attn_decoder_loss=0.3492, over 29502.00 frames. ], tot_loss[loss=0.3241, ctc_loss=0.272, cr_loss=0.4456, attn_decoder_loss=0.32, over 5776007.63 frames. ], batch size: 92, lr: 3.22e-02, grad_scale: 8.0 2024-09-16 17:04:37,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=41200.0, ans=0.125 2024-09-16 17:05:00,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=41280.0, ans=0.125 2024-09-16 17:05:07,513 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.013e+02 1.378e+02 1.544e+02 1.840e+02 6.927e+02, threshold=3.087e+02, percent-clipped=1.0 2024-09-16 17:05:41,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=41360.0, ans=0.125 2024-09-16 17:05:43,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.48 vs. limit=15.0 2024-09-16 17:05:47,461 INFO [train.py:1198] (0/2) Epoch 3, batch 1300, loss[loss=0.3339, ctc_loss=0.2771, cr_loss=0.4796, attn_decoder_loss=0.3295, over 28357.00 frames. ], tot_loss[loss=0.323, ctc_loss=0.2707, cr_loss=0.4448, attn_decoder_loss=0.3189, over 5782019.99 frames. ], batch size: 111, lr: 3.21e-02, grad_scale: 8.0 2024-09-16 17:05:55,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=41400.0, ans=0.125 2024-09-16 17:06:07,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=41440.0, ans=0.0018608695652173914 2024-09-16 17:06:07,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=41440.0, ans=0.0018608695652173914 2024-09-16 17:06:19,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=41480.0, ans=0.125 2024-09-16 17:06:43,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41520.0, ans=0.1 2024-09-16 17:06:45,615 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:06:59,210 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:07:03,980 INFO [train.py:1198] (0/2) Epoch 3, batch 1350, loss[loss=0.3322, ctc_loss=0.2736, cr_loss=0.4601, attn_decoder_loss=0.3284, over 29764.00 frames. ], tot_loss[loss=0.3224, ctc_loss=0.2695, cr_loss=0.4447, attn_decoder_loss=0.3185, over 5796934.07 frames. ], batch size: 81, lr: 3.21e-02, grad_scale: 8.0 2024-09-16 17:07:28,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=41640.0, ans=0.0 2024-09-16 17:07:38,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=41680.0, ans=0.035 2024-09-16 17:07:41,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.317e+02 1.447e+02 1.601e+02 2.528e+02, threshold=2.895e+02, percent-clipped=1.0 2024-09-16 17:07:42,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41680.0, ans=0.1 2024-09-16 17:07:52,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=41720.0, ans=0.125 2024-09-16 17:07:53,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2024-09-16 17:08:07,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=41760.0, ans=0.125 2024-09-16 17:08:09,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.82 vs. limit=15.0 2024-09-16 17:08:21,363 INFO [train.py:1198] (0/2) Epoch 3, batch 1400, loss[loss=0.299, ctc_loss=0.2544, cr_loss=0.4225, attn_decoder_loss=0.2945, over 29598.00 frames. ], tot_loss[loss=0.3221, ctc_loss=0.2691, cr_loss=0.4448, attn_decoder_loss=0.3181, over 5807979.63 frames. ], batch size: 69, lr: 3.20e-02, grad_scale: 8.0 2024-09-16 17:08:25,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-09-16 17:08:35,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=41840.0, ans=0.0017739130434782611 2024-09-16 17:08:40,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.77 vs. limit=22.5 2024-09-16 17:08:58,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.10 vs. limit=5.0 2024-09-16 17:09:07,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=41920.0, ans=0.125 2024-09-16 17:09:12,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-16 17:09:16,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=41920.0, ans=0.5 2024-09-16 17:09:17,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=41920.0, ans=0.125 2024-09-16 17:09:34,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=41960.0, ans=0.125 2024-09-16 17:09:37,198 INFO [train.py:1198] (0/2) Epoch 3, batch 1450, loss[loss=0.3504, ctc_loss=0.2988, cr_loss=0.4749, attn_decoder_loss=0.3456, over 29416.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.2693, cr_loss=0.4452, attn_decoder_loss=0.3183, over 5803599.93 frames. ], batch size: 94, lr: 3.20e-02, grad_scale: 8.0 2024-09-16 17:09:44,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=42000.0, ans=0.025 2024-09-16 17:10:17,156 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.984e+01 1.371e+02 1.551e+02 1.946e+02 4.633e+02, threshold=3.101e+02, percent-clipped=3.0 2024-09-16 17:10:17,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=42080.0, ans=0.125 2024-09-16 17:10:29,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=42120.0, ans=0.125 2024-09-16 17:10:42,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=42160.0, ans=0.07 2024-09-16 17:10:55,033 INFO [train.py:1198] (0/2) Epoch 3, batch 1500, loss[loss=0.3321, ctc_loss=0.2658, cr_loss=0.444, attn_decoder_loss=0.3296, over 29635.00 frames. ], tot_loss[loss=0.3226, ctc_loss=0.2693, cr_loss=0.445, attn_decoder_loss=0.3186, over 5803000.33 frames. ], batch size: 86, lr: 3.19e-02, grad_scale: 8.0 2024-09-16 17:10:58,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=42200.0, ans=0.125 2024-09-16 17:11:12,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-09-16 17:11:15,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=42240.0, ans=0.125 2024-09-16 17:11:33,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=42280.0, ans=0.125 2024-09-16 17:11:36,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42280.0, ans=0.1 2024-09-16 17:11:38,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=42280.0, ans=0.0 2024-09-16 17:12:04,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=42360.0, ans=0.125 2024-09-16 17:12:06,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=42360.0, ans=0.125 2024-09-16 17:12:13,828 INFO [train.py:1198] (0/2) Epoch 3, batch 1550, loss[loss=0.3408, ctc_loss=0.2845, cr_loss=0.4885, attn_decoder_loss=0.3362, over 29499.00 frames. ], tot_loss[loss=0.3224, ctc_loss=0.2695, cr_loss=0.4446, attn_decoder_loss=0.3184, over 5780344.23 frames. ], batch size: 90, lr: 3.19e-02, grad_scale: 8.0 2024-09-16 17:12:17,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=42400.0, ans=0.0016521739130434792 2024-09-16 17:12:29,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.42 vs. limit=15.0 2024-09-16 17:12:51,103 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.385e+02 1.541e+02 1.743e+02 3.737e+02, threshold=3.082e+02, percent-clipped=1.0 2024-09-16 17:12:51,574 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:13:11,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=42520.0, ans=0.0 2024-09-16 17:13:26,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=42560.0, ans=0.025 2024-09-16 17:13:30,979 INFO [train.py:1198] (0/2) Epoch 3, batch 1600, loss[loss=0.3247, ctc_loss=0.2604, cr_loss=0.4697, attn_decoder_loss=0.3215, over 29653.00 frames. ], tot_loss[loss=0.3226, ctc_loss=0.2701, cr_loss=0.4448, attn_decoder_loss=0.3186, over 5763622.68 frames. ], batch size: 85, lr: 3.18e-02, grad_scale: 16.0 2024-09-16 17:13:32,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=42600.0, ans=0.1 2024-09-16 17:13:43,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=42600.0, ans=0.125 2024-09-16 17:13:44,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42640.0, ans=0.1 2024-09-16 17:13:57,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-16 17:14:09,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=42680.0, ans=0.0 2024-09-16 17:14:24,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=42720.0, ans=0.0 2024-09-16 17:14:46,581 INFO [train.py:1198] (0/2) Epoch 3, batch 1650, loss[loss=0.3324, ctc_loss=0.2801, cr_loss=0.4633, attn_decoder_loss=0.3279, over 29701.00 frames. ], tot_loss[loss=0.3228, ctc_loss=0.2702, cr_loss=0.4453, attn_decoder_loss=0.3187, over 5757930.31 frames. ], batch size: 89, lr: 3.18e-02, grad_scale: 8.0 2024-09-16 17:15:05,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=42840.0, ans=0.1 2024-09-16 17:15:19,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2024-09-16 17:15:26,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.391e+02 1.585e+02 1.858e+02 6.012e+02, threshold=3.169e+02, percent-clipped=6.0 2024-09-16 17:15:37,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=42920.0, ans=0.0 2024-09-16 17:16:01,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=42960.0, ans=0.125 2024-09-16 17:16:03,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=43000.0, ans=0.2 2024-09-16 17:16:04,587 INFO [train.py:1198] (0/2) Epoch 3, batch 1700, loss[loss=0.268, ctc_loss=0.2135, cr_loss=0.4073, attn_decoder_loss=0.265, over 29629.00 frames. ], tot_loss[loss=0.3218, ctc_loss=0.2689, cr_loss=0.4441, attn_decoder_loss=0.3178, over 5779016.01 frames. ], batch size: 69, lr: 3.17e-02, grad_scale: 8.0 2024-09-16 17:16:06,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=43000.0, ans=0.2 2024-09-16 17:16:13,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=43000.0, ans=0.125 2024-09-16 17:16:39,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=43080.0, ans=0.1 2024-09-16 17:16:51,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=43120.0, ans=0.0 2024-09-16 17:16:52,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=43120.0, ans=0.09899494936611666 2024-09-16 17:17:16,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43160.0, ans=0.1 2024-09-16 17:17:22,495 INFO [train.py:1198] (0/2) Epoch 3, batch 1750, loss[loss=0.2839, ctc_loss=0.2304, cr_loss=0.405, attn_decoder_loss=0.2808, over 29376.00 frames. ], tot_loss[loss=0.3214, ctc_loss=0.2686, cr_loss=0.4444, attn_decoder_loss=0.3174, over 5787389.07 frames. ], batch size: 67, lr: 3.17e-02, grad_scale: 8.0 2024-09-16 17:17:30,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43200.0, ans=0.1 2024-09-16 17:17:32,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-09-16 17:17:40,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=43240.0, ans=0.125 2024-09-16 17:18:01,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.347e+02 1.520e+02 1.785e+02 2.603e+02, threshold=3.040e+02, percent-clipped=0.0 2024-09-16 17:18:38,185 INFO [train.py:1198] (0/2) Epoch 3, batch 1800, loss[loss=0.3449, ctc_loss=0.2904, cr_loss=0.4578, attn_decoder_loss=0.3407, over 29693.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2687, cr_loss=0.4445, attn_decoder_loss=0.3175, over 5790928.27 frames. ], batch size: 83, lr: 3.16e-02, grad_scale: 8.0 2024-09-16 17:18:59,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=43440.0, ans=0.2 2024-09-16 17:19:10,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43480.0, ans=0.1 2024-09-16 17:19:24,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.73 vs. limit=12.0 2024-09-16 17:19:51,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43560.0, ans=0.1 2024-09-16 17:19:54,249 INFO [train.py:1198] (0/2) Epoch 3, batch 1850, loss[loss=0.3404, ctc_loss=0.29, cr_loss=0.4613, attn_decoder_loss=0.3357, over 29625.00 frames. ], tot_loss[loss=0.3209, ctc_loss=0.2674, cr_loss=0.4445, attn_decoder_loss=0.3169, over 5796546.44 frames. ], batch size: 86, lr: 3.16e-02, grad_scale: 8.0 2024-09-16 17:20:03,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=43600.0, ans=0.125 2024-09-16 17:20:13,066 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:20:23,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=43640.0, ans=0.125 2024-09-16 17:20:32,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=43680.0, ans=0.0 2024-09-16 17:20:35,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.308e+02 1.428e+02 1.692e+02 5.194e+02, threshold=2.856e+02, percent-clipped=3.0 2024-09-16 17:20:45,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-09-16 17:21:13,416 INFO [train.py:1198] (0/2) Epoch 3, batch 1900, loss[loss=0.3273, ctc_loss=0.2761, cr_loss=0.4456, attn_decoder_loss=0.3231, over 29694.00 frames. ], tot_loss[loss=0.3217, ctc_loss=0.2683, cr_loss=0.4452, attn_decoder_loss=0.3177, over 5805286.20 frames. ], batch size: 89, lr: 3.15e-02, grad_scale: 8.0 2024-09-16 17:21:16,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43800.0, ans=0.1 2024-09-16 17:21:22,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=43800.0, ans=0.125 2024-09-16 17:21:41,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=43840.0, ans=0.2 2024-09-16 17:21:45,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43880.0, ans=0.1 2024-09-16 17:21:58,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.14 vs. limit=22.5 2024-09-16 17:21:59,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-16 17:22:08,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=43920.0, ans=0.125 2024-09-16 17:22:08,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=43920.0, ans=0.125 2024-09-16 17:22:17,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=43960.0, ans=0.2 2024-09-16 17:22:19,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=43960.0, ans=0.00131304347826087 2024-09-16 17:22:19,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=43960.0, ans=0.0 2024-09-16 17:22:22,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43960.0, ans=0.0 2024-09-16 17:22:29,847 INFO [train.py:1198] (0/2) Epoch 3, batch 1950, loss[loss=0.3175, ctc_loss=0.2679, cr_loss=0.4755, attn_decoder_loss=0.3124, over 29441.00 frames. ], tot_loss[loss=0.3227, ctc_loss=0.2686, cr_loss=0.4474, attn_decoder_loss=0.3187, over 5820468.18 frames. ], batch size: 78, lr: 3.15e-02, grad_scale: 8.0 2024-09-16 17:23:02,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=44080.0, ans=0.2 2024-09-16 17:23:09,312 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.320e+02 1.491e+02 1.683e+02 2.702e+02, threshold=2.982e+02, percent-clipped=0.0 2024-09-16 17:23:30,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=44160.0, ans=0.0 2024-09-16 17:23:44,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44200.0, ans=0.1 2024-09-16 17:23:45,537 INFO [train.py:1198] (0/2) Epoch 3, batch 2000, loss[loss=0.2844, ctc_loss=0.2384, cr_loss=0.395, attn_decoder_loss=0.2807, over 29322.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2694, cr_loss=0.4483, attn_decoder_loss=0.3195, over 5798124.92 frames. ], batch size: 67, lr: 3.14e-02, grad_scale: 16.0 2024-09-16 17:24:07,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=44240.0, ans=0.125 2024-09-16 17:24:23,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=44280.0, ans=0.0 2024-09-16 17:24:30,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=44280.0, ans=0.125 2024-09-16 17:24:31,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.72 vs. limit=22.5 2024-09-16 17:25:05,794 INFO [train.py:1198] (0/2) Epoch 3, batch 2050, loss[loss=0.2958, ctc_loss=0.2461, cr_loss=0.4008, attn_decoder_loss=0.2925, over 29459.00 frames. ], tot_loss[loss=0.3223, ctc_loss=0.2686, cr_loss=0.4461, attn_decoder_loss=0.3183, over 5791300.63 frames. ], batch size: 70, lr: 3.14e-02, grad_scale: 8.0 2024-09-16 17:25:30,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44440.0, ans=0.1 2024-09-16 17:25:45,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=44480.0, ans=0.0 2024-09-16 17:25:46,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.399e+02 1.535e+02 1.932e+02 1.271e+03, threshold=3.069e+02, percent-clipped=4.0 2024-09-16 17:26:01,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.18 vs. limit=10.0 2024-09-16 17:26:11,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=44560.0, ans=0.125 2024-09-16 17:26:21,863 INFO [train.py:1198] (0/2) Epoch 3, batch 2100, loss[loss=0.3147, ctc_loss=0.2481, cr_loss=0.4452, attn_decoder_loss=0.3122, over 29779.00 frames. ], tot_loss[loss=0.3205, ctc_loss=0.2664, cr_loss=0.4444, attn_decoder_loss=0.3166, over 5802433.30 frames. ], batch size: 81, lr: 3.13e-02, grad_scale: 8.0 2024-09-16 17:26:26,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=44600.0, ans=0.125 2024-09-16 17:26:28,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2024-09-16 17:26:32,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=44600.0, ans=0.125 2024-09-16 17:26:53,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=44680.0, ans=0.0 2024-09-16 17:26:55,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=44680.0, ans=0.0 2024-09-16 17:27:10,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=44720.0, ans=0.125 2024-09-16 17:27:13,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=44720.0, ans=0.125 2024-09-16 17:27:13,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=44720.0, ans=0.001147826086956523 2024-09-16 17:27:15,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=44720.0, ans=0.09899494936611666 2024-09-16 17:27:15,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-16 17:27:16,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=44720.0, ans=0.125 2024-09-16 17:27:16,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=44720.0, ans=0.0 2024-09-16 17:27:22,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44760.0, ans=0.1 2024-09-16 17:27:25,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=44760.0, ans=0.2 2024-09-16 17:27:38,156 INFO [train.py:1198] (0/2) Epoch 3, batch 2150, loss[loss=0.3168, ctc_loss=0.2567, cr_loss=0.4232, attn_decoder_loss=0.3141, over 29450.00 frames. ], tot_loss[loss=0.3197, ctc_loss=0.2653, cr_loss=0.444, attn_decoder_loss=0.3159, over 5816586.83 frames. ], batch size: 78, lr: 3.13e-02, grad_scale: 8.0 2024-09-16 17:27:41,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=44800.0, ans=0.125 2024-09-16 17:27:48,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-09-16 17:27:58,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44840.0, ans=0.1 2024-09-16 17:28:15,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=44880.0, ans=0.07 2024-09-16 17:28:17,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=44880.0, ans=0.1 2024-09-16 17:28:20,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.285e+02 1.430e+02 1.712e+02 4.702e+02, threshold=2.859e+02, percent-clipped=3.0 2024-09-16 17:28:25,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=44920.0, ans=0.125 2024-09-16 17:28:46,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=44960.0, ans=0.125 2024-09-16 17:28:56,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=45000.0, ans=0.125 2024-09-16 17:28:57,954 INFO [train.py:1198] (0/2) Epoch 3, batch 2200, loss[loss=0.3176, ctc_loss=0.261, cr_loss=0.4705, attn_decoder_loss=0.3134, over 29626.00 frames. ], tot_loss[loss=0.3196, ctc_loss=0.2653, cr_loss=0.4446, attn_decoder_loss=0.3158, over 5812326.93 frames. ], batch size: 86, lr: 3.12e-02, grad_scale: 8.0 2024-09-16 17:28:58,243 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:29:10,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=45000.0, ans=0.09899494936611666 2024-09-16 17:29:12,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=45040.0, ans=0.2 2024-09-16 17:29:31,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=45080.0, ans=0.125 2024-09-16 17:30:00,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=45160.0, ans=0.07 2024-09-16 17:30:08,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=45160.0, ans=0.025 2024-09-16 17:30:13,751 INFO [train.py:1198] (0/2) Epoch 3, batch 2250, loss[loss=0.3329, ctc_loss=0.2825, cr_loss=0.4666, attn_decoder_loss=0.3282, over 29683.00 frames. ], tot_loss[loss=0.3195, ctc_loss=0.2649, cr_loss=0.4445, attn_decoder_loss=0.3157, over 5812690.09 frames. ], batch size: 82, lr: 3.12e-02, grad_scale: 8.0 2024-09-16 17:30:18,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45200.0, ans=0.1 2024-09-16 17:30:38,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=45240.0, ans=0.125 2024-09-16 17:30:54,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.411e+02 1.554e+02 1.919e+02 3.789e+02, threshold=3.108e+02, percent-clipped=3.0 2024-09-16 17:31:06,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=45320.0, ans=0.0 2024-09-16 17:31:08,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=45320.0, ans=0.125 2024-09-16 17:31:11,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=45320.0, ans=0.2 2024-09-16 17:31:27,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=45400.0, ans=0.2 2024-09-16 17:31:29,225 INFO [train.py:1198] (0/2) Epoch 3, batch 2300, loss[loss=0.2872, ctc_loss=0.229, cr_loss=0.4102, attn_decoder_loss=0.2846, over 29760.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2638, cr_loss=0.4426, attn_decoder_loss=0.3143, over 5802029.12 frames. ], batch size: 72, lr: 3.11e-02, grad_scale: 8.0 2024-09-16 17:31:37,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=45400.0, ans=0.125 2024-09-16 17:31:40,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-16 17:31:45,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=45440.0, ans=0.125 2024-09-16 17:32:00,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=45480.0, ans=0.0009826086956521742 2024-09-16 17:32:12,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=45480.0, ans=0.0 2024-09-16 17:32:21,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=45520.0, ans=0.0009739130434782608 2024-09-16 17:32:25,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2024-09-16 17:32:49,801 INFO [train.py:1198] (0/2) Epoch 3, batch 2350, loss[loss=0.3212, ctc_loss=0.2628, cr_loss=0.4403, attn_decoder_loss=0.3179, over 29681.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2636, cr_loss=0.4427, attn_decoder_loss=0.3143, over 5806892.70 frames. ], batch size: 83, lr: 3.11e-02, grad_scale: 8.0 2024-09-16 17:32:55,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2024-09-16 17:33:01,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=10.0 2024-09-16 17:33:05,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=45640.0, ans=0.025 2024-09-16 17:33:14,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=45640.0, ans=0.0 2024-09-16 17:33:23,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=45680.0, ans=0.125 2024-09-16 17:33:30,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.361e+02 1.552e+02 1.880e+02 4.928e+02, threshold=3.104e+02, percent-clipped=4.0 2024-09-16 17:33:32,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=45680.0, ans=0.0 2024-09-16 17:33:32,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=45680.0, ans=0.000939130434782609 2024-09-16 17:34:06,304 INFO [train.py:1198] (0/2) Epoch 3, batch 2400, loss[loss=0.3066, ctc_loss=0.2475, cr_loss=0.4204, attn_decoder_loss=0.3038, over 29526.00 frames. ], tot_loss[loss=0.3187, ctc_loss=0.264, cr_loss=0.443, attn_decoder_loss=0.3149, over 5810364.30 frames. ], batch size: 76, lr: 3.10e-02, grad_scale: 16.0 2024-09-16 17:34:09,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=45800.0, ans=0.2 2024-09-16 17:34:12,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45800.0, ans=0.125 2024-09-16 17:34:21,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=45840.0, ans=0.125 2024-09-16 17:34:52,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=45920.0, ans=0.0 2024-09-16 17:34:56,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=45920.0, ans=0.2 2024-09-16 17:34:58,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=45920.0, ans=0.0 2024-09-16 17:35:10,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=45960.0, ans=0.2 2024-09-16 17:35:22,220 INFO [train.py:1198] (0/2) Epoch 3, batch 2450, loss[loss=0.3338, ctc_loss=0.2822, cr_loss=0.4493, attn_decoder_loss=0.3295, over 29714.00 frames. ], tot_loss[loss=0.3201, ctc_loss=0.2659, cr_loss=0.4441, attn_decoder_loss=0.3163, over 5786454.51 frames. ], batch size: 82, lr: 3.10e-02, grad_scale: 8.0 2024-09-16 17:35:22,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=46000.0, ans=0.2 2024-09-16 17:35:26,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=46000.0, ans=0.125 2024-09-16 17:35:35,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=46040.0, ans=0.125 2024-09-16 17:36:05,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=46080.0, ans=0.2 2024-09-16 17:36:06,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.425e+02 1.645e+02 1.863e+02 7.632e+02, threshold=3.291e+02, percent-clipped=3.0 2024-09-16 17:36:08,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=46120.0, ans=0.5 2024-09-16 17:36:41,942 INFO [train.py:1198] (0/2) Epoch 3, batch 2500, loss[loss=0.3302, ctc_loss=0.2672, cr_loss=0.4715, attn_decoder_loss=0.3267, over 29641.00 frames. ], tot_loss[loss=0.3197, ctc_loss=0.2651, cr_loss=0.4441, attn_decoder_loss=0.3159, over 5796356.77 frames. ], batch size: 86, lr: 3.09e-02, grad_scale: 8.0 2024-09-16 17:36:55,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46240.0, ans=0.1 2024-09-16 17:37:09,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=46240.0, ans=0.125 2024-09-16 17:37:47,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=46360.0, ans=0.125 2024-09-16 17:37:54,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-09-16 17:37:58,400 INFO [train.py:1198] (0/2) Epoch 3, batch 2550, loss[loss=0.2905, ctc_loss=0.2329, cr_loss=0.4199, attn_decoder_loss=0.2875, over 29316.00 frames. ], tot_loss[loss=0.3191, ctc_loss=0.2639, cr_loss=0.4441, attn_decoder_loss=0.3154, over 5799956.47 frames. ], batch size: 67, lr: 3.09e-02, grad_scale: 8.0 2024-09-16 17:37:58,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=46400.0, ans=0.0 2024-09-16 17:38:40,772 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.310e+02 1.464e+02 1.728e+02 3.657e+02, threshold=2.928e+02, percent-clipped=2.0 2024-09-16 17:38:48,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46520.0, ans=0.1 2024-09-16 17:39:14,460 INFO [train.py:1198] (0/2) Epoch 3, batch 2600, loss[loss=0.3002, ctc_loss=0.2346, cr_loss=0.4261, attn_decoder_loss=0.298, over 29457.00 frames. ], tot_loss[loss=0.3192, ctc_loss=0.2638, cr_loss=0.4443, attn_decoder_loss=0.3155, over 5796988.92 frames. ], batch size: 78, lr: 3.08e-02, grad_scale: 8.0 2024-09-16 17:39:16,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=46600.0, ans=0.2 2024-09-16 17:39:26,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-09-16 17:39:48,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=46680.0, ans=0.125 2024-09-16 17:39:49,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=46680.0, ans=0.2 2024-09-16 17:39:50,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.35 vs. limit=10.0 2024-09-16 17:40:17,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=46760.0, ans=0.0007043478260869568 2024-09-16 17:40:20,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=46760.0, ans=0.125 2024-09-16 17:40:20,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=46760.0, ans=0.0007043478260869568 2024-09-16 17:40:23,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=46760.0, ans=0.0 2024-09-16 17:40:30,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46760.0, ans=0.1 2024-09-16 17:40:32,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=46800.0, ans=0.125 2024-09-16 17:40:33,422 INFO [train.py:1198] (0/2) Epoch 3, batch 2650, loss[loss=0.3341, ctc_loss=0.2799, cr_loss=0.4581, attn_decoder_loss=0.33, over 29284.00 frames. ], tot_loss[loss=0.3198, ctc_loss=0.2641, cr_loss=0.445, attn_decoder_loss=0.316, over 5801625.70 frames. ], batch size: 100, lr: 3.08e-02, grad_scale: 8.0 2024-09-16 17:40:37,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2024-09-16 17:41:12,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=46880.0, ans=0.125 2024-09-16 17:41:14,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=46880.0, ans=0.125 2024-09-16 17:41:15,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.367e+02 1.536e+02 1.778e+02 3.177e+02, threshold=3.072e+02, percent-clipped=2.0 2024-09-16 17:41:17,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=46920.0, ans=0.000669565217391305 2024-09-16 17:41:31,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=46920.0, ans=0.125 2024-09-16 17:41:38,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=46960.0, ans=0.125 2024-09-16 17:41:49,015 INFO [train.py:1198] (0/2) Epoch 3, batch 2700, loss[loss=0.3261, ctc_loss=0.2675, cr_loss=0.4583, attn_decoder_loss=0.3224, over 29507.00 frames. ], tot_loss[loss=0.3201, ctc_loss=0.2647, cr_loss=0.4451, attn_decoder_loss=0.3164, over 5797086.19 frames. ], batch size: 87, lr: 3.08e-02, grad_scale: 8.0 2024-09-16 17:42:31,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=47080.0, ans=0.1 2024-09-16 17:42:40,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.03 vs. limit=15.0 2024-09-16 17:42:54,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=47160.0, ans=0.0 2024-09-16 17:43:00,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=47160.0, ans=0.025 2024-09-16 17:43:05,710 INFO [train.py:1198] (0/2) Epoch 3, batch 2750, loss[loss=0.3011, ctc_loss=0.2478, cr_loss=0.4171, attn_decoder_loss=0.2977, over 29503.00 frames. ], tot_loss[loss=0.3193, ctc_loss=0.2641, cr_loss=0.4442, attn_decoder_loss=0.3156, over 5796053.74 frames. ], batch size: 75, lr: 3.07e-02, grad_scale: 8.0 2024-09-16 17:43:16,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47200.0, ans=0.1 2024-09-16 17:43:50,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.345e+02 1.531e+02 1.898e+02 4.354e+02, threshold=3.062e+02, percent-clipped=3.0 2024-09-16 17:43:56,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=47320.0, ans=0.125 2024-09-16 17:43:59,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47320.0, ans=0.1 2024-09-16 17:44:22,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=47360.0, ans=0.1 2024-09-16 17:44:25,561 INFO [train.py:1198] (0/2) Epoch 3, batch 2800, loss[loss=0.3731, ctc_loss=0.3543, cr_loss=0.443, attn_decoder_loss=0.3653, over 20214.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2637, cr_loss=0.4439, attn_decoder_loss=0.3152, over 5776016.68 frames. ], batch size: 211, lr: 3.07e-02, grad_scale: 16.0 2024-09-16 17:44:33,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=47400.0, ans=0.0005652173913043481 2024-09-16 17:44:37,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47400.0, ans=0.1 2024-09-16 17:44:38,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=47400.0, ans=0.05 2024-09-16 17:44:51,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=47440.0, ans=0.0005565217391304347 2024-09-16 17:44:51,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.71 vs. limit=15.0 2024-09-16 17:45:05,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=47480.0, ans=0.0005478260869565214 2024-09-16 17:45:25,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47560.0, ans=0.1 2024-09-16 17:45:37,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-09-16 17:45:40,750 INFO [train.py:1198] (0/2) Epoch 3, batch 2850, loss[loss=0.3128, ctc_loss=0.2555, cr_loss=0.4513, attn_decoder_loss=0.3091, over 29522.00 frames. ], tot_loss[loss=0.3203, ctc_loss=0.2654, cr_loss=0.4455, attn_decoder_loss=0.3165, over 5762291.85 frames. ], batch size: 77, lr: 3.06e-02, grad_scale: 8.0 2024-09-16 17:46:05,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=47640.0, ans=0.125 2024-09-16 17:46:06,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2024-09-16 17:46:25,030 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.383e+02 1.687e+02 2.154e+02 5.154e+02, threshold=3.374e+02, percent-clipped=7.0 2024-09-16 17:46:25,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47720.0, ans=0.1 2024-09-16 17:46:53,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=47760.0, ans=0.0004869565217391295 2024-09-16 17:46:56,743 INFO [train.py:1198] (0/2) Epoch 3, batch 2900, loss[loss=0.3049, ctc_loss=0.2399, cr_loss=0.4128, attn_decoder_loss=0.3029, over 29408.00 frames. ], tot_loss[loss=0.3211, ctc_loss=0.2653, cr_loss=0.4463, attn_decoder_loss=0.3173, over 5788316.25 frames. ], batch size: 79, lr: 3.06e-02, grad_scale: 8.0 2024-09-16 17:47:03,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=47800.0, ans=0.125 2024-09-16 17:47:03,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=47800.0, ans=0.125 2024-09-16 17:47:29,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=47880.0, ans=0.125 2024-09-16 17:47:31,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=47880.0, ans=0.2 2024-09-16 17:47:34,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=47880.0, ans=0.125 2024-09-16 17:47:40,893 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-09-16 17:48:15,954 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-12000.pt 2024-09-16 17:48:24,250 INFO [train.py:1198] (0/2) Epoch 3, batch 2950, loss[loss=0.2902, ctc_loss=0.2205, cr_loss=0.3956, attn_decoder_loss=0.2891, over 29509.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2632, cr_loss=0.4441, attn_decoder_loss=0.3153, over 5782537.12 frames. ], batch size: 75, lr: 3.05e-02, grad_scale: 8.0 2024-09-16 17:48:44,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=48040.0, ans=0.035 2024-09-16 17:48:50,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-09-16 17:48:53,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=48080.0, ans=0.125 2024-09-16 17:49:08,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.336e+02 1.504e+02 1.810e+02 3.679e+02, threshold=3.009e+02, percent-clipped=1.0 2024-09-16 17:49:23,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=48160.0, ans=0.1 2024-09-16 17:49:40,405 INFO [train.py:1198] (0/2) Epoch 3, batch 3000, loss[loss=0.3272, ctc_loss=0.275, cr_loss=0.4518, attn_decoder_loss=0.323, over 29748.00 frames. ], tot_loss[loss=0.3184, ctc_loss=0.2629, cr_loss=0.4443, attn_decoder_loss=0.3147, over 5782926.65 frames. ], batch size: 81, lr: 3.05e-02, grad_scale: 8.0 2024-09-16 17:49:40,406 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 17:49:58,748 INFO [train.py:1230] (0/2) Epoch 3, validation: loss=0.2335, ctc_loss=0.0936, cr_loss=4.436e-15, attn_decoder_loss=0.2491, over 944034.00 frames. 2024-09-16 17:49:58,749 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 17:50:15,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=48240.0, ans=0.0003826086956521726 2024-09-16 17:50:46,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48320.0, ans=0.1 2024-09-16 17:50:52,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=48320.0, ans=0.0 2024-09-16 17:50:56,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=48320.0, ans=0.125 2024-09-16 17:51:05,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2024-09-16 17:51:08,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=48360.0, ans=0.0 2024-09-16 17:51:16,894 INFO [train.py:1198] (0/2) Epoch 3, batch 3050, loss[loss=0.3111, ctc_loss=0.2508, cr_loss=0.4676, attn_decoder_loss=0.3074, over 29536.00 frames. ], tot_loss[loss=0.3198, ctc_loss=0.2643, cr_loss=0.4461, attn_decoder_loss=0.316, over 5776114.62 frames. ], batch size: 76, lr: 3.04e-02, grad_scale: 4.0 2024-09-16 17:51:24,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=48400.0, ans=0.0003478260869565226 2024-09-16 17:51:33,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=12.0 2024-09-16 17:51:35,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=48440.0, ans=0.0003391304347826075 2024-09-16 17:51:39,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48440.0, ans=0.125 2024-09-16 17:51:47,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=48480.0, ans=0.125 2024-09-16 17:51:53,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=48480.0, ans=0.125 2024-09-16 17:51:56,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=48480.0, ans=0.125 2024-09-16 17:52:04,208 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 1.405e+02 1.578e+02 1.940e+02 5.924e+02, threshold=3.157e+02, percent-clipped=5.0 2024-09-16 17:52:33,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=48600.0, ans=0.025 2024-09-16 17:52:34,211 INFO [train.py:1198] (0/2) Epoch 3, batch 3100, loss[loss=0.3238, ctc_loss=0.2694, cr_loss=0.4509, attn_decoder_loss=0.3198, over 29232.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2634, cr_loss=0.4447, attn_decoder_loss=0.3152, over 5775200.84 frames. ], batch size: 100, lr: 3.04e-02, grad_scale: 8.0 2024-09-16 17:52:39,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=12.0 2024-09-16 17:52:45,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=48600.0, ans=0.05 2024-09-16 17:52:59,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48640.0, ans=0.125 2024-09-16 17:53:13,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=48680.0, ans=0.125 2024-09-16 17:53:30,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-16 17:53:36,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=48760.0, ans=0.125 2024-09-16 17:53:49,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=48800.0, ans=0.00026086956521739237 2024-09-16 17:53:50,244 INFO [train.py:1198] (0/2) Epoch 3, batch 3150, loss[loss=0.3363, ctc_loss=0.278, cr_loss=0.4799, attn_decoder_loss=0.3321, over 28903.00 frames. ], tot_loss[loss=0.3188, ctc_loss=0.2631, cr_loss=0.4451, attn_decoder_loss=0.3151, over 5782384.22 frames. ], batch size: 104, lr: 3.03e-02, grad_scale: 8.0 2024-09-16 17:54:35,637 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.334e+02 1.533e+02 1.776e+02 7.773e+02, threshold=3.065e+02, percent-clipped=4.0 2024-09-16 17:55:00,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=48960.0, ans=0.025 2024-09-16 17:55:07,983 INFO [train.py:1198] (0/2) Epoch 3, batch 3200, loss[loss=0.313, ctc_loss=0.2542, cr_loss=0.451, attn_decoder_loss=0.3095, over 29406.00 frames. ], tot_loss[loss=0.3174, ctc_loss=0.2612, cr_loss=0.4436, attn_decoder_loss=0.3138, over 5792687.84 frames. ], batch size: 79, lr: 3.03e-02, grad_scale: 16.0 2024-09-16 17:55:12,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=22.5 2024-09-16 17:55:20,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=49000.0, ans=0.125 2024-09-16 17:55:35,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.84 vs. limit=15.0 2024-09-16 17:55:37,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=49040.0, ans=0.00020869565217391216 2024-09-16 17:55:39,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=49080.0, ans=0.125 2024-09-16 17:55:57,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=49120.0, ans=0.125 2024-09-16 17:56:13,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-09-16 17:56:19,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=12.0 2024-09-16 17:56:26,022 INFO [train.py:1198] (0/2) Epoch 3, batch 3250, loss[loss=0.331, ctc_loss=0.275, cr_loss=0.4423, attn_decoder_loss=0.3274, over 29703.00 frames. ], tot_loss[loss=0.3182, ctc_loss=0.2619, cr_loss=0.4453, attn_decoder_loss=0.3146, over 5798822.30 frames. ], batch size: 84, lr: 3.03e-02, grad_scale: 8.0 2024-09-16 17:56:53,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=49240.0, ans=0.125 2024-09-16 17:57:03,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49280.0, ans=0.1 2024-09-16 17:57:06,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=49280.0, ans=0.125 2024-09-16 17:57:12,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2024-09-16 17:57:12,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.316e+02 1.449e+02 1.854e+02 6.916e+02, threshold=2.898e+02, percent-clipped=2.0 2024-09-16 17:57:20,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=49320.0, ans=0.125 2024-09-16 17:57:38,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=49360.0, ans=0.125 2024-09-16 17:57:41,277 INFO [train.py:1198] (0/2) Epoch 3, batch 3300, loss[loss=0.3229, ctc_loss=0.2596, cr_loss=0.4433, attn_decoder_loss=0.3201, over 28218.00 frames. ], tot_loss[loss=0.3164, ctc_loss=0.26, cr_loss=0.4433, attn_decoder_loss=0.3128, over 5796063.55 frames. ], batch size: 111, lr: 3.02e-02, grad_scale: 8.0 2024-09-16 17:57:57,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=49440.0, ans=0.125 2024-09-16 17:58:01,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49440.0, ans=0.1 2024-09-16 17:58:11,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-09-16 17:58:16,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=49480.0, ans=0.0 2024-09-16 17:58:33,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=49520.0, ans=0.00010434782608695695 2024-09-16 17:58:34,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=49520.0, ans=0.0 2024-09-16 17:58:40,897 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:58:42,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49560.0, ans=0.125 2024-09-16 17:58:59,515 INFO [train.py:1198] (0/2) Epoch 3, batch 3350, loss[loss=0.332, ctc_loss=0.2781, cr_loss=0.4762, attn_decoder_loss=0.3275, over 28848.00 frames. ], tot_loss[loss=0.3172, ctc_loss=0.2611, cr_loss=0.4443, attn_decoder_loss=0.3135, over 5771656.67 frames. ], batch size: 104, lr: 3.02e-02, grad_scale: 8.0 2024-09-16 17:58:59,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=49600.0, ans=0.09899494936611666 2024-09-16 17:59:02,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=49600.0, ans=0.125 2024-09-16 17:59:09,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=49600.0, ans=0.125 2024-09-16 17:59:17,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=49640.0, ans=10.0 2024-09-16 17:59:21,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=49640.0, ans=0.025 2024-09-16 17:59:48,722 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.330e+02 1.460e+02 1.779e+02 4.186e+02, threshold=2.920e+02, percent-clipped=7.0 2024-09-16 18:00:11,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49760.0, ans=0.125 2024-09-16 18:00:17,599 INFO [train.py:1198] (0/2) Epoch 3, batch 3400, loss[loss=0.2795, ctc_loss=0.2308, cr_loss=0.3804, attn_decoder_loss=0.2765, over 29294.00 frames. ], tot_loss[loss=0.3171, ctc_loss=0.2613, cr_loss=0.4446, attn_decoder_loss=0.3134, over 5763242.57 frames. ], batch size: 67, lr: 3.01e-02, grad_scale: 8.0 2024-09-16 18:00:26,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=49800.0, ans=0.5 2024-09-16 18:00:28,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49800.0, ans=0.1 2024-09-16 18:00:36,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.94 vs. limit=15.0 2024-09-16 18:00:40,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=49840.0, ans=0.125 2024-09-16 18:00:48,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=49880.0, ans=0.125 2024-09-16 18:01:00,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2024-09-16 18:01:06,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=49920.0, ans=1.7391304347826736e-05 2024-09-16 18:01:10,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=49920.0, ans=0.1 2024-09-16 18:01:33,057 INFO [train.py:1198] (0/2) Epoch 3, batch 3450, loss[loss=0.3349, ctc_loss=0.2822, cr_loss=0.4925, attn_decoder_loss=0.3298, over 28149.00 frames. ], tot_loss[loss=0.3173, ctc_loss=0.2613, cr_loss=0.4447, attn_decoder_loss=0.3136, over 5772163.54 frames. ], batch size: 111, lr: 3.01e-02, grad_scale: 8.0 2024-09-16 18:01:48,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=50040.0, ans=0.125 2024-09-16 18:02:19,802 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.389e+02 1.591e+02 1.812e+02 6.127e+02, threshold=3.183e+02, percent-clipped=1.0 2024-09-16 18:02:23,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=12.0 2024-09-16 18:02:30,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=50120.0, ans=0.125 2024-09-16 18:02:40,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.02 vs. limit=12.0 2024-09-16 18:02:50,661 INFO [train.py:1198] (0/2) Epoch 3, batch 3500, loss[loss=0.2893, ctc_loss=0.2389, cr_loss=0.4108, attn_decoder_loss=0.2858, over 29338.00 frames. ], tot_loss[loss=0.3165, ctc_loss=0.2607, cr_loss=0.4441, attn_decoder_loss=0.3128, over 5773820.44 frames. ], batch size: 71, lr: 3.00e-02, grad_scale: 8.0 2024-09-16 18:02:59,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=50200.0, ans=0.1 2024-09-16 18:03:09,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=50240.0, ans=0.125 2024-09-16 18:03:14,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50240.0, ans=0.1 2024-09-16 18:03:24,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=50280.0, ans=0.125 2024-09-16 18:03:43,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=50320.0, ans=0.0 2024-09-16 18:03:45,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=50320.0, ans=0.125 2024-09-16 18:04:07,558 INFO [train.py:1198] (0/2) Epoch 3, batch 3550, loss[loss=0.3245, ctc_loss=0.2646, cr_loss=0.4458, attn_decoder_loss=0.3213, over 29722.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2602, cr_loss=0.4449, attn_decoder_loss=0.3126, over 5781062.76 frames. ], batch size: 89, lr: 3.00e-02, grad_scale: 4.0 2024-09-16 18:04:26,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=50440.0, ans=0.2 2024-09-16 18:04:43,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=50480.0, ans=0.125 2024-09-16 18:04:55,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.391e+02 1.610e+02 2.091e+02 4.528e+02, threshold=3.220e+02, percent-clipped=5.0 2024-09-16 18:05:19,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-09-16 18:05:20,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.01 vs. limit=10.0 2024-09-16 18:05:21,622 INFO [train.py:1198] (0/2) Epoch 3, batch 3600, loss[loss=0.3018, ctc_loss=0.2434, cr_loss=0.4045, attn_decoder_loss=0.2993, over 29483.00 frames. ], tot_loss[loss=0.3164, ctc_loss=0.26, cr_loss=0.4448, attn_decoder_loss=0.3128, over 5791060.05 frames. ], batch size: 77, lr: 2.99e-02, grad_scale: 8.0 2024-09-16 18:05:44,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=50640.0, ans=0.0 2024-09-16 18:05:54,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50680.0, ans=0.1 2024-09-16 18:06:17,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2024-09-16 18:06:20,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-09-16 18:06:27,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=50760.0, ans=0.0 2024-09-16 18:06:35,940 INFO [train.py:1198] (0/2) Epoch 3, batch 3650, loss[loss=0.3398, ctc_loss=0.2864, cr_loss=0.4632, attn_decoder_loss=0.3354, over 29514.00 frames. ], tot_loss[loss=0.3156, ctc_loss=0.2591, cr_loss=0.4437, attn_decoder_loss=0.312, over 5793138.36 frames. ], batch size: 90, lr: 2.99e-02, grad_scale: 4.0 2024-09-16 18:06:54,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=50840.0, ans=0.125 2024-09-16 18:07:01,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=50840.0, ans=0.125 2024-09-16 18:07:03,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=50840.0, ans=0.09899494936611666 2024-09-16 18:07:07,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=50880.0, ans=0.125 2024-09-16 18:07:25,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.262e+02 1.447e+02 1.690e+02 1.332e+03, threshold=2.894e+02, percent-clipped=3.0 2024-09-16 18:07:31,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=50920.0, ans=0.125 2024-09-16 18:07:36,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=50960.0, ans=0.125 2024-09-16 18:07:39,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=50960.0, ans=0.2 2024-09-16 18:07:45,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=50960.0, ans=0.0 2024-09-16 18:07:50,880 INFO [train.py:1198] (0/2) Epoch 3, batch 3700, loss[loss=0.3445, ctc_loss=0.2953, cr_loss=0.4958, attn_decoder_loss=0.3389, over 29705.00 frames. ], tot_loss[loss=0.3161, ctc_loss=0.2592, cr_loss=0.445, attn_decoder_loss=0.3125, over 5803511.77 frames. ], batch size: 84, lr: 2.99e-02, grad_scale: 8.0 2024-09-16 18:08:03,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=51000.0, ans=0.125 2024-09-16 18:08:12,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=51040.0, ans=0.2 2024-09-16 18:08:22,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=51080.0, ans=0.125 2024-09-16 18:08:37,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=51120.0, ans=0.125 2024-09-16 18:08:46,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=51120.0, ans=0.04949747468305833 2024-09-16 18:08:52,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=51160.0, ans=0.125 2024-09-16 18:09:09,219 INFO [train.py:1198] (0/2) Epoch 3, batch 3750, loss[loss=0.274, ctc_loss=0.2208, cr_loss=0.3941, attn_decoder_loss=0.2712, over 29336.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2585, cr_loss=0.4441, attn_decoder_loss=0.3119, over 5807393.49 frames. ], batch size: 67, lr: 2.98e-02, grad_scale: 8.0 2024-09-16 18:09:11,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=51200.0, ans=0.0 2024-09-16 18:09:15,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=51200.0, ans=0.125 2024-09-16 18:09:26,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=51240.0, ans=0.0 2024-09-16 18:09:40,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=51280.0, ans=0.125 2024-09-16 18:09:40,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=51280.0, ans=0.0 2024-09-16 18:09:50,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2024-09-16 18:09:52,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=51320.0, ans=0.025 2024-09-16 18:09:58,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.287e+02 1.522e+02 1.821e+02 1.090e+03, threshold=3.043e+02, percent-clipped=9.0 2024-09-16 18:09:58,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=51320.0, ans=0.0 2024-09-16 18:10:16,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=51360.0, ans=0.0 2024-09-16 18:10:21,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=51360.0, ans=0.125 2024-09-16 18:10:21,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2024-09-16 18:10:23,884 INFO [train.py:1198] (0/2) Epoch 3, batch 3800, loss[loss=0.3125, ctc_loss=0.2468, cr_loss=0.4355, attn_decoder_loss=0.3101, over 29632.00 frames. ], tot_loss[loss=0.3152, ctc_loss=0.2583, cr_loss=0.4438, attn_decoder_loss=0.3117, over 5798364.09 frames. ], batch size: 86, lr: 2.98e-02, grad_scale: 8.0 2024-09-16 18:10:27,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=51400.0, ans=0.0 2024-09-16 18:10:27,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=51400.0, ans=0.125 2024-09-16 18:10:28,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=51400.0, ans=0.0 2024-09-16 18:10:33,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.04 vs. limit=22.5 2024-09-16 18:10:41,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=51440.0, ans=0.125 2024-09-16 18:11:06,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-16 18:11:18,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.84 vs. limit=22.5 2024-09-16 18:11:38,187 INFO [train.py:1198] (0/2) Epoch 3, batch 3850, loss[loss=0.3356, ctc_loss=0.278, cr_loss=0.4499, attn_decoder_loss=0.332, over 29205.00 frames. ], tot_loss[loss=0.315, ctc_loss=0.2573, cr_loss=0.4433, attn_decoder_loss=0.3116, over 5811381.97 frames. ], batch size: 100, lr: 2.97e-02, grad_scale: 8.0 2024-09-16 18:11:38,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51600.0, ans=0.1 2024-09-16 18:11:40,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=51600.0, ans=0.125 2024-09-16 18:11:46,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2024-09-16 18:11:48,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=51600.0, ans=0.2 2024-09-16 18:11:53,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=51640.0, ans=0.125 2024-09-16 18:12:03,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=51640.0, ans=0.125 2024-09-16 18:12:18,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=51680.0, ans=0.025 2024-09-16 18:12:20,128 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:12:27,164 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.321e+02 1.509e+02 1.752e+02 3.872e+02, threshold=3.018e+02, percent-clipped=1.0 2024-09-16 18:12:36,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=51760.0, ans=0.02 2024-09-16 18:12:46,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=51760.0, ans=0.0 2024-09-16 18:12:48,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=51760.0, ans=0.2 2024-09-16 18:12:48,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.03 vs. limit=12.0 2024-09-16 18:12:52,660 INFO [train.py:1198] (0/2) Epoch 3, batch 3900, loss[loss=0.3295, ctc_loss=0.2584, cr_loss=0.4622, attn_decoder_loss=0.3271, over 29630.00 frames. ], tot_loss[loss=0.3157, ctc_loss=0.2578, cr_loss=0.4438, attn_decoder_loss=0.3123, over 5815890.39 frames. ], batch size: 86, lr: 2.97e-02, grad_scale: 8.0 2024-09-16 18:12:55,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=51800.0, ans=0.125 2024-09-16 18:13:00,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=51800.0, ans=0.0 2024-09-16 18:13:06,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.75 vs. limit=22.5 2024-09-16 18:13:07,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=51840.0, ans=0.07 2024-09-16 18:13:17,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=51840.0, ans=0.5 2024-09-16 18:13:24,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-16 18:13:35,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=51920.0, ans=0.125 2024-09-16 18:13:41,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=51920.0, ans=0.125 2024-09-16 18:13:54,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=51960.0, ans=10.0 2024-09-16 18:13:54,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=51960.0, ans=0.125 2024-09-16 18:14:06,784 INFO [train.py:1198] (0/2) Epoch 3, batch 3950, loss[loss=0.3363, ctc_loss=0.2787, cr_loss=0.4589, attn_decoder_loss=0.3326, over 29505.00 frames. ], tot_loss[loss=0.3149, ctc_loss=0.2565, cr_loss=0.4432, attn_decoder_loss=0.3116, over 5834997.99 frames. ], batch size: 97, lr: 2.96e-02, grad_scale: 8.0 2024-09-16 18:14:17,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-09-16 18:14:32,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=52040.0, ans=0.125 2024-09-16 18:14:45,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=52080.0, ans=0.0 2024-09-16 18:14:51,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.22 vs. limit=22.5 2024-09-16 18:14:53,979 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:14:58,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.359e+02 1.491e+02 1.794e+02 3.719e+02, threshold=2.982e+02, percent-clipped=2.0 2024-09-16 18:15:22,983 INFO [train.py:1198] (0/2) Epoch 3, batch 4000, loss[loss=0.3007, ctc_loss=0.247, cr_loss=0.4411, attn_decoder_loss=0.2969, over 29511.00 frames. ], tot_loss[loss=0.3153, ctc_loss=0.2575, cr_loss=0.4444, attn_decoder_loss=0.3118, over 5811136.35 frames. ], batch size: 74, lr: 2.96e-02, grad_scale: 16.0 2024-09-16 18:15:48,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52240.0, ans=0.1 2024-09-16 18:16:06,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=52320.0, ans=0.125 2024-09-16 18:16:19,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2024-09-16 18:16:36,968 INFO [train.py:1198] (0/2) Epoch 3, batch 4050, loss[loss=0.3511, ctc_loss=0.3282, cr_loss=0.4782, attn_decoder_loss=0.343, over 20420.00 frames. ], tot_loss[loss=0.3148, ctc_loss=0.2569, cr_loss=0.444, attn_decoder_loss=0.3114, over 5795250.29 frames. ], batch size: 209, lr: 2.96e-02, grad_scale: 4.0 2024-09-16 18:16:37,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.99 vs. limit=15.0 2024-09-16 18:16:47,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-09-16 18:16:57,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=52440.0, ans=0.0 2024-09-16 18:16:58,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.42 vs. limit=15.0 2024-09-16 18:17:09,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=52480.0, ans=0.2 2024-09-16 18:17:17,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=52480.0, ans=0.125 2024-09-16 18:17:22,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=52520.0, ans=0.0 2024-09-16 18:17:28,049 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.348e+02 1.567e+02 1.841e+02 9.373e+02, threshold=3.134e+02, percent-clipped=5.0 2024-09-16 18:17:31,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=52520.0, ans=0.0 2024-09-16 18:17:34,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52560.0, ans=0.1 2024-09-16 18:17:45,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=52560.0, ans=0.0 2024-09-16 18:17:50,222 INFO [train.py:1198] (0/2) Epoch 3, batch 4100, loss[loss=0.3338, ctc_loss=0.274, cr_loss=0.4589, attn_decoder_loss=0.3303, over 29510.00 frames. ], tot_loss[loss=0.3149, ctc_loss=0.2572, cr_loss=0.4437, attn_decoder_loss=0.3114, over 5791158.72 frames. ], batch size: 90, lr: 2.95e-02, grad_scale: 8.0 2024-09-16 18:17:59,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=52600.0, ans=0.125 2024-09-16 18:17:59,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=52600.0, ans=0.125 2024-09-16 18:18:05,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=52640.0, ans=0.125 2024-09-16 18:18:10,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2024-09-16 18:18:18,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=52680.0, ans=0.125 2024-09-16 18:18:22,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=52680.0, ans=0.125 2024-09-16 18:18:27,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=52680.0, ans=0.1 2024-09-16 18:18:37,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=52720.0, ans=0.125 2024-09-16 18:19:06,556 INFO [train.py:1198] (0/2) Epoch 3, batch 4150, loss[loss=0.304, ctc_loss=0.2463, cr_loss=0.4581, attn_decoder_loss=0.3002, over 29508.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2567, cr_loss=0.4432, attn_decoder_loss=0.3109, over 5796541.51 frames. ], batch size: 77, lr: 2.95e-02, grad_scale: 4.0 2024-09-16 18:19:09,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=52800.0, ans=0.125 2024-09-16 18:19:19,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2024-09-16 18:19:24,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=52840.0, ans=0.125 2024-09-16 18:19:42,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=52880.0, ans=0.125 2024-09-16 18:19:55,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=52920.0, ans=0.125 2024-09-16 18:19:59,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.278e+02 1.455e+02 1.672e+02 3.435e+02, threshold=2.910e+02, percent-clipped=1.0 2024-09-16 18:20:13,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=52960.0, ans=0.125 2024-09-16 18:20:16,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=52960.0, ans=0.125 2024-09-16 18:20:20,386 INFO [train.py:1198] (0/2) Epoch 3, batch 4200, loss[loss=0.3248, ctc_loss=0.265, cr_loss=0.4751, attn_decoder_loss=0.3209, over 29480.00 frames. ], tot_loss[loss=0.3148, ctc_loss=0.257, cr_loss=0.4442, attn_decoder_loss=0.3113, over 5798866.17 frames. ], batch size: 90, lr: 2.94e-02, grad_scale: 8.0 2024-09-16 18:20:23,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=53000.0, ans=0.025 2024-09-16 18:20:27,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.14 vs. limit=10.0 2024-09-16 18:20:35,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=53040.0, ans=0.0 2024-09-16 18:20:37,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.82 vs. limit=22.5 2024-09-16 18:20:39,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=53040.0, ans=0.2 2024-09-16 18:20:50,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=53080.0, ans=0.125 2024-09-16 18:20:50,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=53080.0, ans=0.025 2024-09-16 18:20:51,746 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.554e-03 2024-09-16 18:21:07,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=53120.0, ans=0.1 2024-09-16 18:21:08,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-09-16 18:21:08,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2024-09-16 18:21:13,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2024-09-16 18:21:20,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-09-16 18:21:21,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=53160.0, ans=0.0 2024-09-16 18:21:25,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=53160.0, ans=0.025 2024-09-16 18:21:34,071 INFO [train.py:1198] (0/2) Epoch 3, batch 4250, loss[loss=0.2944, ctc_loss=0.2298, cr_loss=0.4204, attn_decoder_loss=0.2923, over 29521.00 frames. ], tot_loss[loss=0.3149, ctc_loss=0.2569, cr_loss=0.4442, attn_decoder_loss=0.3115, over 5804715.29 frames. ], batch size: 74, lr: 2.94e-02, grad_scale: 4.0 2024-09-16 18:21:35,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=53200.0, ans=0.0 2024-09-16 18:21:35,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=53200.0, ans=0.2 2024-09-16 18:21:57,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=53240.0, ans=0.09899494936611666 2024-09-16 18:22:29,124 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.928e+01 1.354e+02 1.567e+02 1.958e+02 1.183e+03, threshold=3.135e+02, percent-clipped=4.0 2024-09-16 18:22:32,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=53360.0, ans=0.07 2024-09-16 18:22:38,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=53360.0, ans=0.125 2024-09-16 18:22:49,063 INFO [train.py:1198] (0/2) Epoch 3, batch 4300, loss[loss=0.326, ctc_loss=0.265, cr_loss=0.4558, attn_decoder_loss=0.3226, over 29535.00 frames. ], tot_loss[loss=0.3151, ctc_loss=0.257, cr_loss=0.4432, attn_decoder_loss=0.3117, over 5794553.15 frames. ], batch size: 87, lr: 2.93e-02, grad_scale: 8.0 2024-09-16 18:22:49,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=53400.0, ans=0.2 2024-09-16 18:22:57,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2024-09-16 18:23:15,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=15.0 2024-09-16 18:24:03,903 INFO [train.py:1198] (0/2) Epoch 3, batch 4350, loss[loss=0.3419, ctc_loss=0.2782, cr_loss=0.4758, attn_decoder_loss=0.3384, over 29489.00 frames. ], tot_loss[loss=0.3189, ctc_loss=0.2605, cr_loss=0.4481, attn_decoder_loss=0.3155, over 5797962.73 frames. ], batch size: 97, lr: 2.93e-02, grad_scale: 4.0 2024-09-16 18:24:13,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2024-09-16 18:24:14,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=53600.0, ans=0.125 2024-09-16 18:24:33,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=53680.0, ans=0.0 2024-09-16 18:24:36,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=53680.0, ans=0.05 2024-09-16 18:24:58,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.0 2024-09-16 18:24:59,250 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.313e+02 1.497e+02 1.843e+02 5.151e+02, threshold=2.995e+02, percent-clipped=3.0 2024-09-16 18:25:07,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=53760.0, ans=0.2 2024-09-16 18:25:13,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=53760.0, ans=0.125 2024-09-16 18:25:13,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=53760.0, ans=10.0 2024-09-16 18:25:14,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=53760.0, ans=0.125 2024-09-16 18:25:16,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=53800.0, ans=0.0 2024-09-16 18:25:17,595 INFO [train.py:1198] (0/2) Epoch 3, batch 4400, loss[loss=0.3237, ctc_loss=0.2698, cr_loss=0.443, attn_decoder_loss=0.3198, over 27290.00 frames. ], tot_loss[loss=0.3221, ctc_loss=0.2642, cr_loss=0.4515, attn_decoder_loss=0.3185, over 5769184.62 frames. ], batch size: 124, lr: 2.93e-02, grad_scale: 8.0 2024-09-16 18:25:17,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=53800.0, ans=0.0 2024-09-16 18:25:20,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=53800.0, ans=0.125 2024-09-16 18:25:20,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=53800.0, ans=0.0 2024-09-16 18:25:46,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.54 vs. limit=15.0 2024-09-16 18:26:07,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.44 vs. limit=15.0 2024-09-16 18:26:08,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=53920.0, ans=0.125 2024-09-16 18:26:20,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53960.0, ans=0.1 2024-09-16 18:26:31,927 INFO [train.py:1198] (0/2) Epoch 3, batch 4450, loss[loss=0.3666, ctc_loss=0.3487, cr_loss=0.4888, attn_decoder_loss=0.3577, over 19400.00 frames. ], tot_loss[loss=0.326, ctc_loss=0.2711, cr_loss=0.4537, attn_decoder_loss=0.322, over 5580234.61 frames. ], batch size: 210, lr: 2.92e-02, grad_scale: 8.0 2024-09-16 18:26:36,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54000.0, ans=0.125 2024-09-16 18:26:53,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=54040.0, ans=0.125 2024-09-16 18:27:29,246 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.938e+01 1.292e+02 1.431e+02 1.663e+02 2.911e+02, threshold=2.863e+02, percent-clipped=0.0 2024-09-16 18:27:29,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=54120.0, ans=0.2 2024-09-16 18:27:35,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=54160.0, ans=0.0 2024-09-16 18:27:47,157 INFO [train.py:1198] (0/2) Epoch 3, batch 4500, loss[loss=0.3341, ctc_loss=0.2913, cr_loss=0.4512, attn_decoder_loss=0.3288, over 20270.00 frames. ], tot_loss[loss=0.331, ctc_loss=0.2811, cr_loss=0.4549, attn_decoder_loss=0.3265, over 5235745.44 frames. ], batch size: 209, lr: 2.92e-02, grad_scale: 8.0 2024-09-16 18:27:54,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=54200.0, ans=0.0 2024-09-16 18:28:20,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.77 vs. limit=22.5 2024-09-16 18:28:24,120 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-3.pt 2024-09-16 18:29:13,306 INFO [train.py:1198] (0/2) Epoch 4, batch 0, loss[loss=0.405, ctc_loss=0.257, cr_loss=0.4284, attn_decoder_loss=0.4119, over 29605.00 frames. ], tot_loss[loss=0.405, ctc_loss=0.257, cr_loss=0.4284, attn_decoder_loss=0.4119, over 29605.00 frames. ], batch size: 73, lr: 2.73e-02, grad_scale: 4.0 2024-09-16 18:29:13,307 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 18:29:31,681 INFO [train.py:1230] (0/2) Epoch 4, validation: loss=0.259, ctc_loss=0.0933, cr_loss=4.939e-15, attn_decoder_loss=0.2774, over 944034.00 frames. 2024-09-16 18:29:31,681 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 18:29:44,452 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:29:47,067 WARNING [optim.py:503] (0/2) Scaling gradients by 0.06680610030889511, model_norm_threshold=286.2942810058594 2024-09-16 18:29:47,270 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.1.self_attn.linear_k.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.084e+06, grad_sumsq=4.710e+06, orig_rms_sq=1.079e+00 2024-09-16 18:29:49,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=22.5 2024-09-16 18:29:52,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=54340.0, ans=0.07 2024-09-16 18:29:55,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.44 vs. limit=22.5 2024-09-16 18:30:06,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=54380.0, ans=0.5 2024-09-16 18:30:07,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=54380.0, ans=0.125 2024-09-16 18:30:18,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=54420.0, ans=0.125 2024-09-16 18:30:39,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=29.39 vs. limit=22.5 2024-09-16 18:30:46,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=54460.0, ans=0.0 2024-09-16 18:30:48,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=54500.0, ans=0.125 2024-09-16 18:30:51,642 INFO [train.py:1198] (0/2) Epoch 4, batch 50, loss[loss=0.2711, ctc_loss=0.2144, cr_loss=0.3672, attn_decoder_loss=0.2692, over 29437.00 frames. ], tot_loss[loss=0.3241, ctc_loss=0.264, cr_loss=0.4419, attn_decoder_loss=0.321, over 1268623.09 frames. ], batch size: 70, lr: 2.72e-02, grad_scale: 2.0 2024-09-16 18:30:54,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=54500.0, ans=0.0 2024-09-16 18:31:00,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2024-09-16 18:31:03,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=54500.0, ans=0.125 2024-09-16 18:31:15,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.251e+02 1.386e+02 1.651e+02 4.285e+03, threshold=2.772e+02, percent-clipped=8.0 2024-09-16 18:31:32,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2024-09-16 18:31:35,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=54620.0, ans=0.025 2024-09-16 18:31:39,493 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2024-09-16 18:32:03,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.65 vs. limit=22.5 2024-09-16 18:32:07,176 INFO [train.py:1198] (0/2) Epoch 4, batch 100, loss[loss=0.2959, ctc_loss=0.2323, cr_loss=0.4533, attn_decoder_loss=0.2929, over 29529.00 frames. ], tot_loss[loss=0.3212, ctc_loss=0.2621, cr_loss=0.4462, attn_decoder_loss=0.3179, over 2252251.77 frames. ], batch size: 76, lr: 2.72e-02, grad_scale: 4.0 2024-09-16 18:32:42,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=54780.0, ans=0.1 2024-09-16 18:32:51,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54820.0, ans=0.125 2024-09-16 18:32:54,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=22.5 2024-09-16 18:32:55,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=54820.0, ans=0.0 2024-09-16 18:33:07,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=54860.0, ans=0.0 2024-09-16 18:33:16,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=54860.0, ans=22.5 2024-09-16 18:33:18,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=54860.0, ans=0.125 2024-09-16 18:33:23,764 INFO [train.py:1198] (0/2) Epoch 4, batch 150, loss[loss=0.2715, ctc_loss=0.2114, cr_loss=0.4041, attn_decoder_loss=0.2692, over 29394.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2573, cr_loss=0.4447, attn_decoder_loss=0.313, over 3046641.74 frames. ], batch size: 70, lr: 2.72e-02, grad_scale: 4.0 2024-09-16 18:33:48,188 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.258e+02 1.425e+02 1.595e+02 3.260e+02, threshold=2.849e+02, percent-clipped=3.0 2024-09-16 18:33:57,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=54980.0, ans=0.0 2024-09-16 18:33:57,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=54980.0, ans=0.125 2024-09-16 18:34:38,957 INFO [train.py:1198] (0/2) Epoch 4, batch 200, loss[loss=0.3171, ctc_loss=0.2476, cr_loss=0.4176, attn_decoder_loss=0.3155, over 27558.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2552, cr_loss=0.4444, attn_decoder_loss=0.311, over 3658414.73 frames. ], batch size: 125, lr: 2.71e-02, grad_scale: 8.0 2024-09-16 18:35:21,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55180.0, ans=0.1 2024-09-16 18:35:22,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=55180.0, ans=0.2 2024-09-16 18:35:30,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=55220.0, ans=0.2 2024-09-16 18:35:56,995 INFO [train.py:1198] (0/2) Epoch 4, batch 250, loss[loss=0.332, ctc_loss=0.2707, cr_loss=0.4745, attn_decoder_loss=0.3282, over 29208.00 frames. ], tot_loss[loss=0.313, ctc_loss=0.2533, cr_loss=0.4442, attn_decoder_loss=0.3098, over 4140526.30 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 4.0 2024-09-16 18:36:12,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-09-16 18:36:14,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.21 vs. limit=15.0 2024-09-16 18:36:22,544 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.428e+01 1.364e+02 1.529e+02 1.729e+02 3.264e+02, threshold=3.057e+02, percent-clipped=1.0 2024-09-16 18:36:32,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=55380.0, ans=0.125 2024-09-16 18:36:32,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=55380.0, ans=0.125 2024-09-16 18:36:32,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.36 vs. limit=22.5 2024-09-16 18:36:56,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=55460.0, ans=0.0 2024-09-16 18:37:14,468 INFO [train.py:1198] (0/2) Epoch 4, batch 300, loss[loss=0.3172, ctc_loss=0.2495, cr_loss=0.423, attn_decoder_loss=0.3153, over 29539.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.2522, cr_loss=0.4427, attn_decoder_loss=0.3088, over 4508804.27 frames. ], batch size: 92, lr: 2.70e-02, grad_scale: 8.0 2024-09-16 18:37:43,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=55580.0, ans=0.1 2024-09-16 18:37:44,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=55580.0, ans=0.0 2024-09-16 18:38:29,981 INFO [train.py:1198] (0/2) Epoch 4, batch 350, loss[loss=0.2697, ctc_loss=0.2005, cr_loss=0.3718, attn_decoder_loss=0.2692, over 29301.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.2515, cr_loss=0.4423, attn_decoder_loss=0.3089, over 4794406.45 frames. ], batch size: 71, lr: 2.70e-02, grad_scale: 8.0 2024-09-16 18:38:34,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=55700.0, ans=0.025 2024-09-16 18:38:59,298 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.338e+02 1.528e+02 1.849e+02 4.816e+02, threshold=3.056e+02, percent-clipped=1.0 2024-09-16 18:39:04,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=55780.0, ans=0.125 2024-09-16 18:39:07,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=55780.0, ans=0.0 2024-09-16 18:39:14,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=55780.0, ans=0.125 2024-09-16 18:39:15,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.11 vs. limit=15.0 2024-09-16 18:39:20,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-16 18:39:32,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=55860.0, ans=0.0 2024-09-16 18:39:46,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=55900.0, ans=0.125 2024-09-16 18:39:47,900 INFO [train.py:1198] (0/2) Epoch 4, batch 400, loss[loss=0.3261, ctc_loss=0.2746, cr_loss=0.4762, attn_decoder_loss=0.3212, over 29699.00 frames. ], tot_loss[loss=0.3113, ctc_loss=0.2505, cr_loss=0.4421, attn_decoder_loss=0.3082, over 5023211.26 frames. ], batch size: 82, lr: 2.70e-02, grad_scale: 8.0 2024-09-16 18:39:55,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=55900.0, ans=0.125 2024-09-16 18:40:08,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-09-16 18:40:41,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=56020.0, ans=0.125 2024-09-16 18:40:50,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56060.0, ans=0.1 2024-09-16 18:40:51,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.99 vs. limit=22.5 2024-09-16 18:40:52,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=15.0 2024-09-16 18:41:05,934 INFO [train.py:1198] (0/2) Epoch 4, batch 450, loss[loss=0.3207, ctc_loss=0.2448, cr_loss=0.455, attn_decoder_loss=0.319, over 29686.00 frames. ], tot_loss[loss=0.3114, ctc_loss=0.2505, cr_loss=0.4424, attn_decoder_loss=0.3083, over 5187296.63 frames. ], batch size: 83, lr: 2.69e-02, grad_scale: 8.0 2024-09-16 18:41:27,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=56140.0, ans=0.125 2024-09-16 18:41:34,591 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.288e+02 1.422e+02 1.644e+02 6.882e+02, threshold=2.845e+02, percent-clipped=3.0 2024-09-16 18:41:42,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=56180.0, ans=0.125 2024-09-16 18:41:58,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-09-16 18:42:12,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=56260.0, ans=0.0 2024-09-16 18:42:14,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2024-09-16 18:42:18,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=56260.0, ans=0.2 2024-09-16 18:42:21,534 INFO [train.py:1198] (0/2) Epoch 4, batch 500, loss[loss=0.3274, ctc_loss=0.2744, cr_loss=0.4524, attn_decoder_loss=0.3232, over 29414.00 frames. ], tot_loss[loss=0.3099, ctc_loss=0.2485, cr_loss=0.442, attn_decoder_loss=0.3069, over 5330441.85 frames. ], batch size: 94, lr: 2.69e-02, grad_scale: 8.0 2024-09-16 18:42:57,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=56380.0, ans=0.125 2024-09-16 18:43:14,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=12.0 2024-09-16 18:43:24,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-09-16 18:43:30,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=56460.0, ans=0.125 2024-09-16 18:43:38,981 INFO [train.py:1198] (0/2) Epoch 4, batch 550, loss[loss=0.3282, ctc_loss=0.2704, cr_loss=0.4623, attn_decoder_loss=0.3243, over 28828.00 frames. ], tot_loss[loss=0.3102, ctc_loss=0.2491, cr_loss=0.4413, attn_decoder_loss=0.3072, over 5422115.34 frames. ], batch size: 104, lr: 2.69e-02, grad_scale: 8.0 2024-09-16 18:43:39,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=56500.0, ans=0.125 2024-09-16 18:43:48,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=56500.0, ans=0.1 2024-09-16 18:44:00,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=56540.0, ans=0.0 2024-09-16 18:44:02,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=56540.0, ans=0.125 2024-09-16 18:44:09,214 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.307e+02 1.429e+02 1.661e+02 4.927e+02, threshold=2.859e+02, percent-clipped=1.0 2024-09-16 18:44:18,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=56580.0, ans=0.0 2024-09-16 18:44:20,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2024-09-16 18:44:36,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=56620.0, ans=0.2 2024-09-16 18:44:56,988 INFO [train.py:1198] (0/2) Epoch 4, batch 600, loss[loss=0.3214, ctc_loss=0.2617, cr_loss=0.4465, attn_decoder_loss=0.3182, over 29248.00 frames. ], tot_loss[loss=0.3097, ctc_loss=0.2482, cr_loss=0.4411, attn_decoder_loss=0.3068, over 5509426.60 frames. ], batch size: 100, lr: 2.68e-02, grad_scale: 8.0 2024-09-16 18:45:13,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=56740.0, ans=0.125 2024-09-16 18:45:22,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=56740.0, ans=0.0 2024-09-16 18:45:29,042 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:45:30,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.17 vs. limit=22.5 2024-09-16 18:45:47,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-09-16 18:46:07,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.37 vs. limit=22.5 2024-09-16 18:46:12,467 INFO [train.py:1198] (0/2) Epoch 4, batch 650, loss[loss=0.3139, ctc_loss=0.2466, cr_loss=0.4264, attn_decoder_loss=0.3119, over 29766.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.247, cr_loss=0.44, attn_decoder_loss=0.3058, over 5586235.69 frames. ], batch size: 81, lr: 2.68e-02, grad_scale: 4.0 2024-09-16 18:46:19,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=56900.0, ans=22.5 2024-09-16 18:46:21,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=56900.0, ans=0.125 2024-09-16 18:46:27,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=56940.0, ans=0.95 2024-09-16 18:46:39,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=56940.0, ans=0.125 2024-09-16 18:46:46,221 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.273e+02 1.380e+02 1.624e+02 3.709e+02, threshold=2.760e+02, percent-clipped=3.0 2024-09-16 18:46:59,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.97 vs. limit=15.0 2024-09-16 18:47:30,026 INFO [train.py:1198] (0/2) Epoch 4, batch 700, loss[loss=0.3116, ctc_loss=0.2546, cr_loss=0.4484, attn_decoder_loss=0.308, over 29533.00 frames. ], tot_loss[loss=0.3091, ctc_loss=0.2471, cr_loss=0.4402, attn_decoder_loss=0.3062, over 5636603.97 frames. ], batch size: 76, lr: 2.67e-02, grad_scale: 8.0 2024-09-16 18:47:33,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=57100.0, ans=0.0 2024-09-16 18:47:43,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=57140.0, ans=0.125 2024-09-16 18:47:48,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=57140.0, ans=0.125 2024-09-16 18:47:54,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=57140.0, ans=0.125 2024-09-16 18:48:07,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-09-16 18:48:12,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=57180.0, ans=0.0 2024-09-16 18:48:37,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=57260.0, ans=0.1 2024-09-16 18:48:37,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.71 vs. limit=22.5 2024-09-16 18:48:41,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=57260.0, ans=0.035 2024-09-16 18:48:44,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=57300.0, ans=0.125 2024-09-16 18:48:46,079 INFO [train.py:1198] (0/2) Epoch 4, batch 750, loss[loss=0.3207, ctc_loss=0.2563, cr_loss=0.4624, attn_decoder_loss=0.3176, over 29720.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2467, cr_loss=0.4404, attn_decoder_loss=0.3058, over 5675170.34 frames. ], batch size: 82, lr: 2.67e-02, grad_scale: 4.0 2024-09-16 18:48:47,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=57300.0, ans=0.1 2024-09-16 18:49:03,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=57340.0, ans=0.2 2024-09-16 18:49:21,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.371e+02 1.558e+02 1.817e+02 5.424e+02, threshold=3.116e+02, percent-clipped=2.0 2024-09-16 18:49:26,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=57380.0, ans=0.1 2024-09-16 18:49:26,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=57380.0, ans=0.125 2024-09-16 18:49:26,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.96 vs. limit=15.0 2024-09-16 18:49:40,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2024-09-16 18:50:03,606 INFO [train.py:1198] (0/2) Epoch 4, batch 800, loss[loss=0.2852, ctc_loss=0.2201, cr_loss=0.3984, attn_decoder_loss=0.2836, over 29617.00 frames. ], tot_loss[loss=0.3085, ctc_loss=0.2465, cr_loss=0.4405, attn_decoder_loss=0.3056, over 5706877.11 frames. ], batch size: 73, lr: 2.67e-02, grad_scale: 8.0 2024-09-16 18:50:17,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=57540.0, ans=0.0 2024-09-16 18:50:28,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.89 vs. limit=15.0 2024-09-16 18:50:38,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=57580.0, ans=0.0 2024-09-16 18:50:52,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=57620.0, ans=0.125 2024-09-16 18:50:52,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=57620.0, ans=0.125 2024-09-16 18:50:52,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=57620.0, ans=0.125 2024-09-16 18:50:54,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=57620.0, ans=0.0 2024-09-16 18:51:17,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-09-16 18:51:20,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2024-09-16 18:51:20,850 INFO [train.py:1198] (0/2) Epoch 4, batch 850, loss[loss=0.3162, ctc_loss=0.2455, cr_loss=0.4736, attn_decoder_loss=0.3135, over 29716.00 frames. ], tot_loss[loss=0.3081, ctc_loss=0.246, cr_loss=0.4403, attn_decoder_loss=0.3052, over 5735570.27 frames. ], batch size: 89, lr: 2.66e-02, grad_scale: 4.0 2024-09-16 18:51:24,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=57700.0, ans=0.125 2024-09-16 18:51:33,034 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:51:34,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=57740.0, ans=0.125 2024-09-16 18:51:34,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=57740.0, ans=0.125 2024-09-16 18:51:45,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-09-16 18:51:55,368 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.339e+02 1.546e+02 1.753e+02 3.025e+02, threshold=3.091e+02, percent-clipped=0.0 2024-09-16 18:52:06,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=57820.0, ans=0.0 2024-09-16 18:52:16,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=57820.0, ans=0.0 2024-09-16 18:52:17,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-09-16 18:52:36,426 INFO [train.py:1198] (0/2) Epoch 4, batch 900, loss[loss=0.2862, ctc_loss=0.2267, cr_loss=0.4371, attn_decoder_loss=0.2831, over 29592.00 frames. ], tot_loss[loss=0.3084, ctc_loss=0.2465, cr_loss=0.4405, attn_decoder_loss=0.3055, over 5740903.19 frames. ], batch size: 73, lr: 2.66e-02, grad_scale: 8.0 2024-09-16 18:52:46,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=57900.0, ans=0.09899494936611666 2024-09-16 18:52:47,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=57900.0, ans=0.0 2024-09-16 18:52:53,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff3.min_abs, batch_count=57940.0, ans=0.2 2024-09-16 18:53:05,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=57940.0, ans=0.035 2024-09-16 18:53:05,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=57940.0, ans=0.125 2024-09-16 18:53:19,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=57980.0, ans=0.125 2024-09-16 18:53:25,280 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:53:37,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=58060.0, ans=0.07 2024-09-16 18:53:40,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=58060.0, ans=0.1 2024-09-16 18:53:53,365 INFO [train.py:1198] (0/2) Epoch 4, batch 950, loss[loss=0.289, ctc_loss=0.2254, cr_loss=0.4179, attn_decoder_loss=0.2867, over 29514.00 frames. ], tot_loss[loss=0.3086, ctc_loss=0.2464, cr_loss=0.441, attn_decoder_loss=0.3057, over 5741121.46 frames. ], batch size: 74, lr: 2.66e-02, grad_scale: 4.0 2024-09-16 18:54:29,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.318e+02 1.459e+02 1.683e+02 8.183e+02, threshold=2.918e+02, percent-clipped=3.0 2024-09-16 18:54:30,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=58180.0, ans=0.0 2024-09-16 18:54:32,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.51 vs. limit=22.5 2024-09-16 18:54:42,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=58220.0, ans=0.125 2024-09-16 18:54:44,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=58220.0, ans=0.125 2024-09-16 18:55:02,191 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:55:07,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=58260.0, ans=0.0 2024-09-16 18:55:10,844 INFO [train.py:1198] (0/2) Epoch 4, batch 1000, loss[loss=0.2891, ctc_loss=0.2206, cr_loss=0.4, attn_decoder_loss=0.2878, over 29502.00 frames. ], tot_loss[loss=0.3091, ctc_loss=0.2471, cr_loss=0.441, attn_decoder_loss=0.3061, over 5734894.02 frames. ], batch size: 77, lr: 2.65e-02, grad_scale: 8.0 2024-09-16 18:55:25,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-16 18:55:42,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=58380.0, ans=0.2 2024-09-16 18:55:47,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=58380.0, ans=0.035 2024-09-16 18:55:50,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=58380.0, ans=0.125 2024-09-16 18:56:15,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=58460.0, ans=0.2 2024-09-16 18:56:24,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.71 vs. limit=22.5 2024-09-16 18:56:28,235 INFO [train.py:1198] (0/2) Epoch 4, batch 1050, loss[loss=0.3211, ctc_loss=0.2513, cr_loss=0.4449, attn_decoder_loss=0.3189, over 29673.00 frames. ], tot_loss[loss=0.3081, ctc_loss=0.246, cr_loss=0.4403, attn_decoder_loss=0.3052, over 5743924.67 frames. ], batch size: 85, lr: 2.65e-02, grad_scale: 4.0 2024-09-16 18:56:31,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2024-09-16 18:56:57,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58580.0, ans=0.1 2024-09-16 18:57:03,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=58580.0, ans=0.125 2024-09-16 18:57:06,150 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.263e+02 1.458e+02 1.745e+02 4.654e+02, threshold=2.917e+02, percent-clipped=3.0 2024-09-16 18:57:07,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.15 vs. limit=15.0 2024-09-16 18:57:21,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=58620.0, ans=0.025 2024-09-16 18:57:34,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=58660.0, ans=6.0 2024-09-16 18:57:35,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-09-16 18:57:43,659 INFO [train.py:1198] (0/2) Epoch 4, batch 1100, loss[loss=0.3121, ctc_loss=0.248, cr_loss=0.4885, attn_decoder_loss=0.3084, over 29451.00 frames. ], tot_loss[loss=0.308, ctc_loss=0.2455, cr_loss=0.4399, attn_decoder_loss=0.3051, over 5756257.48 frames. ], batch size: 78, lr: 2.65e-02, grad_scale: 8.0 2024-09-16 18:58:11,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=58740.0, ans=0.125 2024-09-16 18:58:43,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=26.50 vs. limit=22.5 2024-09-16 18:59:01,087 INFO [train.py:1198] (0/2) Epoch 4, batch 1150, loss[loss=0.3106, ctc_loss=0.2493, cr_loss=0.4515, attn_decoder_loss=0.3074, over 29464.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.2453, cr_loss=0.4391, attn_decoder_loss=0.305, over 5754475.91 frames. ], batch size: 78, lr: 2.64e-02, grad_scale: 4.0 2024-09-16 18:59:11,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=12.0 2024-09-16 18:59:15,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=58940.0, ans=0.125 2024-09-16 18:59:40,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.556e+01 1.271e+02 1.479e+02 1.697e+02 4.647e+02, threshold=2.959e+02, percent-clipped=3.0 2024-09-16 18:59:47,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-09-16 18:59:59,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=59020.0, ans=0.0 2024-09-16 19:00:13,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.12 vs. limit=22.5 2024-09-16 19:00:18,996 INFO [train.py:1198] (0/2) Epoch 4, batch 1200, loss[loss=0.3072, ctc_loss=0.2447, cr_loss=0.4319, attn_decoder_loss=0.3046, over 29674.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2467, cr_loss=0.4408, attn_decoder_loss=0.3058, over 5748355.51 frames. ], batch size: 85, lr: 2.64e-02, grad_scale: 8.0 2024-09-16 19:00:35,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=59140.0, ans=0.025 2024-09-16 19:01:11,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=59220.0, ans=0.125 2024-09-16 19:01:14,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=59220.0, ans=0.05 2024-09-16 19:01:14,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.74 vs. limit=15.0 2024-09-16 19:01:17,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=59220.0, ans=0.125 2024-09-16 19:01:29,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-16 19:01:34,925 INFO [train.py:1198] (0/2) Epoch 4, batch 1250, loss[loss=0.3246, ctc_loss=0.2698, cr_loss=0.4631, attn_decoder_loss=0.3203, over 29529.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.2473, cr_loss=0.4422, attn_decoder_loss=0.3065, over 5774882.58 frames. ], batch size: 92, lr: 2.63e-02, grad_scale: 4.0 2024-09-16 19:01:43,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2024-09-16 19:01:51,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2024-09-16 19:02:02,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.45 vs. limit=22.5 2024-09-16 19:02:05,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=59380.0, ans=0.025 2024-09-16 19:02:07,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59380.0, ans=0.1 2024-09-16 19:02:08,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=59380.0, ans=0.125 2024-09-16 19:02:08,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=59380.0, ans=0.0 2024-09-16 19:02:15,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.296e+02 1.466e+02 1.683e+02 4.153e+02, threshold=2.932e+02, percent-clipped=2.0 2024-09-16 19:02:34,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=59420.0, ans=0.125 2024-09-16 19:02:41,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59460.0, ans=0.1 2024-09-16 19:02:52,644 INFO [train.py:1198] (0/2) Epoch 4, batch 1300, loss[loss=0.3324, ctc_loss=0.2689, cr_loss=0.4503, attn_decoder_loss=0.3295, over 28299.00 frames. ], tot_loss[loss=0.3085, ctc_loss=0.2462, cr_loss=0.4415, attn_decoder_loss=0.3056, over 5778853.67 frames. ], batch size: 111, lr: 2.63e-02, grad_scale: 8.0 2024-09-16 19:03:11,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-09-16 19:03:16,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-09-16 19:03:27,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=59580.0, ans=0.125 2024-09-16 19:03:36,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=12.0 2024-09-16 19:03:43,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2024-09-16 19:03:50,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=59620.0, ans=0.125 2024-09-16 19:03:52,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=59660.0, ans=0.025 2024-09-16 19:04:08,496 INFO [train.py:1198] (0/2) Epoch 4, batch 1350, loss[loss=0.3083, ctc_loss=0.2494, cr_loss=0.4523, attn_decoder_loss=0.3048, over 29763.00 frames. ], tot_loss[loss=0.3078, ctc_loss=0.245, cr_loss=0.4408, attn_decoder_loss=0.305, over 5797288.38 frames. ], batch size: 81, lr: 2.63e-02, grad_scale: 4.0 2024-09-16 19:04:11,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=59700.0, ans=0.125 2024-09-16 19:04:18,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=59700.0, ans=0.2 2024-09-16 19:04:52,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.260e+02 1.419e+02 1.691e+02 3.213e+02, threshold=2.838e+02, percent-clipped=1.0 2024-09-16 19:05:25,810 INFO [train.py:1198] (0/2) Epoch 4, batch 1400, loss[loss=0.2593, ctc_loss=0.192, cr_loss=0.3979, attn_decoder_loss=0.258, over 29561.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2446, cr_loss=0.4406, attn_decoder_loss=0.3048, over 5808062.96 frames. ], batch size: 69, lr: 2.62e-02, grad_scale: 8.0 2024-09-16 19:05:27,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-09-16 19:05:35,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=59900.0, ans=0.0 2024-09-16 19:05:38,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-09-16 19:05:50,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=59940.0, ans=0.125 2024-09-16 19:06:15,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-09-16 19:06:19,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.35 vs. limit=15.0 2024-09-16 19:06:28,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=60060.0, ans=0.125 2024-09-16 19:06:43,769 INFO [train.py:1198] (0/2) Epoch 4, batch 1450, loss[loss=0.344, ctc_loss=0.2783, cr_loss=0.482, attn_decoder_loss=0.3406, over 29442.00 frames. ], tot_loss[loss=0.308, ctc_loss=0.2449, cr_loss=0.4404, attn_decoder_loss=0.3052, over 5805055.22 frames. ], batch size: 94, lr: 2.62e-02, grad_scale: 4.0 2024-09-16 19:06:45,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=60100.0, ans=0.125 2024-09-16 19:07:02,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=60140.0, ans=0.125 2024-09-16 19:07:06,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60140.0, ans=0.1 2024-09-16 19:07:27,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.279e+02 1.464e+02 1.663e+02 3.366e+02, threshold=2.927e+02, percent-clipped=3.0 2024-09-16 19:07:27,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=60220.0, ans=0.0 2024-09-16 19:07:38,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=60220.0, ans=0.125 2024-09-16 19:07:59,067 INFO [train.py:1198] (0/2) Epoch 4, batch 1500, loss[loss=0.32, ctc_loss=0.2592, cr_loss=0.4873, attn_decoder_loss=0.3159, over 29643.00 frames. ], tot_loss[loss=0.3084, ctc_loss=0.2453, cr_loss=0.4412, attn_decoder_loss=0.3056, over 5806845.02 frames. ], batch size: 86, lr: 2.62e-02, grad_scale: 8.0 2024-09-16 19:07:59,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-09-16 19:09:02,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=60460.0, ans=0.0 2024-09-16 19:09:14,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=60460.0, ans=0.025 2024-09-16 19:09:16,973 INFO [train.py:1198] (0/2) Epoch 4, batch 1550, loss[loss=0.317, ctc_loss=0.2494, cr_loss=0.4572, attn_decoder_loss=0.3143, over 29520.00 frames. ], tot_loss[loss=0.3087, ctc_loss=0.2456, cr_loss=0.4411, attn_decoder_loss=0.3059, over 5781230.96 frames. ], batch size: 90, lr: 2.61e-02, grad_scale: 4.0 2024-09-16 19:09:27,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=60500.0, ans=0.0 2024-09-16 19:09:30,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=60540.0, ans=0.09899494936611666 2024-09-16 19:09:40,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2024-09-16 19:09:53,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=60580.0, ans=0.0 2024-09-16 19:10:01,805 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.301e+02 1.510e+02 1.822e+02 6.597e+02, threshold=3.020e+02, percent-clipped=6.0 2024-09-16 19:10:09,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=60620.0, ans=15.0 2024-09-16 19:10:09,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=60620.0, ans=0.0 2024-09-16 19:10:11,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=60620.0, ans=0.125 2024-09-16 19:10:25,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=60660.0, ans=0.125 2024-09-16 19:10:31,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=60660.0, ans=0.0 2024-09-16 19:10:34,147 INFO [train.py:1198] (0/2) Epoch 4, batch 1600, loss[loss=0.3333, ctc_loss=0.2657, cr_loss=0.4773, attn_decoder_loss=0.3302, over 29679.00 frames. ], tot_loss[loss=0.3086, ctc_loss=0.2459, cr_loss=0.4411, attn_decoder_loss=0.3058, over 5762112.39 frames. ], batch size: 85, lr: 2.61e-02, grad_scale: 8.0 2024-09-16 19:10:38,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=60700.0, ans=0.125 2024-09-16 19:10:51,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=60740.0, ans=0.125 2024-09-16 19:10:57,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60740.0, ans=0.1 2024-09-16 19:11:18,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=60820.0, ans=0.125 2024-09-16 19:11:39,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=60860.0, ans=0.07 2024-09-16 19:11:47,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=60860.0, ans=0.0 2024-09-16 19:11:50,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=60900.0, ans=0.125 2024-09-16 19:11:52,009 INFO [train.py:1198] (0/2) Epoch 4, batch 1650, loss[loss=0.3295, ctc_loss=0.2696, cr_loss=0.4885, attn_decoder_loss=0.3253, over 29715.00 frames. ], tot_loss[loss=0.3084, ctc_loss=0.2458, cr_loss=0.4414, attn_decoder_loss=0.3056, over 5757864.34 frames. ], batch size: 89, lr: 2.61e-02, grad_scale: 4.0 2024-09-16 19:11:58,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=60900.0, ans=0.09899494936611666 2024-09-16 19:12:04,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=60900.0, ans=0.125 2024-09-16 19:12:38,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.008e+02 1.275e+02 1.417e+02 1.655e+02 4.421e+02, threshold=2.835e+02, percent-clipped=2.0 2024-09-16 19:12:40,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=61020.0, ans=0.125 2024-09-16 19:12:42,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=61020.0, ans=0.2 2024-09-16 19:12:43,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61020.0, ans=0.125 2024-09-16 19:12:47,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.02 vs. limit=10.0 2024-09-16 19:13:03,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=12.0 2024-09-16 19:13:07,426 INFO [train.py:1198] (0/2) Epoch 4, batch 1700, loss[loss=0.2719, ctc_loss=0.2149, cr_loss=0.3984, attn_decoder_loss=0.2693, over 29570.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2446, cr_loss=0.4409, attn_decoder_loss=0.3048, over 5779875.67 frames. ], batch size: 69, lr: 2.60e-02, grad_scale: 8.0 2024-09-16 19:13:54,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-09-16 19:14:06,686 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:14:22,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.23 vs. limit=15.0 2024-09-16 19:14:25,000 INFO [train.py:1198] (0/2) Epoch 4, batch 1750, loss[loss=0.2648, ctc_loss=0.1964, cr_loss=0.3785, attn_decoder_loss=0.2639, over 29384.00 frames. ], tot_loss[loss=0.3064, ctc_loss=0.2429, cr_loss=0.4403, attn_decoder_loss=0.3037, over 5787906.87 frames. ], batch size: 67, lr: 2.60e-02, grad_scale: 8.0 2024-09-16 19:14:28,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=61300.0, ans=0.2 2024-09-16 19:14:31,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61300.0, ans=0.125 2024-09-16 19:14:54,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=12.0 2024-09-16 19:15:09,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=61420.0, ans=0.125 2024-09-16 19:15:11,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.273e+01 1.237e+02 1.382e+02 1.538e+02 2.452e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-16 19:15:12,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.84 vs. limit=22.5 2024-09-16 19:15:18,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.40 vs. limit=15.0 2024-09-16 19:15:42,011 INFO [train.py:1198] (0/2) Epoch 4, batch 1800, loss[loss=0.3017, ctc_loss=0.2314, cr_loss=0.4596, attn_decoder_loss=0.2993, over 29695.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2429, cr_loss=0.4406, attn_decoder_loss=0.3038, over 5790381.29 frames. ], batch size: 83, lr: 2.60e-02, grad_scale: 8.0 2024-09-16 19:15:52,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=61500.0, ans=0.2 2024-09-16 19:16:35,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=61620.0, ans=0.0 2024-09-16 19:16:57,519 INFO [train.py:1198] (0/2) Epoch 4, batch 1850, loss[loss=0.3188, ctc_loss=0.2434, cr_loss=0.4509, attn_decoder_loss=0.3172, over 29640.00 frames. ], tot_loss[loss=0.3061, ctc_loss=0.2417, cr_loss=0.4401, attn_decoder_loss=0.3034, over 5795634.92 frames. ], batch size: 86, lr: 2.59e-02, grad_scale: 4.0 2024-09-16 19:17:08,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61700.0, ans=0.125 2024-09-16 19:17:46,910 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.284e+02 1.452e+02 1.621e+02 3.527e+02, threshold=2.905e+02, percent-clipped=2.0 2024-09-16 19:17:48,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=61820.0, ans=0.2 2024-09-16 19:18:08,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-09-16 19:18:12,181 INFO [train.py:1198] (0/2) Epoch 4, batch 1900, loss[loss=0.3061, ctc_loss=0.2287, cr_loss=0.4407, attn_decoder_loss=0.3049, over 29724.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2423, cr_loss=0.4405, attn_decoder_loss=0.304, over 5803900.21 frames. ], batch size: 89, lr: 2.59e-02, grad_scale: 8.0 2024-09-16 19:18:35,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61940.0, ans=0.125 2024-09-16 19:18:41,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=61940.0, ans=0.025 2024-09-16 19:18:53,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=61980.0, ans=0.125 2024-09-16 19:18:56,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=61980.0, ans=0.07 2024-09-16 19:19:22,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.54 vs. limit=12.0 2024-09-16 19:19:31,263 INFO [train.py:1198] (0/2) Epoch 4, batch 1950, loss[loss=0.2946, ctc_loss=0.2265, cr_loss=0.4166, attn_decoder_loss=0.2929, over 29486.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2423, cr_loss=0.4412, attn_decoder_loss=0.3045, over 5818577.06 frames. ], batch size: 78, lr: 2.59e-02, grad_scale: 4.0 2024-09-16 19:19:46,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=62140.0, ans=0.125 2024-09-16 19:19:57,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=62140.0, ans=0.125 2024-09-16 19:20:15,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=62220.0, ans=0.125 2024-09-16 19:20:22,231 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.204e+02 1.396e+02 1.540e+02 6.321e+02, threshold=2.792e+02, percent-clipped=2.0 2024-09-16 19:20:32,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=62260.0, ans=0.025 2024-09-16 19:20:42,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=62260.0, ans=0.0 2024-09-16 19:20:46,383 INFO [train.py:1198] (0/2) Epoch 4, batch 2000, loss[loss=0.2779, ctc_loss=0.2141, cr_loss=0.412, attn_decoder_loss=0.2758, over 29354.00 frames. ], tot_loss[loss=0.3081, ctc_loss=0.2431, cr_loss=0.4422, attn_decoder_loss=0.3055, over 5796278.74 frames. ], batch size: 67, lr: 2.58e-02, grad_scale: 8.0 2024-09-16 19:21:07,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2024-09-16 19:22:02,156 INFO [train.py:1198] (0/2) Epoch 4, batch 2050, loss[loss=0.2888, ctc_loss=0.233, cr_loss=0.4361, attn_decoder_loss=0.2853, over 29446.00 frames. ], tot_loss[loss=0.3072, ctc_loss=0.243, cr_loss=0.4416, attn_decoder_loss=0.3045, over 5788948.16 frames. ], batch size: 70, lr: 2.58e-02, grad_scale: 4.0 2024-09-16 19:22:28,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=62540.0, ans=0.125 2024-09-16 19:22:45,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=62580.0, ans=0.0 2024-09-16 19:22:49,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=62620.0, ans=0.125 2024-09-16 19:22:54,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=62620.0, ans=0.05 2024-09-16 19:22:57,031 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.877e+01 1.306e+02 1.501e+02 1.885e+02 4.145e+02, threshold=3.002e+02, percent-clipped=3.0 2024-09-16 19:23:00,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=62620.0, ans=0.2 2024-09-16 19:23:06,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=62660.0, ans=0.125 2024-09-16 19:23:20,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=62700.0, ans=0.125 2024-09-16 19:23:21,653 INFO [train.py:1198] (0/2) Epoch 4, batch 2100, loss[loss=0.2946, ctc_loss=0.2202, cr_loss=0.4119, attn_decoder_loss=0.2937, over 29758.00 frames. ], tot_loss[loss=0.3061, ctc_loss=0.2416, cr_loss=0.4407, attn_decoder_loss=0.3034, over 5801376.85 frames. ], batch size: 81, lr: 2.58e-02, grad_scale: 8.0 2024-09-16 19:23:21,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=62700.0, ans=0.0 2024-09-16 19:23:26,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=62700.0, ans=0.0 2024-09-16 19:23:47,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=62740.0, ans=0.125 2024-09-16 19:24:03,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=62780.0, ans=0.0 2024-09-16 19:24:35,565 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:24:36,635 INFO [train.py:1198] (0/2) Epoch 4, batch 2150, loss[loss=0.3018, ctc_loss=0.2328, cr_loss=0.4189, attn_decoder_loss=0.3002, over 29436.00 frames. ], tot_loss[loss=0.3052, ctc_loss=0.2402, cr_loss=0.4392, attn_decoder_loss=0.3026, over 5816321.68 frames. ], batch size: 78, lr: 2.57e-02, grad_scale: 4.0 2024-09-16 19:24:49,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=62900.0, ans=0.0 2024-09-16 19:24:55,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=62940.0, ans=0.0 2024-09-16 19:25:21,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.38 vs. limit=12.0 2024-09-16 19:25:22,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=63020.0, ans=0.125 2024-09-16 19:25:31,032 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.004e+02 1.239e+02 1.413e+02 1.658e+02 2.671e+02, threshold=2.826e+02, percent-clipped=0.0 2024-09-16 19:25:41,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-09-16 19:25:41,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=63060.0, ans=0.0 2024-09-16 19:25:52,108 INFO [train.py:1198] (0/2) Epoch 4, batch 2200, loss[loss=0.305, ctc_loss=0.2352, cr_loss=0.408, attn_decoder_loss=0.3037, over 29632.00 frames. ], tot_loss[loss=0.3055, ctc_loss=0.2406, cr_loss=0.4404, attn_decoder_loss=0.3029, over 5812180.55 frames. ], batch size: 86, lr: 2.57e-02, grad_scale: 8.0 2024-09-16 19:26:01,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=63100.0, ans=0.0 2024-09-16 19:26:12,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.70 vs. limit=22.5 2024-09-16 19:26:18,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=63140.0, ans=0.0 2024-09-16 19:27:07,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=63260.0, ans=0.125 2024-09-16 19:27:09,948 INFO [train.py:1198] (0/2) Epoch 4, batch 2250, loss[loss=0.3105, ctc_loss=0.2427, cr_loss=0.4446, attn_decoder_loss=0.3082, over 29718.00 frames. ], tot_loss[loss=0.305, ctc_loss=0.2399, cr_loss=0.4401, attn_decoder_loss=0.3025, over 5810640.56 frames. ], batch size: 82, lr: 2.57e-02, grad_scale: 4.0 2024-09-16 19:27:26,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=63340.0, ans=0.09899494936611666 2024-09-16 19:27:35,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63340.0, ans=0.1 2024-09-16 19:27:42,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=63380.0, ans=0.0 2024-09-16 19:27:49,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=63380.0, ans=0.125 2024-09-16 19:27:55,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=63420.0, ans=0.0 2024-09-16 19:27:57,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63420.0, ans=0.1 2024-09-16 19:27:57,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=63420.0, ans=0.0 2024-09-16 19:28:07,579 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.265e+02 1.418e+02 1.691e+02 4.004e+02, threshold=2.836e+02, percent-clipped=3.0 2024-09-16 19:28:09,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=63420.0, ans=0.0 2024-09-16 19:28:22,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=63460.0, ans=0.025 2024-09-16 19:28:27,223 INFO [train.py:1198] (0/2) Epoch 4, batch 2300, loss[loss=0.2824, ctc_loss=0.2207, cr_loss=0.4317, attn_decoder_loss=0.2797, over 29298.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2401, cr_loss=0.4391, attn_decoder_loss=0.3018, over 5798704.26 frames. ], batch size: 71, lr: 2.56e-02, grad_scale: 8.0 2024-09-16 19:28:30,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=63500.0, ans=0.0 2024-09-16 19:28:45,339 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:28:57,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=63580.0, ans=0.1 2024-09-16 19:29:05,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=63580.0, ans=0.0 2024-09-16 19:29:29,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=63660.0, ans=0.5 2024-09-16 19:29:32,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=63660.0, ans=15.0 2024-09-16 19:29:42,589 INFO [train.py:1198] (0/2) Epoch 4, batch 2350, loss[loss=0.312, ctc_loss=0.2468, cr_loss=0.4707, attn_decoder_loss=0.3088, over 29695.00 frames. ], tot_loss[loss=0.3044, ctc_loss=0.2398, cr_loss=0.4397, attn_decoder_loss=0.3018, over 5803691.53 frames. ], batch size: 83, lr: 2.56e-02, grad_scale: 4.0 2024-09-16 19:30:02,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.08 vs. limit=15.0 2024-09-16 19:30:03,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=63740.0, ans=0.125 2024-09-16 19:30:18,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-09-16 19:30:41,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-09-16 19:30:41,598 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.383e+02 1.538e+02 1.780e+02 4.486e+02, threshold=3.076e+02, percent-clipped=4.0 2024-09-16 19:30:59,759 INFO [train.py:1198] (0/2) Epoch 4, batch 2400, loss[loss=0.3021, ctc_loss=0.24, cr_loss=0.4599, attn_decoder_loss=0.2988, over 29556.00 frames. ], tot_loss[loss=0.3058, ctc_loss=0.2414, cr_loss=0.4418, attn_decoder_loss=0.3032, over 5808114.00 frames. ], batch size: 76, lr: 2.56e-02, grad_scale: 8.0 2024-09-16 19:31:35,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63980.0, ans=0.1 2024-09-16 19:31:38,510 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-16000.pt 2024-09-16 19:31:53,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=64020.0, ans=0.2 2024-09-16 19:31:54,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=64020.0, ans=0.1 2024-09-16 19:32:11,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=64060.0, ans=0.125 2024-09-16 19:32:22,175 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:32:24,847 INFO [train.py:1198] (0/2) Epoch 4, batch 2450, loss[loss=0.307, ctc_loss=0.2469, cr_loss=0.4735, attn_decoder_loss=0.3032, over 29697.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2432, cr_loss=0.4434, attn_decoder_loss=0.3043, over 5784456.98 frames. ], batch size: 82, lr: 2.55e-02, grad_scale: 4.0 2024-09-16 19:32:28,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64100.0, ans=0.1 2024-09-16 19:32:31,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.81 vs. limit=15.0 2024-09-16 19:32:47,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=64140.0, ans=0.0 2024-09-16 19:32:52,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.22 vs. limit=15.0 2024-09-16 19:32:58,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=64180.0, ans=0.2 2024-09-16 19:33:03,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-09-16 19:33:16,181 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:33:20,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=64220.0, ans=0.0 2024-09-16 19:33:23,219 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.817e+01 1.239e+02 1.387e+02 1.580e+02 7.191e+02, threshold=2.774e+02, percent-clipped=3.0 2024-09-16 19:33:32,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=64260.0, ans=0.0 2024-09-16 19:33:39,962 INFO [train.py:1198] (0/2) Epoch 4, batch 2500, loss[loss=0.3196, ctc_loss=0.2548, cr_loss=0.4545, attn_decoder_loss=0.3167, over 29640.00 frames. ], tot_loss[loss=0.3068, ctc_loss=0.2428, cr_loss=0.4426, attn_decoder_loss=0.3041, over 5795630.81 frames. ], batch size: 86, lr: 2.55e-02, grad_scale: 8.0 2024-09-16 19:34:21,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=64380.0, ans=0.0 2024-09-16 19:34:41,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=64460.0, ans=0.5 2024-09-16 19:34:59,443 INFO [train.py:1198] (0/2) Epoch 4, batch 2550, loss[loss=0.2785, ctc_loss=0.2146, cr_loss=0.4309, attn_decoder_loss=0.2761, over 29351.00 frames. ], tot_loss[loss=0.3065, ctc_loss=0.2419, cr_loss=0.4419, attn_decoder_loss=0.3038, over 5799193.85 frames. ], batch size: 67, lr: 2.55e-02, grad_scale: 4.0 2024-09-16 19:35:20,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=64540.0, ans=0.125 2024-09-16 19:35:39,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.39 vs. limit=15.0 2024-09-16 19:35:49,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=64620.0, ans=0.125 2024-09-16 19:35:51,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=64620.0, ans=15.0 2024-09-16 19:35:52,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=64620.0, ans=0.025 2024-09-16 19:36:00,136 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.618e+01 1.258e+02 1.410e+02 1.550e+02 4.677e+02, threshold=2.819e+02, percent-clipped=4.0 2024-09-16 19:36:04,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2024-09-16 19:36:11,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=64660.0, ans=0.0 2024-09-16 19:36:15,328 INFO [train.py:1198] (0/2) Epoch 4, batch 2600, loss[loss=0.2949, ctc_loss=0.2242, cr_loss=0.4356, attn_decoder_loss=0.2931, over 29462.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2424, cr_loss=0.442, attn_decoder_loss=0.3044, over 5795731.83 frames. ], batch size: 78, lr: 2.54e-02, grad_scale: 8.0 2024-09-16 19:36:17,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=64700.0, ans=0.125 2024-09-16 19:36:24,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=64700.0, ans=0.125 2024-09-16 19:36:36,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=64740.0, ans=0.125 2024-09-16 19:37:01,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=64820.0, ans=0.2 2024-09-16 19:37:07,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.29 vs. limit=22.5 2024-09-16 19:37:09,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=64820.0, ans=0.035 2024-09-16 19:37:21,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=64860.0, ans=0.04949747468305833 2024-09-16 19:37:23,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=64860.0, ans=0.125 2024-09-16 19:37:29,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=64900.0, ans=0.125 2024-09-16 19:37:30,536 INFO [train.py:1198] (0/2) Epoch 4, batch 2650, loss[loss=0.3277, ctc_loss=0.2652, cr_loss=0.4558, attn_decoder_loss=0.3245, over 29264.00 frames. ], tot_loss[loss=0.3074, ctc_loss=0.2428, cr_loss=0.4426, attn_decoder_loss=0.3047, over 5801405.70 frames. ], batch size: 100, lr: 2.54e-02, grad_scale: 4.0 2024-09-16 19:37:48,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.72 vs. limit=15.0 2024-09-16 19:37:48,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2024-09-16 19:37:51,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=64940.0, ans=0.035 2024-09-16 19:38:23,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=65020.0, ans=0.0 2024-09-16 19:38:28,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=65020.0, ans=0.125 2024-09-16 19:38:34,070 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.250e+02 1.369e+02 1.564e+02 3.210e+02, threshold=2.738e+02, percent-clipped=1.0 2024-09-16 19:38:49,706 INFO [train.py:1198] (0/2) Epoch 4, batch 2700, loss[loss=0.3223, ctc_loss=0.2522, cr_loss=0.4699, attn_decoder_loss=0.3196, over 29501.00 frames. ], tot_loss[loss=0.3069, ctc_loss=0.242, cr_loss=0.442, attn_decoder_loss=0.3042, over 5795658.74 frames. ], batch size: 87, lr: 2.54e-02, grad_scale: 8.0 2024-09-16 19:39:17,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=65140.0, ans=0.125 2024-09-16 19:39:40,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.32 vs. limit=15.0 2024-09-16 19:39:45,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=65220.0, ans=0.07 2024-09-16 19:39:49,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=65260.0, ans=0.2 2024-09-16 19:40:05,380 INFO [train.py:1198] (0/2) Epoch 4, batch 2750, loss[loss=0.2874, ctc_loss=0.2274, cr_loss=0.4296, attn_decoder_loss=0.2845, over 29508.00 frames. ], tot_loss[loss=0.3051, ctc_loss=0.2404, cr_loss=0.4401, attn_decoder_loss=0.3025, over 5795231.62 frames. ], batch size: 75, lr: 2.53e-02, grad_scale: 4.0 2024-09-16 19:40:16,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=65300.0, ans=0.0 2024-09-16 19:40:29,579 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:40:47,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=65380.0, ans=0.125 2024-09-16 19:40:54,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=65420.0, ans=0.125 2024-09-16 19:41:08,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.842e+01 1.245e+02 1.440e+02 1.752e+02 4.612e+02, threshold=2.880e+02, percent-clipped=7.0 2024-09-16 19:41:18,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.59 vs. limit=15.0 2024-09-16 19:41:20,408 INFO [train.py:1198] (0/2) Epoch 4, batch 2800, loss[loss=0.3609, ctc_loss=0.3362, cr_loss=0.5159, attn_decoder_loss=0.3522, over 19835.00 frames. ], tot_loss[loss=0.3054, ctc_loss=0.2408, cr_loss=0.4408, attn_decoder_loss=0.3028, over 5777039.89 frames. ], batch size: 210, lr: 2.53e-02, grad_scale: 8.0 2024-09-16 19:41:28,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=65500.0, ans=0.2 2024-09-16 19:41:31,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=65500.0, ans=0.025 2024-09-16 19:41:41,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-09-16 19:42:08,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=65620.0, ans=0.125 2024-09-16 19:42:16,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=65620.0, ans=0.2 2024-09-16 19:42:24,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.25 vs. limit=15.0 2024-09-16 19:42:40,271 INFO [train.py:1198] (0/2) Epoch 4, batch 2850, loss[loss=0.3007, ctc_loss=0.2321, cr_loss=0.412, attn_decoder_loss=0.2992, over 29499.00 frames. ], tot_loss[loss=0.3062, ctc_loss=0.2416, cr_loss=0.4408, attn_decoder_loss=0.3036, over 5762769.83 frames. ], batch size: 77, lr: 2.53e-02, grad_scale: 4.0 2024-09-16 19:42:45,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.37 vs. limit=15.0 2024-09-16 19:42:51,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=65700.0, ans=0.2 2024-09-16 19:42:57,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=65740.0, ans=0.025 2024-09-16 19:43:00,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=65740.0, ans=0.125 2024-09-16 19:43:09,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65780.0, ans=0.1 2024-09-16 19:43:32,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.24 vs. limit=15.0 2024-09-16 19:43:41,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=65860.0, ans=0.125 2024-09-16 19:43:44,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=65860.0, ans=0.2 2024-09-16 19:43:45,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.371e+02 1.544e+02 1.863e+02 5.214e+02, threshold=3.089e+02, percent-clipped=4.0 2024-09-16 19:43:55,810 INFO [train.py:1198] (0/2) Epoch 4, batch 2900, loss[loss=0.2962, ctc_loss=0.2267, cr_loss=0.4387, attn_decoder_loss=0.2942, over 29419.00 frames. ], tot_loss[loss=0.3067, ctc_loss=0.2412, cr_loss=0.4415, attn_decoder_loss=0.3042, over 5787646.73 frames. ], batch size: 79, lr: 2.52e-02, grad_scale: 8.0 2024-09-16 19:43:59,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.56 vs. limit=10.0 2024-09-16 19:44:00,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=65900.0, ans=0.125 2024-09-16 19:44:28,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=65980.0, ans=0.1 2024-09-16 19:44:35,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-09-16 19:44:46,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=66020.0, ans=15.0 2024-09-16 19:44:58,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.19 vs. limit=15.0 2024-09-16 19:45:00,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=66060.0, ans=0.025 2024-09-16 19:45:10,942 INFO [train.py:1198] (0/2) Epoch 4, batch 2950, loss[loss=0.292, ctc_loss=0.2388, cr_loss=0.4408, attn_decoder_loss=0.2882, over 29511.00 frames. ], tot_loss[loss=0.3052, ctc_loss=0.24, cr_loss=0.4407, attn_decoder_loss=0.3027, over 5782396.69 frames. ], batch size: 75, lr: 2.52e-02, grad_scale: 4.0 2024-09-16 19:45:31,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=66140.0, ans=0.125 2024-09-16 19:46:05,050 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:46:11,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=66220.0, ans=0.125 2024-09-16 19:46:12,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=66260.0, ans=0.125 2024-09-16 19:46:18,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=66260.0, ans=0.09899494936611666 2024-09-16 19:46:19,748 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.755e+01 1.228e+02 1.356e+02 1.566e+02 3.773e+02, threshold=2.713e+02, percent-clipped=2.0 2024-09-16 19:46:30,794 INFO [train.py:1198] (0/2) Epoch 4, batch 3000, loss[loss=0.3149, ctc_loss=0.2539, cr_loss=0.4443, attn_decoder_loss=0.3117, over 29761.00 frames. ], tot_loss[loss=0.305, ctc_loss=0.2397, cr_loss=0.4403, attn_decoder_loss=0.3025, over 5782947.82 frames. ], batch size: 81, lr: 2.52e-02, grad_scale: 8.0 2024-09-16 19:46:30,795 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 19:46:49,052 INFO [train.py:1230] (0/2) Epoch 4, validation: loss=0.2264, ctc_loss=0.07857, cr_loss=4.376e-15, attn_decoder_loss=0.2428, over 944034.00 frames. 2024-09-16 19:46:49,053 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 19:46:51,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2024-09-16 19:46:59,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.55 vs. limit=15.0 2024-09-16 19:47:00,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=66300.0, ans=0.025 2024-09-16 19:47:00,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.39 vs. limit=6.0 2024-09-16 19:47:27,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=66380.0, ans=0.125 2024-09-16 19:47:29,501 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:47:31,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-09-16 19:47:32,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66380.0, ans=0.1 2024-09-16 19:47:52,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=66460.0, ans=0.1 2024-09-16 19:47:58,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=66460.0, ans=0.125 2024-09-16 19:48:05,606 INFO [train.py:1198] (0/2) Epoch 4, batch 3050, loss[loss=0.2892, ctc_loss=0.2233, cr_loss=0.432, attn_decoder_loss=0.287, over 29552.00 frames. ], tot_loss[loss=0.3056, ctc_loss=0.24, cr_loss=0.4405, attn_decoder_loss=0.3031, over 5776345.78 frames. ], batch size: 76, lr: 2.51e-02, grad_scale: 4.0 2024-09-16 19:48:12,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=66500.0, ans=0.125 2024-09-16 19:48:21,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=66540.0, ans=0.125 2024-09-16 19:48:48,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=66580.0, ans=0.125 2024-09-16 19:49:06,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=66660.0, ans=0.125 2024-09-16 19:49:13,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.917e+01 1.239e+02 1.360e+02 1.654e+02 2.744e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-16 19:49:15,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=66660.0, ans=0.0 2024-09-16 19:49:20,805 INFO [train.py:1198] (0/2) Epoch 4, batch 3100, loss[loss=0.3314, ctc_loss=0.266, cr_loss=0.4964, attn_decoder_loss=0.3276, over 29312.00 frames. ], tot_loss[loss=0.3053, ctc_loss=0.2397, cr_loss=0.4412, attn_decoder_loss=0.3027, over 5776757.75 frames. ], batch size: 100, lr: 2.51e-02, grad_scale: 8.0 2024-09-16 19:49:44,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66740.0, ans=0.1 2024-09-16 19:49:48,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=66740.0, ans=0.0 2024-09-16 19:49:53,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=12.0 2024-09-16 19:50:02,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.35 vs. limit=10.0 2024-09-16 19:50:19,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=66820.0, ans=15.0 2024-09-16 19:50:32,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=66860.0, ans=0.025 2024-09-16 19:50:37,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=66860.0, ans=0.125 2024-09-16 19:50:40,199 INFO [train.py:1198] (0/2) Epoch 4, batch 3150, loss[loss=0.3301, ctc_loss=0.2623, cr_loss=0.4783, attn_decoder_loss=0.3271, over 28786.00 frames. ], tot_loss[loss=0.3052, ctc_loss=0.2395, cr_loss=0.4411, attn_decoder_loss=0.3027, over 5784244.59 frames. ], batch size: 104, lr: 2.51e-02, grad_scale: 4.0 2024-09-16 19:50:43,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=66900.0, ans=0.1 2024-09-16 19:50:57,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=66940.0, ans=0.0 2024-09-16 19:51:01,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=66940.0, ans=0.025 2024-09-16 19:51:01,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=66940.0, ans=0.125 2024-09-16 19:51:13,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=66980.0, ans=0.0 2024-09-16 19:51:25,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=67020.0, ans=0.07 2024-09-16 19:51:49,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.339e+01 1.205e+02 1.438e+02 1.646e+02 4.024e+02, threshold=2.876e+02, percent-clipped=3.0 2024-09-16 19:51:50,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.57 vs. limit=15.0 2024-09-16 19:51:55,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-09-16 19:51:55,485 INFO [train.py:1198] (0/2) Epoch 4, batch 3200, loss[loss=0.3059, ctc_loss=0.2364, cr_loss=0.4557, attn_decoder_loss=0.3035, over 29411.00 frames. ], tot_loss[loss=0.3043, ctc_loss=0.2385, cr_loss=0.4406, attn_decoder_loss=0.3018, over 5793365.39 frames. ], batch size: 79, lr: 2.51e-02, grad_scale: 8.0 2024-09-16 19:52:01,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=67100.0, ans=0.125 2024-09-16 19:52:01,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=67100.0, ans=0.05 2024-09-16 19:52:23,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=67140.0, ans=0.04949747468305833 2024-09-16 19:52:24,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=67180.0, ans=0.125 2024-09-16 19:52:49,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=67220.0, ans=0.5 2024-09-16 19:53:11,544 INFO [train.py:1198] (0/2) Epoch 4, batch 3250, loss[loss=0.3028, ctc_loss=0.2379, cr_loss=0.4428, attn_decoder_loss=0.3002, over 29699.00 frames. ], tot_loss[loss=0.3042, ctc_loss=0.2382, cr_loss=0.44, attn_decoder_loss=0.3018, over 5800455.14 frames. ], batch size: 84, lr: 2.50e-02, grad_scale: 4.0 2024-09-16 19:53:14,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=67300.0, ans=0.0 2024-09-16 19:53:36,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=67340.0, ans=0.125 2024-09-16 19:54:15,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=67460.0, ans=0.0 2024-09-16 19:54:22,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.16 vs. limit=15.0 2024-09-16 19:54:26,052 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.284e+02 1.425e+02 1.663e+02 2.668e+02, threshold=2.850e+02, percent-clipped=0.0 2024-09-16 19:54:29,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=67500.0, ans=0.125 2024-09-16 19:54:30,779 INFO [train.py:1198] (0/2) Epoch 4, batch 3300, loss[loss=0.3368, ctc_loss=0.2708, cr_loss=0.4808, attn_decoder_loss=0.3334, over 28153.00 frames. ], tot_loss[loss=0.3028, ctc_loss=0.237, cr_loss=0.4383, attn_decoder_loss=0.3004, over 5796478.89 frames. ], batch size: 111, lr: 2.50e-02, grad_scale: 8.0 2024-09-16 19:54:40,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=67500.0, ans=10.0 2024-09-16 19:55:01,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=67580.0, ans=0.1 2024-09-16 19:55:11,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-16 19:55:27,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.29 vs. limit=15.0 2024-09-16 19:55:46,073 INFO [train.py:1198] (0/2) Epoch 4, batch 3350, loss[loss=0.3149, ctc_loss=0.2442, cr_loss=0.421, attn_decoder_loss=0.3134, over 28928.00 frames. ], tot_loss[loss=0.3038, ctc_loss=0.238, cr_loss=0.4389, attn_decoder_loss=0.3013, over 5772632.21 frames. ], batch size: 104, lr: 2.50e-02, grad_scale: 4.0 2024-09-16 19:55:49,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=67700.0, ans=0.2 2024-09-16 19:55:59,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-09-16 19:56:10,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67740.0, ans=0.1 2024-09-16 19:56:10,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=67740.0, ans=0.0 2024-09-16 19:56:13,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=67740.0, ans=0.125 2024-09-16 19:56:16,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=67780.0, ans=0.0 2024-09-16 19:56:17,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=67780.0, ans=10.0 2024-09-16 19:56:22,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=67780.0, ans=0.0 2024-09-16 19:56:42,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=67820.0, ans=0.0 2024-09-16 19:56:45,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=67860.0, ans=0.125 2024-09-16 19:56:54,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=67860.0, ans=0.125 2024-09-16 19:56:55,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-09-16 19:56:58,641 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.186e+02 1.341e+02 1.622e+02 4.699e+02, threshold=2.682e+02, percent-clipped=3.0 2024-09-16 19:57:01,673 INFO [train.py:1198] (0/2) Epoch 4, batch 3400, loss[loss=0.2758, ctc_loss=0.2176, cr_loss=0.3942, attn_decoder_loss=0.2735, over 29325.00 frames. ], tot_loss[loss=0.3037, ctc_loss=0.2383, cr_loss=0.4391, attn_decoder_loss=0.3012, over 5765698.17 frames. ], batch size: 67, lr: 2.49e-02, grad_scale: 8.0 2024-09-16 19:57:20,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=12.0 2024-09-16 19:58:08,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=68060.0, ans=0.0 2024-09-16 19:58:21,365 INFO [train.py:1198] (0/2) Epoch 4, batch 3450, loss[loss=0.3142, ctc_loss=0.2439, cr_loss=0.4488, attn_decoder_loss=0.312, over 28285.00 frames. ], tot_loss[loss=0.3036, ctc_loss=0.2378, cr_loss=0.4396, attn_decoder_loss=0.3011, over 5774405.61 frames. ], batch size: 111, lr: 2.49e-02, grad_scale: 4.0 2024-09-16 19:58:32,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=68100.0, ans=0.07 2024-09-16 19:58:35,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.64 vs. limit=22.5 2024-09-16 19:58:45,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=68140.0, ans=0.05 2024-09-16 19:59:32,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=68260.0, ans=0.125 2024-09-16 19:59:34,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.730e+01 1.161e+02 1.261e+02 1.469e+02 4.535e+02, threshold=2.521e+02, percent-clipped=3.0 2024-09-16 19:59:36,488 INFO [train.py:1198] (0/2) Epoch 4, batch 3500, loss[loss=0.264, ctc_loss=0.1955, cr_loss=0.3531, attn_decoder_loss=0.2638, over 29730.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2366, cr_loss=0.4379, attn_decoder_loss=0.2999, over 5777282.41 frames. ], batch size: 72, lr: 2.49e-02, grad_scale: 8.0 2024-09-16 20:00:19,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=68420.0, ans=0.025 2024-09-16 20:00:24,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=68420.0, ans=0.0 2024-09-16 20:00:33,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2024-09-16 20:00:44,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=68460.0, ans=12.0 2024-09-16 20:00:50,568 INFO [train.py:1198] (0/2) Epoch 4, batch 3550, loss[loss=0.3144, ctc_loss=0.2413, cr_loss=0.4716, attn_decoder_loss=0.3121, over 29711.00 frames. ], tot_loss[loss=0.3023, ctc_loss=0.2364, cr_loss=0.4378, attn_decoder_loss=0.2999, over 5782695.47 frames. ], batch size: 89, lr: 2.48e-02, grad_scale: 4.0 2024-09-16 20:01:05,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=68540.0, ans=0.1 2024-09-16 20:01:06,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-16 20:01:13,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-09-16 20:01:34,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2024-09-16 20:01:34,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2024-09-16 20:01:42,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=68620.0, ans=0.125 2024-09-16 20:01:51,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=68660.0, ans=0.2 2024-09-16 20:01:52,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=68660.0, ans=0.0 2024-09-16 20:02:00,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68660.0, ans=0.1 2024-09-16 20:02:01,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=68660.0, ans=0.07 2024-09-16 20:02:01,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=68660.0, ans=0.2 2024-09-16 20:02:03,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=68700.0, ans=0.02 2024-09-16 20:02:04,418 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.275e+02 1.372e+02 1.558e+02 6.376e+02, threshold=2.743e+02, percent-clipped=5.0 2024-09-16 20:02:04,445 INFO [train.py:1198] (0/2) Epoch 4, batch 3600, loss[loss=0.2921, ctc_loss=0.2195, cr_loss=0.4281, attn_decoder_loss=0.2906, over 29511.00 frames. ], tot_loss[loss=0.3026, ctc_loss=0.2364, cr_loss=0.4389, attn_decoder_loss=0.3002, over 5792285.99 frames. ], batch size: 77, lr: 2.48e-02, grad_scale: 8.0 2024-09-16 20:02:12,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-16 20:02:43,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=68780.0, ans=0.125 2024-09-16 20:02:56,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=68820.0, ans=0.125 2024-09-16 20:02:56,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=68820.0, ans=0.025 2024-09-16 20:02:59,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2024-09-16 20:03:04,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68820.0, ans=0.125 2024-09-16 20:03:17,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=68860.0, ans=0.125 2024-09-16 20:03:22,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.43 vs. limit=15.0 2024-09-16 20:03:23,378 INFO [train.py:1198] (0/2) Epoch 4, batch 3650, loss[loss=0.3255, ctc_loss=0.2536, cr_loss=0.4594, attn_decoder_loss=0.3233, over 29512.00 frames. ], tot_loss[loss=0.3019, ctc_loss=0.2355, cr_loss=0.438, attn_decoder_loss=0.2996, over 5794387.69 frames. ], batch size: 90, lr: 2.48e-02, grad_scale: 4.0 2024-09-16 20:03:26,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=68900.0, ans=0.2 2024-09-16 20:04:08,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=69020.0, ans=0.125 2024-09-16 20:04:14,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.00 vs. limit=15.0 2024-09-16 20:04:24,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=69060.0, ans=0.2 2024-09-16 20:04:25,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69060.0, ans=0.125 2024-09-16 20:04:36,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=69100.0, ans=0.2 2024-09-16 20:04:37,636 INFO [train.py:1198] (0/2) Epoch 4, batch 3700, loss[loss=0.3252, ctc_loss=0.2521, cr_loss=0.4647, attn_decoder_loss=0.3229, over 29701.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2361, cr_loss=0.4387, attn_decoder_loss=0.2998, over 5804578.68 frames. ], batch size: 84, lr: 2.47e-02, grad_scale: 8.0 2024-09-16 20:04:37,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=69100.0, ans=0.125 2024-09-16 20:04:39,094 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.266e+02 1.378e+02 1.578e+02 2.388e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-16 20:04:51,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=69140.0, ans=0.125 2024-09-16 20:05:05,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=69180.0, ans=0.0 2024-09-16 20:05:28,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=69220.0, ans=0.125 2024-09-16 20:05:35,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=12.0 2024-09-16 20:05:42,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=69260.0, ans=0.0 2024-09-16 20:05:44,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=69260.0, ans=0.125 2024-09-16 20:05:47,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69260.0, ans=0.1 2024-09-16 20:05:48,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69260.0, ans=0.1 2024-09-16 20:05:50,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=69300.0, ans=0.07 2024-09-16 20:05:51,403 INFO [train.py:1198] (0/2) Epoch 4, batch 3750, loss[loss=0.2733, ctc_loss=0.2143, cr_loss=0.4082, attn_decoder_loss=0.2708, over 29351.00 frames. ], tot_loss[loss=0.3019, ctc_loss=0.2356, cr_loss=0.4382, attn_decoder_loss=0.2995, over 5808907.57 frames. ], batch size: 67, lr: 2.47e-02, grad_scale: 4.0 2024-09-16 20:05:59,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2024-09-16 20:06:02,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=69300.0, ans=0.95 2024-09-16 20:06:15,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=69340.0, ans=0.125 2024-09-16 20:06:16,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=69340.0, ans=0.125 2024-09-16 20:06:20,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=22.5 2024-09-16 20:06:30,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=69380.0, ans=0.2 2024-09-16 20:06:36,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=69420.0, ans=0.0 2024-09-16 20:06:49,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=69460.0, ans=0.0 2024-09-16 20:06:54,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=69460.0, ans=0.125 2024-09-16 20:06:55,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=69460.0, ans=0.125 2024-09-16 20:07:04,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=69500.0, ans=0.0 2024-09-16 20:07:05,694 INFO [train.py:1198] (0/2) Epoch 4, batch 3800, loss[loss=0.3235, ctc_loss=0.2534, cr_loss=0.4673, attn_decoder_loss=0.3209, over 29631.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2357, cr_loss=0.4378, attn_decoder_loss=0.2993, over 5799624.18 frames. ], batch size: 86, lr: 2.47e-02, grad_scale: 8.0 2024-09-16 20:07:08,684 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.301e+02 1.423e+02 1.744e+02 6.965e+02, threshold=2.846e+02, percent-clipped=5.0 2024-09-16 20:07:10,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=69500.0, ans=0.025 2024-09-16 20:07:14,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.37 vs. limit=12.0 2024-09-16 20:07:22,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=69540.0, ans=0.0 2024-09-16 20:07:23,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2024-09-16 20:08:14,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=69660.0, ans=0.2 2024-09-16 20:08:20,414 INFO [train.py:1198] (0/2) Epoch 4, batch 3850, loss[loss=0.31, ctc_loss=0.2409, cr_loss=0.4424, attn_decoder_loss=0.3078, over 29249.00 frames. ], tot_loss[loss=0.3015, ctc_loss=0.2352, cr_loss=0.4377, attn_decoder_loss=0.2991, over 5813150.67 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 4.0 2024-09-16 20:08:41,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69740.0, ans=0.1 2024-09-16 20:08:50,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=69740.0, ans=0.125 2024-09-16 20:09:13,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-09-16 20:09:27,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=69860.0, ans=0.125 2024-09-16 20:09:32,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.92 vs. limit=15.0 2024-09-16 20:09:37,512 INFO [train.py:1198] (0/2) Epoch 4, batch 3900, loss[loss=0.3073, ctc_loss=0.2323, cr_loss=0.4315, attn_decoder_loss=0.3061, over 29626.00 frames. ], tot_loss[loss=0.3019, ctc_loss=0.2352, cr_loss=0.439, attn_decoder_loss=0.2995, over 5817436.61 frames. ], batch size: 86, lr: 2.46e-02, grad_scale: 8.0 2024-09-16 20:09:41,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.221e+02 1.343e+02 1.520e+02 2.719e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-16 20:09:52,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=69940.0, ans=0.1 2024-09-16 20:09:55,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=69940.0, ans=0.125 2024-09-16 20:10:06,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.86 vs. limit=22.5 2024-09-16 20:10:07,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=69980.0, ans=0.125 2024-09-16 20:10:13,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69980.0, ans=0.125 2024-09-16 20:10:18,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=69980.0, ans=0.125 2024-09-16 20:10:23,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70020.0, ans=0.1 2024-09-16 20:10:51,473 INFO [train.py:1198] (0/2) Epoch 4, batch 3950, loss[loss=0.3095, ctc_loss=0.2454, cr_loss=0.4451, attn_decoder_loss=0.3068, over 29495.00 frames. ], tot_loss[loss=0.3021, ctc_loss=0.2349, cr_loss=0.439, attn_decoder_loss=0.2998, over 5836725.79 frames. ], batch size: 97, lr: 2.46e-02, grad_scale: 4.0 2024-09-16 20:11:15,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=70140.0, ans=0.0 2024-09-16 20:12:05,013 INFO [train.py:1198] (0/2) Epoch 4, batch 4000, loss[loss=0.2775, ctc_loss=0.2083, cr_loss=0.391, attn_decoder_loss=0.2765, over 29519.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2356, cr_loss=0.4396, attn_decoder_loss=0.3, over 5813578.41 frames. ], batch size: 74, lr: 2.46e-02, grad_scale: 8.0 2024-09-16 20:12:12,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.309e+02 1.435e+02 1.653e+02 3.484e+02, threshold=2.870e+02, percent-clipped=1.0 2024-09-16 20:12:24,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=70340.0, ans=0.125 2024-09-16 20:12:38,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=22.5 2024-09-16 20:12:39,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=70380.0, ans=0.2 2024-09-16 20:12:51,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=70420.0, ans=0.125 2024-09-16 20:13:20,981 INFO [train.py:1198] (0/2) Epoch 4, batch 4050, loss[loss=0.3387, ctc_loss=0.2978, cr_loss=0.4322, attn_decoder_loss=0.3336, over 20236.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2358, cr_loss=0.4395, attn_decoder_loss=0.3, over 5797307.39 frames. ], batch size: 210, lr: 2.45e-02, grad_scale: 4.0 2024-09-16 20:13:22,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=70500.0, ans=0.025 2024-09-16 20:13:35,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=70540.0, ans=0.125 2024-09-16 20:13:47,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-09-16 20:13:57,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2024-09-16 20:13:59,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=70580.0, ans=0.125 2024-09-16 20:14:02,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=70580.0, ans=0.125 2024-09-16 20:14:16,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=70620.0, ans=0.125 2024-09-16 20:14:35,708 INFO [train.py:1198] (0/2) Epoch 4, batch 4100, loss[loss=0.3259, ctc_loss=0.2577, cr_loss=0.4745, attn_decoder_loss=0.3229, over 29523.00 frames. ], tot_loss[loss=0.3021, ctc_loss=0.2355, cr_loss=0.4387, attn_decoder_loss=0.2997, over 5793031.28 frames. ], batch size: 90, lr: 2.45e-02, grad_scale: 8.0 2024-09-16 20:14:40,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=70700.0, ans=0.125 2024-09-16 20:14:42,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.753e+01 1.273e+02 1.617e+02 1.999e+02 3.514e+02, threshold=3.235e+02, percent-clipped=2.0 2024-09-16 20:14:45,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=70700.0, ans=0.125 2024-09-16 20:14:53,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=70740.0, ans=0.125 2024-09-16 20:14:56,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70740.0, ans=0.1 2024-09-16 20:14:56,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=70740.0, ans=0.0 2024-09-16 20:14:59,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=70740.0, ans=0.2 2024-09-16 20:14:59,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=70740.0, ans=0.125 2024-09-16 20:15:18,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-09-16 20:15:24,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=70820.0, ans=0.2 2024-09-16 20:15:32,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=70860.0, ans=0.125 2024-09-16 20:15:46,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2024-09-16 20:15:47,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=70900.0, ans=0.0 2024-09-16 20:15:48,991 INFO [train.py:1198] (0/2) Epoch 4, batch 4150, loss[loss=0.2917, ctc_loss=0.2223, cr_loss=0.4434, attn_decoder_loss=0.2896, over 29498.00 frames. ], tot_loss[loss=0.3015, ctc_loss=0.2349, cr_loss=0.4387, attn_decoder_loss=0.2991, over 5798794.48 frames. ], batch size: 77, lr: 2.45e-02, grad_scale: 4.0 2024-09-16 20:16:17,023 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:16:22,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=70980.0, ans=0.0 2024-09-16 20:16:23,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=15.0 2024-09-16 20:17:02,282 INFO [train.py:1198] (0/2) Epoch 4, batch 4200, loss[loss=0.3331, ctc_loss=0.2659, cr_loss=0.5061, attn_decoder_loss=0.3293, over 29475.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2356, cr_loss=0.4397, attn_decoder_loss=0.2999, over 5800202.84 frames. ], batch size: 90, lr: 2.44e-02, grad_scale: 8.0 2024-09-16 20:17:12,670 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.228e+02 1.369e+02 1.579e+02 3.524e+02, threshold=2.737e+02, percent-clipped=1.0 2024-09-16 20:17:14,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=71100.0, ans=0.125 2024-09-16 20:17:37,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=71180.0, ans=0.125 2024-09-16 20:17:37,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=71180.0, ans=0.025 2024-09-16 20:17:44,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71180.0, ans=0.1 2024-09-16 20:17:52,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.47 vs. limit=10.0 2024-09-16 20:18:05,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=71260.0, ans=0.125 2024-09-16 20:18:14,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71260.0, ans=0.1 2024-09-16 20:18:17,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=71300.0, ans=0.125 2024-09-16 20:18:18,324 INFO [train.py:1198] (0/2) Epoch 4, batch 4250, loss[loss=0.2736, ctc_loss=0.2027, cr_loss=0.4045, attn_decoder_loss=0.2724, over 29476.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2355, cr_loss=0.4403, attn_decoder_loss=0.3001, over 5806161.72 frames. ], batch size: 74, lr: 2.44e-02, grad_scale: 4.0 2024-09-16 20:18:34,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=71340.0, ans=0.125 2024-09-16 20:18:39,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=12.0 2024-09-16 20:18:40,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=71340.0, ans=0.125 2024-09-16 20:18:44,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=71340.0, ans=0.0 2024-09-16 20:18:52,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=71380.0, ans=0.125 2024-09-16 20:19:01,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=71420.0, ans=0.0 2024-09-16 20:19:06,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-16 20:19:15,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=71460.0, ans=0.125 2024-09-16 20:19:17,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71460.0, ans=0.125 2024-09-16 20:19:31,897 INFO [train.py:1198] (0/2) Epoch 4, batch 4300, loss[loss=0.3072, ctc_loss=0.2285, cr_loss=0.4363, attn_decoder_loss=0.3062, over 29522.00 frames. ], tot_loss[loss=0.3025, ctc_loss=0.2353, cr_loss=0.4402, attn_decoder_loss=0.3002, over 5795291.55 frames. ], batch size: 87, lr: 2.44e-02, grad_scale: 8.0 2024-09-16 20:19:43,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.834e+01 1.271e+02 1.418e+02 1.620e+02 3.004e+02, threshold=2.836e+02, percent-clipped=2.0 2024-09-16 20:20:10,768 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:20:15,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=71620.0, ans=0.04949747468305833 2024-09-16 20:20:47,050 INFO [train.py:1198] (0/2) Epoch 4, batch 4350, loss[loss=0.319, ctc_loss=0.2552, cr_loss=0.4449, attn_decoder_loss=0.3162, over 29504.00 frames. ], tot_loss[loss=0.3061, ctc_loss=0.2386, cr_loss=0.4449, attn_decoder_loss=0.3037, over 5796999.32 frames. ], batch size: 97, lr: 2.44e-02, grad_scale: 4.0 2024-09-16 20:21:02,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=71740.0, ans=0.125 2024-09-16 20:21:10,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=71740.0, ans=0.125 2024-09-16 20:21:18,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=71780.0, ans=0.1 2024-09-16 20:21:43,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71820.0, ans=0.1 2024-09-16 20:21:49,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=71860.0, ans=0.07 2024-09-16 20:22:00,933 INFO [train.py:1198] (0/2) Epoch 4, batch 4400, loss[loss=0.3134, ctc_loss=0.2468, cr_loss=0.4459, attn_decoder_loss=0.3108, over 27052.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.2423, cr_loss=0.4487, attn_decoder_loss=0.3069, over 5768442.04 frames. ], batch size: 124, lr: 2.43e-02, grad_scale: 8.0 2024-09-16 20:22:04,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=71900.0, ans=0.0 2024-09-16 20:22:14,013 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.841e+01 1.227e+02 1.349e+02 1.608e+02 3.095e+02, threshold=2.698e+02, percent-clipped=2.0 2024-09-16 20:22:24,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=71940.0, ans=0.125 2024-09-16 20:22:24,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=71940.0, ans=0.125 2024-09-16 20:22:32,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=18.21 vs. limit=15.0 2024-09-16 20:22:51,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=72020.0, ans=0.2 2024-09-16 20:22:53,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=72020.0, ans=0.0 2024-09-16 20:22:59,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=72060.0, ans=0.125 2024-09-16 20:23:03,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=72060.0, ans=0.5 2024-09-16 20:23:16,018 INFO [train.py:1198] (0/2) Epoch 4, batch 4450, loss[loss=0.3401, ctc_loss=0.2984, cr_loss=0.4585, attn_decoder_loss=0.3346, over 20209.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.2495, cr_loss=0.4516, attn_decoder_loss=0.3108, over 5582465.30 frames. ], batch size: 211, lr: 2.43e-02, grad_scale: 4.0 2024-09-16 20:23:21,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2024-09-16 20:23:30,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=72140.0, ans=0.2 2024-09-16 20:23:37,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=72140.0, ans=0.125 2024-09-16 20:24:11,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=72220.0, ans=0.2 2024-09-16 20:24:23,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=72260.0, ans=0.125 2024-09-16 20:24:23,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=72260.0, ans=0.125 2024-09-16 20:24:31,033 INFO [train.py:1198] (0/2) Epoch 4, batch 4500, loss[loss=0.3459, ctc_loss=0.311, cr_loss=0.4643, attn_decoder_loss=0.3395, over 19924.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.2586, cr_loss=0.4523, attn_decoder_loss=0.3147, over 5240234.20 frames. ], batch size: 209, lr: 2.43e-02, grad_scale: 8.0 2024-09-16 20:24:31,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.60 vs. limit=10.0 2024-09-16 20:24:46,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.245e+02 1.357e+02 1.541e+02 2.817e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-16 20:25:07,878 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-4.pt 2024-09-16 20:26:05,472 INFO [train.py:1198] (0/2) Epoch 5, batch 0, loss[loss=0.36, ctc_loss=0.2251, cr_loss=0.4235, attn_decoder_loss=0.3656, over 29603.00 frames. ], tot_loss[loss=0.36, ctc_loss=0.2251, cr_loss=0.4235, attn_decoder_loss=0.3656, over 29603.00 frames. ], batch size: 73, lr: 2.26e-02, grad_scale: 4.0 2024-09-16 20:26:05,473 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 20:26:24,507 INFO [train.py:1230] (0/2) Epoch 5, validation: loss=0.2407, ctc_loss=0.07934, cr_loss=4.486e-15, attn_decoder_loss=0.2587, over 944034.00 frames. 2024-09-16 20:26:24,507 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 20:26:25,963 WARNING [optim.py:503] (0/2) Scaling gradients by 0.06828752905130386, model_norm_threshold=271.39923095703125 2024-09-16 20:26:26,172 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.embed.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.372e+06, grad_sumsq=1.717e+06, orig_rms_sq=2.546e+00 2024-09-16 20:26:30,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=72400.0, ans=0.0 2024-09-16 20:26:33,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.58 vs. limit=12.0 2024-09-16 20:26:38,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=72440.0, ans=0.125 2024-09-16 20:26:50,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=72440.0, ans=0.05 2024-09-16 20:26:51,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=72440.0, ans=0.2 2024-09-16 20:27:33,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-09-16 20:27:40,505 INFO [train.py:1198] (0/2) Epoch 5, batch 50, loss[loss=0.2756, ctc_loss=0.2171, cr_loss=0.3776, attn_decoder_loss=0.2737, over 29389.00 frames. ], tot_loss[loss=0.3114, ctc_loss=0.2444, cr_loss=0.4431, attn_decoder_loss=0.309, over 1267501.49 frames. ], batch size: 70, lr: 2.26e-02, grad_scale: 4.0 2024-09-16 20:27:44,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.33 vs. limit=15.0 2024-09-16 20:27:46,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=72600.0, ans=0.125 2024-09-16 20:28:00,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-09-16 20:28:14,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=72680.0, ans=0.125 2024-09-16 20:28:19,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72680.0, ans=0.125 2024-09-16 20:28:26,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=72720.0, ans=0.125 2024-09-16 20:28:37,067 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.241e+02 1.473e+02 1.722e+02 3.974e+03, threshold=2.946e+02, percent-clipped=9.0 2024-09-16 20:28:38,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72720.0, ans=0.125 2024-09-16 20:28:58,586 INFO [train.py:1198] (0/2) Epoch 5, batch 100, loss[loss=0.2852, ctc_loss=0.2172, cr_loss=0.4136, attn_decoder_loss=0.2836, over 29548.00 frames. ], tot_loss[loss=0.3094, ctc_loss=0.242, cr_loss=0.445, attn_decoder_loss=0.307, over 2251938.46 frames. ], batch size: 76, lr: 2.25e-02, grad_scale: 8.0 2024-09-16 20:29:07,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=72800.0, ans=0.125 2024-09-16 20:29:09,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=72800.0, ans=0.0 2024-09-16 20:29:22,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-09-16 20:29:27,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=12.0 2024-09-16 20:29:28,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=72880.0, ans=0.0 2024-09-16 20:29:35,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2024-09-16 20:29:49,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=72920.0, ans=0.2 2024-09-16 20:30:01,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72960.0, ans=0.1 2024-09-16 20:30:14,763 INFO [train.py:1198] (0/2) Epoch 5, batch 150, loss[loss=0.2597, ctc_loss=0.1866, cr_loss=0.39, attn_decoder_loss=0.2592, over 29440.00 frames. ], tot_loss[loss=0.3033, ctc_loss=0.2352, cr_loss=0.439, attn_decoder_loss=0.3011, over 3046597.02 frames. ], batch size: 70, lr: 2.25e-02, grad_scale: 4.0 2024-09-16 20:30:15,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=73000.0, ans=0.2 2024-09-16 20:30:25,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=73000.0, ans=0.2 2024-09-16 20:30:50,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-09-16 20:31:01,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=73120.0, ans=0.2 2024-09-16 20:31:09,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.201e+01 1.170e+02 1.302e+02 1.516e+02 3.725e+02, threshold=2.604e+02, percent-clipped=3.0 2024-09-16 20:31:13,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=73160.0, ans=0.125 2024-09-16 20:31:26,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=73160.0, ans=0.125 2024-09-16 20:31:29,486 INFO [train.py:1198] (0/2) Epoch 5, batch 200, loss[loss=0.3297, ctc_loss=0.2631, cr_loss=0.4635, attn_decoder_loss=0.3268, over 27661.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2337, cr_loss=0.4383, attn_decoder_loss=0.2995, over 3658303.73 frames. ], batch size: 125, lr: 2.25e-02, grad_scale: 8.0 2024-09-16 20:32:20,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=73320.0, ans=0.025 2024-09-16 20:32:24,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=73320.0, ans=0.125 2024-09-16 20:32:29,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73320.0, ans=0.1 2024-09-16 20:32:38,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=73360.0, ans=0.125 2024-09-16 20:32:45,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=73400.0, ans=0.0 2024-09-16 20:32:46,884 INFO [train.py:1198] (0/2) Epoch 5, batch 250, loss[loss=0.3133, ctc_loss=0.2491, cr_loss=0.4666, attn_decoder_loss=0.3101, over 29182.00 frames. ], tot_loss[loss=0.3009, ctc_loss=0.2326, cr_loss=0.4386, attn_decoder_loss=0.2987, over 4141098.65 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 4.0 2024-09-16 20:33:16,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2024-09-16 20:33:18,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=73480.0, ans=0.125 2024-09-16 20:33:44,206 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.445e+01 1.169e+02 1.336e+02 1.491e+02 2.357e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-16 20:34:04,727 INFO [train.py:1198] (0/2) Epoch 5, batch 300, loss[loss=0.3048, ctc_loss=0.225, cr_loss=0.4492, attn_decoder_loss=0.3037, over 29511.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2305, cr_loss=0.4359, attn_decoder_loss=0.2973, over 4509059.38 frames. ], batch size: 92, lr: 2.24e-02, grad_scale: 8.0 2024-09-16 20:34:24,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=73640.0, ans=0.125 2024-09-16 20:34:26,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=73640.0, ans=0.0 2024-09-16 20:34:29,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=73640.0, ans=0.0 2024-09-16 20:34:35,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=73680.0, ans=0.125 2024-09-16 20:34:46,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.25 vs. limit=22.5 2024-09-16 20:34:56,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.25 vs. limit=15.0 2024-09-16 20:35:19,695 INFO [train.py:1198] (0/2) Epoch 5, batch 350, loss[loss=0.274, ctc_loss=0.2039, cr_loss=0.3937, attn_decoder_loss=0.273, over 29303.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.23, cr_loss=0.4368, attn_decoder_loss=0.2973, over 4793841.99 frames. ], batch size: 71, lr: 2.24e-02, grad_scale: 4.0 2024-09-16 20:35:20,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=73800.0, ans=0.2 2024-09-16 20:35:27,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=73800.0, ans=0.125 2024-09-16 20:35:43,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73840.0, ans=0.1 2024-09-16 20:35:52,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=73880.0, ans=0.1 2024-09-16 20:35:59,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73880.0, ans=0.1 2024-09-16 20:36:01,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=73880.0, ans=0.0 2024-09-16 20:36:20,664 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.525e+01 1.174e+02 1.354e+02 1.521e+02 2.144e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-16 20:36:30,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73960.0, ans=0.1 2024-09-16 20:36:37,087 INFO [train.py:1198] (0/2) Epoch 5, batch 400, loss[loss=0.3037, ctc_loss=0.2353, cr_loss=0.4362, attn_decoder_loss=0.3016, over 29725.00 frames. ], tot_loss[loss=0.2987, ctc_loss=0.2294, cr_loss=0.4355, attn_decoder_loss=0.2967, over 5024252.52 frames. ], batch size: 82, lr: 2.24e-02, grad_scale: 8.0 2024-09-16 20:37:15,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=74080.0, ans=0.125 2024-09-16 20:37:22,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=74120.0, ans=0.125 2024-09-16 20:37:23,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=74120.0, ans=0.025 2024-09-16 20:37:40,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.36 vs. limit=15.0 2024-09-16 20:37:55,367 INFO [train.py:1198] (0/2) Epoch 5, batch 450, loss[loss=0.2979, ctc_loss=0.2177, cr_loss=0.439, attn_decoder_loss=0.297, over 29701.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2291, cr_loss=0.4351, attn_decoder_loss=0.2964, over 5188388.82 frames. ], batch size: 83, lr: 2.24e-02, grad_scale: 4.0 2024-09-16 20:37:57,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=74200.0, ans=0.0 2024-09-16 20:38:09,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=74240.0, ans=0.125 2024-09-16 20:38:36,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-09-16 20:38:39,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-16 20:38:56,441 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.507e+01 1.148e+02 1.317e+02 1.480e+02 2.097e+02, threshold=2.634e+02, percent-clipped=0.0 2024-09-16 20:38:56,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74360.0, ans=0.1 2024-09-16 20:39:04,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=74360.0, ans=0.125 2024-09-16 20:39:11,822 INFO [train.py:1198] (0/2) Epoch 5, batch 500, loss[loss=0.3175, ctc_loss=0.2443, cr_loss=0.4736, attn_decoder_loss=0.3151, over 29436.00 frames. ], tot_loss[loss=0.2975, ctc_loss=0.2281, cr_loss=0.4349, attn_decoder_loss=0.2955, over 5330729.26 frames. ], batch size: 94, lr: 2.23e-02, grad_scale: 8.0 2024-09-16 20:39:15,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.96 vs. limit=22.5 2024-09-16 20:39:18,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=74400.0, ans=0.125 2024-09-16 20:39:59,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=74520.0, ans=0.0 2024-09-16 20:39:59,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=74520.0, ans=0.125 2024-09-16 20:40:11,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=16.34 vs. limit=15.0 2024-09-16 20:40:29,206 INFO [train.py:1198] (0/2) Epoch 5, batch 550, loss[loss=0.3232, ctc_loss=0.2606, cr_loss=0.4924, attn_decoder_loss=0.3193, over 28860.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2286, cr_loss=0.4357, attn_decoder_loss=0.296, over 5423826.30 frames. ], batch size: 104, lr: 2.23e-02, grad_scale: 2.0 2024-09-16 20:40:31,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=22.5 2024-09-16 20:40:52,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-16 20:40:56,878 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:40:58,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-09-16 20:41:10,389 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:41:20,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=74720.0, ans=0.125 2024-09-16 20:41:32,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.970e+01 1.190e+02 1.363e+02 1.590e+02 5.102e+02, threshold=2.726e+02, percent-clipped=4.0 2024-09-16 20:41:44,819 INFO [train.py:1198] (0/2) Epoch 5, batch 600, loss[loss=0.3291, ctc_loss=0.2634, cr_loss=0.475, attn_decoder_loss=0.3258, over 29319.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.2286, cr_loss=0.436, attn_decoder_loss=0.2961, over 5510557.17 frames. ], batch size: 100, lr: 2.23e-02, grad_scale: 4.0 2024-09-16 20:41:50,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=74800.0, ans=0.125 2024-09-16 20:41:54,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=74800.0, ans=0.125 2024-09-16 20:42:02,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=74840.0, ans=0.0 2024-09-16 20:42:06,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=74840.0, ans=0.125 2024-09-16 20:42:11,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.54 vs. limit=15.0 2024-09-16 20:42:27,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=74880.0, ans=0.0 2024-09-16 20:42:57,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=74960.0, ans=0.025 2024-09-16 20:43:00,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-09-16 20:43:02,185 INFO [train.py:1198] (0/2) Epoch 5, batch 650, loss[loss=0.3022, ctc_loss=0.2348, cr_loss=0.433, attn_decoder_loss=0.3001, over 29753.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2269, cr_loss=0.435, attn_decoder_loss=0.295, over 5587604.57 frames. ], batch size: 81, lr: 2.23e-02, grad_scale: 4.0 2024-09-16 20:43:02,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75000.0, ans=0.1 2024-09-16 20:43:05,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=75000.0, ans=0.2 2024-09-16 20:43:07,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=75000.0, ans=0.025 2024-09-16 20:43:15,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2024-09-16 20:43:19,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=75040.0, ans=0.125 2024-09-16 20:43:26,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-16 20:43:31,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=75040.0, ans=0.125 2024-09-16 20:43:33,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=75080.0, ans=0.2 2024-09-16 20:43:44,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=75080.0, ans=0.125 2024-09-16 20:43:47,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2024-09-16 20:43:56,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2024-09-16 20:44:07,671 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.399e+01 1.145e+02 1.260e+02 1.468e+02 2.396e+02, threshold=2.520e+02, percent-clipped=0.0 2024-09-16 20:44:20,243 INFO [train.py:1198] (0/2) Epoch 5, batch 700, loss[loss=0.2613, ctc_loss=0.1816, cr_loss=0.3775, attn_decoder_loss=0.2618, over 29514.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2268, cr_loss=0.4352, attn_decoder_loss=0.2954, over 5639490.16 frames. ], batch size: 76, lr: 2.22e-02, grad_scale: 8.0 2024-09-16 20:44:21,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2024-09-16 20:44:25,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-09-16 20:44:40,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=75240.0, ans=0.125 2024-09-16 20:44:47,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=75240.0, ans=0.0 2024-09-16 20:45:22,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=75360.0, ans=0.125 2024-09-16 20:45:35,699 INFO [train.py:1198] (0/2) Epoch 5, batch 750, loss[loss=0.3036, ctc_loss=0.226, cr_loss=0.4137, attn_decoder_loss=0.3031, over 29707.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2264, cr_loss=0.4352, attn_decoder_loss=0.2949, over 5677474.16 frames. ], batch size: 82, lr: 2.22e-02, grad_scale: 4.0 2024-09-16 20:45:45,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=75400.0, ans=0.5 2024-09-16 20:46:01,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-09-16 20:46:12,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=75480.0, ans=0.0 2024-09-16 20:46:19,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2024-09-16 20:46:37,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.03 vs. limit=15.0 2024-09-16 20:46:42,564 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.394e+01 1.181e+02 1.291e+02 1.489e+02 2.242e+02, threshold=2.582e+02, percent-clipped=0.0 2024-09-16 20:46:42,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=75560.0, ans=0.125 2024-09-16 20:46:49,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.17 vs. limit=10.0 2024-09-16 20:46:49,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.26 vs. limit=10.0 2024-09-16 20:46:53,301 INFO [train.py:1198] (0/2) Epoch 5, batch 800, loss[loss=0.268, ctc_loss=0.1995, cr_loss=0.3974, attn_decoder_loss=0.2668, over 29558.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2263, cr_loss=0.4348, attn_decoder_loss=0.295, over 5707171.99 frames. ], batch size: 73, lr: 2.22e-02, grad_scale: 8.0 2024-09-16 20:46:58,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=15.0 2024-09-16 20:47:24,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-09-16 20:47:39,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75720.0, ans=0.1 2024-09-16 20:48:10,235 INFO [train.py:1198] (0/2) Epoch 5, batch 850, loss[loss=0.3071, ctc_loss=0.2251, cr_loss=0.463, attn_decoder_loss=0.3059, over 29724.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2254, cr_loss=0.4337, attn_decoder_loss=0.2944, over 5734826.92 frames. ], batch size: 89, lr: 2.22e-02, grad_scale: 4.0 2024-09-16 20:48:13,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=75800.0, ans=0.0 2024-09-16 20:48:14,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=75800.0, ans=0.125 2024-09-16 20:48:29,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=75840.0, ans=0.04949747468305833 2024-09-16 20:48:32,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=75840.0, ans=0.2 2024-09-16 20:49:01,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=75920.0, ans=0.0 2024-09-16 20:49:06,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-09-16 20:49:16,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.020e+02 1.208e+02 1.339e+02 1.559e+02 5.118e+02, threshold=2.679e+02, percent-clipped=4.0 2024-09-16 20:49:26,156 INFO [train.py:1198] (0/2) Epoch 5, batch 900, loss[loss=0.2647, ctc_loss=0.1894, cr_loss=0.3993, attn_decoder_loss=0.2642, over 29604.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2262, cr_loss=0.4336, attn_decoder_loss=0.2951, over 5737745.23 frames. ], batch size: 73, lr: 2.21e-02, grad_scale: 8.0 2024-09-16 20:50:02,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=76080.0, ans=0.125 2024-09-16 20:50:15,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=76120.0, ans=0.125 2024-09-16 20:50:16,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=76120.0, ans=0.125 2024-09-16 20:50:32,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=76160.0, ans=0.015 2024-09-16 20:50:39,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=76160.0, ans=0.0 2024-09-16 20:50:43,189 INFO [train.py:1198] (0/2) Epoch 5, batch 950, loss[loss=0.2705, ctc_loss=0.1946, cr_loss=0.4079, attn_decoder_loss=0.2699, over 29513.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2266, cr_loss=0.4345, attn_decoder_loss=0.2955, over 5740328.23 frames. ], batch size: 74, lr: 2.21e-02, grad_scale: 4.0 2024-09-16 20:51:01,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=76240.0, ans=0.0 2024-09-16 20:51:02,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=76240.0, ans=0.125 2024-09-16 20:51:22,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.52 vs. limit=15.0 2024-09-16 20:51:47,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=76360.0, ans=0.125 2024-09-16 20:51:48,758 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:51:52,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-09-16 20:51:52,956 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.976e+01 1.191e+02 1.361e+02 1.638e+02 5.772e+02, threshold=2.722e+02, percent-clipped=5.0 2024-09-16 20:52:00,423 INFO [train.py:1198] (0/2) Epoch 5, batch 1000, loss[loss=0.2927, ctc_loss=0.2326, cr_loss=0.4673, attn_decoder_loss=0.289, over 29506.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2278, cr_loss=0.4355, attn_decoder_loss=0.2961, over 5735086.50 frames. ], batch size: 77, lr: 2.21e-02, grad_scale: 8.0 2024-09-16 20:52:18,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=76440.0, ans=0.0 2024-09-16 20:52:50,577 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:52:56,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76520.0, ans=0.125 2024-09-16 20:53:15,774 INFO [train.py:1198] (0/2) Epoch 5, batch 1050, loss[loss=0.3105, ctc_loss=0.2346, cr_loss=0.4399, attn_decoder_loss=0.3091, over 29691.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2268, cr_loss=0.4352, attn_decoder_loss=0.2954, over 5742993.06 frames. ], batch size: 85, lr: 2.21e-02, grad_scale: 4.0 2024-09-16 20:53:25,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=76600.0, ans=0.1 2024-09-16 20:53:42,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=76640.0, ans=0.125 2024-09-16 20:53:51,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=76680.0, ans=0.2 2024-09-16 20:53:59,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=76680.0, ans=0.125 2024-09-16 20:54:22,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2024-09-16 20:54:23,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=76760.0, ans=0.125 2024-09-16 20:54:27,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.620e+01 1.158e+02 1.276e+02 1.580e+02 2.597e+02, threshold=2.552e+02, percent-clipped=0.0 2024-09-16 20:54:33,692 INFO [train.py:1198] (0/2) Epoch 5, batch 1100, loss[loss=0.3017, ctc_loss=0.2303, cr_loss=0.4439, attn_decoder_loss=0.2997, over 29463.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2265, cr_loss=0.4347, attn_decoder_loss=0.2954, over 5755162.66 frames. ], batch size: 78, lr: 2.20e-02, grad_scale: 8.0 2024-09-16 20:54:50,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=76840.0, ans=0.5 2024-09-16 20:55:02,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2024-09-16 20:55:04,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=76880.0, ans=0.125 2024-09-16 20:55:07,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=76880.0, ans=0.1 2024-09-16 20:55:11,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=76880.0, ans=0.125 2024-09-16 20:55:13,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.83 vs. limit=15.0 2024-09-16 20:55:27,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=76920.0, ans=0.0 2024-09-16 20:55:36,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=76960.0, ans=0.09899494936611666 2024-09-16 20:55:41,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.70 vs. limit=15.0 2024-09-16 20:55:42,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=76960.0, ans=0.125 2024-09-16 20:55:51,261 INFO [train.py:1198] (0/2) Epoch 5, batch 1150, loss[loss=0.2733, ctc_loss=0.1975, cr_loss=0.4112, attn_decoder_loss=0.2726, over 29435.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.2261, cr_loss=0.4351, attn_decoder_loss=0.2951, over 5754822.06 frames. ], batch size: 78, lr: 2.20e-02, grad_scale: 4.0 2024-09-16 20:56:14,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=77040.0, ans=0.125 2024-09-16 20:56:22,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=77080.0, ans=10.0 2024-09-16 20:56:22,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=77080.0, ans=0.125 2024-09-16 20:56:23,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=77080.0, ans=0.125 2024-09-16 20:56:23,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=77080.0, ans=0.05 2024-09-16 20:56:26,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=77080.0, ans=0.125 2024-09-16 20:57:02,552 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.643e+01 1.177e+02 1.305e+02 1.494e+02 2.713e+02, threshold=2.610e+02, percent-clipped=1.0 2024-09-16 20:57:04,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=27.37 vs. limit=22.5 2024-09-16 20:57:06,992 INFO [train.py:1198] (0/2) Epoch 5, batch 1200, loss[loss=0.2983, ctc_loss=0.2311, cr_loss=0.4384, attn_decoder_loss=0.296, over 29676.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2275, cr_loss=0.4367, attn_decoder_loss=0.2961, over 5747308.57 frames. ], batch size: 85, lr: 2.20e-02, grad_scale: 8.0 2024-09-16 20:57:13,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=77200.0, ans=0.0 2024-09-16 20:57:16,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-16 20:57:36,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=77240.0, ans=0.0 2024-09-16 20:57:47,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=77280.0, ans=0.2 2024-09-16 20:57:52,406 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.30 vs. limit=15.0 2024-09-16 20:58:00,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=77320.0, ans=0.125 2024-09-16 20:58:24,359 INFO [train.py:1198] (0/2) Epoch 5, batch 1250, loss[loss=0.3145, ctc_loss=0.2437, cr_loss=0.4705, attn_decoder_loss=0.3119, over 29542.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.2268, cr_loss=0.4366, attn_decoder_loss=0.296, over 5774005.75 frames. ], batch size: 92, lr: 2.20e-02, grad_scale: 4.0 2024-09-16 20:58:27,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=77400.0, ans=0.125 2024-09-16 20:59:01,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=77480.0, ans=0.125 2024-09-16 20:59:16,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77520.0, ans=0.125 2024-09-16 20:59:21,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=77520.0, ans=0.0 2024-09-16 20:59:28,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=77560.0, ans=0.2 2024-09-16 20:59:31,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=77560.0, ans=0.125 2024-09-16 20:59:34,677 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:59:38,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.175e+02 1.290e+02 1.503e+02 2.372e+02, threshold=2.579e+02, percent-clipped=0.0 2024-09-16 20:59:42,133 INFO [train.py:1198] (0/2) Epoch 5, batch 1300, loss[loss=0.3205, ctc_loss=0.2454, cr_loss=0.4524, attn_decoder_loss=0.3188, over 28363.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.2265, cr_loss=0.4366, attn_decoder_loss=0.2954, over 5780005.18 frames. ], batch size: 111, lr: 2.19e-02, grad_scale: 8.0 2024-09-16 20:59:47,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=77600.0, ans=0.025 2024-09-16 21:00:26,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=77720.0, ans=22.5 2024-09-16 21:00:30,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.64 vs. limit=12.0 2024-09-16 21:00:35,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.01 vs. limit=15.0 2024-09-16 21:00:37,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=77720.0, ans=0.09899494936611666 2024-09-16 21:00:57,181 INFO [train.py:1198] (0/2) Epoch 5, batch 1350, loss[loss=0.3037, ctc_loss=0.2265, cr_loss=0.4312, attn_decoder_loss=0.3027, over 29759.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2255, cr_loss=0.4357, attn_decoder_loss=0.2949, over 5797560.95 frames. ], batch size: 81, lr: 2.19e-02, grad_scale: 4.0 2024-09-16 21:00:57,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=77800.0, ans=0.125 2024-09-16 21:01:04,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=77800.0, ans=0.125 2024-09-16 21:01:08,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-09-16 21:01:47,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=77920.0, ans=0.0 2024-09-16 21:02:07,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=77960.0, ans=0.09899494936611666 2024-09-16 21:02:12,850 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.348e+01 1.157e+02 1.281e+02 1.429e+02 2.166e+02, threshold=2.563e+02, percent-clipped=0.0 2024-09-16 21:02:14,437 INFO [train.py:1198] (0/2) Epoch 5, batch 1400, loss[loss=0.2609, ctc_loss=0.1969, cr_loss=0.3994, attn_decoder_loss=0.2591, over 29580.00 frames. ], tot_loss[loss=0.296, ctc_loss=0.2245, cr_loss=0.4344, attn_decoder_loss=0.2943, over 5808507.55 frames. ], batch size: 69, lr: 2.19e-02, grad_scale: 8.0 2024-09-16 21:02:25,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=78000.0, ans=0.125 2024-09-16 21:02:41,651 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:02:57,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=78080.0, ans=0.125 2024-09-16 21:03:00,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=78120.0, ans=0.0 2024-09-16 21:03:23,137 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:03:30,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=78200.0, ans=0.0 2024-09-16 21:03:31,880 INFO [train.py:1198] (0/2) Epoch 5, batch 1450, loss[loss=0.3127, ctc_loss=0.233, cr_loss=0.4682, attn_decoder_loss=0.3111, over 29417.00 frames. ], tot_loss[loss=0.2972, ctc_loss=0.226, cr_loss=0.4366, attn_decoder_loss=0.2954, over 5803977.94 frames. ], batch size: 94, lr: 2.19e-02, grad_scale: 4.0 2024-09-16 21:03:32,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-16 21:03:38,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=78200.0, ans=0.125 2024-09-16 21:03:48,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=78240.0, ans=0.125 2024-09-16 21:03:55,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-09-16 21:04:39,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=78360.0, ans=0.125 2024-09-16 21:04:39,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=78360.0, ans=0.125 2024-09-16 21:04:47,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.240e+02 1.382e+02 1.635e+02 6.361e+02, threshold=2.763e+02, percent-clipped=6.0 2024-09-16 21:04:47,778 INFO [train.py:1198] (0/2) Epoch 5, batch 1500, loss[loss=0.327, ctc_loss=0.2575, cr_loss=0.4849, attn_decoder_loss=0.3239, over 29623.00 frames. ], tot_loss[loss=0.2975, ctc_loss=0.2258, cr_loss=0.436, attn_decoder_loss=0.2958, over 5803510.91 frames. ], batch size: 86, lr: 2.18e-02, grad_scale: 8.0 2024-09-16 21:04:49,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=78400.0, ans=0.0 2024-09-16 21:05:03,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-09-16 21:05:21,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=78480.0, ans=0.07 2024-09-16 21:05:22,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=78480.0, ans=0.125 2024-09-16 21:05:34,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=78520.0, ans=0.2 2024-09-16 21:05:46,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=78520.0, ans=0.125 2024-09-16 21:05:47,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=78520.0, ans=0.0 2024-09-16 21:06:05,492 INFO [train.py:1198] (0/2) Epoch 5, batch 1550, loss[loss=0.3236, ctc_loss=0.2546, cr_loss=0.4816, attn_decoder_loss=0.3205, over 29500.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2257, cr_loss=0.4354, attn_decoder_loss=0.2955, over 5779019.23 frames. ], batch size: 90, lr: 2.18e-02, grad_scale: 4.0 2024-09-16 21:06:08,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=78600.0, ans=0.07 2024-09-16 21:06:13,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=78600.0, ans=0.04949747468305833 2024-09-16 21:06:38,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-16 21:06:46,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=78680.0, ans=0.1 2024-09-16 21:06:47,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=78680.0, ans=0.125 2024-09-16 21:06:48,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-09-16 21:06:58,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=78720.0, ans=0.125 2024-09-16 21:07:15,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=78760.0, ans=0.125 2024-09-16 21:07:22,403 INFO [train.py:1198] (0/2) Epoch 5, batch 1600, loss[loss=0.2908, ctc_loss=0.2159, cr_loss=0.4255, attn_decoder_loss=0.2897, over 29666.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.2257, cr_loss=0.4345, attn_decoder_loss=0.2952, over 5761145.43 frames. ], batch size: 85, lr: 2.18e-02, grad_scale: 8.0 2024-09-16 21:07:23,871 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.848e+01 1.266e+02 1.474e+02 1.762e+02 4.006e+02, threshold=2.948e+02, percent-clipped=2.0 2024-09-16 21:07:28,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=78800.0, ans=0.125 2024-09-16 21:07:28,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=78800.0, ans=0.125 2024-09-16 21:07:42,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-16 21:07:50,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=78840.0, ans=0.125 2024-09-16 21:08:37,894 INFO [train.py:1198] (0/2) Epoch 5, batch 1650, loss[loss=0.3068, ctc_loss=0.2261, cr_loss=0.4277, attn_decoder_loss=0.3062, over 29721.00 frames. ], tot_loss[loss=0.2967, ctc_loss=0.2257, cr_loss=0.4342, attn_decoder_loss=0.295, over 5755215.23 frames. ], batch size: 89, lr: 2.18e-02, grad_scale: 4.0 2024-09-16 21:08:50,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=79000.0, ans=0.125 2024-09-16 21:08:56,342 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:09:30,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=79120.0, ans=15.0 2024-09-16 21:09:55,749 INFO [train.py:1198] (0/2) Epoch 5, batch 1700, loss[loss=0.2725, ctc_loss=0.2092, cr_loss=0.396, attn_decoder_loss=0.2708, over 29581.00 frames. ], tot_loss[loss=0.296, ctc_loss=0.2247, cr_loss=0.4332, attn_decoder_loss=0.2943, over 5778410.91 frames. ], batch size: 69, lr: 2.17e-02, grad_scale: 8.0 2024-09-16 21:10:00,289 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.635e+01 1.159e+02 1.263e+02 1.450e+02 2.662e+02, threshold=2.527e+02, percent-clipped=0.0 2024-09-16 21:10:02,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=79200.0, ans=0.0 2024-09-16 21:10:10,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-09-16 21:10:11,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=79240.0, ans=0.125 2024-09-16 21:10:15,906 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:11:07,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=79360.0, ans=0.025 2024-09-16 21:11:12,998 INFO [train.py:1198] (0/2) Epoch 5, batch 1750, loss[loss=0.2599, ctc_loss=0.1877, cr_loss=0.4078, attn_decoder_loss=0.2588, over 29317.00 frames. ], tot_loss[loss=0.2952, ctc_loss=0.2236, cr_loss=0.4328, attn_decoder_loss=0.2935, over 5786438.33 frames. ], batch size: 67, lr: 2.17e-02, grad_scale: 4.0 2024-09-16 21:11:21,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-09-16 21:11:25,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=79400.0, ans=0.0 2024-09-16 21:11:40,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=79440.0, ans=0.125 2024-09-16 21:11:45,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=79480.0, ans=0.125 2024-09-16 21:11:54,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=79480.0, ans=0.0 2024-09-16 21:12:28,474 INFO [train.py:1198] (0/2) Epoch 5, batch 1800, loss[loss=0.2906, ctc_loss=0.2098, cr_loss=0.4493, attn_decoder_loss=0.2896, over 29690.00 frames. ], tot_loss[loss=0.2953, ctc_loss=0.2236, cr_loss=0.4335, attn_decoder_loss=0.2936, over 5790513.24 frames. ], batch size: 83, lr: 2.17e-02, grad_scale: 8.0 2024-09-16 21:12:33,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=79600.0, ans=0.125 2024-09-16 21:12:34,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.637e+01 1.099e+02 1.224e+02 1.443e+02 2.616e+02, threshold=2.449e+02, percent-clipped=2.0 2024-09-16 21:12:45,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=79640.0, ans=0.0 2024-09-16 21:12:49,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=79640.0, ans=0.125 2024-09-16 21:12:50,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=12.0 2024-09-16 21:13:05,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=79680.0, ans=0.2 2024-09-16 21:13:08,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=79680.0, ans=0.125 2024-09-16 21:13:21,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=79720.0, ans=0.125 2024-09-16 21:13:30,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=8.0 2024-09-16 21:13:39,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-09-16 21:13:44,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79800.0, ans=0.1 2024-09-16 21:13:46,155 INFO [train.py:1198] (0/2) Epoch 5, batch 1850, loss[loss=0.3008, ctc_loss=0.2244, cr_loss=0.4033, attn_decoder_loss=0.3004, over 29617.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2228, cr_loss=0.4327, attn_decoder_loss=0.2932, over 5797810.63 frames. ], batch size: 86, lr: 2.17e-02, grad_scale: 4.0 2024-09-16 21:13:55,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2024-09-16 21:14:04,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2024-09-16 21:14:05,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=79840.0, ans=0.04949747468305833 2024-09-16 21:14:13,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=79840.0, ans=0.125 2024-09-16 21:14:25,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=79880.0, ans=0.125 2024-09-16 21:14:44,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-16 21:14:52,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=79960.0, ans=0.125 2024-09-16 21:14:54,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=79960.0, ans=0.0 2024-09-16 21:15:01,855 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-20000.pt 2024-09-16 21:15:10,196 INFO [train.py:1198] (0/2) Epoch 5, batch 1900, loss[loss=0.3072, ctc_loss=0.2247, cr_loss=0.4381, attn_decoder_loss=0.3066, over 29673.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2242, cr_loss=0.435, attn_decoder_loss=0.2944, over 5805376.54 frames. ], batch size: 89, lr: 2.16e-02, grad_scale: 8.0 2024-09-16 21:15:17,676 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.933e+01 1.145e+02 1.241e+02 1.387e+02 2.102e+02, threshold=2.481e+02, percent-clipped=0.0 2024-09-16 21:15:18,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=80000.0, ans=0.125 2024-09-16 21:15:19,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-16 21:15:28,894 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:15:36,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=80040.0, ans=0.125 2024-09-16 21:15:36,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=80040.0, ans=0.025 2024-09-16 21:15:39,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=80080.0, ans=0.2 2024-09-16 21:15:51,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80080.0, ans=0.1 2024-09-16 21:15:59,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=80120.0, ans=0.125 2024-09-16 21:16:03,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=80120.0, ans=0.0 2024-09-16 21:16:07,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-16 21:16:26,183 INFO [train.py:1198] (0/2) Epoch 5, batch 1950, loss[loss=0.2947, ctc_loss=0.2309, cr_loss=0.4213, attn_decoder_loss=0.2924, over 29454.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2253, cr_loss=0.4373, attn_decoder_loss=0.2957, over 5820839.45 frames. ], batch size: 78, lr: 2.16e-02, grad_scale: 4.0 2024-09-16 21:16:31,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.01 vs. limit=10.0 2024-09-16 21:16:32,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=80200.0, ans=0.95 2024-09-16 21:16:40,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=80240.0, ans=0.125 2024-09-16 21:16:58,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=80280.0, ans=0.125 2024-09-16 21:17:19,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=80320.0, ans=10.0 2024-09-16 21:17:22,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=80320.0, ans=0.125 2024-09-16 21:17:28,866 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:17:30,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=80360.0, ans=0.0 2024-09-16 21:17:39,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=80360.0, ans=0.125 2024-09-16 21:17:43,609 INFO [train.py:1198] (0/2) Epoch 5, batch 2000, loss[loss=0.2614, ctc_loss=0.1883, cr_loss=0.4055, attn_decoder_loss=0.2605, over 29360.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.226, cr_loss=0.4376, attn_decoder_loss=0.2961, over 5800116.05 frames. ], batch size: 67, lr: 2.16e-02, grad_scale: 8.0 2024-09-16 21:17:52,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.796e+01 1.236e+02 1.402e+02 1.608e+02 2.421e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-16 21:18:19,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-09-16 21:18:26,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-16 21:18:38,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=80520.0, ans=0.0 2024-09-16 21:18:50,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-09-16 21:18:56,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=80560.0, ans=0.125 2024-09-16 21:19:01,084 INFO [train.py:1198] (0/2) Epoch 5, batch 2050, loss[loss=0.2642, ctc_loss=0.1991, cr_loss=0.3997, attn_decoder_loss=0.2625, over 29427.00 frames. ], tot_loss[loss=0.2964, ctc_loss=0.225, cr_loss=0.4361, attn_decoder_loss=0.2947, over 5791050.79 frames. ], batch size: 70, lr: 2.16e-02, grad_scale: 4.0 2024-09-16 21:19:01,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=80600.0, ans=0.125 2024-09-16 21:19:31,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=80680.0, ans=0.0 2024-09-16 21:19:38,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.20 vs. limit=22.5 2024-09-16 21:19:47,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2024-09-16 21:20:17,211 INFO [train.py:1198] (0/2) Epoch 5, batch 2100, loss[loss=0.3001, ctc_loss=0.2248, cr_loss=0.4456, attn_decoder_loss=0.2985, over 29760.00 frames. ], tot_loss[loss=0.296, ctc_loss=0.2242, cr_loss=0.4353, attn_decoder_loss=0.2944, over 5802766.02 frames. ], batch size: 81, lr: 2.15e-02, grad_scale: 8.0 2024-09-16 21:20:18,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=80800.0, ans=0.125 2024-09-16 21:20:20,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=80800.0, ans=0.1 2024-09-16 21:20:22,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=12.55 vs. limit=15.0 2024-09-16 21:20:24,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=80800.0, ans=0.125 2024-09-16 21:20:26,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=80800.0, ans=0.0 2024-09-16 21:20:27,415 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.761e+01 1.220e+02 1.373e+02 1.548e+02 8.609e+02, threshold=2.746e+02, percent-clipped=3.0 2024-09-16 21:21:08,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80920.0, ans=0.125 2024-09-16 21:21:15,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=80960.0, ans=0.125 2024-09-16 21:21:19,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2024-09-16 21:21:31,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.41 vs. limit=22.5 2024-09-16 21:21:34,125 INFO [train.py:1198] (0/2) Epoch 5, batch 2150, loss[loss=0.2946, ctc_loss=0.2335, cr_loss=0.4647, attn_decoder_loss=0.2911, over 29448.00 frames. ], tot_loss[loss=0.295, ctc_loss=0.2227, cr_loss=0.4345, attn_decoder_loss=0.2933, over 5816769.33 frames. ], batch size: 78, lr: 2.15e-02, grad_scale: 4.0 2024-09-16 21:21:34,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=81000.0, ans=0.125 2024-09-16 21:21:41,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=81000.0, ans=0.025 2024-09-16 21:21:57,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81040.0, ans=0.125 2024-09-16 21:21:59,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-09-16 21:22:02,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2024-09-16 21:22:03,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81040.0, ans=0.1 2024-09-16 21:22:21,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81120.0, ans=0.1 2024-09-16 21:22:51,825 INFO [train.py:1198] (0/2) Epoch 5, batch 2200, loss[loss=0.2988, ctc_loss=0.2244, cr_loss=0.428, attn_decoder_loss=0.2976, over 29633.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2227, cr_loss=0.4344, attn_decoder_loss=0.2931, over 5813003.44 frames. ], batch size: 86, lr: 2.15e-02, grad_scale: 8.0 2024-09-16 21:23:02,278 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.674e+01 1.183e+02 1.300e+02 1.517e+02 2.352e+02, threshold=2.600e+02, percent-clipped=0.0 2024-09-16 21:23:12,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81240.0, ans=0.125 2024-09-16 21:23:44,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=81320.0, ans=0.0 2024-09-16 21:23:44,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=81320.0, ans=0.0 2024-09-16 21:23:52,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=81360.0, ans=0.2 2024-09-16 21:24:02,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.02 vs. limit=22.5 2024-09-16 21:24:07,126 INFO [train.py:1198] (0/2) Epoch 5, batch 2250, loss[loss=0.3039, ctc_loss=0.2287, cr_loss=0.4323, attn_decoder_loss=0.3026, over 29738.00 frames. ], tot_loss[loss=0.2947, ctc_loss=0.2225, cr_loss=0.4344, attn_decoder_loss=0.293, over 5811258.53 frames. ], batch size: 82, lr: 2.15e-02, grad_scale: 4.0 2024-09-16 21:24:27,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.97 vs. limit=22.5 2024-09-16 21:24:36,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-09-16 21:24:38,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=81480.0, ans=0.0 2024-09-16 21:24:43,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=81480.0, ans=0.0 2024-09-16 21:25:23,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=81600.0, ans=0.125 2024-09-16 21:25:23,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=81600.0, ans=0.07 2024-09-16 21:25:25,073 INFO [train.py:1198] (0/2) Epoch 5, batch 2300, loss[loss=0.2613, ctc_loss=0.1896, cr_loss=0.3921, attn_decoder_loss=0.2606, over 29712.00 frames. ], tot_loss[loss=0.2939, ctc_loss=0.2214, cr_loss=0.4326, attn_decoder_loss=0.2923, over 5799344.04 frames. ], batch size: 72, lr: 2.15e-02, grad_scale: 8.0 2024-09-16 21:25:38,312 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.849e+01 1.191e+02 1.337e+02 1.602e+02 2.823e+02, threshold=2.675e+02, percent-clipped=3.0 2024-09-16 21:26:15,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=81720.0, ans=0.125 2024-09-16 21:26:25,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=81760.0, ans=0.07 2024-09-16 21:26:33,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=81760.0, ans=0.125 2024-09-16 21:26:38,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=81760.0, ans=0.2 2024-09-16 21:26:42,305 INFO [train.py:1198] (0/2) Epoch 5, batch 2350, loss[loss=0.3043, ctc_loss=0.2254, cr_loss=0.4429, attn_decoder_loss=0.3032, over 29702.00 frames. ], tot_loss[loss=0.2939, ctc_loss=0.2214, cr_loss=0.4324, attn_decoder_loss=0.2923, over 5804360.97 frames. ], batch size: 83, lr: 2.14e-02, grad_scale: 4.0 2024-09-16 21:26:51,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=81800.0, ans=0.0 2024-09-16 21:27:11,153 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:27:31,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-16 21:27:38,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=81920.0, ans=0.125 2024-09-16 21:27:44,636 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:27:58,047 INFO [train.py:1198] (0/2) Epoch 5, batch 2400, loss[loss=0.2922, ctc_loss=0.2211, cr_loss=0.4334, attn_decoder_loss=0.2905, over 29522.00 frames. ], tot_loss[loss=0.2946, ctc_loss=0.2221, cr_loss=0.4337, attn_decoder_loss=0.293, over 5806924.88 frames. ], batch size: 76, lr: 2.14e-02, grad_scale: 8.0 2024-09-16 21:28:12,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82040.0, ans=0.1 2024-09-16 21:28:13,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.326e+01 1.225e+02 1.360e+02 1.581e+02 2.424e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-16 21:28:27,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=82080.0, ans=0.125 2024-09-16 21:28:47,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-09-16 21:28:52,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=8.0 2024-09-16 21:29:14,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=82200.0, ans=10.0 2024-09-16 21:29:15,780 INFO [train.py:1198] (0/2) Epoch 5, batch 2450, loss[loss=0.3028, ctc_loss=0.2221, cr_loss=0.4488, attn_decoder_loss=0.3018, over 29699.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2235, cr_loss=0.4346, attn_decoder_loss=0.2942, over 5783629.11 frames. ], batch size: 82, lr: 2.14e-02, grad_scale: 4.0 2024-09-16 21:30:01,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-09-16 21:30:24,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2024-09-16 21:30:33,951 INFO [train.py:1198] (0/2) Epoch 5, batch 2500, loss[loss=0.3028, ctc_loss=0.2177, cr_loss=0.4498, attn_decoder_loss=0.3023, over 29625.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2231, cr_loss=0.4348, attn_decoder_loss=0.2941, over 5793403.88 frames. ], batch size: 86, lr: 2.14e-02, grad_scale: 8.0 2024-09-16 21:30:50,576 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.347e+01 1.183e+02 1.324e+02 1.493e+02 3.213e+02, threshold=2.647e+02, percent-clipped=2.0 2024-09-16 21:31:03,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.79 vs. limit=12.0 2024-09-16 21:31:15,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=82480.0, ans=0.125 2024-09-16 21:31:19,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-09-16 21:31:26,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-09-16 21:31:31,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82520.0, ans=0.1 2024-09-16 21:31:49,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=12.0 2024-09-16 21:31:49,667 INFO [train.py:1198] (0/2) Epoch 5, batch 2550, loss[loss=0.2563, ctc_loss=0.1842, cr_loss=0.3872, attn_decoder_loss=0.2557, over 29335.00 frames. ], tot_loss[loss=0.2954, ctc_loss=0.2226, cr_loss=0.4339, attn_decoder_loss=0.2938, over 5797732.99 frames. ], batch size: 67, lr: 2.13e-02, grad_scale: 4.0 2024-09-16 21:31:51,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=82600.0, ans=0.125 2024-09-16 21:32:04,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=82640.0, ans=0.0 2024-09-16 21:32:06,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=82640.0, ans=0.125 2024-09-16 21:32:32,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=82680.0, ans=15.0 2024-09-16 21:32:34,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82720.0, ans=0.1 2024-09-16 21:32:45,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82720.0, ans=0.1 2024-09-16 21:32:48,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=82760.0, ans=0.0 2024-09-16 21:33:02,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=82760.0, ans=0.2 2024-09-16 21:33:04,972 INFO [train.py:1198] (0/2) Epoch 5, batch 2600, loss[loss=0.2778, ctc_loss=0.2062, cr_loss=0.4515, attn_decoder_loss=0.2757, over 29437.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2229, cr_loss=0.4352, attn_decoder_loss=0.2941, over 5793953.49 frames. ], batch size: 78, lr: 2.13e-02, grad_scale: 8.0 2024-09-16 21:33:25,236 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.991e+01 1.177e+02 1.349e+02 1.549e+02 3.059e+02, threshold=2.698e+02, percent-clipped=1.0 2024-09-16 21:33:33,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-09-16 21:33:37,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=82880.0, ans=0.015 2024-09-16 21:33:45,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.22 vs. limit=22.5 2024-09-16 21:33:56,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=82920.0, ans=0.05 2024-09-16 21:34:02,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=82920.0, ans=0.5 2024-09-16 21:34:03,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=82920.0, ans=0.2 2024-09-16 21:34:22,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2024-09-16 21:34:24,551 INFO [train.py:1198] (0/2) Epoch 5, batch 2650, loss[loss=0.3197, ctc_loss=0.2515, cr_loss=0.4735, attn_decoder_loss=0.3168, over 29224.00 frames. ], tot_loss[loss=0.2962, ctc_loss=0.2234, cr_loss=0.4363, attn_decoder_loss=0.2946, over 5799687.67 frames. ], batch size: 100, lr: 2.13e-02, grad_scale: 4.0 2024-09-16 21:34:25,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.98 vs. limit=22.5 2024-09-16 21:34:56,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=83080.0, ans=0.07 2024-09-16 21:35:09,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=83120.0, ans=0.05 2024-09-16 21:35:22,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2024-09-16 21:35:24,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2024-09-16 21:35:28,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83160.0, ans=0.1 2024-09-16 21:35:31,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.55 vs. limit=15.0 2024-09-16 21:35:39,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=83200.0, ans=0.0 2024-09-16 21:35:40,224 INFO [train.py:1198] (0/2) Epoch 5, batch 2700, loss[loss=0.3106, ctc_loss=0.2316, cr_loss=0.4461, attn_decoder_loss=0.3094, over 29495.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2232, cr_loss=0.4364, attn_decoder_loss=0.2945, over 5795819.39 frames. ], batch size: 87, lr: 2.13e-02, grad_scale: 8.0 2024-09-16 21:35:55,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=83240.0, ans=0.0 2024-09-16 21:35:58,611 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:35:59,713 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.832e+01 1.218e+02 1.347e+02 1.527e+02 8.149e+02, threshold=2.695e+02, percent-clipped=3.0 2024-09-16 21:36:33,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=83320.0, ans=0.0 2024-09-16 21:36:36,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=83320.0, ans=0.125 2024-09-16 21:36:41,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=83360.0, ans=0.125 2024-09-16 21:36:41,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.14 vs. limit=10.0 2024-09-16 21:36:56,081 INFO [train.py:1198] (0/2) Epoch 5, batch 2750, loss[loss=0.2842, ctc_loss=0.2145, cr_loss=0.4354, attn_decoder_loss=0.2823, over 29523.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2222, cr_loss=0.4356, attn_decoder_loss=0.2932, over 5793569.30 frames. ], batch size: 75, lr: 2.12e-02, grad_scale: 4.0 2024-09-16 21:37:46,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-09-16 21:38:15,626 INFO [train.py:1198] (0/2) Epoch 5, batch 2800, loss[loss=0.347, ctc_loss=0.3086, cr_loss=0.456, attn_decoder_loss=0.3411, over 20284.00 frames. ], tot_loss[loss=0.2951, ctc_loss=0.2226, cr_loss=0.4352, attn_decoder_loss=0.2935, over 5775675.79 frames. ], batch size: 210, lr: 2.12e-02, grad_scale: 8.0 2024-09-16 21:38:24,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=83600.0, ans=0.125 2024-09-16 21:38:25,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-09-16 21:38:29,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=83640.0, ans=0.125 2024-09-16 21:38:36,672 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.137e+02 1.290e+02 1.487e+02 2.968e+02, threshold=2.580e+02, percent-clipped=1.0 2024-09-16 21:38:37,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.02 vs. limit=12.0 2024-09-16 21:38:43,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=83640.0, ans=0.1 2024-09-16 21:38:44,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=83680.0, ans=0.125 2024-09-16 21:38:49,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=83680.0, ans=0.0 2024-09-16 21:38:52,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=83680.0, ans=0.0 2024-09-16 21:39:03,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2024-09-16 21:39:16,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.76 vs. limit=22.5 2024-09-16 21:39:23,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=83760.0, ans=0.125 2024-09-16 21:39:30,816 INFO [train.py:1198] (0/2) Epoch 5, batch 2850, loss[loss=0.3021, ctc_loss=0.2321, cr_loss=0.4524, attn_decoder_loss=0.2998, over 29502.00 frames. ], tot_loss[loss=0.2958, ctc_loss=0.2232, cr_loss=0.4361, attn_decoder_loss=0.2941, over 5760697.28 frames. ], batch size: 77, lr: 2.12e-02, grad_scale: 4.0 2024-09-16 21:39:41,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83800.0, ans=0.1 2024-09-16 21:39:41,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=83800.0, ans=0.0 2024-09-16 21:39:44,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-09-16 21:39:50,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=83840.0, ans=0.0 2024-09-16 21:39:57,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.whiten.whitening_limit, batch_count=83840.0, ans=12.0 2024-09-16 21:40:12,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=83880.0, ans=0.0 2024-09-16 21:40:25,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=83920.0, ans=0.125 2024-09-16 21:40:25,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=83920.0, ans=10.0 2024-09-16 21:40:47,090 INFO [train.py:1198] (0/2) Epoch 5, batch 2900, loss[loss=0.2966, ctc_loss=0.2265, cr_loss=0.4517, attn_decoder_loss=0.2944, over 29423.00 frames. ], tot_loss[loss=0.2968, ctc_loss=0.2238, cr_loss=0.438, attn_decoder_loss=0.2952, over 5785972.92 frames. ], batch size: 79, lr: 2.12e-02, grad_scale: 8.0 2024-09-16 21:40:58,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-16 21:41:04,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=84040.0, ans=0.125 2024-09-16 21:41:13,838 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.341e+01 1.106e+02 1.208e+02 1.366e+02 2.377e+02, threshold=2.415e+02, percent-clipped=0.0 2024-09-16 21:41:14,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2024-09-16 21:41:18,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=84040.0, ans=0.0 2024-09-16 21:41:47,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84120.0, ans=0.1 2024-09-16 21:41:48,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=84120.0, ans=0.2 2024-09-16 21:41:57,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=84160.0, ans=0.05 2024-09-16 21:41:57,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=84160.0, ans=0.95 2024-09-16 21:42:02,435 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:42:06,522 INFO [train.py:1198] (0/2) Epoch 5, batch 2950, loss[loss=0.2741, ctc_loss=0.2, cr_loss=0.4005, attn_decoder_loss=0.2734, over 29525.00 frames. ], tot_loss[loss=0.295, ctc_loss=0.2217, cr_loss=0.4356, attn_decoder_loss=0.2934, over 5783578.13 frames. ], batch size: 75, lr: 2.12e-02, grad_scale: 4.0 2024-09-16 21:42:09,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=84200.0, ans=0.05 2024-09-16 21:42:17,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.79 vs. limit=22.5 2024-09-16 21:43:15,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.56 vs. limit=15.0 2024-09-16 21:43:22,234 INFO [train.py:1198] (0/2) Epoch 5, batch 3000, loss[loss=0.2913, ctc_loss=0.2118, cr_loss=0.4394, attn_decoder_loss=0.2904, over 29738.00 frames. ], tot_loss[loss=0.2954, ctc_loss=0.2222, cr_loss=0.4359, attn_decoder_loss=0.2939, over 5785380.54 frames. ], batch size: 81, lr: 2.11e-02, grad_scale: 8.0 2024-09-16 21:43:22,235 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 21:43:40,546 INFO [train.py:1230] (0/2) Epoch 5, validation: loss=0.2221, ctc_loss=0.06863, cr_loss=4.342e-15, attn_decoder_loss=0.2392, over 944034.00 frames. 2024-09-16 21:43:40,547 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 21:43:43,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=84400.0, ans=0.125 2024-09-16 21:43:46,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2024-09-16 21:43:54,424 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:44:04,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.812e+01 1.181e+02 1.340e+02 1.602e+02 4.120e+02, threshold=2.680e+02, percent-clipped=4.0 2024-09-16 21:44:06,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=84440.0, ans=0.125 2024-09-16 21:44:08,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.40 vs. limit=15.0 2024-09-16 21:44:09,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=84480.0, ans=0.05 2024-09-16 21:44:13,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=84480.0, ans=0.1 2024-09-16 21:44:27,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=84520.0, ans=0.0 2024-09-16 21:44:41,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.18 vs. limit=15.0 2024-09-16 21:44:46,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=84560.0, ans=0.125 2024-09-16 21:44:59,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=84600.0, ans=0.125 2024-09-16 21:45:00,285 INFO [train.py:1198] (0/2) Epoch 5, batch 3050, loss[loss=0.2795, ctc_loss=0.2143, cr_loss=0.4489, attn_decoder_loss=0.2768, over 29533.00 frames. ], tot_loss[loss=0.2961, ctc_loss=0.2233, cr_loss=0.4371, attn_decoder_loss=0.2945, over 5779246.80 frames. ], batch size: 76, lr: 2.11e-02, grad_scale: 4.0 2024-09-16 21:45:08,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=84600.0, ans=0.025 2024-09-16 21:45:30,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=84680.0, ans=0.0 2024-09-16 21:45:39,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=84680.0, ans=0.025 2024-09-16 21:45:50,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=84720.0, ans=0.05 2024-09-16 21:46:15,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=84800.0, ans=0.95 2024-09-16 21:46:16,184 INFO [train.py:1198] (0/2) Epoch 5, batch 3100, loss[loss=0.3234, ctc_loss=0.2576, cr_loss=0.46, attn_decoder_loss=0.3205, over 29271.00 frames. ], tot_loss[loss=0.2957, ctc_loss=0.2232, cr_loss=0.4364, attn_decoder_loss=0.294, over 5778065.90 frames. ], batch size: 100, lr: 2.11e-02, grad_scale: 8.0 2024-09-16 21:46:41,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.874e+01 1.199e+02 1.306e+02 1.594e+02 3.534e+02, threshold=2.612e+02, percent-clipped=1.0 2024-09-16 21:46:43,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=84840.0, ans=0.2 2024-09-16 21:46:49,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=84880.0, ans=0.1 2024-09-16 21:47:12,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=84920.0, ans=0.125 2024-09-16 21:47:22,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=84960.0, ans=0.125 2024-09-16 21:47:25,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=84960.0, ans=0.1 2024-09-16 21:47:27,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=84960.0, ans=0.2 2024-09-16 21:47:31,537 INFO [train.py:1198] (0/2) Epoch 5, batch 3150, loss[loss=0.315, ctc_loss=0.2365, cr_loss=0.4687, attn_decoder_loss=0.3133, over 29024.00 frames. ], tot_loss[loss=0.2952, ctc_loss=0.2224, cr_loss=0.4361, attn_decoder_loss=0.2936, over 5783872.33 frames. ], batch size: 105, lr: 2.11e-02, grad_scale: 4.0 2024-09-16 21:47:54,438 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:48:18,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-09-16 21:48:21,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=85120.0, ans=0.0 2024-09-16 21:48:32,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.51 vs. limit=15.0 2024-09-16 21:48:50,911 INFO [train.py:1198] (0/2) Epoch 5, batch 3200, loss[loss=0.2928, ctc_loss=0.2184, cr_loss=0.4496, attn_decoder_loss=0.2911, over 29423.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.222, cr_loss=0.4358, attn_decoder_loss=0.2932, over 5793936.77 frames. ], batch size: 79, lr: 2.10e-02, grad_scale: 8.0 2024-09-16 21:49:04,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=85240.0, ans=0.125 2024-09-16 21:49:12,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85240.0, ans=0.1 2024-09-16 21:49:18,340 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.158e+01 1.087e+02 1.227e+02 1.343e+02 2.511e+02, threshold=2.453e+02, percent-clipped=0.0 2024-09-16 21:49:23,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=85280.0, ans=0.2 2024-09-16 21:49:25,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.37 vs. limit=10.0 2024-09-16 21:49:28,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=85280.0, ans=0.0 2024-09-16 21:49:44,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=85320.0, ans=0.0 2024-09-16 21:49:47,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=85320.0, ans=0.125 2024-09-16 21:50:07,120 INFO [train.py:1198] (0/2) Epoch 5, batch 3250, loss[loss=0.3096, ctc_loss=0.2386, cr_loss=0.4452, attn_decoder_loss=0.3076, over 29729.00 frames. ], tot_loss[loss=0.2948, ctc_loss=0.2219, cr_loss=0.4357, attn_decoder_loss=0.2932, over 5800809.10 frames. ], batch size: 84, lr: 2.10e-02, grad_scale: 4.0 2024-09-16 21:50:25,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=85440.0, ans=0.1 2024-09-16 21:50:28,471 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:51:22,833 INFO [train.py:1198] (0/2) Epoch 5, batch 3300, loss[loss=0.3157, ctc_loss=0.2445, cr_loss=0.4474, attn_decoder_loss=0.3136, over 28228.00 frames. ], tot_loss[loss=0.2935, ctc_loss=0.2204, cr_loss=0.4341, attn_decoder_loss=0.292, over 5796769.19 frames. ], batch size: 111, lr: 2.10e-02, grad_scale: 8.0 2024-09-16 21:51:30,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=85600.0, ans=0.0 2024-09-16 21:51:40,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=85640.0, ans=0.125 2024-09-16 21:51:51,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.589e+01 1.170e+02 1.337e+02 1.496e+02 4.068e+02, threshold=2.673e+02, percent-clipped=4.0 2024-09-16 21:52:31,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=85760.0, ans=0.125 2024-09-16 21:52:42,504 INFO [train.py:1198] (0/2) Epoch 5, batch 3350, loss[loss=0.3052, ctc_loss=0.2301, cr_loss=0.4511, attn_decoder_loss=0.3035, over 28907.00 frames. ], tot_loss[loss=0.2943, ctc_loss=0.2215, cr_loss=0.4343, attn_decoder_loss=0.2927, over 5773077.88 frames. ], batch size: 104, lr: 2.10e-02, grad_scale: 4.0 2024-09-16 21:52:49,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-09-16 21:52:59,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=85840.0, ans=0.1 2024-09-16 21:53:01,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85840.0, ans=0.1 2024-09-16 21:53:07,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=85840.0, ans=0.125 2024-09-16 21:53:08,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-16 21:53:18,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.58 vs. limit=15.0 2024-09-16 21:53:31,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85920.0, ans=0.1 2024-09-16 21:53:54,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.40 vs. limit=15.0 2024-09-16 21:53:58,050 INFO [train.py:1198] (0/2) Epoch 5, batch 3400, loss[loss=0.2583, ctc_loss=0.1876, cr_loss=0.3957, attn_decoder_loss=0.2574, over 29342.00 frames. ], tot_loss[loss=0.2942, ctc_loss=0.2215, cr_loss=0.4345, attn_decoder_loss=0.2927, over 5767258.06 frames. ], batch size: 67, lr: 2.10e-02, grad_scale: 4.0 2024-09-16 21:54:07,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=86000.0, ans=0.025 2024-09-16 21:54:14,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=86040.0, ans=0.025 2024-09-16 21:54:16,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=86040.0, ans=0.125 2024-09-16 21:54:17,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=86040.0, ans=0.0 2024-09-16 21:54:18,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2024-09-16 21:54:28,087 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.899e+01 1.163e+02 1.316e+02 1.513e+02 4.040e+02, threshold=2.631e+02, percent-clipped=2.0 2024-09-16 21:54:29,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.77 vs. limit=10.0 2024-09-16 21:54:34,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=86080.0, ans=0.125 2024-09-16 21:54:43,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=86120.0, ans=0.0 2024-09-16 21:54:52,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=86120.0, ans=0.0 2024-09-16 21:55:04,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=86160.0, ans=0.2 2024-09-16 21:55:07,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=86160.0, ans=0.125 2024-09-16 21:55:13,323 INFO [train.py:1198] (0/2) Epoch 5, batch 3450, loss[loss=0.3109, ctc_loss=0.2328, cr_loss=0.4524, attn_decoder_loss=0.3095, over 28217.00 frames. ], tot_loss[loss=0.294, ctc_loss=0.221, cr_loss=0.4344, attn_decoder_loss=0.2925, over 5774742.92 frames. ], batch size: 111, lr: 2.09e-02, grad_scale: 4.0 2024-09-16 21:55:14,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=86200.0, ans=0.125 2024-09-16 21:56:17,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86360.0, ans=0.1 2024-09-16 21:56:19,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.45 vs. limit=10.0 2024-09-16 21:56:33,266 INFO [train.py:1198] (0/2) Epoch 5, batch 3500, loss[loss=0.2873, ctc_loss=0.2209, cr_loss=0.4289, attn_decoder_loss=0.2852, over 29327.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2207, cr_loss=0.4341, attn_decoder_loss=0.2918, over 5775966.29 frames. ], batch size: 71, lr: 2.09e-02, grad_scale: 8.0 2024-09-16 21:56:44,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86400.0, ans=0.1 2024-09-16 21:56:51,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=86440.0, ans=0.0 2024-09-16 21:56:53,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86440.0, ans=0.0 2024-09-16 21:57:01,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=86480.0, ans=0.0 2024-09-16 21:57:02,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-09-16 21:57:04,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.011e+02 1.247e+02 1.357e+02 1.561e+02 2.944e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-16 21:57:14,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.46 vs. limit=22.5 2024-09-16 21:57:19,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=86520.0, ans=0.125 2024-09-16 21:57:28,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=86520.0, ans=0.125 2024-09-16 21:57:46,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=86600.0, ans=0.0 2024-09-16 21:57:48,009 INFO [train.py:1198] (0/2) Epoch 5, batch 3550, loss[loss=0.3126, ctc_loss=0.2443, cr_loss=0.4362, attn_decoder_loss=0.3105, over 29695.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2205, cr_loss=0.4343, attn_decoder_loss=0.2918, over 5781857.09 frames. ], batch size: 89, lr: 2.09e-02, grad_scale: 4.0 2024-09-16 21:58:10,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=86640.0, ans=0.0 2024-09-16 21:58:15,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=86640.0, ans=0.025 2024-09-16 21:58:21,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=86680.0, ans=0.125 2024-09-16 21:58:57,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86760.0, ans=0.1 2024-09-16 21:59:00,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2024-09-16 21:59:02,142 INFO [train.py:1198] (0/2) Epoch 5, batch 3600, loss[loss=0.2711, ctc_loss=0.1974, cr_loss=0.4118, attn_decoder_loss=0.2701, over 29489.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2201, cr_loss=0.4339, attn_decoder_loss=0.2919, over 5791421.83 frames. ], batch size: 77, lr: 2.09e-02, grad_scale: 8.0 2024-09-16 21:59:11,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=86800.0, ans=0.125 2024-09-16 21:59:21,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=86840.0, ans=0.05 2024-09-16 21:59:23,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=86840.0, ans=0.0 2024-09-16 21:59:24,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=86840.0, ans=0.125 2024-09-16 21:59:24,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=86840.0, ans=0.09899494936611666 2024-09-16 21:59:34,636 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.246e+01 1.105e+02 1.213e+02 1.386e+02 4.333e+02, threshold=2.426e+02, percent-clipped=4.0 2024-09-16 21:59:36,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86880.0, ans=0.125 2024-09-16 22:00:16,129 INFO [train.py:1198] (0/2) Epoch 5, batch 3650, loss[loss=0.317, ctc_loss=0.2411, cr_loss=0.4715, attn_decoder_loss=0.3149, over 29514.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2196, cr_loss=0.4331, attn_decoder_loss=0.2913, over 5793465.93 frames. ], batch size: 90, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:00:17,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=87000.0, ans=0.5 2024-09-16 22:00:19,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=87000.0, ans=0.0 2024-09-16 22:00:37,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.60 vs. limit=10.0 2024-09-16 22:00:38,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=87040.0, ans=0.125 2024-09-16 22:00:53,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=87080.0, ans=0.125 2024-09-16 22:00:58,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-09-16 22:01:19,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=87160.0, ans=0.0 2024-09-16 22:01:31,068 INFO [train.py:1198] (0/2) Epoch 5, batch 3700, loss[loss=0.3121, ctc_loss=0.2418, cr_loss=0.4866, attn_decoder_loss=0.3091, over 29699.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2191, cr_loss=0.4328, attn_decoder_loss=0.2915, over 5804433.85 frames. ], batch size: 84, lr: 2.08e-02, grad_scale: 8.0 2024-09-16 22:01:40,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=87200.0, ans=0.07 2024-09-16 22:01:43,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=87200.0, ans=0.2 2024-09-16 22:01:43,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-09-16 22:01:44,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=87240.0, ans=0.07 2024-09-16 22:01:59,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=87280.0, ans=0.05 2024-09-16 22:02:01,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.24 vs. limit=15.0 2024-09-16 22:02:05,212 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.891e+01 1.136e+02 1.234e+02 1.353e+02 4.194e+02, threshold=2.467e+02, percent-clipped=4.0 2024-09-16 22:02:05,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=87280.0, ans=0.04949747468305833 2024-09-16 22:02:14,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=87320.0, ans=0.125 2024-09-16 22:02:24,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=87320.0, ans=0.125 2024-09-16 22:02:26,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=87320.0, ans=0.125 2024-09-16 22:02:47,096 INFO [train.py:1198] (0/2) Epoch 5, batch 3750, loss[loss=0.2524, ctc_loss=0.1835, cr_loss=0.3876, attn_decoder_loss=0.2515, over 29348.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.2186, cr_loss=0.4328, attn_decoder_loss=0.2909, over 5807766.19 frames. ], batch size: 67, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:02:53,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=87400.0, ans=0.125 2024-09-16 22:02:56,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=87400.0, ans=0.125 2024-09-16 22:02:59,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=87400.0, ans=0.0 2024-09-16 22:03:06,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.35 vs. limit=15.0 2024-09-16 22:03:06,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2024-09-16 22:03:49,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=87560.0, ans=0.0 2024-09-16 22:03:57,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=87560.0, ans=0.025 2024-09-16 22:04:02,947 INFO [train.py:1198] (0/2) Epoch 5, batch 3800, loss[loss=0.3029, ctc_loss=0.2253, cr_loss=0.4539, attn_decoder_loss=0.3015, over 29630.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.219, cr_loss=0.4329, attn_decoder_loss=0.2908, over 5798270.30 frames. ], batch size: 86, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:04:07,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87600.0, ans=0.1 2024-09-16 22:04:38,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.217e+02 1.354e+02 1.572e+02 4.220e+02, threshold=2.708e+02, percent-clipped=3.0 2024-09-16 22:05:02,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=87760.0, ans=0.125 2024-09-16 22:05:06,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87760.0, ans=0.1 2024-09-16 22:05:17,042 INFO [train.py:1198] (0/2) Epoch 5, batch 3850, loss[loss=0.3217, ctc_loss=0.2476, cr_loss=0.4838, attn_decoder_loss=0.3192, over 29273.00 frames. ], tot_loss[loss=0.292, ctc_loss=0.2187, cr_loss=0.4329, attn_decoder_loss=0.2906, over 5813039.80 frames. ], batch size: 100, lr: 2.08e-02, grad_scale: 4.0 2024-09-16 22:05:20,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=87800.0, ans=0.125 2024-09-16 22:05:26,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-09-16 22:05:50,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=87880.0, ans=0.0 2024-09-16 22:06:03,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=87920.0, ans=0.0 2024-09-16 22:06:19,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87960.0, ans=0.1 2024-09-16 22:06:21,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.02 vs. limit=22.5 2024-09-16 22:06:28,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=87960.0, ans=0.125 2024-09-16 22:06:31,877 INFO [train.py:1198] (0/2) Epoch 5, batch 3900, loss[loss=0.2965, ctc_loss=0.2251, cr_loss=0.4382, attn_decoder_loss=0.2947, over 29638.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2192, cr_loss=0.4339, attn_decoder_loss=0.2913, over 5817462.64 frames. ], batch size: 86, lr: 2.07e-02, grad_scale: 8.0 2024-09-16 22:06:34,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.77 vs. limit=15.0 2024-09-16 22:06:57,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2024-09-16 22:06:58,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=88040.0, ans=0.125 2024-09-16 22:07:08,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.846e+01 1.141e+02 1.281e+02 1.435e+02 2.843e+02, threshold=2.562e+02, percent-clipped=1.0 2024-09-16 22:07:17,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=88120.0, ans=0.125 2024-09-16 22:07:47,171 INFO [train.py:1198] (0/2) Epoch 5, batch 3950, loss[loss=0.3155, ctc_loss=0.2384, cr_loss=0.4818, attn_decoder_loss=0.3133, over 29475.00 frames. ], tot_loss[loss=0.2926, ctc_loss=0.2184, cr_loss=0.433, attn_decoder_loss=0.2913, over 5836656.11 frames. ], batch size: 97, lr: 2.07e-02, grad_scale: 4.0 2024-09-16 22:08:12,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-09-16 22:08:21,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=88280.0, ans=0.0 2024-09-16 22:08:24,327 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:08:31,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=88320.0, ans=0.0 2024-09-16 22:08:43,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=88320.0, ans=0.2 2024-09-16 22:08:51,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.35 vs. limit=22.5 2024-09-16 22:08:52,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=88360.0, ans=0.125 2024-09-16 22:09:01,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=88400.0, ans=0.2 2024-09-16 22:09:02,282 INFO [train.py:1198] (0/2) Epoch 5, batch 4000, loss[loss=0.2603, ctc_loss=0.1848, cr_loss=0.3943, attn_decoder_loss=0.2599, over 29515.00 frames. ], tot_loss[loss=0.293, ctc_loss=0.2192, cr_loss=0.4338, attn_decoder_loss=0.2916, over 5812944.76 frames. ], batch size: 74, lr: 2.07e-02, grad_scale: 8.0 2024-09-16 22:09:17,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=88440.0, ans=12.0 2024-09-16 22:09:40,467 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.385e+01 1.172e+02 1.271e+02 1.397e+02 4.120e+02, threshold=2.542e+02, percent-clipped=3.0 2024-09-16 22:09:51,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88520.0, ans=0.1 2024-09-16 22:10:09,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2024-09-16 22:10:16,078 INFO [train.py:1198] (0/2) Epoch 5, batch 4050, loss[loss=0.3295, ctc_loss=0.287, cr_loss=0.4395, attn_decoder_loss=0.3245, over 20929.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2193, cr_loss=0.4333, attn_decoder_loss=0.2914, over 5797448.39 frames. ], batch size: 210, lr: 2.07e-02, grad_scale: 4.0 2024-09-16 22:10:23,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=88600.0, ans=0.125 2024-09-16 22:10:31,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=88640.0, ans=0.125 2024-09-16 22:10:45,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=88680.0, ans=0.125 2024-09-16 22:10:48,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=88680.0, ans=0.125 2024-09-16 22:11:16,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=88760.0, ans=0.125 2024-09-16 22:11:16,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=88760.0, ans=0.125 2024-09-16 22:11:25,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=88760.0, ans=0.09899494936611666 2024-09-16 22:11:30,111 INFO [train.py:1198] (0/2) Epoch 5, batch 4100, loss[loss=0.3119, ctc_loss=0.2347, cr_loss=0.4778, attn_decoder_loss=0.3098, over 29492.00 frames. ], tot_loss[loss=0.2931, ctc_loss=0.2199, cr_loss=0.4341, attn_decoder_loss=0.2915, over 5792626.77 frames. ], batch size: 90, lr: 2.07e-02, grad_scale: 8.0 2024-09-16 22:11:33,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=88800.0, ans=0.125 2024-09-16 22:11:37,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88800.0, ans=0.1 2024-09-16 22:11:39,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=88800.0, ans=0.0 2024-09-16 22:11:48,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-09-16 22:11:52,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=88840.0, ans=0.2 2024-09-16 22:11:57,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=88840.0, ans=0.0 2024-09-16 22:12:01,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=88880.0, ans=0.025 2024-09-16 22:12:11,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.506e+01 1.179e+02 1.301e+02 1.533e+02 3.400e+02, threshold=2.603e+02, percent-clipped=2.0 2024-09-16 22:12:15,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=88920.0, ans=0.025 2024-09-16 22:12:32,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88960.0, ans=0.1 2024-09-16 22:12:42,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=88960.0, ans=0.0 2024-09-16 22:12:45,306 INFO [train.py:1198] (0/2) Epoch 5, batch 4150, loss[loss=0.2938, ctc_loss=0.2149, cr_loss=0.4184, attn_decoder_loss=0.2933, over 29508.00 frames. ], tot_loss[loss=0.2921, ctc_loss=0.2188, cr_loss=0.4327, attn_decoder_loss=0.2906, over 5798571.41 frames. ], batch size: 77, lr: 2.06e-02, grad_scale: 4.0 2024-09-16 22:12:59,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=89040.0, ans=0.0 2024-09-16 22:13:08,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=89040.0, ans=0.0 2024-09-16 22:13:09,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2024-09-16 22:13:15,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=89080.0, ans=0.125 2024-09-16 22:13:17,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=89080.0, ans=0.0 2024-09-16 22:13:26,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=89080.0, ans=0.035 2024-09-16 22:13:59,806 INFO [train.py:1198] (0/2) Epoch 5, batch 4200, loss[loss=0.3275, ctc_loss=0.2623, cr_loss=0.4982, attn_decoder_loss=0.3237, over 29518.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2189, cr_loss=0.4328, attn_decoder_loss=0.291, over 5800459.03 frames. ], batch size: 90, lr: 2.06e-02, grad_scale: 8.0 2024-09-16 22:14:00,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=89200.0, ans=0.05 2024-09-16 22:14:01,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=89200.0, ans=0.125 2024-09-16 22:14:02,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=89200.0, ans=0.125 2024-09-16 22:14:13,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=89240.0, ans=0.2 2024-09-16 22:14:14,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=89240.0, ans=0.05 2024-09-16 22:14:41,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.772e+01 1.118e+02 1.246e+02 1.404e+02 2.463e+02, threshold=2.492e+02, percent-clipped=0.0 2024-09-16 22:15:08,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89360.0, ans=0.1 2024-09-16 22:15:12,895 INFO [train.py:1198] (0/2) Epoch 5, batch 4250, loss[loss=0.2561, ctc_loss=0.1859, cr_loss=0.3748, attn_decoder_loss=0.2555, over 29523.00 frames. ], tot_loss[loss=0.2923, ctc_loss=0.2182, cr_loss=0.4327, attn_decoder_loss=0.2909, over 5806937.85 frames. ], batch size: 74, lr: 2.06e-02, grad_scale: 4.0 2024-09-16 22:15:19,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=89400.0, ans=0.2 2024-09-16 22:15:21,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.61 vs. limit=15.0 2024-09-16 22:15:45,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=89480.0, ans=0.125 2024-09-16 22:15:45,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=89480.0, ans=0.2 2024-09-16 22:15:53,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-09-16 22:15:56,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89520.0, ans=0.125 2024-09-16 22:16:16,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=89560.0, ans=0.125 2024-09-16 22:16:27,391 INFO [train.py:1198] (0/2) Epoch 5, batch 4300, loss[loss=0.29, ctc_loss=0.2098, cr_loss=0.3975, attn_decoder_loss=0.2901, over 29523.00 frames. ], tot_loss[loss=0.2927, ctc_loss=0.2186, cr_loss=0.4331, attn_decoder_loss=0.2913, over 5796351.44 frames. ], batch size: 87, lr: 2.06e-02, grad_scale: 8.0 2024-09-16 22:16:27,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=89600.0, ans=0.125 2024-09-16 22:16:32,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=89600.0, ans=0.2 2024-09-16 22:16:38,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-09-16 22:16:52,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=89640.0, ans=0.125 2024-09-16 22:17:01,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.56 vs. limit=15.0 2024-09-16 22:17:08,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=89680.0, ans=0.125 2024-09-16 22:17:11,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.609e+01 1.163e+02 1.276e+02 1.524e+02 3.260e+02, threshold=2.552e+02, percent-clipped=3.0 2024-09-16 22:17:14,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89720.0, ans=0.1 2024-09-16 22:17:30,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=89760.0, ans=0.125 2024-09-16 22:17:42,370 INFO [train.py:1198] (0/2) Epoch 5, batch 4350, loss[loss=0.3087, ctc_loss=0.236, cr_loss=0.4377, attn_decoder_loss=0.3071, over 29467.00 frames. ], tot_loss[loss=0.2965, ctc_loss=0.222, cr_loss=0.4384, attn_decoder_loss=0.295, over 5798072.32 frames. ], batch size: 97, lr: 2.06e-02, grad_scale: 4.0 2024-09-16 22:17:46,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2024-09-16 22:18:02,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=89840.0, ans=0.125 2024-09-16 22:18:15,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=89880.0, ans=0.125 2024-09-16 22:18:37,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=89920.0, ans=0.025 2024-09-16 22:18:37,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=89920.0, ans=0.2 2024-09-16 22:18:56,342 INFO [train.py:1198] (0/2) Epoch 5, batch 4400, loss[loss=0.3149, ctc_loss=0.2453, cr_loss=0.4474, attn_decoder_loss=0.3127, over 27220.00 frames. ], tot_loss[loss=0.2988, ctc_loss=0.2242, cr_loss=0.4402, attn_decoder_loss=0.2973, over 5768003.78 frames. ], batch size: 124, lr: 2.05e-02, grad_scale: 8.0 2024-09-16 22:18:56,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=90000.0, ans=10.0 2024-09-16 22:19:29,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=90080.0, ans=0.0 2024-09-16 22:19:40,703 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.790e+01 1.097e+02 1.213e+02 1.417e+02 2.444e+02, threshold=2.426e+02, percent-clipped=0.0 2024-09-16 22:19:56,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.85 vs. limit=10.0 2024-09-16 22:20:10,762 INFO [train.py:1198] (0/2) Epoch 5, batch 4450, loss[loss=0.3262, ctc_loss=0.2807, cr_loss=0.4255, attn_decoder_loss=0.3218, over 19398.00 frames. ], tot_loss[loss=0.3027, ctc_loss=0.2308, cr_loss=0.4434, attn_decoder_loss=0.3008, over 5576475.78 frames. ], batch size: 209, lr: 2.05e-02, grad_scale: 4.0 2024-09-16 22:20:34,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=90240.0, ans=0.125 2024-09-16 22:20:35,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=90240.0, ans=0.0 2024-09-16 22:20:43,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=22.5 2024-09-16 22:21:10,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=90360.0, ans=0.125 2024-09-16 22:21:26,520 INFO [train.py:1198] (0/2) Epoch 5, batch 4500, loss[loss=0.3242, ctc_loss=0.2748, cr_loss=0.4516, attn_decoder_loss=0.3196, over 19957.00 frames. ], tot_loss[loss=0.3071, ctc_loss=0.2398, cr_loss=0.4451, attn_decoder_loss=0.3047, over 5236382.48 frames. ], batch size: 210, lr: 2.05e-02, grad_scale: 8.0 2024-09-16 22:21:28,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90400.0, ans=0.125 2024-09-16 22:21:31,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.82 vs. limit=15.0 2024-09-16 22:21:33,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2024-09-16 22:22:03,467 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-5.pt 2024-09-16 22:22:53,646 INFO [train.py:1198] (0/2) Epoch 6, batch 0, loss[loss=0.3298, ctc_loss=0.2105, cr_loss=0.4328, attn_decoder_loss=0.3334, over 29623.00 frames. ], tot_loss[loss=0.3298, ctc_loss=0.2105, cr_loss=0.4328, attn_decoder_loss=0.3334, over 29623.00 frames. ], batch size: 73, lr: 1.91e-02, grad_scale: 4.0 2024-09-16 22:22:53,647 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 22:22:59,667 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5120, 4.9777, 4.6828, 4.9199], device='cuda:0') 2024-09-16 22:23:11,939 INFO [train.py:1230] (0/2) Epoch 6, validation: loss=0.2379, ctc_loss=0.06988, cr_loss=4.72e-15, attn_decoder_loss=0.2566, over 944034.00 frames. 2024-09-16 22:23:11,939 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 22:23:13,423 WARNING [optim.py:503] (0/2) Scaling gradients by 0.0589279979467392, model_norm_threshold=242.58145141601562 2024-09-16 22:23:13,643 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.1.self_attn.linear_k.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.469e+06, grad_sumsq=5.019e+05, orig_rms_sq=8.904e+00 2024-09-16 22:23:21,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90500.0, ans=0.125 2024-09-16 22:23:22,772 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.192e+02 1.351e+02 1.731e+02 4.117e+03, threshold=2.703e+02, percent-clipped=9.0 2024-09-16 22:23:23,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=90500.0, ans=0.125 2024-09-16 22:23:29,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-16 22:23:35,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90540.0, ans=0.1 2024-09-16 22:23:35,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=90540.0, ans=0.125 2024-09-16 22:23:40,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=90540.0, ans=0.0 2024-09-16 22:23:52,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90580.0, ans=0.125 2024-09-16 22:23:59,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=90620.0, ans=0.2 2024-09-16 22:24:03,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-16 22:24:17,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=90660.0, ans=0.1 2024-09-16 22:24:17,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=90660.0, ans=0.125 2024-09-16 22:24:28,161 INFO [train.py:1198] (0/2) Epoch 6, batch 50, loss[loss=0.2572, ctc_loss=0.1852, cr_loss=0.3854, attn_decoder_loss=0.2567, over 29449.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.2244, cr_loss=0.4364, attn_decoder_loss=0.297, over 1267546.53 frames. ], batch size: 70, lr: 1.91e-02, grad_scale: 4.0 2024-09-16 22:24:29,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=90700.0, ans=0.125 2024-09-16 22:24:30,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=90700.0, ans=0.1 2024-09-16 22:24:36,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=90700.0, ans=0.2 2024-09-16 22:24:40,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=90700.0, ans=0.025 2024-09-16 22:24:43,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=90740.0, ans=0.125 2024-09-16 22:24:50,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-09-16 22:24:51,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=90740.0, ans=0.2 2024-09-16 22:25:23,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=90820.0, ans=0.125 2024-09-16 22:25:45,648 INFO [train.py:1198] (0/2) Epoch 6, batch 100, loss[loss=0.2812, ctc_loss=0.2166, cr_loss=0.4336, attn_decoder_loss=0.2788, over 29517.00 frames. ], tot_loss[loss=0.2979, ctc_loss=0.223, cr_loss=0.4385, attn_decoder_loss=0.2965, over 2251293.29 frames. ], batch size: 76, lr: 1.91e-02, grad_scale: 8.0 2024-09-16 22:25:57,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.583e+01 1.187e+02 1.367e+02 1.634e+02 6.216e+02, threshold=2.735e+02, percent-clipped=2.0 2024-09-16 22:26:00,704 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:26:24,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=90980.0, ans=0.0 2024-09-16 22:26:26,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=90980.0, ans=0.0 2024-09-16 22:26:51,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=91060.0, ans=0.125 2024-09-16 22:27:01,989 INFO [train.py:1198] (0/2) Epoch 6, batch 150, loss[loss=0.2533, ctc_loss=0.1777, cr_loss=0.3954, attn_decoder_loss=0.253, over 29461.00 frames. ], tot_loss[loss=0.2946, ctc_loss=0.2201, cr_loss=0.436, attn_decoder_loss=0.2932, over 3046386.01 frames. ], batch size: 70, lr: 1.91e-02, grad_scale: 4.0 2024-09-16 22:27:20,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91140.0, ans=0.1 2024-09-16 22:27:22,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=91140.0, ans=0.1 2024-09-16 22:27:25,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=91140.0, ans=0.2 2024-09-16 22:27:28,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91140.0, ans=0.1 2024-09-16 22:27:38,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=91180.0, ans=0.07 2024-09-16 22:27:50,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=91220.0, ans=0.5 2024-09-16 22:27:55,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=91220.0, ans=0.0 2024-09-16 22:27:55,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91220.0, ans=0.1 2024-09-16 22:28:00,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-16 22:28:16,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=91300.0, ans=0.0 2024-09-16 22:28:17,517 INFO [train.py:1198] (0/2) Epoch 6, batch 200, loss[loss=0.3061, ctc_loss=0.2364, cr_loss=0.4242, attn_decoder_loss=0.3044, over 27098.00 frames. ], tot_loss[loss=0.2925, ctc_loss=0.2176, cr_loss=0.4332, attn_decoder_loss=0.2913, over 3659059.59 frames. ], batch size: 124, lr: 1.90e-02, grad_scale: 8.0 2024-09-16 22:28:21,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2024-09-16 22:28:29,580 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.192e+01 1.064e+02 1.171e+02 1.354e+02 3.116e+02, threshold=2.342e+02, percent-clipped=1.0 2024-09-16 22:28:57,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=91380.0, ans=0.125 2024-09-16 22:29:09,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=91420.0, ans=0.07 2024-09-16 22:29:27,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91460.0, ans=0.125 2024-09-16 22:29:35,028 INFO [train.py:1198] (0/2) Epoch 6, batch 250, loss[loss=0.3082, ctc_loss=0.2392, cr_loss=0.4386, attn_decoder_loss=0.3061, over 29244.00 frames. ], tot_loss[loss=0.2918, ctc_loss=0.2162, cr_loss=0.4324, attn_decoder_loss=0.2906, over 4141090.76 frames. ], batch size: 100, lr: 1.90e-02, grad_scale: 4.0 2024-09-16 22:29:41,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91500.0, ans=0.1 2024-09-16 22:30:39,355 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:30:49,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91660.0, ans=0.1 2024-09-16 22:30:52,394 INFO [train.py:1198] (0/2) Epoch 6, batch 300, loss[loss=0.3067, ctc_loss=0.2385, cr_loss=0.4631, attn_decoder_loss=0.304, over 29540.00 frames. ], tot_loss[loss=0.2907, ctc_loss=0.215, cr_loss=0.4321, attn_decoder_loss=0.2895, over 4509657.22 frames. ], batch size: 92, lr: 1.90e-02, grad_scale: 8.0 2024-09-16 22:31:07,398 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.038e+01 1.116e+02 1.244e+02 1.492e+02 2.099e+02, threshold=2.488e+02, percent-clipped=0.0 2024-09-16 22:31:09,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=91740.0, ans=0.125 2024-09-16 22:31:13,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=91740.0, ans=0.125 2024-09-16 22:31:27,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.32 vs. limit=10.0 2024-09-16 22:31:33,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=91780.0, ans=0.0 2024-09-16 22:31:46,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=91820.0, ans=0.0 2024-09-16 22:32:07,878 INFO [train.py:1198] (0/2) Epoch 6, batch 350, loss[loss=0.2523, ctc_loss=0.1761, cr_loss=0.3735, attn_decoder_loss=0.2525, over 29309.00 frames. ], tot_loss[loss=0.291, ctc_loss=0.215, cr_loss=0.4327, attn_decoder_loss=0.2898, over 4794283.29 frames. ], batch size: 71, lr: 1.90e-02, grad_scale: 4.0 2024-09-16 22:32:32,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.23 vs. limit=15.0 2024-09-16 22:32:35,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91940.0, ans=0.125 2024-09-16 22:32:37,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.32 vs. limit=15.0 2024-09-16 22:32:52,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=91980.0, ans=0.125 2024-09-16 22:33:17,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=92060.0, ans=0.0 2024-09-16 22:33:19,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=92060.0, ans=0.0 2024-09-16 22:33:23,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92060.0, ans=0.1 2024-09-16 22:33:26,405 INFO [train.py:1198] (0/2) Epoch 6, batch 400, loss[loss=0.2894, ctc_loss=0.2176, cr_loss=0.4517, attn_decoder_loss=0.2873, over 29707.00 frames. ], tot_loss[loss=0.2901, ctc_loss=0.2138, cr_loss=0.4316, attn_decoder_loss=0.289, over 5024448.54 frames. ], batch size: 82, lr: 1.90e-02, grad_scale: 8.0 2024-09-16 22:33:38,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=92100.0, ans=0.0 2024-09-16 22:33:43,123 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.703e+01 1.115e+02 1.264e+02 1.415e+02 3.594e+02, threshold=2.527e+02, percent-clipped=2.0 2024-09-16 22:33:46,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=92140.0, ans=10.0 2024-09-16 22:34:01,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=92180.0, ans=0.2 2024-09-16 22:34:30,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=92260.0, ans=0.125 2024-09-16 22:34:45,093 INFO [train.py:1198] (0/2) Epoch 6, batch 450, loss[loss=0.301, ctc_loss=0.2198, cr_loss=0.4621, attn_decoder_loss=0.2998, over 29693.00 frames. ], tot_loss[loss=0.29, ctc_loss=0.214, cr_loss=0.432, attn_decoder_loss=0.2889, over 5184053.35 frames. ], batch size: 83, lr: 1.89e-02, grad_scale: 4.0 2024-09-16 22:34:49,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=92300.0, ans=0.125 2024-09-16 22:35:11,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.70 vs. limit=15.0 2024-09-16 22:35:12,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=92340.0, ans=0.0 2024-09-16 22:35:12,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=92340.0, ans=0.05 2024-09-16 22:35:40,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=92420.0, ans=0.125 2024-09-16 22:35:40,534 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:35:41,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92420.0, ans=0.1 2024-09-16 22:35:50,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=92460.0, ans=0.125 2024-09-16 22:35:56,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.40 vs. limit=22.5 2024-09-16 22:36:01,408 INFO [train.py:1198] (0/2) Epoch 6, batch 500, loss[loss=0.3241, ctc_loss=0.2505, cr_loss=0.4861, attn_decoder_loss=0.3215, over 29445.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.213, cr_loss=0.4307, attn_decoder_loss=0.288, over 5327380.28 frames. ], batch size: 94, lr: 1.89e-02, grad_scale: 8.0 2024-09-16 22:36:18,344 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.047e+01 1.094e+02 1.193e+02 1.318e+02 2.724e+02, threshold=2.387e+02, percent-clipped=2.0 2024-09-16 22:36:40,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=92580.0, ans=0.125 2024-09-16 22:36:50,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.63 vs. limit=15.0 2024-09-16 22:36:55,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2024-09-16 22:36:59,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.90 vs. limit=10.0 2024-09-16 22:37:02,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=92620.0, ans=0.125 2024-09-16 22:37:07,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.47 vs. limit=15.0 2024-09-16 22:37:20,293 INFO [train.py:1198] (0/2) Epoch 6, batch 550, loss[loss=0.2957, ctc_loss=0.2234, cr_loss=0.4255, attn_decoder_loss=0.2943, over 28881.00 frames. ], tot_loss[loss=0.2894, ctc_loss=0.2133, cr_loss=0.4314, attn_decoder_loss=0.2883, over 5421017.69 frames. ], batch size: 104, lr: 1.89e-02, grad_scale: 4.0 2024-09-16 22:38:03,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=92780.0, ans=0.125 2024-09-16 22:38:39,400 INFO [train.py:1198] (0/2) Epoch 6, batch 600, loss[loss=0.2999, ctc_loss=0.2273, cr_loss=0.4301, attn_decoder_loss=0.2984, over 29249.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.2133, cr_loss=0.4314, attn_decoder_loss=0.2885, over 5508726.32 frames. ], batch size: 100, lr: 1.89e-02, grad_scale: 8.0 2024-09-16 22:38:44,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=92900.0, ans=0.125 2024-09-16 22:38:48,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-09-16 22:38:49,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-09-16 22:38:53,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=92940.0, ans=0.1 2024-09-16 22:38:59,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.669e+01 1.124e+02 1.276e+02 1.446e+02 7.170e+02, threshold=2.552e+02, percent-clipped=2.0 2024-09-16 22:39:06,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=92940.0, ans=0.025 2024-09-16 22:39:13,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.64 vs. limit=15.0 2024-09-16 22:39:22,162 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:39:25,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=93020.0, ans=0.07 2024-09-16 22:39:46,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=93060.0, ans=0.125 2024-09-16 22:39:48,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93060.0, ans=0.1 2024-09-16 22:39:48,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93060.0, ans=0.1 2024-09-16 22:39:55,421 INFO [train.py:1198] (0/2) Epoch 6, batch 650, loss[loss=0.2855, ctc_loss=0.2123, cr_loss=0.4081, attn_decoder_loss=0.2845, over 29770.00 frames. ], tot_loss[loss=0.2884, ctc_loss=0.2119, cr_loss=0.4301, attn_decoder_loss=0.2873, over 5586050.77 frames. ], batch size: 81, lr: 1.89e-02, grad_scale: 4.0 2024-09-16 22:39:57,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=93100.0, ans=0.0 2024-09-16 22:40:06,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=93100.0, ans=0.125 2024-09-16 22:40:14,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=93140.0, ans=0.125 2024-09-16 22:40:46,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.34 vs. limit=22.5 2024-09-16 22:40:48,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=93220.0, ans=0.125 2024-09-16 22:40:54,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=93220.0, ans=0.0 2024-09-16 22:40:55,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=93220.0, ans=0.125 2024-09-16 22:41:05,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=93260.0, ans=0.0 2024-09-16 22:41:13,970 INFO [train.py:1198] (0/2) Epoch 6, batch 700, loss[loss=0.2842, ctc_loss=0.2146, cr_loss=0.4334, attn_decoder_loss=0.2823, over 29533.00 frames. ], tot_loss[loss=0.2897, ctc_loss=0.2131, cr_loss=0.4316, attn_decoder_loss=0.2886, over 5636807.70 frames. ], batch size: 76, lr: 1.89e-02, grad_scale: 8.0 2024-09-16 22:41:25,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=12.0 2024-09-16 22:41:27,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=93340.0, ans=0.0 2024-09-16 22:41:35,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.774e+01 1.081e+02 1.183e+02 1.296e+02 3.770e+02, threshold=2.365e+02, percent-clipped=2.0 2024-09-16 22:41:46,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=93380.0, ans=0.0 2024-09-16 22:41:53,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=93380.0, ans=0.125 2024-09-16 22:42:10,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93420.0, ans=0.0 2024-09-16 22:42:22,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=93460.0, ans=0.0 2024-09-16 22:42:32,786 INFO [train.py:1198] (0/2) Epoch 6, batch 750, loss[loss=0.2889, ctc_loss=0.2072, cr_loss=0.433, attn_decoder_loss=0.2883, over 29709.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2127, cr_loss=0.4319, attn_decoder_loss=0.2879, over 5676630.50 frames. ], batch size: 82, lr: 1.88e-02, grad_scale: 4.0 2024-09-16 22:42:37,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=93500.0, ans=0.2 2024-09-16 22:42:52,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=15.0 2024-09-16 22:43:17,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-09-16 22:43:49,718 INFO [train.py:1198] (0/2) Epoch 6, batch 800, loss[loss=0.271, ctc_loss=0.2022, cr_loss=0.4283, attn_decoder_loss=0.2691, over 29584.00 frames. ], tot_loss[loss=0.289, ctc_loss=0.2124, cr_loss=0.432, attn_decoder_loss=0.2879, over 5707342.02 frames. ], batch size: 73, lr: 1.88e-02, grad_scale: 8.0 2024-09-16 22:43:58,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.50 vs. limit=12.0 2024-09-16 22:44:12,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.997e+01 1.068e+02 1.156e+02 1.307e+02 3.410e+02, threshold=2.312e+02, percent-clipped=1.0 2024-09-16 22:44:21,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.89 vs. limit=15.0 2024-09-16 22:44:30,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2024-09-16 22:44:35,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=93820.0, ans=0.0 2024-09-16 22:44:52,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=93860.0, ans=0.0 2024-09-16 22:45:04,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2024-09-16 22:45:08,096 INFO [train.py:1198] (0/2) Epoch 6, batch 850, loss[loss=0.3093, ctc_loss=0.2226, cr_loss=0.4236, attn_decoder_loss=0.3095, over 29681.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2114, cr_loss=0.4305, attn_decoder_loss=0.2872, over 5736315.16 frames. ], batch size: 89, lr: 1.88e-02, grad_scale: 4.0 2024-09-16 22:45:17,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=93900.0, ans=0.1 2024-09-16 22:45:20,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-09-16 22:45:21,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=93940.0, ans=0.07 2024-09-16 22:45:37,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=93980.0, ans=0.125 2024-09-16 22:45:44,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=93980.0, ans=0.1 2024-09-16 22:45:47,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=93980.0, ans=0.0 2024-09-16 22:46:03,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94020.0, ans=0.0 2024-09-16 22:46:04,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94020.0, ans=0.1 2024-09-16 22:46:06,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=94020.0, ans=0.025 2024-09-16 22:46:22,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=94060.0, ans=0.2 2024-09-16 22:46:24,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=94060.0, ans=0.2 2024-09-16 22:46:27,094 INFO [train.py:1198] (0/2) Epoch 6, batch 900, loss[loss=0.2642, ctc_loss=0.1927, cr_loss=0.3979, attn_decoder_loss=0.2633, over 29558.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2118, cr_loss=0.4311, attn_decoder_loss=0.2875, over 5740180.13 frames. ], batch size: 73, lr: 1.88e-02, grad_scale: 8.0 2024-09-16 22:46:27,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=94100.0, ans=0.125 2024-09-16 22:46:34,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94100.0, ans=0.1 2024-09-16 22:46:37,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=94100.0, ans=0.025 2024-09-16 22:46:39,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=94100.0, ans=0.125 2024-09-16 22:46:49,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.705e+01 1.096e+02 1.207e+02 1.371e+02 3.827e+02, threshold=2.414e+02, percent-clipped=1.0 2024-09-16 22:46:59,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=94180.0, ans=0.125 2024-09-16 22:47:10,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=94180.0, ans=15.0 2024-09-16 22:47:17,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=94220.0, ans=0.125 2024-09-16 22:47:29,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=94260.0, ans=0.2 2024-09-16 22:47:29,794 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:47:35,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=94260.0, ans=0.0 2024-09-16 22:47:40,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=94260.0, ans=0.07 2024-09-16 22:47:42,987 INFO [train.py:1198] (0/2) Epoch 6, batch 950, loss[loss=0.2816, ctc_loss=0.2128, cr_loss=0.4482, attn_decoder_loss=0.2793, over 29480.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.2122, cr_loss=0.4307, attn_decoder_loss=0.2877, over 5742990.31 frames. ], batch size: 74, lr: 1.88e-02, grad_scale: 4.0 2024-09-16 22:47:52,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.58 vs. limit=15.0 2024-09-16 22:48:03,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-09-16 22:48:12,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=94380.0, ans=0.05 2024-09-16 22:48:28,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=94420.0, ans=0.2 2024-09-16 22:48:32,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=94420.0, ans=0.1 2024-09-16 22:49:01,838 INFO [train.py:1198] (0/2) Epoch 6, batch 1000, loss[loss=0.2725, ctc_loss=0.1865, cr_loss=0.3981, attn_decoder_loss=0.2732, over 29494.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.213, cr_loss=0.4315, attn_decoder_loss=0.2885, over 5736723.68 frames. ], batch size: 77, lr: 1.87e-02, grad_scale: 8.0 2024-09-16 22:49:09,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=94500.0, ans=0.125 2024-09-16 22:49:12,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=94500.0, ans=0.09899494936611666 2024-09-16 22:49:26,357 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.001e+01 1.144e+02 1.278e+02 1.441e+02 2.268e+02, threshold=2.556e+02, percent-clipped=0.0 2024-09-16 22:49:44,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.47 vs. limit=12.0 2024-09-16 22:50:08,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=94660.0, ans=0.2 2024-09-16 22:50:14,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94660.0, ans=0.1 2024-09-16 22:50:14,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=94660.0, ans=0.125 2024-09-16 22:50:17,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94660.0, ans=0.0 2024-09-16 22:50:19,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=94700.0, ans=0.0 2024-09-16 22:50:19,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=94700.0, ans=0.125 2024-09-16 22:50:20,455 INFO [train.py:1198] (0/2) Epoch 6, batch 1050, loss[loss=0.2963, ctc_loss=0.2165, cr_loss=0.4432, attn_decoder_loss=0.2954, over 29687.00 frames. ], tot_loss[loss=0.2884, ctc_loss=0.2117, cr_loss=0.4307, attn_decoder_loss=0.2873, over 5745743.46 frames. ], batch size: 85, lr: 1.87e-02, grad_scale: 4.0 2024-09-16 22:50:22,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=94700.0, ans=0.0 2024-09-16 22:50:57,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=94780.0, ans=0.2 2024-09-16 22:51:27,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=94860.0, ans=0.125 2024-09-16 22:51:36,711 INFO [train.py:1198] (0/2) Epoch 6, batch 1100, loss[loss=0.2786, ctc_loss=0.1972, cr_loss=0.4147, attn_decoder_loss=0.2784, over 29423.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2114, cr_loss=0.4298, attn_decoder_loss=0.287, over 5757555.52 frames. ], batch size: 78, lr: 1.87e-02, grad_scale: 8.0 2024-09-16 22:51:55,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-16 22:52:02,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=94940.0, ans=0.0 2024-09-16 22:52:03,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.796e+01 1.080e+02 1.185e+02 1.359e+02 3.091e+02, threshold=2.369e+02, percent-clipped=1.0 2024-09-16 22:52:29,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=95020.0, ans=0.1 2024-09-16 22:52:45,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=95060.0, ans=0.125 2024-09-16 22:52:51,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=95100.0, ans=0.2 2024-09-16 22:52:54,938 INFO [train.py:1198] (0/2) Epoch 6, batch 1150, loss[loss=0.2924, ctc_loss=0.2074, cr_loss=0.4412, attn_decoder_loss=0.292, over 29485.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.211, cr_loss=0.4289, attn_decoder_loss=0.2869, over 5757667.42 frames. ], batch size: 78, lr: 1.87e-02, grad_scale: 4.0 2024-09-16 22:53:11,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=15.0 2024-09-16 22:53:16,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=95140.0, ans=0.125 2024-09-16 22:53:37,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-09-16 22:53:40,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-09-16 22:53:48,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=95220.0, ans=0.125 2024-09-16 22:53:55,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2024-09-16 22:54:14,541 INFO [train.py:1198] (0/2) Epoch 6, batch 1200, loss[loss=0.3048, ctc_loss=0.2245, cr_loss=0.4642, attn_decoder_loss=0.3034, over 29665.00 frames. ], tot_loss[loss=0.2886, ctc_loss=0.2118, cr_loss=0.4295, attn_decoder_loss=0.2876, over 5749432.52 frames. ], batch size: 85, lr: 1.87e-02, grad_scale: 8.0 2024-09-16 22:54:14,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=95300.0, ans=0.5 2024-09-16 22:54:39,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95340.0, ans=0.1 2024-09-16 22:54:43,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.331e+01 1.110e+02 1.224e+02 1.490e+02 4.215e+02, threshold=2.447e+02, percent-clipped=3.0 2024-09-16 22:55:05,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=95420.0, ans=0.125 2024-09-16 22:55:30,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=95500.0, ans=0.125 2024-09-16 22:55:31,311 INFO [train.py:1198] (0/2) Epoch 6, batch 1250, loss[loss=0.2996, ctc_loss=0.2228, cr_loss=0.4511, attn_decoder_loss=0.2981, over 29520.00 frames. ], tot_loss[loss=0.2893, ctc_loss=0.2122, cr_loss=0.4313, attn_decoder_loss=0.2883, over 5777315.75 frames. ], batch size: 92, lr: 1.87e-02, grad_scale: 4.0 2024-09-16 22:55:38,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-09-16 22:56:22,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=95620.0, ans=0.125 2024-09-16 22:56:26,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=95620.0, ans=0.0 2024-09-16 22:56:28,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=95620.0, ans=0.0 2024-09-16 22:56:33,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=95660.0, ans=0.025 2024-09-16 22:56:47,926 INFO [train.py:1198] (0/2) Epoch 6, batch 1300, loss[loss=0.3005, ctc_loss=0.2184, cr_loss=0.4297, attn_decoder_loss=0.3001, over 28204.00 frames. ], tot_loss[loss=0.2884, ctc_loss=0.2115, cr_loss=0.4305, attn_decoder_loss=0.2874, over 5782046.21 frames. ], batch size: 111, lr: 1.86e-02, grad_scale: 8.0 2024-09-16 22:56:56,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=95700.0, ans=0.95 2024-09-16 22:57:09,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2024-09-16 22:57:13,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=95740.0, ans=0.0 2024-09-16 22:57:20,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.424e+01 1.067e+02 1.141e+02 1.259e+02 1.965e+02, threshold=2.283e+02, percent-clipped=0.0 2024-09-16 22:57:21,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.27 vs. limit=10.0 2024-09-16 22:57:40,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2024-09-16 22:58:01,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=95860.0, ans=0.0 2024-09-16 22:58:09,241 INFO [train.py:1198] (0/2) Epoch 6, batch 1350, loss[loss=0.274, ctc_loss=0.1891, cr_loss=0.4178, attn_decoder_loss=0.2741, over 29751.00 frames. ], tot_loss[loss=0.2878, ctc_loss=0.2104, cr_loss=0.4296, attn_decoder_loss=0.2868, over 5797928.28 frames. ], batch size: 81, lr: 1.86e-02, grad_scale: 4.0 2024-09-16 22:58:41,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.97 vs. limit=15.0 2024-09-16 22:58:46,133 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-24000.pt 2024-09-16 22:59:20,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=96060.0, ans=0.125 2024-09-16 22:59:24,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=96060.0, ans=0.0 2024-09-16 22:59:25,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96060.0, ans=0.1 2024-09-16 22:59:33,027 INFO [train.py:1198] (0/2) Epoch 6, batch 1400, loss[loss=0.2663, ctc_loss=0.1939, cr_loss=0.3981, attn_decoder_loss=0.2655, over 29572.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2099, cr_loss=0.4288, attn_decoder_loss=0.2865, over 5809564.32 frames. ], batch size: 69, lr: 1.86e-02, grad_scale: 8.0 2024-09-16 22:59:40,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=96100.0, ans=0.07 2024-09-16 22:59:54,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=96140.0, ans=0.125 2024-09-16 23:00:05,175 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.329e+01 1.115e+02 1.239e+02 1.357e+02 3.096e+02, threshold=2.478e+02, percent-clipped=1.0 2024-09-16 23:00:08,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=96180.0, ans=0.025 2024-09-16 23:00:21,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=96220.0, ans=0.025 2024-09-16 23:00:40,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=96260.0, ans=0.125 2024-09-16 23:00:43,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=96260.0, ans=0.2 2024-09-16 23:00:48,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=96300.0, ans=0.2 2024-09-16 23:00:49,733 INFO [train.py:1198] (0/2) Epoch 6, batch 1450, loss[loss=0.3074, ctc_loss=0.2303, cr_loss=0.4587, attn_decoder_loss=0.3058, over 29430.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2103, cr_loss=0.4291, attn_decoder_loss=0.287, over 5804583.60 frames. ], batch size: 94, lr: 1.86e-02, grad_scale: 4.0 2024-09-16 23:00:54,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=96300.0, ans=0.125 2024-09-16 23:00:55,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2024-09-16 23:01:04,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.18 vs. limit=15.0 2024-09-16 23:01:23,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=96380.0, ans=0.125 2024-09-16 23:01:28,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=96380.0, ans=0.125 2024-09-16 23:01:51,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=96460.0, ans=0.0 2024-09-16 23:02:07,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=96460.0, ans=0.125 2024-09-16 23:02:10,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-16 23:02:10,328 INFO [train.py:1198] (0/2) Epoch 6, batch 1500, loss[loss=0.2947, ctc_loss=0.2121, cr_loss=0.4467, attn_decoder_loss=0.294, over 29640.00 frames. ], tot_loss[loss=0.2883, ctc_loss=0.2105, cr_loss=0.4299, attn_decoder_loss=0.2874, over 5805757.32 frames. ], batch size: 86, lr: 1.86e-02, grad_scale: 8.0 2024-09-16 23:02:44,675 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.157e+01 1.117e+02 1.199e+02 1.410e+02 2.285e+02, threshold=2.399e+02, percent-clipped=0.0 2024-09-16 23:03:02,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=96620.0, ans=0.125 2024-09-16 23:03:27,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=96700.0, ans=0.125 2024-09-16 23:03:28,275 INFO [train.py:1198] (0/2) Epoch 6, batch 1550, loss[loss=0.3103, ctc_loss=0.2401, cr_loss=0.4475, attn_decoder_loss=0.3081, over 29509.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2106, cr_loss=0.4292, attn_decoder_loss=0.2872, over 5781479.98 frames. ], batch size: 90, lr: 1.85e-02, grad_scale: 4.0 2024-09-16 23:03:35,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-16 23:03:36,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=96700.0, ans=0.125 2024-09-16 23:03:37,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96700.0, ans=0.1 2024-09-16 23:03:45,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-09-16 23:03:55,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.83 vs. limit=22.5 2024-09-16 23:04:00,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=96780.0, ans=0.2 2024-09-16 23:04:03,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-16 23:04:13,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=96820.0, ans=0.025 2024-09-16 23:04:17,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=96820.0, ans=0.125 2024-09-16 23:04:45,350 INFO [train.py:1198] (0/2) Epoch 6, batch 1600, loss[loss=0.2907, ctc_loss=0.2021, cr_loss=0.4403, attn_decoder_loss=0.2908, over 29663.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2108, cr_loss=0.4299, attn_decoder_loss=0.287, over 5763810.48 frames. ], batch size: 85, lr: 1.85e-02, grad_scale: 8.0 2024-09-16 23:04:49,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2024-09-16 23:04:51,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=96900.0, ans=0.125 2024-09-16 23:04:51,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=96900.0, ans=0.125 2024-09-16 23:05:11,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-16 23:05:22,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2024-09-16 23:05:22,804 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.461e+01 1.097e+02 1.251e+02 1.445e+02 2.140e+02, threshold=2.501e+02, percent-clipped=0.0 2024-09-16 23:05:36,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97020.0, ans=0.125 2024-09-16 23:05:54,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=97060.0, ans=0.125 2024-09-16 23:05:59,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=97060.0, ans=15.0 2024-09-16 23:06:04,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=97060.0, ans=0.0 2024-09-16 23:06:06,650 INFO [train.py:1198] (0/2) Epoch 6, batch 1650, loss[loss=0.2911, ctc_loss=0.2054, cr_loss=0.4353, attn_decoder_loss=0.2909, over 29735.00 frames. ], tot_loss[loss=0.288, ctc_loss=0.2109, cr_loss=0.43, attn_decoder_loss=0.287, over 5758371.47 frames. ], batch size: 89, lr: 1.85e-02, grad_scale: 4.0 2024-09-16 23:06:31,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=97140.0, ans=0.025 2024-09-16 23:06:37,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=97180.0, ans=0.125 2024-09-16 23:06:47,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=97180.0, ans=0.125 2024-09-16 23:06:52,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-09-16 23:07:19,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=97260.0, ans=0.2 2024-09-16 23:07:23,290 INFO [train.py:1198] (0/2) Epoch 6, batch 1700, loss[loss=0.2613, ctc_loss=0.1912, cr_loss=0.4063, attn_decoder_loss=0.2601, over 29575.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2101, cr_loss=0.4296, attn_decoder_loss=0.2866, over 5779274.26 frames. ], batch size: 69, lr: 1.85e-02, grad_scale: 8.0 2024-09-16 23:07:30,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-09-16 23:07:49,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=97340.0, ans=0.125 2024-09-16 23:07:51,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=97340.0, ans=0.125 2024-09-16 23:08:00,041 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.175e+01 1.040e+02 1.164e+02 1.267e+02 1.903e+02, threshold=2.329e+02, percent-clipped=0.0 2024-09-16 23:08:27,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=97460.0, ans=0.125 2024-09-16 23:08:29,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=97460.0, ans=0.125 2024-09-16 23:08:39,834 INFO [train.py:1198] (0/2) Epoch 6, batch 1750, loss[loss=0.2434, ctc_loss=0.1699, cr_loss=0.371, attn_decoder_loss=0.2433, over 29335.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2098, cr_loss=0.4297, attn_decoder_loss=0.2864, over 5788221.78 frames. ], batch size: 67, lr: 1.85e-02, grad_scale: 4.0 2024-09-16 23:08:51,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-09-16 23:08:52,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=97500.0, ans=0.125 2024-09-16 23:08:52,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-16 23:08:59,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=97540.0, ans=0.2 2024-09-16 23:09:02,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-16 23:09:32,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-16 23:09:47,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.47 vs. limit=12.0 2024-09-16 23:09:57,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=97660.0, ans=0.0 2024-09-16 23:10:01,593 INFO [train.py:1198] (0/2) Epoch 6, batch 1800, loss[loss=0.3136, ctc_loss=0.2344, cr_loss=0.472, attn_decoder_loss=0.312, over 29681.00 frames. ], tot_loss[loss=0.2876, ctc_loss=0.2101, cr_loss=0.4299, attn_decoder_loss=0.2867, over 5791212.52 frames. ], batch size: 83, lr: 1.85e-02, grad_scale: 8.0 2024-09-16 23:10:17,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=97740.0, ans=0.025 2024-09-16 23:10:39,658 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.002e+01 1.085e+02 1.174e+02 1.306e+02 4.568e+02, threshold=2.348e+02, percent-clipped=1.0 2024-09-16 23:10:46,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=97820.0, ans=0.125 2024-09-16 23:10:46,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.10 vs. limit=15.0 2024-09-16 23:10:47,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=97820.0, ans=0.125 2024-09-16 23:11:04,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=12.0 2024-09-16 23:11:12,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=97860.0, ans=0.1 2024-09-16 23:11:18,205 INFO [train.py:1198] (0/2) Epoch 6, batch 1850, loss[loss=0.299, ctc_loss=0.22, cr_loss=0.4457, attn_decoder_loss=0.2978, over 29651.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2094, cr_loss=0.4287, attn_decoder_loss=0.2862, over 5796678.80 frames. ], batch size: 86, lr: 1.84e-02, grad_scale: 4.0 2024-09-16 23:11:23,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=97900.0, ans=0.0 2024-09-16 23:11:24,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=97900.0, ans=0.0 2024-09-16 23:11:26,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-09-16 23:11:32,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=97940.0, ans=0.1 2024-09-16 23:11:50,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=97980.0, ans=0.125 2024-09-16 23:12:10,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=98020.0, ans=0.125 2024-09-16 23:12:22,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=98060.0, ans=0.0 2024-09-16 23:12:33,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=98100.0, ans=0.2 2024-09-16 23:12:34,394 INFO [train.py:1198] (0/2) Epoch 6, batch 1900, loss[loss=0.2933, ctc_loss=0.2112, cr_loss=0.4306, attn_decoder_loss=0.2929, over 29704.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2099, cr_loss=0.4297, attn_decoder_loss=0.287, over 5804275.63 frames. ], batch size: 89, lr: 1.84e-02, grad_scale: 8.0 2024-09-16 23:12:49,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=98140.0, ans=0.2 2024-09-16 23:13:00,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=98140.0, ans=0.0 2024-09-16 23:13:08,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2024-09-16 23:13:10,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=98180.0, ans=0.125 2024-09-16 23:13:16,230 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.164e+01 1.098e+02 1.206e+02 1.393e+02 1.994e+02, threshold=2.412e+02, percent-clipped=0.0 2024-09-16 23:13:17,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-09-16 23:13:24,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=98220.0, ans=0.0 2024-09-16 23:13:49,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=98260.0, ans=0.0 2024-09-16 23:13:55,787 INFO [train.py:1198] (0/2) Epoch 6, batch 1950, loss[loss=0.2948, ctc_loss=0.2181, cr_loss=0.4641, attn_decoder_loss=0.293, over 29465.00 frames. ], tot_loss[loss=0.2895, ctc_loss=0.2113, cr_loss=0.4323, attn_decoder_loss=0.2886, over 5819075.19 frames. ], batch size: 78, lr: 1.84e-02, grad_scale: 4.0 2024-09-16 23:14:02,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=98300.0, ans=0.0 2024-09-16 23:14:13,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=98340.0, ans=0.2 2024-09-16 23:14:22,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=98340.0, ans=0.0 2024-09-16 23:14:50,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=98420.0, ans=0.125 2024-09-16 23:14:52,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.00 vs. limit=22.5 2024-09-16 23:14:52,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=98420.0, ans=15.0 2024-09-16 23:15:13,562 INFO [train.py:1198] (0/2) Epoch 6, batch 2000, loss[loss=0.2615, ctc_loss=0.19, cr_loss=0.4243, attn_decoder_loss=0.2601, over 29327.00 frames. ], tot_loss[loss=0.2898, ctc_loss=0.2115, cr_loss=0.432, attn_decoder_loss=0.2888, over 5797138.94 frames. ], batch size: 67, lr: 1.84e-02, grad_scale: 8.0 2024-09-16 23:15:17,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.49 vs. limit=10.0 2024-09-16 23:15:20,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98500.0, ans=0.1 2024-09-16 23:15:55,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.337e+01 1.180e+02 1.301e+02 1.522e+02 2.715e+02, threshold=2.602e+02, percent-clipped=3.0 2024-09-16 23:15:58,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=98620.0, ans=0.0 2024-09-16 23:16:03,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98620.0, ans=0.1 2024-09-16 23:16:13,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=98660.0, ans=0.125 2024-09-16 23:16:29,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=98700.0, ans=0.125 2024-09-16 23:16:30,526 INFO [train.py:1198] (0/2) Epoch 6, batch 2050, loss[loss=0.2679, ctc_loss=0.1925, cr_loss=0.4249, attn_decoder_loss=0.2668, over 29417.00 frames. ], tot_loss[loss=0.2885, ctc_loss=0.2106, cr_loss=0.4306, attn_decoder_loss=0.2876, over 5789478.20 frames. ], batch size: 70, lr: 1.84e-02, grad_scale: 4.0 2024-09-16 23:16:46,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=98740.0, ans=0.125 2024-09-16 23:17:04,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=98780.0, ans=22.5 2024-09-16 23:17:15,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=98780.0, ans=0.2 2024-09-16 23:17:36,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=98860.0, ans=0.2 2024-09-16 23:17:38,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=98860.0, ans=10.0 2024-09-16 23:17:43,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=98860.0, ans=0.2 2024-09-16 23:17:43,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.82 vs. limit=22.5 2024-09-16 23:17:52,105 INFO [train.py:1198] (0/2) Epoch 6, batch 2100, loss[loss=0.2955, ctc_loss=0.2177, cr_loss=0.4486, attn_decoder_loss=0.2942, over 29745.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2091, cr_loss=0.4294, attn_decoder_loss=0.2864, over 5801284.40 frames. ], batch size: 81, lr: 1.84e-02, grad_scale: 8.0 2024-09-16 23:18:01,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=98900.0, ans=0.125 2024-09-16 23:18:02,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=98900.0, ans=0.125 2024-09-16 23:18:06,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-09-16 23:18:09,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98940.0, ans=0.1 2024-09-16 23:18:19,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=98940.0, ans=0.025 2024-09-16 23:18:28,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=98980.0, ans=0.0 2024-09-16 23:18:31,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2024-09-16 23:18:34,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.717e+01 1.049e+02 1.121e+02 1.246e+02 2.037e+02, threshold=2.242e+02, percent-clipped=0.0 2024-09-16 23:18:39,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-09-16 23:19:08,355 INFO [train.py:1198] (0/2) Epoch 6, batch 2150, loss[loss=0.2854, ctc_loss=0.2104, cr_loss=0.4161, attn_decoder_loss=0.2844, over 29453.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2076, cr_loss=0.4275, attn_decoder_loss=0.2854, over 5815283.94 frames. ], batch size: 78, lr: 1.83e-02, grad_scale: 4.0 2024-09-16 23:19:14,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=99100.0, ans=0.125 2024-09-16 23:20:08,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2024-09-16 23:20:09,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=99260.0, ans=0.0 2024-09-16 23:20:12,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=99260.0, ans=0.125 2024-09-16 23:20:25,666 INFO [train.py:1198] (0/2) Epoch 6, batch 2200, loss[loss=0.3036, ctc_loss=0.2273, cr_loss=0.4463, attn_decoder_loss=0.3022, over 29619.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.208, cr_loss=0.4276, attn_decoder_loss=0.2857, over 5811673.09 frames. ], batch size: 86, lr: 1.83e-02, grad_scale: 8.0 2024-09-16 23:20:40,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-09-16 23:20:52,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=99340.0, ans=0.07 2024-09-16 23:21:12,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.295e+01 1.080e+02 1.191e+02 1.298e+02 2.659e+02, threshold=2.382e+02, percent-clipped=1.0 2024-09-16 23:21:14,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=99420.0, ans=0.2 2024-09-16 23:21:14,123 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:21:14,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=99420.0, ans=0.1 2024-09-16 23:21:45,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=99500.0, ans=0.0 2024-09-16 23:21:46,865 INFO [train.py:1198] (0/2) Epoch 6, batch 2250, loss[loss=0.303, ctc_loss=0.2261, cr_loss=0.4892, attn_decoder_loss=0.3006, over 29690.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2077, cr_loss=0.4276, attn_decoder_loss=0.2856, over 5811280.92 frames. ], batch size: 82, lr: 1.83e-02, grad_scale: 4.0 2024-09-16 23:21:56,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=99500.0, ans=0.2 2024-09-16 23:21:59,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=99500.0, ans=0.125 2024-09-16 23:22:10,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=99540.0, ans=10.0 2024-09-16 23:22:11,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=99540.0, ans=0.025 2024-09-16 23:22:17,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=99580.0, ans=0.125 2024-09-16 23:22:32,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99620.0, ans=0.1 2024-09-16 23:22:34,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=99620.0, ans=0.0 2024-09-16 23:22:35,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=99620.0, ans=0.1 2024-09-16 23:22:38,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=99620.0, ans=0.2 2024-09-16 23:23:02,874 INFO [train.py:1198] (0/2) Epoch 6, batch 2300, loss[loss=0.2578, ctc_loss=0.1833, cr_loss=0.4123, attn_decoder_loss=0.2569, over 29319.00 frames. ], tot_loss[loss=0.2861, ctc_loss=0.2081, cr_loss=0.4273, attn_decoder_loss=0.2853, over 5799364.24 frames. ], batch size: 71, lr: 1.83e-02, grad_scale: 8.0 2024-09-16 23:23:04,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99700.0, ans=0.1 2024-09-16 23:23:30,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=99740.0, ans=0.125 2024-09-16 23:23:30,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=99740.0, ans=0.1 2024-09-16 23:23:49,098 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.439e+01 1.127e+02 1.220e+02 1.323e+02 2.863e+02, threshold=2.441e+02, percent-clipped=2.0 2024-09-16 23:23:52,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=99820.0, ans=0.125 2024-09-16 23:24:19,889 INFO [train.py:1198] (0/2) Epoch 6, batch 2350, loss[loss=0.2907, ctc_loss=0.2148, cr_loss=0.4519, attn_decoder_loss=0.2891, over 29686.00 frames. ], tot_loss[loss=0.2857, ctc_loss=0.2075, cr_loss=0.4272, attn_decoder_loss=0.2849, over 5806094.23 frames. ], batch size: 83, lr: 1.83e-02, grad_scale: 4.0 2024-09-16 23:24:27,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=99900.0, ans=0.125 2024-09-16 23:24:45,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2024-09-16 23:24:55,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=99980.0, ans=0.0 2024-09-16 23:25:04,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2024-09-16 23:25:41,910 INFO [train.py:1198] (0/2) Epoch 6, batch 2400, loss[loss=0.2697, ctc_loss=0.1859, cr_loss=0.3685, attn_decoder_loss=0.2708, over 29533.00 frames. ], tot_loss[loss=0.2863, ctc_loss=0.208, cr_loss=0.4276, attn_decoder_loss=0.2855, over 5808869.06 frames. ], batch size: 76, lr: 1.83e-02, grad_scale: 8.0 2024-09-16 23:25:43,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=100100.0, ans=0.125 2024-09-16 23:25:53,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=100100.0, ans=0.2 2024-09-16 23:26:29,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.345e+01 1.104e+02 1.208e+02 1.363e+02 5.197e+02, threshold=2.416e+02, percent-clipped=3.0 2024-09-16 23:26:30,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-16 23:26:31,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=100220.0, ans=0.125 2024-09-16 23:26:31,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100220.0, ans=0.1 2024-09-16 23:26:58,937 INFO [train.py:1198] (0/2) Epoch 6, batch 2450, loss[loss=0.283, ctc_loss=0.2008, cr_loss=0.4125, attn_decoder_loss=0.2829, over 29716.00 frames. ], tot_loss[loss=0.2874, ctc_loss=0.2091, cr_loss=0.429, attn_decoder_loss=0.2866, over 5784807.03 frames. ], batch size: 82, lr: 1.82e-02, grad_scale: 4.0 2024-09-16 23:27:09,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=100300.0, ans=0.125 2024-09-16 23:27:18,261 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2024-09-16 23:27:31,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=100380.0, ans=0.0 2024-09-16 23:28:02,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100460.0, ans=0.1 2024-09-16 23:28:11,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=100460.0, ans=0.025 2024-09-16 23:28:15,744 INFO [train.py:1198] (0/2) Epoch 6, batch 2500, loss[loss=0.2846, ctc_loss=0.1976, cr_loss=0.4109, attn_decoder_loss=0.2851, over 29629.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2085, cr_loss=0.4282, attn_decoder_loss=0.2862, over 5795519.57 frames. ], batch size: 86, lr: 1.82e-02, grad_scale: 8.0 2024-09-16 23:28:20,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-09-16 23:28:20,707 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:28:29,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=100540.0, ans=0.0 2024-09-16 23:28:36,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=100540.0, ans=0.125 2024-09-16 23:28:43,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100540.0, ans=0.125 2024-09-16 23:28:49,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=100580.0, ans=0.125 2024-09-16 23:29:04,964 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.690e+01 1.098e+02 1.228e+02 1.415e+02 3.536e+02, threshold=2.457e+02, percent-clipped=1.0 2024-09-16 23:29:10,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=100620.0, ans=0.125 2024-09-16 23:29:36,830 INFO [train.py:1198] (0/2) Epoch 6, batch 2550, loss[loss=0.2524, ctc_loss=0.177, cr_loss=0.3775, attn_decoder_loss=0.2523, over 29330.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2087, cr_loss=0.4292, attn_decoder_loss=0.2863, over 5796892.36 frames. ], batch size: 67, lr: 1.82e-02, grad_scale: 4.0 2024-09-16 23:29:37,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=100700.0, ans=0.1 2024-09-16 23:29:41,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=100700.0, ans=0.1 2024-09-16 23:29:44,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=100700.0, ans=0.2 2024-09-16 23:29:47,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=100700.0, ans=0.0 2024-09-16 23:30:15,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=100780.0, ans=0.0 2024-09-16 23:30:43,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=100860.0, ans=0.2 2024-09-16 23:30:50,061 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:30:54,115 INFO [train.py:1198] (0/2) Epoch 6, batch 2600, loss[loss=0.2802, ctc_loss=0.2029, cr_loss=0.432, attn_decoder_loss=0.2792, over 29453.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2092, cr_loss=0.4292, attn_decoder_loss=0.2866, over 5792815.59 frames. ], batch size: 78, lr: 1.82e-02, grad_scale: 8.0 2024-09-16 23:31:00,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=100900.0, ans=10.0 2024-09-16 23:31:15,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=100940.0, ans=0.04949747468305833 2024-09-16 23:31:17,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=100940.0, ans=0.0 2024-09-16 23:31:44,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.431e+01 1.047e+02 1.098e+02 1.263e+02 2.416e+02, threshold=2.197e+02, percent-clipped=0.0 2024-09-16 23:31:45,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.66 vs. limit=15.0 2024-09-16 23:31:49,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=101020.0, ans=0.07 2024-09-16 23:31:52,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.72 vs. limit=10.0 2024-09-16 23:32:01,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=101060.0, ans=0.125 2024-09-16 23:32:08,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101100.0, ans=0.1 2024-09-16 23:32:10,133 INFO [train.py:1198] (0/2) Epoch 6, batch 2650, loss[loss=0.3016, ctc_loss=0.2213, cr_loss=0.4523, attn_decoder_loss=0.3005, over 29316.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2088, cr_loss=0.4294, attn_decoder_loss=0.2865, over 5799544.86 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 4.0 2024-09-16 23:32:10,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101100.0, ans=0.1 2024-09-16 23:32:19,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=101100.0, ans=0.1 2024-09-16 23:32:19,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=101100.0, ans=0.125 2024-09-16 23:33:04,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=101220.0, ans=0.125 2024-09-16 23:33:31,357 INFO [train.py:1198] (0/2) Epoch 6, batch 2700, loss[loss=0.2947, ctc_loss=0.2115, cr_loss=0.4618, attn_decoder_loss=0.2937, over 29499.00 frames. ], tot_loss[loss=0.2881, ctc_loss=0.2094, cr_loss=0.4305, attn_decoder_loss=0.2872, over 5796551.04 frames. ], batch size: 87, lr: 1.82e-02, grad_scale: 8.0 2024-09-16 23:33:48,544 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:33:56,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=101340.0, ans=0.125 2024-09-16 23:33:56,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=101340.0, ans=0.125 2024-09-16 23:34:04,463 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=17.54 vs. limit=15.0 2024-09-16 23:34:23,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.471e+01 1.102e+02 1.222e+02 1.380e+02 2.898e+02, threshold=2.443e+02, percent-clipped=1.0 2024-09-16 23:34:33,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=101460.0, ans=0.125 2024-09-16 23:34:48,556 INFO [train.py:1198] (0/2) Epoch 6, batch 2750, loss[loss=0.2821, ctc_loss=0.2065, cr_loss=0.4325, attn_decoder_loss=0.2809, over 29524.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2079, cr_loss=0.4282, attn_decoder_loss=0.2856, over 5795026.51 frames. ], batch size: 75, lr: 1.81e-02, grad_scale: 4.0 2024-09-16 23:35:46,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=101620.0, ans=0.125 2024-09-16 23:35:57,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=101660.0, ans=0.0 2024-09-16 23:35:57,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=101660.0, ans=15.0 2024-09-16 23:36:06,176 INFO [train.py:1198] (0/2) Epoch 6, batch 2800, loss[loss=0.3237, ctc_loss=0.2719, cr_loss=0.4323, attn_decoder_loss=0.3199, over 20423.00 frames. ], tot_loss[loss=0.2867, ctc_loss=0.2081, cr_loss=0.4274, attn_decoder_loss=0.2859, over 5775867.51 frames. ], batch size: 210, lr: 1.81e-02, grad_scale: 8.0 2024-09-16 23:36:14,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=101700.0, ans=0.125 2024-09-16 23:36:15,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=101700.0, ans=0.0 2024-09-16 23:36:46,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=101780.0, ans=0.125 2024-09-16 23:36:48,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=101780.0, ans=0.125 2024-09-16 23:37:04,392 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.339e+01 1.139e+02 1.318e+02 1.529e+02 2.693e+02, threshold=2.635e+02, percent-clipped=4.0 2024-09-16 23:37:13,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=101860.0, ans=0.125 2024-09-16 23:37:14,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2024-09-16 23:37:27,426 INFO [train.py:1198] (0/2) Epoch 6, batch 2850, loss[loss=0.2858, ctc_loss=0.2152, cr_loss=0.4785, attn_decoder_loss=0.283, over 29460.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2098, cr_loss=0.4297, attn_decoder_loss=0.287, over 5762817.06 frames. ], batch size: 77, lr: 1.81e-02, grad_scale: 4.0 2024-09-16 23:37:33,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=101900.0, ans=0.0 2024-09-16 23:37:36,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=15.0 2024-09-16 23:37:39,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-09-16 23:38:02,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=101980.0, ans=0.125 2024-09-16 23:38:09,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=101980.0, ans=0.2 2024-09-16 23:38:43,937 INFO [train.py:1198] (0/2) Epoch 6, batch 2900, loss[loss=0.2901, ctc_loss=0.2105, cr_loss=0.4487, attn_decoder_loss=0.289, over 29400.00 frames. ], tot_loss[loss=0.2886, ctc_loss=0.2097, cr_loss=0.4311, attn_decoder_loss=0.2878, over 5787880.05 frames. ], batch size: 79, lr: 1.81e-02, grad_scale: 8.0 2024-09-16 23:39:24,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102180.0, ans=0.125 2024-09-16 23:39:30,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=102220.0, ans=0.2 2024-09-16 23:39:35,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-16 23:39:39,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.808e+01 1.155e+02 1.262e+02 1.445e+02 2.631e+02, threshold=2.524e+02, percent-clipped=0.0 2024-09-16 23:39:54,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=102260.0, ans=0.125 2024-09-16 23:39:56,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102260.0, ans=0.1 2024-09-16 23:40:00,662 INFO [train.py:1198] (0/2) Epoch 6, batch 2950, loss[loss=0.278, ctc_loss=0.2008, cr_loss=0.4208, attn_decoder_loss=0.2772, over 29511.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2084, cr_loss=0.4298, attn_decoder_loss=0.2863, over 5783037.81 frames. ], batch size: 75, lr: 1.81e-02, grad_scale: 4.0 2024-09-16 23:40:08,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102300.0, ans=0.1 2024-09-16 23:40:08,899 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:40:11,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=102300.0, ans=0.125 2024-09-16 23:40:18,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=102340.0, ans=0.0 2024-09-16 23:40:24,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102340.0, ans=0.125 2024-09-16 23:40:24,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2024-09-16 23:40:40,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=102380.0, ans=0.0 2024-09-16 23:40:44,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=102380.0, ans=0.125 2024-09-16 23:41:13,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=102460.0, ans=0.0 2024-09-16 23:41:19,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=102460.0, ans=0.125 2024-09-16 23:41:23,646 INFO [train.py:1198] (0/2) Epoch 6, batch 3000, loss[loss=0.291, ctc_loss=0.2034, cr_loss=0.4196, attn_decoder_loss=0.2915, over 29763.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2082, cr_loss=0.4292, attn_decoder_loss=0.2862, over 5783716.85 frames. ], batch size: 81, lr: 1.81e-02, grad_scale: 8.0 2024-09-16 23:41:23,647 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-16 23:41:42,098 INFO [train.py:1230] (0/2) Epoch 6, validation: loss=0.2192, ctc_loss=0.0625, cr_loss=4.383e-15, attn_decoder_loss=0.2366, over 944034.00 frames. 2024-09-16 23:41:42,098 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-16 23:42:03,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.74 vs. limit=15.0 2024-09-16 23:42:06,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-09-16 23:42:19,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.11 vs. limit=15.0 2024-09-16 23:42:25,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=102580.0, ans=0.125 2024-09-16 23:42:28,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=102620.0, ans=0.125 2024-09-16 23:42:38,833 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.071e+01 1.057e+02 1.164e+02 1.320e+02 2.426e+02, threshold=2.327e+02, percent-clipped=0.0 2024-09-16 23:42:49,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=102660.0, ans=0.125 2024-09-16 23:42:58,905 INFO [train.py:1198] (0/2) Epoch 6, batch 3050, loss[loss=0.2855, ctc_loss=0.2163, cr_loss=0.4426, attn_decoder_loss=0.2834, over 29528.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.2089, cr_loss=0.4305, attn_decoder_loss=0.2868, over 5777961.75 frames. ], batch size: 76, lr: 1.80e-02, grad_scale: 4.0 2024-09-16 23:43:13,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=102740.0, ans=0.0 2024-09-16 23:43:26,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=102740.0, ans=0.0 2024-09-16 23:43:31,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=102780.0, ans=0.125 2024-09-16 23:43:42,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=102780.0, ans=0.2 2024-09-16 23:43:46,957 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:43:49,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=102820.0, ans=0.125 2024-09-16 23:43:54,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=102820.0, ans=15.0 2024-09-16 23:43:57,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=102820.0, ans=0.125 2024-09-16 23:44:07,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=102860.0, ans=0.0 2024-09-16 23:44:10,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102860.0, ans=0.0 2024-09-16 23:44:15,200 INFO [train.py:1198] (0/2) Epoch 6, batch 3100, loss[loss=0.2988, ctc_loss=0.2205, cr_loss=0.4452, attn_decoder_loss=0.2976, over 29219.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2087, cr_loss=0.4295, attn_decoder_loss=0.2864, over 5778153.59 frames. ], batch size: 100, lr: 1.80e-02, grad_scale: 8.0 2024-09-16 23:44:24,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=102900.0, ans=0.0 2024-09-16 23:44:28,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-16 23:44:29,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=102940.0, ans=0.1 2024-09-16 23:44:36,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=102940.0, ans=0.0 2024-09-16 23:44:45,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-09-16 23:44:52,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102980.0, ans=0.1 2024-09-16 23:45:10,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=8.0 2024-09-16 23:45:13,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=103020.0, ans=0.0 2024-09-16 23:45:17,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.176e+01 1.082e+02 1.229e+02 1.361e+02 4.744e+02, threshold=2.458e+02, percent-clipped=3.0 2024-09-16 23:45:19,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=103060.0, ans=0.125 2024-09-16 23:45:20,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103060.0, ans=0.1 2024-09-16 23:45:28,310 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:45:35,733 INFO [train.py:1198] (0/2) Epoch 6, batch 3150, loss[loss=0.3256, ctc_loss=0.2467, cr_loss=0.499, attn_decoder_loss=0.3233, over 28850.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2086, cr_loss=0.4295, attn_decoder_loss=0.2864, over 5783817.36 frames. ], batch size: 104, lr: 1.80e-02, grad_scale: 4.0 2024-09-16 23:45:39,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=103100.0, ans=0.125 2024-09-16 23:46:00,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=103140.0, ans=0.125 2024-09-16 23:46:52,917 INFO [train.py:1198] (0/2) Epoch 6, batch 3200, loss[loss=0.2821, ctc_loss=0.2029, cr_loss=0.4413, attn_decoder_loss=0.2811, over 29404.00 frames. ], tot_loss[loss=0.2863, ctc_loss=0.2075, cr_loss=0.4286, attn_decoder_loss=0.2856, over 5793494.78 frames. ], batch size: 79, lr: 1.80e-02, grad_scale: 8.0 2024-09-16 23:47:07,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=103340.0, ans=0.0 2024-09-16 23:47:16,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=103340.0, ans=0.1 2024-09-16 23:47:16,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=103340.0, ans=0.0 2024-09-16 23:47:38,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2024-09-16 23:47:49,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=103420.0, ans=0.125 2024-09-16 23:47:52,864 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.859e+01 1.056e+02 1.155e+02 1.311e+02 1.883e+02, threshold=2.309e+02, percent-clipped=0.0 2024-09-16 23:48:02,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103460.0, ans=0.1 2024-09-16 23:48:08,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=103500.0, ans=0.125 2024-09-16 23:48:09,762 INFO [train.py:1198] (0/2) Epoch 6, batch 3250, loss[loss=0.298, ctc_loss=0.2154, cr_loss=0.4321, attn_decoder_loss=0.2975, over 29715.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2073, cr_loss=0.4285, attn_decoder_loss=0.2857, over 5801390.62 frames. ], batch size: 84, lr: 1.80e-02, grad_scale: 4.0 2024-09-16 23:48:11,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=103500.0, ans=0.125 2024-09-16 23:48:19,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2024-09-16 23:48:25,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=103540.0, ans=0.025 2024-09-16 23:48:41,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.56 vs. limit=22.5 2024-09-16 23:48:46,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.63 vs. limit=15.0 2024-09-16 23:48:56,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2024-09-16 23:49:04,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=12.0 2024-09-16 23:49:30,806 INFO [train.py:1198] (0/2) Epoch 6, batch 3300, loss[loss=0.3009, ctc_loss=0.2204, cr_loss=0.458, attn_decoder_loss=0.2996, over 28199.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.2068, cr_loss=0.4271, attn_decoder_loss=0.2846, over 5799277.53 frames. ], batch size: 111, lr: 1.80e-02, grad_scale: 8.0 2024-09-16 23:49:32,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=22.5 2024-09-16 23:49:32,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2024-09-16 23:49:39,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2024-09-16 23:49:46,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=103740.0, ans=0.07 2024-09-16 23:49:50,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-09-16 23:50:04,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=103780.0, ans=0.0 2024-09-16 23:50:10,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2024-09-16 23:50:20,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.10 vs. limit=15.0 2024-09-16 23:50:32,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.684e+01 1.121e+02 1.244e+02 1.460e+02 3.755e+02, threshold=2.488e+02, percent-clipped=2.0 2024-09-16 23:50:33,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103860.0, ans=0.1 2024-09-16 23:50:41,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=103860.0, ans=0.125 2024-09-16 23:50:46,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=103900.0, ans=0.2 2024-09-16 23:50:47,242 INFO [train.py:1198] (0/2) Epoch 6, batch 3350, loss[loss=0.311, ctc_loss=0.2343, cr_loss=0.4343, attn_decoder_loss=0.3098, over 28923.00 frames. ], tot_loss[loss=0.2866, ctc_loss=0.2083, cr_loss=0.4282, attn_decoder_loss=0.2857, over 5775231.94 frames. ], batch size: 104, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:51:26,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=103980.0, ans=0.0 2024-09-16 23:51:26,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103980.0, ans=0.1 2024-09-16 23:51:44,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=104020.0, ans=0.2 2024-09-16 23:52:00,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=104060.0, ans=0.0 2024-09-16 23:52:04,357 INFO [train.py:1198] (0/2) Epoch 6, batch 3400, loss[loss=0.2442, ctc_loss=0.1754, cr_loss=0.3692, attn_decoder_loss=0.2437, over 29357.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.208, cr_loss=0.4274, attn_decoder_loss=0.2854, over 5767427.46 frames. ], batch size: 67, lr: 1.79e-02, grad_scale: 8.0 2024-09-16 23:52:04,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=104100.0, ans=0.125 2024-09-16 23:52:15,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=104100.0, ans=0.1 2024-09-16 23:52:54,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=104220.0, ans=0.09899494936611666 2024-09-16 23:52:54,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104220.0, ans=0.1 2024-09-16 23:53:03,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=104220.0, ans=0.0 2024-09-16 23:53:06,910 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:53:11,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=104260.0, ans=0.025 2024-09-16 23:53:12,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.970e+01 1.077e+02 1.207e+02 1.405e+02 5.237e+02, threshold=2.415e+02, percent-clipped=2.0 2024-09-16 23:53:21,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=104260.0, ans=0.125 2024-09-16 23:53:24,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-16 23:53:26,203 INFO [train.py:1198] (0/2) Epoch 6, batch 3450, loss[loss=0.2767, ctc_loss=0.1863, cr_loss=0.3676, attn_decoder_loss=0.2785, over 28267.00 frames. ], tot_loss[loss=0.2864, ctc_loss=0.2078, cr_loss=0.4282, attn_decoder_loss=0.2856, over 5773715.88 frames. ], batch size: 111, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:53:28,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=104300.0, ans=0.0 2024-09-16 23:53:45,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=104340.0, ans=0.1 2024-09-16 23:53:57,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=104380.0, ans=0.0 2024-09-16 23:54:06,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=104380.0, ans=0.125 2024-09-16 23:54:28,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.87 vs. limit=22.5 2024-09-16 23:54:31,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=104460.0, ans=0.0 2024-09-16 23:54:35,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=104460.0, ans=0.125 2024-09-16 23:54:41,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=104500.0, ans=0.0 2024-09-16 23:54:43,085 INFO [train.py:1198] (0/2) Epoch 6, batch 3500, loss[loss=0.2549, ctc_loss=0.1729, cr_loss=0.3595, attn_decoder_loss=0.256, over 29334.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2067, cr_loss=0.4272, attn_decoder_loss=0.2846, over 5775929.94 frames. ], batch size: 71, lr: 1.79e-02, grad_scale: 8.0 2024-09-16 23:55:00,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=104540.0, ans=0.025 2024-09-16 23:55:01,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=104540.0, ans=0.125 2024-09-16 23:55:18,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=104580.0, ans=0.1 2024-09-16 23:55:19,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=104580.0, ans=0.125 2024-09-16 23:55:22,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=104580.0, ans=0.0 2024-09-16 23:55:27,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2024-09-16 23:55:33,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=104620.0, ans=0.0 2024-09-16 23:55:41,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=104620.0, ans=0.2 2024-09-16 23:55:46,678 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.951e+01 1.039e+02 1.144e+02 1.274e+02 4.432e+02, threshold=2.289e+02, percent-clipped=1.0 2024-09-16 23:55:47,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=104660.0, ans=0.125 2024-09-16 23:55:58,730 INFO [train.py:1198] (0/2) Epoch 6, batch 3550, loss[loss=0.2854, ctc_loss=0.1878, cr_loss=0.4313, attn_decoder_loss=0.2866, over 29683.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.2063, cr_loss=0.4273, attn_decoder_loss=0.2846, over 5782620.16 frames. ], batch size: 89, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:55:59,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=104700.0, ans=0.0 2024-09-16 23:56:54,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=104820.0, ans=0.125 2024-09-16 23:57:14,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.69 vs. limit=15.0 2024-09-16 23:57:16,979 INFO [train.py:1198] (0/2) Epoch 6, batch 3600, loss[loss=0.2771, ctc_loss=0.197, cr_loss=0.4183, attn_decoder_loss=0.2767, over 29494.00 frames. ], tot_loss[loss=0.2858, ctc_loss=0.2068, cr_loss=0.4287, attn_decoder_loss=0.2851, over 5792158.60 frames. ], batch size: 77, lr: 1.79e-02, grad_scale: 8.0 2024-09-16 23:57:39,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=104940.0, ans=0.125 2024-09-16 23:57:43,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-09-16 23:57:47,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=104980.0, ans=0.2 2024-09-16 23:58:05,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=105020.0, ans=0.125 2024-09-16 23:58:23,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.018e+01 1.117e+02 1.191e+02 1.328e+02 4.381e+02, threshold=2.382e+02, percent-clipped=2.0 2024-09-16 23:58:26,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=105060.0, ans=0.125 2024-09-16 23:58:26,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=105060.0, ans=0.125 2024-09-16 23:58:32,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=105100.0, ans=0.0 2024-09-16 23:58:34,004 INFO [train.py:1198] (0/2) Epoch 6, batch 3650, loss[loss=0.3052, ctc_loss=0.2165, cr_loss=0.4671, attn_decoder_loss=0.3047, over 29495.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2062, cr_loss=0.428, attn_decoder_loss=0.2844, over 5792965.58 frames. ], batch size: 90, lr: 1.79e-02, grad_scale: 4.0 2024-09-16 23:58:35,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=105100.0, ans=0.0 2024-09-16 23:58:49,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=12.0 2024-09-16 23:58:50,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=105140.0, ans=0.2 2024-09-16 23:59:13,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=105180.0, ans=0.125 2024-09-16 23:59:21,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=105220.0, ans=0.125 2024-09-16 23:59:41,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.56 vs. limit=10.0 2024-09-16 23:59:44,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-09-16 23:59:49,423 INFO [train.py:1198] (0/2) Epoch 6, batch 3700, loss[loss=0.2965, ctc_loss=0.2131, cr_loss=0.4317, attn_decoder_loss=0.2962, over 29712.00 frames. ], tot_loss[loss=0.2849, ctc_loss=0.2055, cr_loss=0.4278, attn_decoder_loss=0.2842, over 5803201.80 frames. ], batch size: 84, lr: 1.78e-02, grad_scale: 8.0 2024-09-16 23:59:57,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=105300.0, ans=0.0 2024-09-17 00:00:00,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105300.0, ans=0.1 2024-09-17 00:00:12,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=105340.0, ans=0.125 2024-09-17 00:00:12,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105340.0, ans=0.1 2024-09-17 00:00:17,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=105340.0, ans=0.0 2024-09-17 00:00:20,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=105380.0, ans=0.025 2024-09-17 00:00:27,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=105380.0, ans=0.0 2024-09-17 00:00:31,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=105380.0, ans=0.1 2024-09-17 00:00:44,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-09-17 00:00:51,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=105460.0, ans=0.125 2024-09-17 00:00:55,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.176e+01 1.057e+02 1.159e+02 1.295e+02 2.172e+02, threshold=2.318e+02, percent-clipped=0.0 2024-09-17 00:00:56,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=105460.0, ans=0.125 2024-09-17 00:01:04,780 INFO [train.py:1198] (0/2) Epoch 6, batch 3750, loss[loss=0.256, ctc_loss=0.1806, cr_loss=0.3977, attn_decoder_loss=0.2555, over 29351.00 frames. ], tot_loss[loss=0.2845, ctc_loss=0.2052, cr_loss=0.4273, attn_decoder_loss=0.2838, over 5807084.24 frames. ], batch size: 67, lr: 1.78e-02, grad_scale: 4.0 2024-09-17 00:01:05,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=105500.0, ans=0.04949747468305833 2024-09-17 00:01:33,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=105580.0, ans=0.0 2024-09-17 00:02:08,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=105660.0, ans=0.0 2024-09-17 00:02:14,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=105660.0, ans=0.125 2024-09-17 00:02:20,335 INFO [train.py:1198] (0/2) Epoch 6, batch 3800, loss[loss=0.29, ctc_loss=0.2084, cr_loss=0.4468, attn_decoder_loss=0.2892, over 29624.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2048, cr_loss=0.4268, attn_decoder_loss=0.2834, over 5797951.66 frames. ], batch size: 86, lr: 1.78e-02, grad_scale: 8.0 2024-09-17 00:02:24,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=105700.0, ans=0.125 2024-09-17 00:02:43,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=105740.0, ans=0.125 2024-09-17 00:03:01,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=105780.0, ans=0.0 2024-09-17 00:03:04,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=105820.0, ans=0.125 2024-09-17 00:03:19,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=105820.0, ans=0.0 2024-09-17 00:03:27,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=105860.0, ans=0.0 2024-09-17 00:03:29,921 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.809e+01 1.097e+02 1.194e+02 1.336e+02 2.111e+02, threshold=2.388e+02, percent-clipped=0.0 2024-09-17 00:03:33,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-09-17 00:03:37,376 INFO [train.py:1198] (0/2) Epoch 6, batch 3850, loss[loss=0.3076, ctc_loss=0.2325, cr_loss=0.4549, attn_decoder_loss=0.3059, over 29242.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2048, cr_loss=0.4273, attn_decoder_loss=0.2836, over 5810079.93 frames. ], batch size: 100, lr: 1.78e-02, grad_scale: 4.0 2024-09-17 00:03:51,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=105940.0, ans=0.125 2024-09-17 00:04:20,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2024-09-17 00:04:20,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2024-09-17 00:04:36,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=106020.0, ans=0.125 2024-09-17 00:04:39,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=106060.0, ans=0.0 2024-09-17 00:04:48,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=106060.0, ans=0.05 2024-09-17 00:04:54,331 INFO [train.py:1198] (0/2) Epoch 6, batch 3900, loss[loss=0.2856, ctc_loss=0.1985, cr_loss=0.4158, attn_decoder_loss=0.286, over 29649.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2042, cr_loss=0.427, attn_decoder_loss=0.2836, over 5814743.86 frames. ], batch size: 86, lr: 1.78e-02, grad_scale: 8.0 2024-09-17 00:05:05,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-17 00:05:16,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=106140.0, ans=0.0 2024-09-17 00:05:35,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=106180.0, ans=0.5 2024-09-17 00:05:36,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=106180.0, ans=0.125 2024-09-17 00:05:41,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-09-17 00:05:52,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=106260.0, ans=0.125 2024-09-17 00:06:03,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.054e+01 1.064e+02 1.152e+02 1.217e+02 1.852e+02, threshold=2.304e+02, percent-clipped=0.0 2024-09-17 00:06:03,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=106260.0, ans=0.125 2024-09-17 00:06:09,212 INFO [train.py:1198] (0/2) Epoch 6, batch 3950, loss[loss=0.2983, ctc_loss=0.2211, cr_loss=0.4533, attn_decoder_loss=0.2968, over 29481.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2037, cr_loss=0.4271, attn_decoder_loss=0.2835, over 5834774.02 frames. ], batch size: 97, lr: 1.78e-02, grad_scale: 4.0 2024-09-17 00:06:17,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=106300.0, ans=0.2 2024-09-17 00:07:24,790 INFO [train.py:1198] (0/2) Epoch 6, batch 4000, loss[loss=0.2708, ctc_loss=0.1907, cr_loss=0.4163, attn_decoder_loss=0.2704, over 29526.00 frames. ], tot_loss[loss=0.2844, ctc_loss=0.2044, cr_loss=0.4272, attn_decoder_loss=0.2838, over 5813306.35 frames. ], batch size: 74, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:07:39,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=106540.0, ans=0.125 2024-09-17 00:07:44,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=106540.0, ans=0.125 2024-09-17 00:08:31,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=106660.0, ans=0.0 2024-09-17 00:08:36,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.741e+01 1.103e+02 1.193e+02 1.340e+02 9.903e+02, threshold=2.386e+02, percent-clipped=2.0 2024-09-17 00:08:41,362 INFO [train.py:1198] (0/2) Epoch 6, batch 4050, loss[loss=0.332, ctc_loss=0.2825, cr_loss=0.4598, attn_decoder_loss=0.3273, over 19771.00 frames. ], tot_loss[loss=0.2843, ctc_loss=0.2046, cr_loss=0.4267, attn_decoder_loss=0.2836, over 5795907.38 frames. ], batch size: 210, lr: 1.77e-02, grad_scale: 4.0 2024-09-17 00:08:44,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106700.0, ans=0.1 2024-09-17 00:08:44,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=106700.0, ans=0.125 2024-09-17 00:09:36,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=106820.0, ans=0.025 2024-09-17 00:09:44,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-09-17 00:09:56,862 INFO [train.py:1198] (0/2) Epoch 6, batch 4100, loss[loss=0.2999, ctc_loss=0.223, cr_loss=0.4533, attn_decoder_loss=0.2984, over 29507.00 frames. ], tot_loss[loss=0.2847, ctc_loss=0.2051, cr_loss=0.4272, attn_decoder_loss=0.2841, over 5791723.14 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:10:19,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=106940.0, ans=0.05 2024-09-17 00:10:26,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=106980.0, ans=0.125 2024-09-17 00:10:51,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=107020.0, ans=0.125 2024-09-17 00:10:53,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=107020.0, ans=0.5 2024-09-17 00:10:54,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=107060.0, ans=0.125 2024-09-17 00:10:58,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-17 00:11:07,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.979e+01 1.121e+02 1.241e+02 1.471e+02 3.510e+02, threshold=2.481e+02, percent-clipped=3.0 2024-09-17 00:11:10,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=8.0 2024-09-17 00:11:11,062 INFO [train.py:1198] (0/2) Epoch 6, batch 4150, loss[loss=0.26, ctc_loss=0.1756, cr_loss=0.3865, attn_decoder_loss=0.2608, over 29515.00 frames. ], tot_loss[loss=0.2845, ctc_loss=0.2049, cr_loss=0.4271, attn_decoder_loss=0.2839, over 5797480.19 frames. ], batch size: 77, lr: 1.77e-02, grad_scale: 4.0 2024-09-17 00:11:15,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=107100.0, ans=0.0 2024-09-17 00:11:23,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=107100.0, ans=0.025 2024-09-17 00:11:52,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-09-17 00:11:55,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=107220.0, ans=0.07 2024-09-17 00:11:57,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-09-17 00:11:59,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=107220.0, ans=0.125 2024-09-17 00:12:05,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=107220.0, ans=0.2 2024-09-17 00:12:16,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=107260.0, ans=0.125 2024-09-17 00:12:24,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=12.0 2024-09-17 00:12:27,411 INFO [train.py:1198] (0/2) Epoch 6, batch 4200, loss[loss=0.3074, ctc_loss=0.2257, cr_loss=0.4743, attn_decoder_loss=0.3059, over 29498.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2056, cr_loss=0.4281, attn_decoder_loss=0.2845, over 5800685.23 frames. ], batch size: 90, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:12:27,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=107300.0, ans=0.125 2024-09-17 00:12:48,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=107340.0, ans=0.125 2024-09-17 00:12:53,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.67 vs. limit=10.0 2024-09-17 00:13:02,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=107380.0, ans=0.125 2024-09-17 00:13:06,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107380.0, ans=0.1 2024-09-17 00:13:41,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.922e+01 1.074e+02 1.154e+02 1.259e+02 3.870e+02, threshold=2.307e+02, percent-clipped=1.0 2024-09-17 00:13:43,277 INFO [train.py:1198] (0/2) Epoch 6, batch 4250, loss[loss=0.2607, ctc_loss=0.1784, cr_loss=0.3774, attn_decoder_loss=0.2615, over 29520.00 frames. ], tot_loss[loss=0.2852, ctc_loss=0.2053, cr_loss=0.4279, attn_decoder_loss=0.2846, over 5807253.24 frames. ], batch size: 74, lr: 1.77e-02, grad_scale: 4.0 2024-09-17 00:13:45,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=107500.0, ans=0.125 2024-09-17 00:14:10,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-17 00:14:17,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=107580.0, ans=0.2 2024-09-17 00:14:26,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=107620.0, ans=0.0 2024-09-17 00:14:28,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=107620.0, ans=0.125 2024-09-17 00:14:32,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=107620.0, ans=0.125 2024-09-17 00:14:42,998 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:14:53,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=107660.0, ans=0.125 2024-09-17 00:14:53,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-09-17 00:14:57,398 INFO [train.py:1198] (0/2) Epoch 6, batch 4300, loss[loss=0.2875, ctc_loss=0.1993, cr_loss=0.4196, attn_decoder_loss=0.2879, over 29523.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.205, cr_loss=0.428, attn_decoder_loss=0.2848, over 5796515.43 frames. ], batch size: 87, lr: 1.77e-02, grad_scale: 8.0 2024-09-17 00:15:02,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-09-17 00:15:28,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.83 vs. limit=12.0 2024-09-17 00:15:35,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-09-17 00:15:47,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-09-17 00:15:56,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-09-17 00:16:13,577 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.817e+01 1.068e+02 1.179e+02 1.314e+02 6.167e+02, threshold=2.359e+02, percent-clipped=2.0 2024-09-17 00:16:13,599 INFO [train.py:1198] (0/2) Epoch 6, batch 4350, loss[loss=0.2886, ctc_loss=0.2074, cr_loss=0.4218, attn_decoder_loss=0.2882, over 29480.00 frames. ], tot_loss[loss=0.289, ctc_loss=0.2086, cr_loss=0.4335, attn_decoder_loss=0.2883, over 5798126.64 frames. ], batch size: 97, lr: 1.76e-02, grad_scale: 4.0 2024-09-17 00:16:41,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=107940.0, ans=0.125 2024-09-17 00:16:56,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.32 vs. limit=10.0 2024-09-17 00:17:08,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2024-09-17 00:17:13,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108060.0, ans=0.1 2024-09-17 00:17:15,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=108060.0, ans=0.0 2024-09-17 00:17:19,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=108060.0, ans=12.0 2024-09-17 00:17:19,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=108060.0, ans=0.125 2024-09-17 00:17:22,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.09 vs. limit=22.5 2024-09-17 00:17:25,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-09-17 00:17:28,660 INFO [train.py:1198] (0/2) Epoch 6, batch 4400, loss[loss=0.2933, ctc_loss=0.2243, cr_loss=0.4556, attn_decoder_loss=0.2908, over 27084.00 frames. ], tot_loss[loss=0.2916, ctc_loss=0.2111, cr_loss=0.4368, attn_decoder_loss=0.2908, over 5767529.78 frames. ], batch size: 124, lr: 1.76e-02, grad_scale: 8.0 2024-09-17 00:17:44,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-17 00:18:01,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=108180.0, ans=0.025 2024-09-17 00:18:20,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=108220.0, ans=10.0 2024-09-17 00:18:25,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=108220.0, ans=0.0 2024-09-17 00:18:38,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108260.0, ans=0.1 2024-09-17 00:18:43,509 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:18:44,520 INFO [train.py:1198] (0/2) Epoch 6, batch 4450, loss[loss=0.3186, ctc_loss=0.2646, cr_loss=0.4353, attn_decoder_loss=0.3149, over 20364.00 frames. ], tot_loss[loss=0.2956, ctc_loss=0.2177, cr_loss=0.4399, attn_decoder_loss=0.2945, over 5581725.90 frames. ], batch size: 209, lr: 1.76e-02, grad_scale: 4.0 2024-09-17 00:18:46,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.821e+01 1.110e+02 1.171e+02 1.331e+02 5.376e+02, threshold=2.342e+02, percent-clipped=1.0 2024-09-17 00:18:47,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=108300.0, ans=0.2 2024-09-17 00:18:53,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=108300.0, ans=0.0 2024-09-17 00:19:00,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-17 00:19:13,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=108380.0, ans=0.0 2024-09-17 00:19:18,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=108380.0, ans=0.125 2024-09-17 00:19:27,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=12.0 2024-09-17 00:19:30,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=108420.0, ans=0.2 2024-09-17 00:20:00,686 INFO [train.py:1198] (0/2) Epoch 6, batch 4500, loss[loss=0.3152, ctc_loss=0.2611, cr_loss=0.4315, attn_decoder_loss=0.3116, over 20853.00 frames. ], tot_loss[loss=0.3003, ctc_loss=0.2264, cr_loss=0.4419, attn_decoder_loss=0.2987, over 5243340.79 frames. ], batch size: 209, lr: 1.76e-02, grad_scale: 8.0 2024-09-17 00:20:13,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=108500.0, ans=0.07 2024-09-17 00:20:19,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=108540.0, ans=0.125 2024-09-17 00:20:38,670 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-6.pt 2024-09-17 00:21:33,137 WARNING [optim.py:503] (0/2) Scaling gradients by 0.06814046949148178, model_norm_threshold=234.16368103027344 2024-09-17 00:21:33,345 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.attention_decoder.decoder.layers.0.norm_self_attn.weight with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.188e+06, grad_sumsq=4.711e+10, orig_rms_sq=6.766e-05 2024-09-17 00:21:33,373 INFO [train.py:1198] (0/2) Epoch 7, batch 0, loss[loss=0.3002, ctc_loss=0.1949, cr_loss=0.4463, attn_decoder_loss=0.3019, over 29600.00 frames. ], tot_loss[loss=0.3002, ctc_loss=0.1949, cr_loss=0.4463, attn_decoder_loss=0.3019, over 29600.00 frames. ], batch size: 73, lr: 1.65e-02, grad_scale: 8.0 2024-09-17 00:21:33,374 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 00:21:51,800 INFO [train.py:1230] (0/2) Epoch 7, validation: loss=0.2253, ctc_loss=0.06341, cr_loss=4.598e-15, attn_decoder_loss=0.2433, over 944034.00 frames. 2024-09-17 00:21:51,800 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 00:21:53,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=108600.0, ans=0.125 2024-09-17 00:22:19,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=108640.0, ans=0.125 2024-09-17 00:22:20,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-09-17 00:22:25,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=108680.0, ans=0.2 2024-09-17 00:22:26,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.79 vs. limit=15.0 2024-09-17 00:22:36,025 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.794e+01 1.158e+02 1.328e+02 1.536e+02 3.436e+03, threshold=2.655e+02, percent-clipped=8.0 2024-09-17 00:23:11,666 INFO [train.py:1198] (0/2) Epoch 7, batch 50, loss[loss=0.2562, ctc_loss=0.1762, cr_loss=0.3789, attn_decoder_loss=0.2567, over 29457.00 frames. ], tot_loss[loss=0.2905, ctc_loss=0.2127, cr_loss=0.4354, attn_decoder_loss=0.2895, over 1268121.79 frames. ], batch size: 70, lr: 1.65e-02, grad_scale: 4.0 2024-09-17 00:23:49,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=108880.0, ans=0.0 2024-09-17 00:23:51,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=108880.0, ans=0.125 2024-09-17 00:23:53,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-17 00:24:21,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=108960.0, ans=0.0 2024-09-17 00:24:25,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-17 00:24:27,109 INFO [train.py:1198] (0/2) Epoch 7, batch 100, loss[loss=0.2768, ctc_loss=0.1989, cr_loss=0.4219, attn_decoder_loss=0.2761, over 29539.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2107, cr_loss=0.4346, attn_decoder_loss=0.2896, over 2251137.67 frames. ], batch size: 76, lr: 1.65e-02, grad_scale: 8.0 2024-09-17 00:24:28,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=109000.0, ans=0.125 2024-09-17 00:24:40,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=109040.0, ans=0.0 2024-09-17 00:24:54,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=109040.0, ans=0.2 2024-09-17 00:25:08,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-09-17 00:25:08,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2024-09-17 00:25:10,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.921e+01 1.059e+02 1.169e+02 1.320e+02 2.276e+02, threshold=2.339e+02, percent-clipped=0.0 2024-09-17 00:25:15,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=109120.0, ans=0.125 2024-09-17 00:25:19,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109120.0, ans=0.1 2024-09-17 00:25:19,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=109120.0, ans=0.025 2024-09-17 00:25:43,878 INFO [train.py:1198] (0/2) Epoch 7, batch 150, loss[loss=0.2547, ctc_loss=0.1783, cr_loss=0.373, attn_decoder_loss=0.2549, over 29430.00 frames. ], tot_loss[loss=0.2869, ctc_loss=0.2069, cr_loss=0.4318, attn_decoder_loss=0.2862, over 3046228.76 frames. ], batch size: 70, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:25:57,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=109240.0, ans=0.125 2024-09-17 00:26:31,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=109320.0, ans=0.125 2024-09-17 00:26:40,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-09-17 00:26:44,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=109360.0, ans=0.0 2024-09-17 00:26:46,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.30 vs. limit=10.0 2024-09-17 00:26:47,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=109360.0, ans=0.0 2024-09-17 00:26:48,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2024-09-17 00:27:00,800 INFO [train.py:1198] (0/2) Epoch 7, batch 200, loss[loss=0.3047, ctc_loss=0.2284, cr_loss=0.4459, attn_decoder_loss=0.3032, over 27326.00 frames. ], tot_loss[loss=0.2851, ctc_loss=0.2047, cr_loss=0.4289, attn_decoder_loss=0.2845, over 3658227.32 frames. ], batch size: 124, lr: 1.64e-02, grad_scale: 8.0 2024-09-17 00:27:13,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=109400.0, ans=0.2 2024-09-17 00:27:35,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109480.0, ans=0.0 2024-09-17 00:27:46,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.796e+01 1.025e+02 1.125e+02 1.234e+02 4.171e+02, threshold=2.251e+02, percent-clipped=1.0 2024-09-17 00:27:47,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=109520.0, ans=0.0 2024-09-17 00:28:17,035 INFO [train.py:1198] (0/2) Epoch 7, batch 250, loss[loss=0.3025, ctc_loss=0.2156, cr_loss=0.4548, attn_decoder_loss=0.302, over 29225.00 frames. ], tot_loss[loss=0.2847, ctc_loss=0.2038, cr_loss=0.4287, attn_decoder_loss=0.2841, over 4140978.43 frames. ], batch size: 100, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:28:41,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109640.0, ans=0.1 2024-09-17 00:28:52,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=109680.0, ans=0.125 2024-09-17 00:29:03,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=109720.0, ans=0.125 2024-09-17 00:29:04,541 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:29:09,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=109720.0, ans=0.125 2024-09-17 00:29:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109720.0, ans=0.0 2024-09-17 00:29:24,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=109760.0, ans=0.0 2024-09-17 00:29:33,125 INFO [train.py:1198] (0/2) Epoch 7, batch 300, loss[loss=0.3021, ctc_loss=0.2258, cr_loss=0.4469, attn_decoder_loss=0.3006, over 29550.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.2018, cr_loss=0.4265, attn_decoder_loss=0.2829, over 4508648.90 frames. ], batch size: 92, lr: 1.64e-02, grad_scale: 8.0 2024-09-17 00:29:39,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109800.0, ans=0.1 2024-09-17 00:29:42,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=109800.0, ans=0.07 2024-09-17 00:29:47,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2024-09-17 00:29:51,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=109840.0, ans=0.125 2024-09-17 00:30:00,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2024-09-17 00:30:03,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=109840.0, ans=0.2 2024-09-17 00:30:04,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=109880.0, ans=0.125 2024-09-17 00:30:16,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=109880.0, ans=0.125 2024-09-17 00:30:18,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=109880.0, ans=0.125 2024-09-17 00:30:25,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.782e+01 1.034e+02 1.140e+02 1.272e+02 2.553e+02, threshold=2.279e+02, percent-clipped=1.0 2024-09-17 00:30:30,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=109920.0, ans=0.125 2024-09-17 00:30:33,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=109920.0, ans=0.125 2024-09-17 00:30:53,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110000.0, ans=0.1 2024-09-17 00:30:54,838 INFO [train.py:1198] (0/2) Epoch 7, batch 350, loss[loss=0.262, ctc_loss=0.1816, cr_loss=0.4106, attn_decoder_loss=0.2619, over 29282.00 frames. ], tot_loss[loss=0.284, ctc_loss=0.2026, cr_loss=0.4275, attn_decoder_loss=0.2836, over 4793032.63 frames. ], batch size: 71, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:31:18,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2024-09-17 00:31:52,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=110120.0, ans=0.125 2024-09-17 00:32:03,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=110160.0, ans=0.0 2024-09-17 00:32:08,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.89 vs. limit=15.0 2024-09-17 00:32:10,451 INFO [train.py:1198] (0/2) Epoch 7, batch 400, loss[loss=0.2874, ctc_loss=0.2024, cr_loss=0.4255, attn_decoder_loss=0.2873, over 29699.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2025, cr_loss=0.427, attn_decoder_loss=0.2834, over 5023004.19 frames. ], batch size: 82, lr: 1.64e-02, grad_scale: 8.0 2024-09-17 00:32:13,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=110200.0, ans=0.025 2024-09-17 00:32:13,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=110200.0, ans=0.0 2024-09-17 00:32:18,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.94 vs. limit=10.0 2024-09-17 00:32:19,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=110200.0, ans=0.07 2024-09-17 00:32:24,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110240.0, ans=0.1 2024-09-17 00:32:35,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110240.0, ans=0.1 2024-09-17 00:32:58,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=110320.0, ans=0.0 2024-09-17 00:32:59,546 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.594e+01 1.062e+02 1.160e+02 1.275e+02 1.904e+02, threshold=2.320e+02, percent-clipped=0.0 2024-09-17 00:33:19,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=110360.0, ans=0.07 2024-09-17 00:33:24,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=110360.0, ans=0.125 2024-09-17 00:33:27,321 INFO [train.py:1198] (0/2) Epoch 7, batch 450, loss[loss=0.2988, ctc_loss=0.2247, cr_loss=0.4417, attn_decoder_loss=0.2973, over 29690.00 frames. ], tot_loss[loss=0.2841, ctc_loss=0.2028, cr_loss=0.4276, attn_decoder_loss=0.2836, over 5187596.60 frames. ], batch size: 83, lr: 1.64e-02, grad_scale: 4.0 2024-09-17 00:33:50,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-09-17 00:34:29,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110520.0, ans=0.1 2024-09-17 00:34:32,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=110560.0, ans=0.0 2024-09-17 00:34:48,899 INFO [train.py:1198] (0/2) Epoch 7, batch 500, loss[loss=0.2922, ctc_loss=0.2059, cr_loss=0.438, attn_decoder_loss=0.2921, over 29448.00 frames. ], tot_loss[loss=0.2827, ctc_loss=0.2014, cr_loss=0.4263, attn_decoder_loss=0.2823, over 5329787.43 frames. ], batch size: 94, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:34:52,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=110600.0, ans=22.5 2024-09-17 00:35:04,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=110640.0, ans=0.09899494936611666 2024-09-17 00:35:23,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-09-17 00:35:39,102 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.537e+01 1.048e+02 1.174e+02 1.330e+02 3.263e+02, threshold=2.347e+02, percent-clipped=4.0 2024-09-17 00:35:40,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-17 00:35:41,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.80 vs. limit=15.0 2024-09-17 00:35:47,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=110720.0, ans=0.125 2024-09-17 00:35:56,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=110760.0, ans=0.0 2024-09-17 00:35:59,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=110760.0, ans=0.0 2024-09-17 00:36:03,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=110800.0, ans=0.125 2024-09-17 00:36:04,947 INFO [train.py:1198] (0/2) Epoch 7, batch 550, loss[loss=0.2937, ctc_loss=0.2095, cr_loss=0.4412, attn_decoder_loss=0.2932, over 28974.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.201, cr_loss=0.4247, attn_decoder_loss=0.2818, over 5423528.31 frames. ], batch size: 104, lr: 1.63e-02, grad_scale: 4.0 2024-09-17 00:36:06,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=110800.0, ans=0.0 2024-09-17 00:36:13,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2024-09-17 00:36:23,666 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:36:26,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=110840.0, ans=0.0 2024-09-17 00:36:59,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=110920.0, ans=0.0 2024-09-17 00:37:05,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=110960.0, ans=0.125 2024-09-17 00:37:12,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=110960.0, ans=0.025 2024-09-17 00:37:15,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110960.0, ans=0.1 2024-09-17 00:37:21,446 INFO [train.py:1198] (0/2) Epoch 7, batch 600, loss[loss=0.3014, ctc_loss=0.2204, cr_loss=0.4715, attn_decoder_loss=0.2999, over 29229.00 frames. ], tot_loss[loss=0.2821, ctc_loss=0.2004, cr_loss=0.4246, attn_decoder_loss=0.2818, over 5509576.41 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:37:21,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=111000.0, ans=0.125 2024-09-17 00:37:30,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=111000.0, ans=0.05 2024-09-17 00:37:33,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=111000.0, ans=6.0 2024-09-17 00:37:41,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=111040.0, ans=0.2 2024-09-17 00:37:55,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111080.0, ans=0.1 2024-09-17 00:37:56,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-09-17 00:38:02,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.75 vs. limit=15.0 2024-09-17 00:38:09,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=12.0 2024-09-17 00:38:14,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.423e+01 1.086e+02 1.156e+02 1.256e+02 2.672e+02, threshold=2.312e+02, percent-clipped=2.0 2024-09-17 00:38:19,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=111120.0, ans=0.025 2024-09-17 00:38:29,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=111160.0, ans=0.0 2024-09-17 00:38:41,954 INFO [train.py:1198] (0/2) Epoch 7, batch 650, loss[loss=0.2815, ctc_loss=0.1963, cr_loss=0.4084, attn_decoder_loss=0.2818, over 29748.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1988, cr_loss=0.4234, attn_decoder_loss=0.2805, over 5586722.30 frames. ], batch size: 81, lr: 1.63e-02, grad_scale: 4.0 2024-09-17 00:38:45,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=111200.0, ans=0.125 2024-09-17 00:38:59,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2024-09-17 00:39:27,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.05 vs. limit=22.5 2024-09-17 00:39:35,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=111320.0, ans=0.2 2024-09-17 00:39:46,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111360.0, ans=0.125 2024-09-17 00:39:58,143 INFO [train.py:1198] (0/2) Epoch 7, batch 700, loss[loss=0.2737, ctc_loss=0.1955, cr_loss=0.4414, attn_decoder_loss=0.2726, over 29541.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.1997, cr_loss=0.4252, attn_decoder_loss=0.2812, over 5637640.88 frames. ], batch size: 76, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:39:58,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=111400.0, ans=0.125 2024-09-17 00:40:00,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.14 vs. limit=15.0 2024-09-17 00:40:24,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=111440.0, ans=0.125 2024-09-17 00:40:32,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=111480.0, ans=0.0 2024-09-17 00:40:51,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.298e+01 1.046e+02 1.126e+02 1.229e+02 1.906e+02, threshold=2.253e+02, percent-clipped=0.0 2024-09-17 00:40:51,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=111520.0, ans=0.025 2024-09-17 00:40:55,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.53 vs. limit=15.0 2024-09-17 00:41:08,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=111560.0, ans=0.125 2024-09-17 00:41:14,392 INFO [train.py:1198] (0/2) Epoch 7, batch 750, loss[loss=0.289, ctc_loss=0.2029, cr_loss=0.4487, attn_decoder_loss=0.2886, over 29720.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1989, cr_loss=0.4243, attn_decoder_loss=0.2804, over 5677105.83 frames. ], batch size: 82, lr: 1.63e-02, grad_scale: 4.0 2024-09-17 00:41:23,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=111600.0, ans=0.0 2024-09-17 00:41:30,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-09-17 00:41:32,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=111640.0, ans=0.125 2024-09-17 00:41:35,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=111640.0, ans=0.125 2024-09-17 00:42:29,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111760.0, ans=0.1 2024-09-17 00:42:29,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=111760.0, ans=0.09899494936611666 2024-09-17 00:42:35,173 INFO [train.py:1198] (0/2) Epoch 7, batch 800, loss[loss=0.2507, ctc_loss=0.1639, cr_loss=0.3699, attn_decoder_loss=0.2522, over 29599.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1989, cr_loss=0.4242, attn_decoder_loss=0.2803, over 5707851.40 frames. ], batch size: 73, lr: 1.63e-02, grad_scale: 8.0 2024-09-17 00:42:36,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=111800.0, ans=15.0 2024-09-17 00:42:56,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=111840.0, ans=0.125 2024-09-17 00:43:02,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=111840.0, ans=0.125 2024-09-17 00:43:05,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=111880.0, ans=0.0 2024-09-17 00:43:13,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=111880.0, ans=0.2 2024-09-17 00:43:16,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=111880.0, ans=0.125 2024-09-17 00:43:24,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=111920.0, ans=0.025 2024-09-17 00:43:29,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.814e+01 1.067e+02 1.173e+02 1.326e+02 3.037e+02, threshold=2.345e+02, percent-clipped=2.0 2024-09-17 00:43:31,800 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:43:37,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=111960.0, ans=0.125 2024-09-17 00:43:50,001 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-28000.pt 2024-09-17 00:43:58,057 INFO [train.py:1198] (0/2) Epoch 7, batch 850, loss[loss=0.3004, ctc_loss=0.2204, cr_loss=0.4198, attn_decoder_loss=0.3, over 29683.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.1983, cr_loss=0.4222, attn_decoder_loss=0.2798, over 5737440.71 frames. ], batch size: 89, lr: 1.62e-02, grad_scale: 4.0 2024-09-17 00:43:58,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=112000.0, ans=0.95 2024-09-17 00:44:11,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112040.0, ans=0.1 2024-09-17 00:44:47,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.79 vs. limit=6.0 2024-09-17 00:45:08,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=112160.0, ans=0.025 2024-09-17 00:45:14,255 INFO [train.py:1198] (0/2) Epoch 7, batch 900, loss[loss=0.2677, ctc_loss=0.1861, cr_loss=0.3989, attn_decoder_loss=0.2679, over 29619.00 frames. ], tot_loss[loss=0.2809, ctc_loss=0.1992, cr_loss=0.4229, attn_decoder_loss=0.2805, over 5741716.61 frames. ], batch size: 73, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:46:01,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-17 00:46:02,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=112320.0, ans=0.0 2024-09-17 00:46:09,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2024-09-17 00:46:13,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=112320.0, ans=0.2 2024-09-17 00:46:14,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.034e+01 1.106e+02 1.225e+02 1.378e+02 5.810e+02, threshold=2.450e+02, percent-clipped=7.0 2024-09-17 00:46:27,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=112360.0, ans=0.125 2024-09-17 00:46:31,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=112360.0, ans=0.0 2024-09-17 00:46:34,342 INFO [train.py:1198] (0/2) Epoch 7, batch 950, loss[loss=0.2576, ctc_loss=0.1698, cr_loss=0.3765, attn_decoder_loss=0.259, over 29514.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.1996, cr_loss=0.4232, attn_decoder_loss=0.2809, over 5743763.00 frames. ], batch size: 74, lr: 1.62e-02, grad_scale: 4.0 2024-09-17 00:46:39,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=112400.0, ans=0.125 2024-09-17 00:46:43,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=112400.0, ans=0.125 2024-09-17 00:46:45,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=112400.0, ans=0.025 2024-09-17 00:47:01,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=112440.0, ans=0.125 2024-09-17 00:47:10,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.10 vs. limit=10.0 2024-09-17 00:47:14,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.50 vs. limit=15.0 2024-09-17 00:47:30,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=112520.0, ans=0.125 2024-09-17 00:47:48,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.23 vs. limit=10.0 2024-09-17 00:47:50,432 INFO [train.py:1198] (0/2) Epoch 7, batch 1000, loss[loss=0.2716, ctc_loss=0.191, cr_loss=0.4203, attn_decoder_loss=0.2712, over 29504.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.2005, cr_loss=0.4246, attn_decoder_loss=0.2819, over 5738099.41 frames. ], batch size: 77, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:47:55,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112600.0, ans=0.1 2024-09-17 00:48:00,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.39 vs. limit=12.0 2024-09-17 00:48:36,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=112720.0, ans=0.125 2024-09-17 00:48:46,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.415e+01 1.043e+02 1.127e+02 1.327e+02 3.931e+02, threshold=2.254e+02, percent-clipped=2.0 2024-09-17 00:48:59,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=112760.0, ans=0.0 2024-09-17 00:49:06,945 INFO [train.py:1198] (0/2) Epoch 7, batch 1050, loss[loss=0.2815, ctc_loss=0.2017, cr_loss=0.4309, attn_decoder_loss=0.2808, over 29684.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1991, cr_loss=0.4229, attn_decoder_loss=0.2808, over 5746101.25 frames. ], batch size: 85, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:49:21,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-09-17 00:49:31,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=112840.0, ans=10.0 2024-09-17 00:49:33,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112840.0, ans=0.1 2024-09-17 00:49:40,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=112880.0, ans=0.025 2024-09-17 00:49:50,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=112880.0, ans=0.0 2024-09-17 00:49:53,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-09-17 00:49:56,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=112920.0, ans=0.125 2024-09-17 00:50:21,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=112960.0, ans=0.125 2024-09-17 00:50:26,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.42 vs. limit=15.0 2024-09-17 00:50:28,828 INFO [train.py:1198] (0/2) Epoch 7, batch 1100, loss[loss=0.2692, ctc_loss=0.1817, cr_loss=0.3963, attn_decoder_loss=0.2701, over 29459.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1982, cr_loss=0.4221, attn_decoder_loss=0.2804, over 5758094.20 frames. ], batch size: 78, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:50:39,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=113000.0, ans=0.025 2024-09-17 00:50:46,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=113040.0, ans=0.0 2024-09-17 00:50:46,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.32 vs. limit=22.5 2024-09-17 00:50:55,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=113040.0, ans=0.125 2024-09-17 00:51:03,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=7.49 vs. limit=12.0 2024-09-17 00:51:16,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=15.0 2024-09-17 00:51:21,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=113120.0, ans=0.0 2024-09-17 00:51:26,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=113120.0, ans=0.0 2024-09-17 00:51:28,961 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.665e+01 1.036e+02 1.119e+02 1.238e+02 1.913e+02, threshold=2.238e+02, percent-clipped=0.0 2024-09-17 00:51:30,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2024-09-17 00:51:36,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=113160.0, ans=0.125 2024-09-17 00:51:37,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=113160.0, ans=0.1 2024-09-17 00:51:45,873 INFO [train.py:1198] (0/2) Epoch 7, batch 1150, loss[loss=0.2734, ctc_loss=0.1908, cr_loss=0.4116, attn_decoder_loss=0.2734, over 29455.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1982, cr_loss=0.4218, attn_decoder_loss=0.2803, over 5756505.84 frames. ], batch size: 78, lr: 1.62e-02, grad_scale: 4.0 2024-09-17 00:52:06,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=113240.0, ans=0.125 2024-09-17 00:52:17,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=113280.0, ans=0.0 2024-09-17 00:52:21,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=113280.0, ans=0.2 2024-09-17 00:52:24,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=113280.0, ans=0.125 2024-09-17 00:52:29,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=113280.0, ans=0.125 2024-09-17 00:52:37,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=113320.0, ans=0.2 2024-09-17 00:52:43,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-17 00:52:52,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=113360.0, ans=0.2 2024-09-17 00:53:00,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=113360.0, ans=0.04949747468305833 2024-09-17 00:53:02,763 INFO [train.py:1198] (0/2) Epoch 7, batch 1200, loss[loss=0.2884, ctc_loss=0.2007, cr_loss=0.4373, attn_decoder_loss=0.2884, over 29675.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.1993, cr_loss=0.4237, attn_decoder_loss=0.2814, over 5749159.39 frames. ], batch size: 85, lr: 1.62e-02, grad_scale: 8.0 2024-09-17 00:53:13,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=113400.0, ans=0.0 2024-09-17 00:53:13,876 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:54:08,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.382e+01 1.039e+02 1.128e+02 1.242e+02 2.195e+02, threshold=2.256e+02, percent-clipped=0.0 2024-09-17 00:54:10,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=113560.0, ans=0.1 2024-09-17 00:54:12,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=113560.0, ans=0.2 2024-09-17 00:54:13,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=113560.0, ans=0.5 2024-09-17 00:54:15,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=113560.0, ans=0.125 2024-09-17 00:54:16,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=113560.0, ans=0.025 2024-09-17 00:54:24,445 INFO [train.py:1198] (0/2) Epoch 7, batch 1250, loss[loss=0.2892, ctc_loss=0.1942, cr_loss=0.4155, attn_decoder_loss=0.2905, over 29530.00 frames. ], tot_loss[loss=0.282, ctc_loss=0.1995, cr_loss=0.4242, attn_decoder_loss=0.2817, over 5776920.20 frames. ], batch size: 92, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 00:54:29,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=113600.0, ans=0.2 2024-09-17 00:54:30,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=113600.0, ans=0.2 2024-09-17 00:54:56,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113680.0, ans=0.125 2024-09-17 00:54:58,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113680.0, ans=0.125 2024-09-17 00:55:03,059 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:55:24,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=113760.0, ans=0.0 2024-09-17 00:55:27,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=113760.0, ans=0.0 2024-09-17 00:55:28,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=113760.0, ans=0.125 2024-09-17 00:55:34,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-17 00:55:34,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-09-17 00:55:40,885 INFO [train.py:1198] (0/2) Epoch 7, batch 1300, loss[loss=0.2962, ctc_loss=0.2073, cr_loss=0.4232, attn_decoder_loss=0.2966, over 28325.00 frames. ], tot_loss[loss=0.2816, ctc_loss=0.1992, cr_loss=0.4239, attn_decoder_loss=0.2813, over 5780499.23 frames. ], batch size: 111, lr: 1.61e-02, grad_scale: 8.0 2024-09-17 00:55:53,446 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:55:56,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.99 vs. limit=15.0 2024-09-17 00:55:59,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=113840.0, ans=0.0 2024-09-17 00:56:12,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-09-17 00:56:40,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=8.0 2024-09-17 00:56:43,502 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.471e+01 1.026e+02 1.124e+02 1.254e+02 2.028e+02, threshold=2.249e+02, percent-clipped=0.0 2024-09-17 00:56:57,147 INFO [train.py:1198] (0/2) Epoch 7, batch 1350, loss[loss=0.2909, ctc_loss=0.2142, cr_loss=0.441, attn_decoder_loss=0.2896, over 29760.00 frames. ], tot_loss[loss=0.2811, ctc_loss=0.1985, cr_loss=0.4235, attn_decoder_loss=0.2809, over 5798373.07 frames. ], batch size: 81, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 00:57:01,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=114000.0, ans=0.0 2024-09-17 00:57:24,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=114040.0, ans=0.125 2024-09-17 00:57:33,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=114080.0, ans=0.0 2024-09-17 00:57:34,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.34 vs. limit=10.0 2024-09-17 00:57:41,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=114120.0, ans=0.2 2024-09-17 00:57:44,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=114120.0, ans=0.0 2024-09-17 00:58:18,159 INFO [train.py:1198] (0/2) Epoch 7, batch 1400, loss[loss=0.25, ctc_loss=0.1761, cr_loss=0.3876, attn_decoder_loss=0.2496, over 29566.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1983, cr_loss=0.4236, attn_decoder_loss=0.2806, over 5808805.25 frames. ], batch size: 69, lr: 1.61e-02, grad_scale: 8.0 2024-09-17 00:58:20,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=114200.0, ans=0.125 2024-09-17 00:58:21,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=114200.0, ans=0.0 2024-09-17 00:58:33,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=114240.0, ans=0.125 2024-09-17 00:58:35,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-09-17 00:58:47,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=114280.0, ans=0.0 2024-09-17 00:58:57,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=114280.0, ans=0.125 2024-09-17 00:58:59,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=114280.0, ans=0.125 2024-09-17 00:59:01,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.92 vs. limit=15.0 2024-09-17 00:59:15,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.52 vs. limit=22.5 2024-09-17 00:59:21,791 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.456e+01 9.951e+01 1.071e+02 1.173e+02 2.370e+02, threshold=2.143e+02, percent-clipped=1.0 2024-09-17 00:59:22,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=114360.0, ans=0.125 2024-09-17 00:59:23,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=114360.0, ans=0.0 2024-09-17 00:59:28,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=114360.0, ans=0.0 2024-09-17 00:59:34,522 INFO [train.py:1198] (0/2) Epoch 7, batch 1450, loss[loss=0.2967, ctc_loss=0.2084, cr_loss=0.4681, attn_decoder_loss=0.2961, over 29420.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.1984, cr_loss=0.4238, attn_decoder_loss=0.281, over 5804287.07 frames. ], batch size: 94, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 00:59:34,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=114400.0, ans=0.0 2024-09-17 00:59:36,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=114400.0, ans=0.025 2024-09-17 00:59:37,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=114400.0, ans=0.0 2024-09-17 01:00:17,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=114480.0, ans=0.125 2024-09-17 01:00:21,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=114520.0, ans=0.2 2024-09-17 01:00:25,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=114520.0, ans=0.2 2024-09-17 01:00:35,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=114560.0, ans=0.1 2024-09-17 01:00:50,573 INFO [train.py:1198] (0/2) Epoch 7, batch 1500, loss[loss=0.286, ctc_loss=0.1926, cr_loss=0.4132, attn_decoder_loss=0.2872, over 29642.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.1981, cr_loss=0.4235, attn_decoder_loss=0.2813, over 5805154.06 frames. ], batch size: 86, lr: 1.61e-02, grad_scale: 8.0 2024-09-17 01:00:56,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114600.0, ans=0.1 2024-09-17 01:01:09,258 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:01:35,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=114720.0, ans=0.125 2024-09-17 01:01:58,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.51 vs. limit=15.0 2024-09-17 01:01:58,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.036e+01 1.083e+02 1.173e+02 1.293e+02 2.517e+02, threshold=2.346e+02, percent-clipped=1.0 2024-09-17 01:02:02,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=114760.0, ans=0.0 2024-09-17 01:02:11,566 INFO [train.py:1198] (0/2) Epoch 7, batch 1550, loss[loss=0.2991, ctc_loss=0.204, cr_loss=0.4337, attn_decoder_loss=0.3, over 29475.00 frames. ], tot_loss[loss=0.2816, ctc_loss=0.1988, cr_loss=0.4239, attn_decoder_loss=0.2814, over 5782047.09 frames. ], batch size: 90, lr: 1.61e-02, grad_scale: 4.0 2024-09-17 01:02:28,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=114840.0, ans=0.125 2024-09-17 01:02:31,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114840.0, ans=0.1 2024-09-17 01:02:45,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-17 01:02:58,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=114920.0, ans=0.05 2024-09-17 01:03:00,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=114920.0, ans=0.125 2024-09-17 01:03:05,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=114920.0, ans=0.0 2024-09-17 01:03:10,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.26 vs. limit=10.0 2024-09-17 01:03:26,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=115000.0, ans=0.125 2024-09-17 01:03:27,526 INFO [train.py:1198] (0/2) Epoch 7, batch 1600, loss[loss=0.2878, ctc_loss=0.2003, cr_loss=0.4401, attn_decoder_loss=0.2878, over 29666.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.1989, cr_loss=0.4228, attn_decoder_loss=0.2811, over 5764087.36 frames. ], batch size: 85, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:03:29,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=115000.0, ans=0.125 2024-09-17 01:03:58,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=115080.0, ans=0.125 2024-09-17 01:04:31,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=115160.0, ans=0.125 2024-09-17 01:04:33,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=115160.0, ans=0.125 2024-09-17 01:04:34,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.309e+01 1.047e+02 1.116e+02 1.262e+02 4.085e+02, threshold=2.232e+02, percent-clipped=3.0 2024-09-17 01:04:43,957 INFO [train.py:1198] (0/2) Epoch 7, batch 1650, loss[loss=0.3057, ctc_loss=0.2193, cr_loss=0.4288, attn_decoder_loss=0.3058, over 29712.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.1987, cr_loss=0.4228, attn_decoder_loss=0.281, over 5757037.64 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:05:05,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=115240.0, ans=0.04949747468305833 2024-09-17 01:05:08,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=115240.0, ans=0.0 2024-09-17 01:05:08,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115240.0, ans=0.1 2024-09-17 01:05:53,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=115360.0, ans=0.125 2024-09-17 01:05:57,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=115360.0, ans=0.125 2024-09-17 01:05:57,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=115360.0, ans=0.0 2024-09-17 01:06:04,530 INFO [train.py:1198] (0/2) Epoch 7, batch 1700, loss[loss=0.2522, ctc_loss=0.1741, cr_loss=0.3712, attn_decoder_loss=0.2526, over 29610.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1976, cr_loss=0.4221, attn_decoder_loss=0.2806, over 5779471.46 frames. ], batch size: 69, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:06:12,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115400.0, ans=0.1 2024-09-17 01:06:33,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-17 01:06:55,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=115520.0, ans=0.2 2024-09-17 01:06:57,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-17 01:07:13,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.613e+01 9.967e+01 1.096e+02 1.177e+02 1.822e+02, threshold=2.192e+02, percent-clipped=0.0 2024-09-17 01:07:14,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=115560.0, ans=0.125 2024-09-17 01:07:17,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115560.0, ans=0.1 2024-09-17 01:07:21,576 INFO [train.py:1198] (0/2) Epoch 7, batch 1750, loss[loss=0.2431, ctc_loss=0.1699, cr_loss=0.3876, attn_decoder_loss=0.2427, over 29372.00 frames. ], tot_loss[loss=0.2806, ctc_loss=0.1977, cr_loss=0.4225, attn_decoder_loss=0.2804, over 5788374.68 frames. ], batch size: 67, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:07:32,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=115600.0, ans=0.0 2024-09-17 01:07:50,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115640.0, ans=0.1 2024-09-17 01:07:50,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=115640.0, ans=0.0 2024-09-17 01:07:51,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=115680.0, ans=0.125 2024-09-17 01:07:54,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=115680.0, ans=0.04949747468305833 2024-09-17 01:07:55,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-09-17 01:08:14,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=115720.0, ans=0.2 2024-09-17 01:08:14,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.14 vs. limit=15.0 2024-09-17 01:08:21,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=115760.0, ans=0.04949747468305833 2024-09-17 01:08:32,396 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:08:38,043 INFO [train.py:1198] (0/2) Epoch 7, batch 1800, loss[loss=0.2958, ctc_loss=0.2143, cr_loss=0.4562, attn_decoder_loss=0.2948, over 29684.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1978, cr_loss=0.4234, attn_decoder_loss=0.2805, over 5791502.68 frames. ], batch size: 83, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:08:42,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-09-17 01:08:49,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-09-17 01:09:01,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=115840.0, ans=0.0 2024-09-17 01:09:04,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=115840.0, ans=0.125 2024-09-17 01:09:22,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2024-09-17 01:09:50,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.679e+01 1.033e+02 1.126e+02 1.316e+02 3.290e+02, threshold=2.253e+02, percent-clipped=1.0 2024-09-17 01:09:57,303 INFO [train.py:1198] (0/2) Epoch 7, batch 1850, loss[loss=0.3015, ctc_loss=0.2104, cr_loss=0.4517, attn_decoder_loss=0.3016, over 29651.00 frames. ], tot_loss[loss=0.2804, ctc_loss=0.1972, cr_loss=0.4231, attn_decoder_loss=0.2803, over 5797441.12 frames. ], batch size: 86, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:10:30,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2024-09-17 01:11:15,080 INFO [train.py:1198] (0/2) Epoch 7, batch 1900, loss[loss=0.279, ctc_loss=0.1902, cr_loss=0.3955, attn_decoder_loss=0.2801, over 29705.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1973, cr_loss=0.4234, attn_decoder_loss=0.2806, over 5804703.35 frames. ], batch size: 89, lr: 1.60e-02, grad_scale: 8.0 2024-09-17 01:11:21,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=116200.0, ans=0.0 2024-09-17 01:11:24,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=116200.0, ans=0.125 2024-09-17 01:11:24,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=116200.0, ans=0.025 2024-09-17 01:11:54,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=12.0 2024-09-17 01:12:13,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=116320.0, ans=0.125 2024-09-17 01:12:16,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=116360.0, ans=0.0 2024-09-17 01:12:26,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.068e+01 1.016e+02 1.078e+02 1.162e+02 1.899e+02, threshold=2.156e+02, percent-clipped=0.0 2024-09-17 01:12:28,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=116360.0, ans=0.0 2024-09-17 01:12:31,512 INFO [train.py:1198] (0/2) Epoch 7, batch 1950, loss[loss=0.2803, ctc_loss=0.1942, cr_loss=0.4475, attn_decoder_loss=0.2799, over 29468.00 frames. ], tot_loss[loss=0.2817, ctc_loss=0.1975, cr_loss=0.4248, attn_decoder_loss=0.2816, over 5819381.58 frames. ], batch size: 78, lr: 1.60e-02, grad_scale: 4.0 2024-09-17 01:12:37,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=116400.0, ans=0.0 2024-09-17 01:12:38,102 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:13:07,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-09-17 01:13:24,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2024-09-17 01:13:36,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-17 01:13:46,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=116560.0, ans=0.0 2024-09-17 01:13:50,415 INFO [train.py:1198] (0/2) Epoch 7, batch 2000, loss[loss=0.2605, ctc_loss=0.1898, cr_loss=0.394, attn_decoder_loss=0.2596, over 29380.00 frames. ], tot_loss[loss=0.2825, ctc_loss=0.1986, cr_loss=0.4254, attn_decoder_loss=0.2823, over 5796750.44 frames. ], batch size: 67, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:13:50,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=116600.0, ans=0.0 2024-09-17 01:14:10,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=116640.0, ans=0.0 2024-09-17 01:14:12,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=116640.0, ans=15.0 2024-09-17 01:14:13,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=116640.0, ans=0.125 2024-09-17 01:14:15,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.66 vs. limit=15.0 2024-09-17 01:14:16,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=116640.0, ans=0.125 2024-09-17 01:14:18,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.41 vs. limit=15.0 2024-09-17 01:14:37,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=116720.0, ans=0.0 2024-09-17 01:14:45,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=116720.0, ans=0.125 2024-09-17 01:15:06,313 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.446e+01 1.080e+02 1.203e+02 1.389e+02 2.597e+02, threshold=2.406e+02, percent-clipped=3.0 2024-09-17 01:15:09,745 INFO [train.py:1198] (0/2) Epoch 7, batch 2050, loss[loss=0.2407, ctc_loss=0.1633, cr_loss=0.3679, attn_decoder_loss=0.2411, over 29444.00 frames. ], tot_loss[loss=0.2814, ctc_loss=0.1981, cr_loss=0.4243, attn_decoder_loss=0.2812, over 5789329.29 frames. ], batch size: 70, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:15:17,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=116800.0, ans=0.125 2024-09-17 01:15:26,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=116840.0, ans=0.0 2024-09-17 01:15:29,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.11 vs. limit=6.0 2024-09-17 01:15:33,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116840.0, ans=0.1 2024-09-17 01:15:37,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=116840.0, ans=0.2 2024-09-17 01:15:46,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=116880.0, ans=0.125 2024-09-17 01:15:51,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=116880.0, ans=0.125 2024-09-17 01:15:52,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=116880.0, ans=0.125 2024-09-17 01:16:25,958 INFO [train.py:1198] (0/2) Epoch 7, batch 2100, loss[loss=0.262, ctc_loss=0.1776, cr_loss=0.3922, attn_decoder_loss=0.2626, over 29753.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1975, cr_loss=0.4236, attn_decoder_loss=0.2806, over 5800802.09 frames. ], batch size: 81, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:17:05,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-09-17 01:17:16,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=117120.0, ans=10.0 2024-09-17 01:17:21,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117120.0, ans=0.125 2024-09-17 01:17:36,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.25 vs. limit=10.0 2024-09-17 01:17:38,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=117160.0, ans=0.0 2024-09-17 01:17:42,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.423e+01 1.035e+02 1.132e+02 1.239e+02 1.917e+02, threshold=2.264e+02, percent-clipped=0.0 2024-09-17 01:17:44,157 INFO [train.py:1198] (0/2) Epoch 7, batch 2150, loss[loss=0.2903, ctc_loss=0.2099, cr_loss=0.4395, attn_decoder_loss=0.2894, over 29473.00 frames. ], tot_loss[loss=0.2798, ctc_loss=0.1963, cr_loss=0.4221, attn_decoder_loss=0.2797, over 5816145.34 frames. ], batch size: 78, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:17:44,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=117200.0, ans=0.125 2024-09-17 01:17:44,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=117200.0, ans=0.0 2024-09-17 01:17:58,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=117200.0, ans=0.2 2024-09-17 01:18:09,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=117240.0, ans=0.125 2024-09-17 01:18:26,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=117280.0, ans=0.125 2024-09-17 01:18:32,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=117320.0, ans=0.125 2024-09-17 01:18:39,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117320.0, ans=0.1 2024-09-17 01:18:50,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=117360.0, ans=0.125 2024-09-17 01:18:58,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=117360.0, ans=0.125 2024-09-17 01:19:02,629 INFO [train.py:1198] (0/2) Epoch 7, batch 2200, loss[loss=0.2938, ctc_loss=0.2011, cr_loss=0.4095, attn_decoder_loss=0.295, over 29620.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.1967, cr_loss=0.4234, attn_decoder_loss=0.2802, over 5812435.94 frames. ], batch size: 86, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:19:21,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=117440.0, ans=0.125 2024-09-17 01:19:25,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=117440.0, ans=0.125 2024-09-17 01:19:55,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=117520.0, ans=0.125 2024-09-17 01:20:04,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=117560.0, ans=0.07 2024-09-17 01:20:19,563 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.574e+01 1.056e+02 1.108e+02 1.251e+02 3.146e+02, threshold=2.216e+02, percent-clipped=2.0 2024-09-17 01:20:19,585 INFO [train.py:1198] (0/2) Epoch 7, batch 2250, loss[loss=0.2731, ctc_loss=0.1887, cr_loss=0.422, attn_decoder_loss=0.2731, over 29684.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1972, cr_loss=0.4234, attn_decoder_loss=0.2803, over 5812790.30 frames. ], batch size: 82, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:20:31,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.37 vs. limit=15.0 2024-09-17 01:21:00,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=117680.0, ans=0.125 2024-09-17 01:21:02,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=117680.0, ans=0.2 2024-09-17 01:21:18,519 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:21:37,949 INFO [train.py:1198] (0/2) Epoch 7, batch 2300, loss[loss=0.2558, ctc_loss=0.1763, cr_loss=0.4009, attn_decoder_loss=0.2557, over 29302.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1962, cr_loss=0.4216, attn_decoder_loss=0.279, over 5798291.71 frames. ], batch size: 71, lr: 1.59e-02, grad_scale: 8.0 2024-09-17 01:21:51,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=117840.0, ans=0.125 2024-09-17 01:22:01,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=117840.0, ans=0.95 2024-09-17 01:22:08,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=117880.0, ans=0.09899494936611666 2024-09-17 01:22:56,189 INFO [train.py:1198] (0/2) Epoch 7, batch 2350, loss[loss=0.2986, ctc_loss=0.2109, cr_loss=0.4571, attn_decoder_loss=0.2982, over 29663.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1956, cr_loss=0.4215, attn_decoder_loss=0.2789, over 5802718.07 frames. ], batch size: 83, lr: 1.59e-02, grad_scale: 4.0 2024-09-17 01:22:57,667 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.765e+01 1.037e+02 1.131e+02 1.224e+02 2.356e+02, threshold=2.262e+02, percent-clipped=1.0 2024-09-17 01:22:58,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=118000.0, ans=0.0 2024-09-17 01:23:08,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=118000.0, ans=0.0 2024-09-17 01:23:14,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=118040.0, ans=0.125 2024-09-17 01:23:55,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=118160.0, ans=0.125 2024-09-17 01:24:01,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-17 01:24:06,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118160.0, ans=0.1 2024-09-17 01:24:09,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=118160.0, ans=0.2 2024-09-17 01:24:12,081 INFO [train.py:1198] (0/2) Epoch 7, batch 2400, loss[loss=0.2724, ctc_loss=0.1912, cr_loss=0.3997, attn_decoder_loss=0.2725, over 29540.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.1962, cr_loss=0.4217, attn_decoder_loss=0.2794, over 5806036.74 frames. ], batch size: 76, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:24:44,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=118280.0, ans=0.5 2024-09-17 01:24:47,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=118280.0, ans=0.0 2024-09-17 01:24:52,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=118280.0, ans=0.07 2024-09-17 01:25:09,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118320.0, ans=0.1 2024-09-17 01:25:32,283 INFO [train.py:1198] (0/2) Epoch 7, batch 2450, loss[loss=0.2882, ctc_loss=0.1979, cr_loss=0.4435, attn_decoder_loss=0.2884, over 29684.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1971, cr_loss=0.4231, attn_decoder_loss=0.2806, over 5783888.64 frames. ], batch size: 82, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:25:32,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=118400.0, ans=0.0 2024-09-17 01:25:35,213 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.912e+01 1.061e+02 1.128e+02 1.247e+02 1.833e+02, threshold=2.256e+02, percent-clipped=0.0 2024-09-17 01:25:46,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=118440.0, ans=0.125 2024-09-17 01:26:03,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=118480.0, ans=0.025 2024-09-17 01:26:34,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=118560.0, ans=0.0 2024-09-17 01:26:45,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.00 vs. limit=10.0 2024-09-17 01:26:50,582 INFO [train.py:1198] (0/2) Epoch 7, batch 2500, loss[loss=0.2944, ctc_loss=0.2075, cr_loss=0.4253, attn_decoder_loss=0.2946, over 29627.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1971, cr_loss=0.4234, attn_decoder_loss=0.2805, over 5794289.33 frames. ], batch size: 86, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:26:50,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=118600.0, ans=0.125 2024-09-17 01:27:03,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=118600.0, ans=0.0 2024-09-17 01:27:07,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118640.0, ans=0.125 2024-09-17 01:27:15,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=118640.0, ans=0.0 2024-09-17 01:27:16,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118640.0, ans=0.1 2024-09-17 01:27:40,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=118720.0, ans=15.0 2024-09-17 01:28:07,516 INFO [train.py:1198] (0/2) Epoch 7, batch 2550, loss[loss=0.2437, ctc_loss=0.1668, cr_loss=0.3977, attn_decoder_loss=0.2434, over 29331.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.1963, cr_loss=0.4231, attn_decoder_loss=0.2802, over 5798434.84 frames. ], batch size: 67, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:28:11,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.793e+01 1.007e+02 1.102e+02 1.293e+02 3.039e+02, threshold=2.204e+02, percent-clipped=2.0 2024-09-17 01:28:14,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2024-09-17 01:28:33,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-09-17 01:29:12,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=118960.0, ans=0.2 2024-09-17 01:29:25,799 INFO [train.py:1198] (0/2) Epoch 7, batch 2600, loss[loss=0.2548, ctc_loss=0.1663, cr_loss=0.3816, attn_decoder_loss=0.2561, over 29448.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.1964, cr_loss=0.4231, attn_decoder_loss=0.2804, over 5794817.31 frames. ], batch size: 78, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:29:29,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=119000.0, ans=0.125 2024-09-17 01:30:08,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=119080.0, ans=0.125 2024-09-17 01:30:11,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119120.0, ans=0.1 2024-09-17 01:30:19,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-09-17 01:30:22,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=119120.0, ans=0.125 2024-09-17 01:30:36,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.25 vs. limit=22.5 2024-09-17 01:30:43,410 INFO [train.py:1198] (0/2) Epoch 7, batch 2650, loss[loss=0.3062, ctc_loss=0.2187, cr_loss=0.4495, attn_decoder_loss=0.3059, over 29213.00 frames. ], tot_loss[loss=0.2809, ctc_loss=0.1969, cr_loss=0.4231, attn_decoder_loss=0.2808, over 5801658.86 frames. ], batch size: 100, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:30:45,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=119200.0, ans=0.0 2024-09-17 01:30:49,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.862e+01 1.038e+02 1.128e+02 1.278e+02 2.890e+02, threshold=2.256e+02, percent-clipped=2.0 2024-09-17 01:31:00,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119240.0, ans=0.1 2024-09-17 01:31:12,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119280.0, ans=0.1 2024-09-17 01:31:16,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.52 vs. limit=22.5 2024-09-17 01:31:17,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=119280.0, ans=0.125 2024-09-17 01:31:23,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=119280.0, ans=0.125 2024-09-17 01:31:29,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.91 vs. limit=15.0 2024-09-17 01:31:59,342 INFO [train.py:1198] (0/2) Epoch 7, batch 2700, loss[loss=0.2789, ctc_loss=0.1894, cr_loss=0.4025, attn_decoder_loss=0.2799, over 29505.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1969, cr_loss=0.4228, attn_decoder_loss=0.2808, over 5797319.90 frames. ], batch size: 87, lr: 1.58e-02, grad_scale: 8.0 2024-09-17 01:32:25,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=119440.0, ans=0.0 2024-09-17 01:32:45,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=119520.0, ans=0.025 2024-09-17 01:32:55,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=119520.0, ans=0.1 2024-09-17 01:33:01,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=119560.0, ans=0.125 2024-09-17 01:33:02,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2024-09-17 01:33:14,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119560.0, ans=0.125 2024-09-17 01:33:18,438 INFO [train.py:1198] (0/2) Epoch 7, batch 2750, loss[loss=0.2656, ctc_loss=0.1771, cr_loss=0.4278, attn_decoder_loss=0.266, over 29550.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1957, cr_loss=0.421, attn_decoder_loss=0.2795, over 5794627.61 frames. ], batch size: 75, lr: 1.58e-02, grad_scale: 4.0 2024-09-17 01:33:20,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=119600.0, ans=0.125 2024-09-17 01:33:26,048 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.208e+01 9.926e+01 1.072e+02 1.182e+02 2.176e+02, threshold=2.145e+02, percent-clipped=0.0 2024-09-17 01:33:31,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2024-09-17 01:33:50,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=119680.0, ans=0.0 2024-09-17 01:33:51,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=119680.0, ans=0.125 2024-09-17 01:33:58,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=119680.0, ans=0.125 2024-09-17 01:34:28,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=119760.0, ans=0.0 2024-09-17 01:34:37,079 INFO [train.py:1198] (0/2) Epoch 7, batch 2800, loss[loss=0.3224, ctc_loss=0.2657, cr_loss=0.4801, attn_decoder_loss=0.318, over 20296.00 frames. ], tot_loss[loss=0.2802, ctc_loss=0.1967, cr_loss=0.4224, attn_decoder_loss=0.2801, over 5775822.53 frames. ], batch size: 210, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:34:43,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.27 vs. limit=15.0 2024-09-17 01:34:50,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=119800.0, ans=15.0 2024-09-17 01:35:29,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=119920.0, ans=0.0 2024-09-17 01:35:30,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=119920.0, ans=0.2 2024-09-17 01:35:32,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=119920.0, ans=0.2 2024-09-17 01:35:33,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=119920.0, ans=0.0 2024-09-17 01:35:37,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=119960.0, ans=0.0 2024-09-17 01:35:53,909 INFO [train.py:1198] (0/2) Epoch 7, batch 2850, loss[loss=0.274, ctc_loss=0.1904, cr_loss=0.4272, attn_decoder_loss=0.2738, over 29495.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.1973, cr_loss=0.4228, attn_decoder_loss=0.2806, over 5759917.53 frames. ], batch size: 77, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:36:03,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.909e+01 1.092e+02 1.177e+02 1.435e+02 2.490e+02, threshold=2.355e+02, percent-clipped=3.0 2024-09-17 01:36:05,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=120000.0, ans=0.125 2024-09-17 01:36:20,305 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:36:21,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=120040.0, ans=0.125 2024-09-17 01:36:23,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=120080.0, ans=0.2 2024-09-17 01:36:23,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=120080.0, ans=0.0 2024-09-17 01:36:45,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=120120.0, ans=0.125 2024-09-17 01:36:53,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=120120.0, ans=0.125 2024-09-17 01:37:00,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=120160.0, ans=0.125 2024-09-17 01:37:08,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=120160.0, ans=0.0 2024-09-17 01:37:11,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=120200.0, ans=0.0 2024-09-17 01:37:13,102 INFO [train.py:1198] (0/2) Epoch 7, batch 2900, loss[loss=0.2726, ctc_loss=0.1922, cr_loss=0.4375, attn_decoder_loss=0.2718, over 29432.00 frames. ], tot_loss[loss=0.2815, ctc_loss=0.1978, cr_loss=0.4244, attn_decoder_loss=0.2814, over 5785476.73 frames. ], batch size: 79, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:37:19,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=120200.0, ans=0.0 2024-09-17 01:37:19,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120200.0, ans=0.1 2024-09-17 01:37:42,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=120240.0, ans=0.0 2024-09-17 01:37:43,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-09-17 01:37:49,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120280.0, ans=0.1 2024-09-17 01:37:52,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=120280.0, ans=0.125 2024-09-17 01:37:58,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=120280.0, ans=0.95 2024-09-17 01:37:59,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=120320.0, ans=0.025 2024-09-17 01:38:19,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-17 01:38:31,304 INFO [train.py:1198] (0/2) Epoch 7, batch 2950, loss[loss=0.2545, ctc_loss=0.1708, cr_loss=0.3869, attn_decoder_loss=0.2552, over 29506.00 frames. ], tot_loss[loss=0.2802, ctc_loss=0.1968, cr_loss=0.4232, attn_decoder_loss=0.2801, over 5781915.48 frames. ], batch size: 75, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:38:37,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=120400.0, ans=0.07 2024-09-17 01:38:41,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.289e+01 1.032e+02 1.125e+02 1.263e+02 2.681e+02, threshold=2.250e+02, percent-clipped=2.0 2024-09-17 01:38:42,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=120400.0, ans=0.2 2024-09-17 01:38:53,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=120440.0, ans=0.125 2024-09-17 01:39:02,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=120480.0, ans=0.125 2024-09-17 01:39:10,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.00 vs. limit=15.0 2024-09-17 01:39:34,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=120560.0, ans=0.125 2024-09-17 01:39:47,816 INFO [train.py:1198] (0/2) Epoch 7, batch 3000, loss[loss=0.2657, ctc_loss=0.1773, cr_loss=0.3842, attn_decoder_loss=0.2669, over 29770.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1967, cr_loss=0.4232, attn_decoder_loss=0.2798, over 5782997.53 frames. ], batch size: 81, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:39:47,817 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 01:40:06,235 INFO [train.py:1230] (0/2) Epoch 7, validation: loss=0.2168, ctc_loss=0.05873, cr_loss=4.524e-15, attn_decoder_loss=0.2344, over 944034.00 frames. 2024-09-17 01:40:06,236 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 01:40:11,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=120600.0, ans=0.125 2024-09-17 01:40:20,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=120640.0, ans=0.125 2024-09-17 01:40:32,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=22.5 2024-09-17 01:40:33,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=120640.0, ans=0.0 2024-09-17 01:40:59,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=120720.0, ans=0.0 2024-09-17 01:41:25,955 INFO [train.py:1198] (0/2) Epoch 7, batch 3050, loss[loss=0.276, ctc_loss=0.1973, cr_loss=0.4495, attn_decoder_loss=0.2747, over 29540.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.1979, cr_loss=0.4253, attn_decoder_loss=0.2811, over 5777187.77 frames. ], batch size: 76, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:41:40,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.612e+01 1.070e+02 1.194e+02 1.343e+02 6.918e+02, threshold=2.387e+02, percent-clipped=4.0 2024-09-17 01:41:41,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2024-09-17 01:41:51,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=120840.0, ans=0.125 2024-09-17 01:41:57,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=120880.0, ans=0.125 2024-09-17 01:42:15,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=120920.0, ans=0.125 2024-09-17 01:42:44,149 INFO [train.py:1198] (0/2) Epoch 7, batch 3100, loss[loss=0.3005, ctc_loss=0.2132, cr_loss=0.4504, attn_decoder_loss=0.3001, over 29147.00 frames. ], tot_loss[loss=0.2812, ctc_loss=0.1979, cr_loss=0.4246, attn_decoder_loss=0.281, over 5776684.40 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 8.0 2024-09-17 01:42:51,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=121000.0, ans=0.125 2024-09-17 01:42:56,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=121000.0, ans=0.125 2024-09-17 01:43:45,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121160.0, ans=0.125 2024-09-17 01:43:47,078 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:43:48,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=121160.0, ans=0.0 2024-09-17 01:44:00,605 INFO [train.py:1198] (0/2) Epoch 7, batch 3150, loss[loss=0.2994, ctc_loss=0.214, cr_loss=0.4526, attn_decoder_loss=0.2989, over 28912.00 frames. ], tot_loss[loss=0.2808, ctc_loss=0.1976, cr_loss=0.4242, attn_decoder_loss=0.2806, over 5783086.86 frames. ], batch size: 104, lr: 1.57e-02, grad_scale: 4.0 2024-09-17 01:44:14,324 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.541e+01 1.004e+02 1.097e+02 1.266e+02 2.300e+02, threshold=2.194e+02, percent-clipped=0.0 2024-09-17 01:44:29,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=121240.0, ans=0.125 2024-09-17 01:44:38,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121280.0, ans=0.0 2024-09-17 01:44:41,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=121280.0, ans=0.125 2024-09-17 01:44:43,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=121280.0, ans=0.0 2024-09-17 01:44:44,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=121280.0, ans=0.05 2024-09-17 01:44:52,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=121320.0, ans=0.125 2024-09-17 01:45:01,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=121320.0, ans=0.125 2024-09-17 01:45:10,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=121360.0, ans=0.0 2024-09-17 01:45:15,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=121360.0, ans=0.125 2024-09-17 01:45:19,449 INFO [train.py:1198] (0/2) Epoch 7, batch 3200, loss[loss=0.2731, ctc_loss=0.1936, cr_loss=0.426, attn_decoder_loss=0.2725, over 29395.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1964, cr_loss=0.4236, attn_decoder_loss=0.2798, over 5792658.06 frames. ], batch size: 79, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:45:43,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=121440.0, ans=0.125 2024-09-17 01:45:52,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-17 01:45:57,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=121480.0, ans=0.0 2024-09-17 01:46:05,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-09-17 01:46:12,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-17 01:46:25,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=121560.0, ans=15.0 2024-09-17 01:46:27,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=121560.0, ans=0.09899494936611666 2024-09-17 01:46:38,563 INFO [train.py:1198] (0/2) Epoch 7, batch 3250, loss[loss=0.2819, ctc_loss=0.1924, cr_loss=0.4183, attn_decoder_loss=0.2826, over 29689.00 frames. ], tot_loss[loss=0.2799, ctc_loss=0.1961, cr_loss=0.4238, attn_decoder_loss=0.2798, over 5799438.73 frames. ], batch size: 84, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:46:46,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=121600.0, ans=0.125 2024-09-17 01:46:47,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2024-09-17 01:46:53,779 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.273e+01 1.020e+02 1.119e+02 1.210e+02 1.676e+02, threshold=2.238e+02, percent-clipped=0.0 2024-09-17 01:47:19,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=121680.0, ans=0.2 2024-09-17 01:47:21,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=121680.0, ans=0.025 2024-09-17 01:47:21,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2024-09-17 01:47:23,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=121720.0, ans=0.125 2024-09-17 01:47:30,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=121720.0, ans=0.0 2024-09-17 01:47:32,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=121720.0, ans=0.125 2024-09-17 01:47:35,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121720.0, ans=0.0 2024-09-17 01:47:46,216 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.15 vs. limit=10.0 2024-09-17 01:47:54,941 INFO [train.py:1198] (0/2) Epoch 7, batch 3300, loss[loss=0.2882, ctc_loss=0.204, cr_loss=0.4115, attn_decoder_loss=0.2884, over 28245.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.1951, cr_loss=0.4219, attn_decoder_loss=0.2785, over 5795903.75 frames. ], batch size: 111, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:48:28,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121880.0, ans=0.1 2024-09-17 01:48:30,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=121880.0, ans=0.0 2024-09-17 01:48:34,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=121880.0, ans=0.125 2024-09-17 01:49:13,988 INFO [train.py:1198] (0/2) Epoch 7, batch 3350, loss[loss=0.2785, ctc_loss=0.1842, cr_loss=0.3851, attn_decoder_loss=0.2805, over 28928.00 frames. ], tot_loss[loss=0.2802, ctc_loss=0.197, cr_loss=0.4241, attn_decoder_loss=0.28, over 5774451.82 frames. ], batch size: 104, lr: 1.56e-02, grad_scale: 4.0 2024-09-17 01:49:19,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-17 01:49:20,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=122000.0, ans=0.0 2024-09-17 01:49:20,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=122000.0, ans=0.125 2024-09-17 01:49:32,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.051e+01 1.075e+02 1.159e+02 1.381e+02 2.720e+02, threshold=2.319e+02, percent-clipped=3.0 2024-09-17 01:49:33,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2024-09-17 01:50:03,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=122120.0, ans=0.1 2024-09-17 01:50:10,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.40 vs. limit=10.0 2024-09-17 01:50:14,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=122120.0, ans=0.125 2024-09-17 01:50:28,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=122160.0, ans=0.2 2024-09-17 01:50:32,406 INFO [train.py:1198] (0/2) Epoch 7, batch 3400, loss[loss=0.2445, ctc_loss=0.1584, cr_loss=0.3683, attn_decoder_loss=0.2459, over 29375.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1967, cr_loss=0.4235, attn_decoder_loss=0.2798, over 5767530.04 frames. ], batch size: 67, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:50:38,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=122200.0, ans=0.125 2024-09-17 01:51:33,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=122360.0, ans=0.2 2024-09-17 01:51:35,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=122360.0, ans=0.125 2024-09-17 01:51:45,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=122360.0, ans=0.09899494936611666 2024-09-17 01:51:48,684 INFO [train.py:1198] (0/2) Epoch 7, batch 3450, loss[loss=0.2896, ctc_loss=0.2005, cr_loss=0.419, attn_decoder_loss=0.2902, over 28475.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.197, cr_loss=0.4245, attn_decoder_loss=0.2804, over 5775822.42 frames. ], batch size: 112, lr: 1.56e-02, grad_scale: 4.0 2024-09-17 01:51:54,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-09-17 01:52:09,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.695e+01 1.040e+02 1.098e+02 1.235e+02 2.393e+02, threshold=2.195e+02, percent-clipped=1.0 2024-09-17 01:52:29,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=122480.0, ans=0.025 2024-09-17 01:52:35,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=122520.0, ans=0.0 2024-09-17 01:52:38,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=122520.0, ans=0.125 2024-09-17 01:52:47,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=122520.0, ans=0.125 2024-09-17 01:52:56,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=122560.0, ans=0.125 2024-09-17 01:53:07,003 INFO [train.py:1198] (0/2) Epoch 7, batch 3500, loss[loss=0.2562, ctc_loss=0.1727, cr_loss=0.3782, attn_decoder_loss=0.2571, over 29304.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1958, cr_loss=0.4228, attn_decoder_loss=0.2795, over 5777549.52 frames. ], batch size: 71, lr: 1.56e-02, grad_scale: 8.0 2024-09-17 01:53:13,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=122600.0, ans=0.07 2024-09-17 01:53:19,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=122600.0, ans=0.0 2024-09-17 01:53:53,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=122720.0, ans=0.025 2024-09-17 01:53:56,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=122720.0, ans=0.0 2024-09-17 01:53:57,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122720.0, ans=0.1 2024-09-17 01:54:23,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=122800.0, ans=0.125 2024-09-17 01:54:24,518 INFO [train.py:1198] (0/2) Epoch 7, batch 3550, loss[loss=0.288, ctc_loss=0.2002, cr_loss=0.4234, attn_decoder_loss=0.2884, over 29694.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.1951, cr_loss=0.4225, attn_decoder_loss=0.2792, over 5783676.84 frames. ], batch size: 89, lr: 1.56e-02, grad_scale: 4.0 2024-09-17 01:54:36,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=122800.0, ans=0.2 2024-09-17 01:54:43,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.552e+01 9.824e+01 1.101e+02 1.214e+02 1.774e+02, threshold=2.203e+02, percent-clipped=0.0 2024-09-17 01:54:53,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=122880.0, ans=0.125 2024-09-17 01:55:09,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=122920.0, ans=0.125 2024-09-17 01:55:11,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122920.0, ans=0.1 2024-09-17 01:55:20,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=122920.0, ans=0.5 2024-09-17 01:55:39,586 INFO [train.py:1198] (0/2) Epoch 7, batch 3600, loss[loss=0.2726, ctc_loss=0.1845, cr_loss=0.4073, attn_decoder_loss=0.2733, over 29486.00 frames. ], tot_loss[loss=0.2791, ctc_loss=0.1948, cr_loss=0.4219, attn_decoder_loss=0.2791, over 5792408.76 frames. ], batch size: 77, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 01:55:51,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=123000.0, ans=0.2 2024-09-17 01:56:05,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123040.0, ans=0.1 2024-09-17 01:56:13,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=123080.0, ans=0.0 2024-09-17 01:56:31,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=123120.0, ans=0.0 2024-09-17 01:56:43,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123160.0, ans=0.1 2024-09-17 01:56:55,559 INFO [train.py:1198] (0/2) Epoch 7, batch 3650, loss[loss=0.3016, ctc_loss=0.2149, cr_loss=0.4619, attn_decoder_loss=0.301, over 29491.00 frames. ], tot_loss[loss=0.2784, ctc_loss=0.194, cr_loss=0.4211, attn_decoder_loss=0.2784, over 5793010.91 frames. ], batch size: 90, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 01:56:58,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=123200.0, ans=0.125 2024-09-17 01:57:12,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=123240.0, ans=0.1 2024-09-17 01:57:14,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-17 01:57:16,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.280e+01 1.041e+02 1.137e+02 1.251e+02 2.329e+02, threshold=2.273e+02, percent-clipped=0.0 2024-09-17 01:57:33,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=123280.0, ans=0.0 2024-09-17 01:58:13,141 INFO [train.py:1198] (0/2) Epoch 7, batch 3700, loss[loss=0.2866, ctc_loss=0.1983, cr_loss=0.4287, attn_decoder_loss=0.2868, over 29711.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1942, cr_loss=0.422, attn_decoder_loss=0.2789, over 5803762.14 frames. ], batch size: 84, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 01:58:32,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=123440.0, ans=0.025 2024-09-17 01:58:36,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-17 01:58:49,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.00 vs. limit=12.0 2024-09-17 01:58:53,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=123480.0, ans=0.2 2024-09-17 01:59:13,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=123560.0, ans=0.025 2024-09-17 01:59:28,054 INFO [train.py:1198] (0/2) Epoch 7, batch 3750, loss[loss=0.2488, ctc_loss=0.1625, cr_loss=0.3664, attn_decoder_loss=0.2503, over 29355.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.194, cr_loss=0.4222, attn_decoder_loss=0.2785, over 5808121.86 frames. ], batch size: 67, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 01:59:45,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123640.0, ans=0.1 2024-09-17 01:59:50,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.690e+01 1.049e+02 1.152e+02 1.342e+02 3.942e+02, threshold=2.304e+02, percent-clipped=2.0 2024-09-17 01:59:55,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-09-17 01:59:56,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=123680.0, ans=0.0 2024-09-17 01:59:57,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-17 02:00:16,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=123720.0, ans=0.125 2024-09-17 02:00:42,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=123760.0, ans=0.1 2024-09-17 02:00:45,074 INFO [train.py:1198] (0/2) Epoch 7, batch 3800, loss[loss=0.2854, ctc_loss=0.1891, cr_loss=0.4356, attn_decoder_loss=0.2864, over 29629.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.1941, cr_loss=0.4218, attn_decoder_loss=0.2781, over 5798328.29 frames. ], batch size: 86, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 02:00:46,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=123800.0, ans=0.125 2024-09-17 02:00:57,357 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:01:06,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=123840.0, ans=0.125 2024-09-17 02:01:11,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2024-09-17 02:01:12,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=123840.0, ans=0.0 2024-09-17 02:01:32,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123920.0, ans=0.0 2024-09-17 02:01:40,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=123920.0, ans=0.0 2024-09-17 02:01:49,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123960.0, ans=0.125 2024-09-17 02:02:00,444 INFO [train.py:1198] (0/2) Epoch 7, batch 3850, loss[loss=0.3026, ctc_loss=0.2168, cr_loss=0.4539, attn_decoder_loss=0.3021, over 29214.00 frames. ], tot_loss[loss=0.2782, ctc_loss=0.194, cr_loss=0.4216, attn_decoder_loss=0.2782, over 5811737.44 frames. ], batch size: 100, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 02:02:24,324 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.024e+02 1.121e+02 1.176e+02 2.647e+02, threshold=2.243e+02, percent-clipped=2.0 2024-09-17 02:02:25,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2024-09-17 02:02:33,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=124080.0, ans=0.035 2024-09-17 02:02:39,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=124080.0, ans=0.125 2024-09-17 02:02:41,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=124080.0, ans=0.125 2024-09-17 02:03:15,233 INFO [train.py:1198] (0/2) Epoch 7, batch 3900, loss[loss=0.3031, ctc_loss=0.2199, cr_loss=0.4761, attn_decoder_loss=0.3018, over 29631.00 frames. ], tot_loss[loss=0.2786, ctc_loss=0.1941, cr_loss=0.4216, attn_decoder_loss=0.2786, over 5816266.71 frames. ], batch size: 86, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 02:04:04,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.66 vs. limit=15.0 2024-09-17 02:04:12,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=124320.0, ans=0.125 2024-09-17 02:04:12,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=124320.0, ans=0.125 2024-09-17 02:04:18,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=124360.0, ans=0.05 2024-09-17 02:04:20,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-17 02:04:31,847 INFO [train.py:1198] (0/2) Epoch 7, batch 3950, loss[loss=0.289, ctc_loss=0.1947, cr_loss=0.411, attn_decoder_loss=0.2904, over 29489.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1936, cr_loss=0.4216, attn_decoder_loss=0.2786, over 5835806.47 frames. ], batch size: 97, lr: 1.55e-02, grad_scale: 4.0 2024-09-17 02:04:32,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-17 02:04:57,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.925e+01 1.017e+02 1.080e+02 1.236e+02 3.410e+02, threshold=2.160e+02, percent-clipped=1.0 2024-09-17 02:05:02,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.95 vs. limit=15.0 2024-09-17 02:05:15,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=124520.0, ans=0.125 2024-09-17 02:05:46,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124600.0, ans=0.125 2024-09-17 02:05:47,556 INFO [train.py:1198] (0/2) Epoch 7, batch 4000, loss[loss=0.2624, ctc_loss=0.1782, cr_loss=0.4027, attn_decoder_loss=0.2628, over 29500.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.1942, cr_loss=0.4213, attn_decoder_loss=0.2788, over 5812064.47 frames. ], batch size: 74, lr: 1.55e-02, grad_scale: 8.0 2024-09-17 02:05:56,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=124600.0, ans=0.125 2024-09-17 02:06:06,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=124640.0, ans=0.125 2024-09-17 02:06:16,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2024-09-17 02:06:44,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.59 vs. limit=22.5 2024-09-17 02:06:45,176 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:06:47,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124760.0, ans=0.1 2024-09-17 02:06:50,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=124760.0, ans=0.125 2024-09-17 02:06:59,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=124760.0, ans=0.1 2024-09-17 02:07:02,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-09-17 02:07:02,864 INFO [train.py:1198] (0/2) Epoch 7, batch 4050, loss[loss=0.3218, ctc_loss=0.2749, cr_loss=0.4309, attn_decoder_loss=0.3174, over 20009.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1942, cr_loss=0.4211, attn_decoder_loss=0.2786, over 5796146.67 frames. ], batch size: 209, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:07:05,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2024-09-17 02:07:29,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.830e+01 1.037e+02 1.133e+02 1.279e+02 3.685e+02, threshold=2.266e+02, percent-clipped=2.0 2024-09-17 02:07:35,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=124880.0, ans=0.09899494936611666 2024-09-17 02:07:38,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=124880.0, ans=0.125 2024-09-17 02:07:53,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=124920.0, ans=0.0 2024-09-17 02:08:05,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=124960.0, ans=0.125 2024-09-17 02:08:05,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124960.0, ans=0.125 2024-09-17 02:08:18,296 INFO [train.py:1198] (0/2) Epoch 7, batch 4100, loss[loss=0.2996, ctc_loss=0.2221, cr_loss=0.4893, attn_decoder_loss=0.2973, over 29507.00 frames. ], tot_loss[loss=0.2791, ctc_loss=0.1948, cr_loss=0.4223, attn_decoder_loss=0.2791, over 5791299.91 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:08:21,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=125000.0, ans=10.0 2024-09-17 02:08:28,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=125000.0, ans=0.1 2024-09-17 02:09:01,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=125120.0, ans=0.0 2024-09-17 02:09:17,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=6.0 2024-09-17 02:09:21,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=22.5 2024-09-17 02:09:26,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-09-17 02:09:32,348 INFO [train.py:1198] (0/2) Epoch 7, batch 4150, loss[loss=0.2726, ctc_loss=0.1932, cr_loss=0.4336, attn_decoder_loss=0.2718, over 29496.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.1946, cr_loss=0.4219, attn_decoder_loss=0.2787, over 5797734.40 frames. ], batch size: 77, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:09:39,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=15.0 2024-09-17 02:09:46,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=125240.0, ans=0.125 2024-09-17 02:09:49,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=125240.0, ans=0.025 2024-09-17 02:09:52,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=125240.0, ans=0.07 2024-09-17 02:10:01,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.306e+01 1.018e+02 1.090e+02 1.211e+02 2.746e+02, threshold=2.181e+02, percent-clipped=3.0 2024-09-17 02:10:04,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=125280.0, ans=0.2 2024-09-17 02:10:06,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.06 vs. limit=10.0 2024-09-17 02:10:47,445 INFO [train.py:1198] (0/2) Epoch 7, batch 4200, loss[loss=0.2911, ctc_loss=0.2045, cr_loss=0.4386, attn_decoder_loss=0.291, over 29510.00 frames. ], tot_loss[loss=0.279, ctc_loss=0.1945, cr_loss=0.4222, attn_decoder_loss=0.279, over 5800281.02 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:10:49,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=125400.0, ans=0.0 2024-09-17 02:11:11,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=125440.0, ans=0.07 2024-09-17 02:11:27,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=125480.0, ans=0.125 2024-09-17 02:11:38,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=125520.0, ans=0.2 2024-09-17 02:11:41,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=125520.0, ans=0.125 2024-09-17 02:11:43,962 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:11:43,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=125520.0, ans=0.125 2024-09-17 02:12:02,720 INFO [train.py:1198] (0/2) Epoch 7, batch 4250, loss[loss=0.2433, ctc_loss=0.1456, cr_loss=0.3557, attn_decoder_loss=0.2463, over 29504.00 frames. ], tot_loss[loss=0.2791, ctc_loss=0.1942, cr_loss=0.4227, attn_decoder_loss=0.2792, over 5806515.20 frames. ], batch size: 74, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:12:16,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=125640.0, ans=0.125 2024-09-17 02:12:22,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=125640.0, ans=0.125 2024-09-17 02:12:30,969 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:12:31,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.603e+01 1.048e+02 1.150e+02 1.288e+02 2.522e+02, threshold=2.299e+02, percent-clipped=2.0 2024-09-17 02:12:47,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=125720.0, ans=0.125 2024-09-17 02:12:59,281 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:13:16,849 INFO [train.py:1198] (0/2) Epoch 7, batch 4300, loss[loss=0.2865, ctc_loss=0.2049, cr_loss=0.4451, attn_decoder_loss=0.2857, over 29541.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.1944, cr_loss=0.4224, attn_decoder_loss=0.2793, over 5795686.56 frames. ], batch size: 87, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:13:17,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=125800.0, ans=0.05 2024-09-17 02:13:29,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=125800.0, ans=0.1 2024-09-17 02:13:29,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=125800.0, ans=0.07 2024-09-17 02:13:52,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=125880.0, ans=0.025 2024-09-17 02:13:55,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=125880.0, ans=0.0 2024-09-17 02:13:58,372 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:14:04,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125920.0, ans=0.1 2024-09-17 02:14:13,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=125920.0, ans=0.125 2024-09-17 02:14:20,831 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:14:31,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.39 vs. limit=10.0 2024-09-17 02:14:32,532 INFO [train.py:1198] (0/2) Epoch 7, batch 4350, loss[loss=0.2934, ctc_loss=0.2077, cr_loss=0.4622, attn_decoder_loss=0.2926, over 29464.00 frames. ], tot_loss[loss=0.2826, ctc_loss=0.1974, cr_loss=0.4269, attn_decoder_loss=0.2826, over 5798633.64 frames. ], batch size: 97, lr: 1.54e-02, grad_scale: 4.0 2024-09-17 02:14:39,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-17 02:14:42,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-09-17 02:14:55,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=126040.0, ans=0.125 2024-09-17 02:15:04,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.068e+01 1.042e+02 1.125e+02 1.257e+02 6.277e+02, threshold=2.251e+02, percent-clipped=2.0 2024-09-17 02:15:10,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-17 02:15:21,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2024-09-17 02:15:32,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=126160.0, ans=0.125 2024-09-17 02:15:34,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=126160.0, ans=0.125 2024-09-17 02:15:39,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-09-17 02:15:44,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=126160.0, ans=0.02 2024-09-17 02:15:47,176 INFO [train.py:1198] (0/2) Epoch 7, batch 4400, loss[loss=0.3023, ctc_loss=0.2295, cr_loss=0.4685, attn_decoder_loss=0.3, over 27388.00 frames. ], tot_loss[loss=0.2854, ctc_loss=0.2003, cr_loss=0.43, attn_decoder_loss=0.2853, over 5769376.10 frames. ], batch size: 124, lr: 1.54e-02, grad_scale: 8.0 2024-09-17 02:15:47,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=126200.0, ans=0.125 2024-09-17 02:15:54,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=126200.0, ans=0.5 2024-09-17 02:15:56,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=126200.0, ans=0.125 2024-09-17 02:15:57,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=126200.0, ans=0.0 2024-09-17 02:16:22,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-09-17 02:16:24,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=126280.0, ans=0.1 2024-09-17 02:16:33,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=126320.0, ans=0.2 2024-09-17 02:16:38,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=126320.0, ans=0.125 2024-09-17 02:16:40,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=126320.0, ans=0.2 2024-09-17 02:16:56,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2024-09-17 02:17:03,047 INFO [train.py:1198] (0/2) Epoch 7, batch 4450, loss[loss=0.3192, ctc_loss=0.2661, cr_loss=0.4728, attn_decoder_loss=0.3146, over 19930.00 frames. ], tot_loss[loss=0.2886, ctc_loss=0.2058, cr_loss=0.4328, attn_decoder_loss=0.2882, over 5581612.91 frames. ], batch size: 210, lr: 1.53e-02, grad_scale: 4.0 2024-09-17 02:17:29,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=126440.0, ans=0.1 2024-09-17 02:17:36,225 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.504e+01 1.059e+02 1.182e+02 1.268e+02 2.368e+02, threshold=2.364e+02, percent-clipped=1.0 2024-09-17 02:17:47,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126520.0, ans=0.1 2024-09-17 02:18:05,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=126560.0, ans=0.125 2024-09-17 02:18:19,606 INFO [train.py:1198] (0/2) Epoch 7, batch 4500, loss[loss=0.3093, ctc_loss=0.2505, cr_loss=0.4358, attn_decoder_loss=0.3061, over 19347.00 frames. ], tot_loss[loss=0.2928, ctc_loss=0.2138, cr_loss=0.4345, attn_decoder_loss=0.2919, over 5238966.95 frames. ], batch size: 209, lr: 1.53e-02, grad_scale: 8.0 2024-09-17 02:18:20,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2024-09-17 02:18:22,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.08 vs. limit=10.0 2024-09-17 02:18:23,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=126600.0, ans=0.125 2024-09-17 02:18:41,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=126640.0, ans=15.0 2024-09-17 02:18:53,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2024-09-17 02:18:56,935 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-7.pt 2024-09-17 02:19:46,523 INFO [train.py:1198] (0/2) Epoch 8, batch 0, loss[loss=0.2652, ctc_loss=0.1734, cr_loss=0.3856, attn_decoder_loss=0.2668, over 29588.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1734, cr_loss=0.3856, attn_decoder_loss=0.2668, over 29588.00 frames. ], batch size: 73, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:19:46,524 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 02:20:02,854 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.1746, 6.0801, 5.6649, 5.9205], device='cuda:0') 2024-09-17 02:20:03,699 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9685, 3.5046, 3.7976, 3.8010], device='cuda:0') 2024-09-17 02:20:04,922 INFO [train.py:1230] (0/2) Epoch 8, validation: loss=0.2208, ctc_loss=0.05894, cr_loss=4.762e-15, attn_decoder_loss=0.2387, over 944034.00 frames. 2024-09-17 02:20:04,923 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 02:20:44,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=126780.0, ans=0.025 2024-09-17 02:21:19,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.753e+01 1.155e+02 1.254e+02 1.387e+02 1.225e+03, threshold=2.508e+02, percent-clipped=2.0 2024-09-17 02:21:20,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=22.5 2024-09-17 02:21:20,943 INFO [train.py:1198] (0/2) Epoch 8, batch 50, loss[loss=0.2492, ctc_loss=0.1705, cr_loss=0.4143, attn_decoder_loss=0.2487, over 29449.00 frames. ], tot_loss[loss=0.2813, ctc_loss=0.1992, cr_loss=0.4258, attn_decoder_loss=0.281, over 1268320.88 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:21:24,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=126900.0, ans=0.125 2024-09-17 02:22:21,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.56 vs. limit=15.0 2024-09-17 02:22:30,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=127060.0, ans=0.0 2024-09-17 02:22:41,947 INFO [train.py:1198] (0/2) Epoch 8, batch 100, loss[loss=0.2669, ctc_loss=0.1831, cr_loss=0.4115, attn_decoder_loss=0.267, over 29541.00 frames. ], tot_loss[loss=0.2827, ctc_loss=0.1983, cr_loss=0.4267, attn_decoder_loss=0.2826, over 2251626.63 frames. ], batch size: 76, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:22:42,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127100.0, ans=0.1 2024-09-17 02:23:14,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.35 vs. limit=12.0 2024-09-17 02:23:18,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=127180.0, ans=0.2 2024-09-17 02:23:18,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2024-09-17 02:23:34,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=127220.0, ans=0.125 2024-09-17 02:23:37,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2024-09-17 02:23:39,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=127220.0, ans=0.0 2024-09-17 02:23:39,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=127220.0, ans=0.125 2024-09-17 02:23:48,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=127260.0, ans=0.125 2024-09-17 02:23:55,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=127300.0, ans=0.0 2024-09-17 02:23:56,928 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.517e+01 1.081e+02 1.202e+02 1.454e+02 2.807e+02, threshold=2.403e+02, percent-clipped=1.0 2024-09-17 02:23:56,954 INFO [train.py:1198] (0/2) Epoch 8, batch 150, loss[loss=0.2498, ctc_loss=0.1653, cr_loss=0.3858, attn_decoder_loss=0.2506, over 29449.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.195, cr_loss=0.4235, attn_decoder_loss=0.2796, over 3047043.81 frames. ], batch size: 70, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:24:35,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=127380.0, ans=0.125 2024-09-17 02:24:45,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=127420.0, ans=0.95 2024-09-17 02:24:54,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=127420.0, ans=0.025 2024-09-17 02:24:57,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127460.0, ans=0.1 2024-09-17 02:25:12,582 INFO [train.py:1198] (0/2) Epoch 8, batch 200, loss[loss=0.2874, ctc_loss=0.2039, cr_loss=0.4189, attn_decoder_loss=0.2874, over 27500.00 frames. ], tot_loss[loss=0.2777, ctc_loss=0.1927, cr_loss=0.4212, attn_decoder_loss=0.2778, over 3660159.46 frames. ], batch size: 124, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:25:23,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=127500.0, ans=0.0 2024-09-17 02:25:31,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.85 vs. limit=10.0 2024-09-17 02:25:34,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=127540.0, ans=0.125 2024-09-17 02:25:57,143 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:26:19,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=127660.0, ans=0.95 2024-09-17 02:26:25,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=127660.0, ans=0.09899494936611666 2024-09-17 02:26:27,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-17 02:26:33,476 INFO [train.py:1198] (0/2) Epoch 8, batch 250, loss[loss=0.29, ctc_loss=0.1985, cr_loss=0.4372, attn_decoder_loss=0.2905, over 29198.00 frames. ], tot_loss[loss=0.277, ctc_loss=0.1915, cr_loss=0.4198, attn_decoder_loss=0.2771, over 4142718.39 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:26:33,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=127700.0, ans=0.2 2024-09-17 02:26:34,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 9.771e+01 1.014e+02 1.103e+02 1.585e+02, threshold=2.028e+02, percent-clipped=0.0 2024-09-17 02:26:47,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=127740.0, ans=0.0 2024-09-17 02:27:00,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=127740.0, ans=0.125 2024-09-17 02:27:03,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=127780.0, ans=0.125 2024-09-17 02:27:03,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127780.0, ans=0.1 2024-09-17 02:27:12,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2024-09-17 02:27:17,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=127820.0, ans=0.1 2024-09-17 02:27:48,966 INFO [train.py:1198] (0/2) Epoch 8, batch 300, loss[loss=0.2965, ctc_loss=0.2105, cr_loss=0.4632, attn_decoder_loss=0.2958, over 29534.00 frames. ], tot_loss[loss=0.277, ctc_loss=0.1915, cr_loss=0.4207, attn_decoder_loss=0.2771, over 4511923.77 frames. ], batch size: 92, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:27:50,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127900.0, ans=0.1 2024-09-17 02:27:58,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=127900.0, ans=0.0 2024-09-17 02:28:14,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=127940.0, ans=0.125 2024-09-17 02:28:22,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=127980.0, ans=0.125 2024-09-17 02:28:22,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=127980.0, ans=0.125 2024-09-17 02:28:25,487 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-32000.pt 2024-09-17 02:29:12,042 INFO [train.py:1198] (0/2) Epoch 8, batch 350, loss[loss=0.2351, ctc_loss=0.1481, cr_loss=0.3507, attn_decoder_loss=0.237, over 29325.00 frames. ], tot_loss[loss=0.2768, ctc_loss=0.1907, cr_loss=0.4201, attn_decoder_loss=0.277, over 4797536.39 frames. ], batch size: 71, lr: 1.44e-02, grad_scale: 4.0 2024-09-17 02:29:13,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=128100.0, ans=0.2 2024-09-17 02:29:14,916 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.409e+01 1.011e+02 1.095e+02 1.201e+02 2.476e+02, threshold=2.189e+02, percent-clipped=3.0 2024-09-17 02:29:19,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=128100.0, ans=0.0 2024-09-17 02:29:38,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=128140.0, ans=0.09899494936611666 2024-09-17 02:29:38,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=128140.0, ans=10.0 2024-09-17 02:29:59,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=128220.0, ans=0.125 2024-09-17 02:30:27,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=128260.0, ans=0.125 2024-09-17 02:30:29,954 INFO [train.py:1198] (0/2) Epoch 8, batch 400, loss[loss=0.275, ctc_loss=0.1879, cr_loss=0.4191, attn_decoder_loss=0.2753, over 29712.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1901, cr_loss=0.4199, attn_decoder_loss=0.2764, over 5026705.29 frames. ], batch size: 82, lr: 1.44e-02, grad_scale: 8.0 2024-09-17 02:30:54,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=128340.0, ans=0.0 2024-09-17 02:31:07,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.39 vs. limit=10.0 2024-09-17 02:31:15,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=128380.0, ans=0.125 2024-09-17 02:31:27,813 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:31:39,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128460.0, ans=0.0 2024-09-17 02:31:44,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=128460.0, ans=0.125 2024-09-17 02:31:48,774 INFO [train.py:1198] (0/2) Epoch 8, batch 450, loss[loss=0.2764, ctc_loss=0.1863, cr_loss=0.4255, attn_decoder_loss=0.277, over 29685.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1898, cr_loss=0.419, attn_decoder_loss=0.2763, over 5188338.28 frames. ], batch size: 83, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:31:53,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 1.003e+02 1.077e+02 1.187e+02 3.906e+02, threshold=2.154e+02, percent-clipped=1.0 2024-09-17 02:32:11,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=128540.0, ans=0.5 2024-09-17 02:32:11,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=128540.0, ans=0.2 2024-09-17 02:33:05,016 INFO [train.py:1198] (0/2) Epoch 8, batch 500, loss[loss=0.2981, ctc_loss=0.2159, cr_loss=0.4642, attn_decoder_loss=0.297, over 29382.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1892, cr_loss=0.4179, attn_decoder_loss=0.2756, over 5330994.94 frames. ], batch size: 94, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:33:14,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=128700.0, ans=0.0 2024-09-17 02:33:19,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128740.0, ans=0.1 2024-09-17 02:33:26,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=128740.0, ans=0.025 2024-09-17 02:33:39,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.16 vs. limit=10.0 2024-09-17 02:33:56,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=128820.0, ans=0.125 2024-09-17 02:34:06,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=128860.0, ans=0.125 2024-09-17 02:34:23,770 INFO [train.py:1198] (0/2) Epoch 8, batch 550, loss[loss=0.3028, ctc_loss=0.2194, cr_loss=0.4613, attn_decoder_loss=0.3018, over 28775.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1898, cr_loss=0.4189, attn_decoder_loss=0.2759, over 5423105.24 frames. ], batch size: 104, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:34:32,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.612e+01 1.023e+02 1.117e+02 1.226e+02 1.997e+02, threshold=2.234e+02, percent-clipped=0.0 2024-09-17 02:34:47,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=128940.0, ans=0.125 2024-09-17 02:34:51,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=128940.0, ans=0.125 2024-09-17 02:35:02,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=128980.0, ans=0.2 2024-09-17 02:35:11,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=129020.0, ans=0.07 2024-09-17 02:35:19,376 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:35:20,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=129020.0, ans=0.125 2024-09-17 02:35:41,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.62 vs. limit=15.0 2024-09-17 02:35:43,599 INFO [train.py:1198] (0/2) Epoch 8, batch 600, loss[loss=0.3038, ctc_loss=0.2186, cr_loss=0.4544, attn_decoder_loss=0.3032, over 29259.00 frames. ], tot_loss[loss=0.2761, ctc_loss=0.1901, cr_loss=0.4201, attn_decoder_loss=0.2763, over 5510160.71 frames. ], batch size: 100, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:35:43,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=129100.0, ans=0.2 2024-09-17 02:36:02,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129140.0, ans=0.1 2024-09-17 02:36:37,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.68 vs. limit=15.0 2024-09-17 02:36:38,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129220.0, ans=0.1 2024-09-17 02:36:53,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=129260.0, ans=0.07 2024-09-17 02:36:55,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=129260.0, ans=0.125 2024-09-17 02:36:59,411 INFO [train.py:1198] (0/2) Epoch 8, batch 650, loss[loss=0.282, ctc_loss=0.1989, cr_loss=0.4537, attn_decoder_loss=0.2812, over 29771.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1891, cr_loss=0.4189, attn_decoder_loss=0.2757, over 5587171.48 frames. ], batch size: 81, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:37:04,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=129300.0, ans=0.0 2024-09-17 02:37:05,489 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.505e+01 9.950e+01 1.082e+02 1.181e+02 2.497e+02, threshold=2.164e+02, percent-clipped=2.0 2024-09-17 02:37:23,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-09-17 02:37:38,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129380.0, ans=0.1 2024-09-17 02:37:50,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=129420.0, ans=0.015 2024-09-17 02:38:09,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=15.0 2024-09-17 02:38:15,949 INFO [train.py:1198] (0/2) Epoch 8, batch 700, loss[loss=0.2735, ctc_loss=0.191, cr_loss=0.4037, attn_decoder_loss=0.2737, over 29523.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1893, cr_loss=0.4187, attn_decoder_loss=0.2763, over 5637076.73 frames. ], batch size: 76, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:38:16,252 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:38:24,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=129500.0, ans=0.09899494936611666 2024-09-17 02:38:48,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=129540.0, ans=0.2 2024-09-17 02:38:51,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=129580.0, ans=0.025 2024-09-17 02:38:59,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=129580.0, ans=0.125 2024-09-17 02:39:06,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=12.0 2024-09-17 02:39:07,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=129620.0, ans=0.0 2024-09-17 02:39:18,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=129620.0, ans=0.1 2024-09-17 02:39:21,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.34 vs. limit=22.5 2024-09-17 02:39:37,333 INFO [train.py:1198] (0/2) Epoch 8, batch 750, loss[loss=0.2868, ctc_loss=0.1985, cr_loss=0.4734, attn_decoder_loss=0.2861, over 29702.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1887, cr_loss=0.418, attn_decoder_loss=0.2755, over 5674976.64 frames. ], batch size: 82, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:39:42,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=129700.0, ans=0.125 2024-09-17 02:39:46,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.613e+01 1.021e+02 1.093e+02 1.208e+02 3.929e+02, threshold=2.185e+02, percent-clipped=1.0 2024-09-17 02:39:46,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=129700.0, ans=0.2 2024-09-17 02:40:22,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=129820.0, ans=15.0 2024-09-17 02:40:40,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-09-17 02:40:41,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=129860.0, ans=0.0 2024-09-17 02:40:53,428 INFO [train.py:1198] (0/2) Epoch 8, batch 800, loss[loss=0.2555, ctc_loss=0.1737, cr_loss=0.3997, attn_decoder_loss=0.2557, over 29623.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1889, cr_loss=0.4184, attn_decoder_loss=0.2756, over 5704886.57 frames. ], batch size: 73, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:40:59,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=129900.0, ans=0.09899494936611666 2024-09-17 02:41:30,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129980.0, ans=0.1 2024-09-17 02:41:36,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=129980.0, ans=0.125 2024-09-17 02:41:44,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-17 02:41:45,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=130020.0, ans=0.125 2024-09-17 02:41:51,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2024-09-17 02:41:53,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2024-09-17 02:42:09,538 INFO [train.py:1198] (0/2) Epoch 8, batch 850, loss[loss=0.288, ctc_loss=0.1991, cr_loss=0.435, attn_decoder_loss=0.2882, over 29714.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1888, cr_loss=0.4186, attn_decoder_loss=0.2753, over 5735151.27 frames. ], batch size: 89, lr: 1.43e-02, grad_scale: 4.0 2024-09-17 02:42:20,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.021e+02 1.113e+02 1.293e+02 2.449e+02, threshold=2.226e+02, percent-clipped=1.0 2024-09-17 02:42:36,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=130140.0, ans=0.0 2024-09-17 02:42:56,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=130180.0, ans=0.125 2024-09-17 02:42:58,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=130180.0, ans=15.0 2024-09-17 02:43:17,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=130260.0, ans=0.07 2024-09-17 02:43:21,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130260.0, ans=0.1 2024-09-17 02:43:23,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=130260.0, ans=0.125 2024-09-17 02:43:31,942 INFO [train.py:1198] (0/2) Epoch 8, batch 900, loss[loss=0.2557, ctc_loss=0.1636, cr_loss=0.3702, attn_decoder_loss=0.2577, over 29599.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1893, cr_loss=0.4186, attn_decoder_loss=0.2758, over 5741411.95 frames. ], batch size: 73, lr: 1.43e-02, grad_scale: 8.0 2024-09-17 02:43:37,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.59 vs. limit=10.0 2024-09-17 02:43:38,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=130300.0, ans=0.125 2024-09-17 02:43:44,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2024-09-17 02:43:49,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=12.0 2024-09-17 02:44:02,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=130380.0, ans=0.125 2024-09-17 02:44:23,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-09-17 02:44:29,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-17 02:44:42,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-17 02:44:45,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=130460.0, ans=0.125 2024-09-17 02:44:48,517 INFO [train.py:1198] (0/2) Epoch 8, batch 950, loss[loss=0.2507, ctc_loss=0.1557, cr_loss=0.3787, attn_decoder_loss=0.2529, over 29504.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1896, cr_loss=0.4187, attn_decoder_loss=0.2762, over 5742862.70 frames. ], batch size: 74, lr: 1.42e-02, grad_scale: 4.0 2024-09-17 02:44:48,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=130500.0, ans=0.1 2024-09-17 02:44:50,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=130500.0, ans=0.1 2024-09-17 02:44:57,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=130500.0, ans=0.2 2024-09-17 02:45:00,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.707e+01 1.021e+02 1.105e+02 1.238e+02 2.320e+02, threshold=2.209e+02, percent-clipped=1.0 2024-09-17 02:45:10,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=130540.0, ans=0.5 2024-09-17 02:45:27,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=130580.0, ans=0.1 2024-09-17 02:45:40,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130620.0, ans=0.1 2024-09-17 02:45:56,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=130660.0, ans=0.025 2024-09-17 02:46:00,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=130660.0, ans=0.04949747468305833 2024-09-17 02:46:04,900 INFO [train.py:1198] (0/2) Epoch 8, batch 1000, loss[loss=0.266, ctc_loss=0.1707, cr_loss=0.3987, attn_decoder_loss=0.2677, over 29494.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1906, cr_loss=0.4202, attn_decoder_loss=0.2773, over 5738993.11 frames. ], batch size: 77, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:46:36,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.71 vs. limit=22.5 2024-09-17 02:46:57,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=130820.0, ans=0.125 2024-09-17 02:47:03,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=130820.0, ans=0.0 2024-09-17 02:47:11,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-17 02:47:18,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=130860.0, ans=0.125 2024-09-17 02:47:26,001 INFO [train.py:1198] (0/2) Epoch 8, batch 1050, loss[loss=0.277, ctc_loss=0.1852, cr_loss=0.4304, attn_decoder_loss=0.2777, over 29683.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1894, cr_loss=0.4191, attn_decoder_loss=0.2763, over 5746874.67 frames. ], batch size: 85, lr: 1.42e-02, grad_scale: 4.0 2024-09-17 02:47:29,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.10 vs. limit=15.0 2024-09-17 02:47:32,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130900.0, ans=0.1 2024-09-17 02:47:39,738 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.558e+01 1.020e+02 1.112e+02 1.252e+02 2.111e+02, threshold=2.224e+02, percent-clipped=0.0 2024-09-17 02:48:00,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=130980.0, ans=0.1 2024-09-17 02:48:05,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-17 02:48:06,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=130980.0, ans=0.0 2024-09-17 02:48:12,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=131020.0, ans=0.125 2024-09-17 02:48:17,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=131020.0, ans=0.125 2024-09-17 02:48:32,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=131060.0, ans=0.2 2024-09-17 02:48:36,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=131060.0, ans=0.125 2024-09-17 02:48:42,658 INFO [train.py:1198] (0/2) Epoch 8, batch 1100, loss[loss=0.2631, ctc_loss=0.1727, cr_loss=0.4076, attn_decoder_loss=0.2641, over 29454.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1887, cr_loss=0.4184, attn_decoder_loss=0.2758, over 5757778.83 frames. ], batch size: 78, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:49:02,751 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:49:16,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=131180.0, ans=0.125 2024-09-17 02:49:33,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=131220.0, ans=0.0 2024-09-17 02:49:35,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131220.0, ans=0.1 2024-09-17 02:49:40,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2024-09-17 02:49:59,485 INFO [train.py:1198] (0/2) Epoch 8, batch 1150, loss[loss=0.2722, ctc_loss=0.1933, cr_loss=0.3954, attn_decoder_loss=0.2722, over 29432.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1891, cr_loss=0.4187, attn_decoder_loss=0.2758, over 5755642.94 frames. ], batch size: 78, lr: 1.42e-02, grad_scale: 4.0 2024-09-17 02:50:16,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.392e+01 9.941e+01 1.085e+02 1.238e+02 2.659e+02, threshold=2.171e+02, percent-clipped=2.0 2024-09-17 02:50:36,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=131380.0, ans=0.125 2024-09-17 02:50:42,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=131380.0, ans=0.125 2024-09-17 02:50:56,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.21 vs. limit=15.0 2024-09-17 02:51:00,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=131420.0, ans=0.125 2024-09-17 02:51:15,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=131460.0, ans=0.95 2024-09-17 02:51:19,981 INFO [train.py:1198] (0/2) Epoch 8, batch 1200, loss[loss=0.2933, ctc_loss=0.2062, cr_loss=0.4807, attn_decoder_loss=0.2923, over 29671.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1897, cr_loss=0.4191, attn_decoder_loss=0.2768, over 5746749.16 frames. ], batch size: 85, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:51:29,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=131500.0, ans=0.125 2024-09-17 02:51:32,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=131500.0, ans=0.125 2024-09-17 02:51:32,585 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:51:36,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=131540.0, ans=0.0 2024-09-17 02:51:43,348 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:51:52,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=131580.0, ans=0.2 2024-09-17 02:52:16,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=131620.0, ans=0.0 2024-09-17 02:52:19,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131660.0, ans=0.1 2024-09-17 02:52:36,147 INFO [train.py:1198] (0/2) Epoch 8, batch 1250, loss[loss=0.3051, ctc_loss=0.2138, cr_loss=0.4786, attn_decoder_loss=0.3046, over 29509.00 frames. ], tot_loss[loss=0.2772, ctc_loss=0.1903, cr_loss=0.4202, attn_decoder_loss=0.2775, over 5774357.42 frames. ], batch size: 92, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:52:52,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.535e+01 1.024e+02 1.090e+02 1.251e+02 7.392e+02, threshold=2.180e+02, percent-clipped=1.0 2024-09-17 02:52:59,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=131740.0, ans=0.125 2024-09-17 02:53:03,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=131740.0, ans=0.125 2024-09-17 02:53:03,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=131740.0, ans=0.5 2024-09-17 02:53:06,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=131780.0, ans=0.0 2024-09-17 02:53:29,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=131820.0, ans=0.09899494936611666 2024-09-17 02:53:40,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=131860.0, ans=0.125 2024-09-17 02:53:44,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-17 02:53:52,342 INFO [train.py:1198] (0/2) Epoch 8, batch 1300, loss[loss=0.2872, ctc_loss=0.2091, cr_loss=0.4312, attn_decoder_loss=0.2862, over 28238.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1892, cr_loss=0.4186, attn_decoder_loss=0.2764, over 5778884.41 frames. ], batch size: 111, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:54:02,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=131900.0, ans=0.0 2024-09-17 02:54:04,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2024-09-17 02:54:05,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=131900.0, ans=0.125 2024-09-17 02:54:10,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=131940.0, ans=0.125 2024-09-17 02:54:23,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131940.0, ans=0.1 2024-09-17 02:55:13,328 INFO [train.py:1198] (0/2) Epoch 8, batch 1350, loss[loss=0.2673, ctc_loss=0.18, cr_loss=0.3892, attn_decoder_loss=0.2684, over 29755.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1877, cr_loss=0.4171, attn_decoder_loss=0.2755, over 5795601.86 frames. ], batch size: 81, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:55:29,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.512e+01 9.976e+01 1.075e+02 1.151e+02 1.437e+02, threshold=2.149e+02, percent-clipped=0.0 2024-09-17 02:55:58,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=132220.0, ans=0.0 2024-09-17 02:56:00,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=132220.0, ans=0.025 2024-09-17 02:56:03,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=132220.0, ans=0.07 2024-09-17 02:56:03,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-09-17 02:56:06,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=132220.0, ans=0.125 2024-09-17 02:56:11,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.46 vs. limit=15.0 2024-09-17 02:56:11,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-09-17 02:56:15,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=132260.0, ans=0.125 2024-09-17 02:56:20,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=132260.0, ans=0.125 2024-09-17 02:56:25,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=132260.0, ans=0.015 2024-09-17 02:56:28,857 INFO [train.py:1198] (0/2) Epoch 8, batch 1400, loss[loss=0.2375, ctc_loss=0.1557, cr_loss=0.3574, attn_decoder_loss=0.2387, over 29558.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1873, cr_loss=0.4176, attn_decoder_loss=0.2751, over 5806028.42 frames. ], batch size: 69, lr: 1.42e-02, grad_scale: 8.0 2024-09-17 02:56:44,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=132340.0, ans=0.125 2024-09-17 02:56:57,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=132380.0, ans=0.0 2024-09-17 02:57:24,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=132420.0, ans=0.2 2024-09-17 02:57:24,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132420.0, ans=0.1 2024-09-17 02:57:44,593 INFO [train.py:1198] (0/2) Epoch 8, batch 1450, loss[loss=0.2955, ctc_loss=0.198, cr_loss=0.4433, attn_decoder_loss=0.2965, over 29463.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1881, cr_loss=0.4177, attn_decoder_loss=0.2758, over 5802626.51 frames. ], batch size: 94, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 02:58:06,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.324e+01 1.032e+02 1.089e+02 1.206e+02 2.427e+02, threshold=2.178e+02, percent-clipped=3.0 2024-09-17 02:58:15,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=132580.0, ans=0.125 2024-09-17 02:58:28,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=132580.0, ans=0.0 2024-09-17 02:58:35,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=132620.0, ans=0.125 2024-09-17 02:58:40,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=132620.0, ans=0.0 2024-09-17 02:58:53,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=132660.0, ans=0.125 2024-09-17 02:58:58,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=132660.0, ans=0.04949747468305833 2024-09-17 02:59:04,180 INFO [train.py:1198] (0/2) Epoch 8, batch 1500, loss[loss=0.2909, ctc_loss=0.2045, cr_loss=0.4456, attn_decoder_loss=0.2906, over 29632.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1881, cr_loss=0.4184, attn_decoder_loss=0.276, over 5803327.74 frames. ], batch size: 86, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 02:59:10,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=132700.0, ans=0.5 2024-09-17 02:59:16,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=132700.0, ans=0.025 2024-09-17 02:59:27,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=132740.0, ans=0.0 2024-09-17 02:59:44,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=132780.0, ans=0.125 2024-09-17 02:59:56,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=132820.0, ans=0.025 2024-09-17 02:59:57,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2024-09-17 02:59:59,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=132820.0, ans=0.125 2024-09-17 03:00:15,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-17 03:00:20,635 INFO [train.py:1198] (0/2) Epoch 8, batch 1550, loss[loss=0.298, ctc_loss=0.2082, cr_loss=0.449, attn_decoder_loss=0.298, over 29518.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1894, cr_loss=0.4195, attn_decoder_loss=0.2765, over 5779402.60 frames. ], batch size: 90, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:00:41,776 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.026e+01 9.829e+01 1.097e+02 1.218e+02 3.935e+02, threshold=2.194e+02, percent-clipped=3.0 2024-09-17 03:00:47,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.05 vs. limit=15.0 2024-09-17 03:01:12,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=133020.0, ans=0.125 2024-09-17 03:01:28,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=133060.0, ans=0.2 2024-09-17 03:01:35,980 INFO [train.py:1198] (0/2) Epoch 8, batch 1600, loss[loss=0.2957, ctc_loss=0.2054, cr_loss=0.4699, attn_decoder_loss=0.2952, over 29673.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1891, cr_loss=0.4193, attn_decoder_loss=0.2761, over 5762091.99 frames. ], batch size: 85, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:01:41,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=133100.0, ans=0.0 2024-09-17 03:02:45,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=133260.0, ans=0.0 2024-09-17 03:02:50,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.31 vs. limit=10.0 2024-09-17 03:02:55,932 INFO [train.py:1198] (0/2) Epoch 8, batch 1650, loss[loss=0.2982, ctc_loss=0.2157, cr_loss=0.4517, attn_decoder_loss=0.2974, over 29713.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1891, cr_loss=0.4191, attn_decoder_loss=0.2761, over 5756605.96 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:03:05,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133300.0, ans=0.1 2024-09-17 03:03:11,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=133340.0, ans=0.125 2024-09-17 03:03:12,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133340.0, ans=0.1 2024-09-17 03:03:18,410 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.840e+01 1.022e+02 1.128e+02 1.304e+02 4.033e+02, threshold=2.256e+02, percent-clipped=2.0 2024-09-17 03:03:30,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=133380.0, ans=0.125 2024-09-17 03:03:35,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-09-17 03:03:49,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=133420.0, ans=0.2 2024-09-17 03:04:00,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-09-17 03:04:11,213 INFO [train.py:1198] (0/2) Epoch 8, batch 1700, loss[loss=0.2401, ctc_loss=0.1741, cr_loss=0.3908, attn_decoder_loss=0.2387, over 29565.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1883, cr_loss=0.4182, attn_decoder_loss=0.2757, over 5779091.30 frames. ], batch size: 69, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:04:37,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133540.0, ans=0.1 2024-09-17 03:04:38,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=133540.0, ans=0.125 2024-09-17 03:04:48,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.44 vs. limit=15.0 2024-09-17 03:05:20,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=133660.0, ans=0.2 2024-09-17 03:05:23,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=133660.0, ans=0.0 2024-09-17 03:05:23,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-09-17 03:05:24,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=133660.0, ans=0.2 2024-09-17 03:05:27,787 INFO [train.py:1198] (0/2) Epoch 8, batch 1750, loss[loss=0.2489, ctc_loss=0.1679, cr_loss=0.3929, attn_decoder_loss=0.2492, over 29365.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1879, cr_loss=0.4173, attn_decoder_loss=0.2753, over 5788189.72 frames. ], batch size: 67, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:05:48,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=133740.0, ans=0.025 2024-09-17 03:05:55,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.904e+01 9.818e+01 1.049e+02 1.183e+02 2.492e+02, threshold=2.098e+02, percent-clipped=1.0 2024-09-17 03:06:00,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=133780.0, ans=0.2 2024-09-17 03:06:00,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.39 vs. limit=15.0 2024-09-17 03:06:04,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=133780.0, ans=0.0 2024-09-17 03:06:31,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=133820.0, ans=0.125 2024-09-17 03:06:44,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=133860.0, ans=0.0 2024-09-17 03:06:48,889 INFO [train.py:1198] (0/2) Epoch 8, batch 1800, loss[loss=0.2854, ctc_loss=0.1957, cr_loss=0.4039, attn_decoder_loss=0.2864, over 29676.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1886, cr_loss=0.4178, attn_decoder_loss=0.2757, over 5790435.38 frames. ], batch size: 83, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:06:58,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=133900.0, ans=0.2 2024-09-17 03:07:11,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-17 03:07:15,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=133940.0, ans=0.0 2024-09-17 03:07:15,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=133940.0, ans=0.125 2024-09-17 03:07:21,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=133980.0, ans=0.0 2024-09-17 03:07:27,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=133980.0, ans=0.125 2024-09-17 03:08:05,025 INFO [train.py:1198] (0/2) Epoch 8, batch 1850, loss[loss=0.2797, ctc_loss=0.18, cr_loss=0.4053, attn_decoder_loss=0.2818, over 29648.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1882, cr_loss=0.4181, attn_decoder_loss=0.2754, over 5796224.17 frames. ], batch size: 86, lr: 1.41e-02, grad_scale: 4.0 2024-09-17 03:08:06,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=134100.0, ans=0.0 2024-09-17 03:08:06,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=134100.0, ans=0.5 2024-09-17 03:08:08,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=134100.0, ans=0.2 2024-09-17 03:08:12,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=134100.0, ans=0.125 2024-09-17 03:08:30,753 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.292e+01 1.011e+02 1.086e+02 1.212e+02 2.686e+02, threshold=2.172e+02, percent-clipped=1.0 2024-09-17 03:08:40,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=134180.0, ans=0.125 2024-09-17 03:08:53,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=134220.0, ans=0.125 2024-09-17 03:08:57,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-17 03:09:03,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=134220.0, ans=0.2 2024-09-17 03:09:05,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=134260.0, ans=0.0 2024-09-17 03:09:20,861 INFO [train.py:1198] (0/2) Epoch 8, batch 1900, loss[loss=0.2887, ctc_loss=0.2017, cr_loss=0.4578, attn_decoder_loss=0.2882, over 29712.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1885, cr_loss=0.419, attn_decoder_loss=0.276, over 5803448.23 frames. ], batch size: 89, lr: 1.41e-02, grad_scale: 8.0 2024-09-17 03:09:37,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=134340.0, ans=0.125 2024-09-17 03:09:46,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=134340.0, ans=0.0 2024-09-17 03:09:49,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=134340.0, ans=0.125 2024-09-17 03:10:05,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=134380.0, ans=0.2 2024-09-17 03:10:28,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=134460.0, ans=0.125 2024-09-17 03:10:41,890 INFO [train.py:1198] (0/2) Epoch 8, batch 1950, loss[loss=0.2663, ctc_loss=0.1798, cr_loss=0.3935, attn_decoder_loss=0.2671, over 29438.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1891, cr_loss=0.4205, attn_decoder_loss=0.2768, over 5818462.75 frames. ], batch size: 78, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:10:45,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=134500.0, ans=0.125 2024-09-17 03:11:07,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2024-09-17 03:11:09,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.585e+01 1.007e+02 1.092e+02 1.214e+02 3.508e+02, threshold=2.184e+02, percent-clipped=3.0 2024-09-17 03:11:17,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=134580.0, ans=0.025 2024-09-17 03:11:22,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134580.0, ans=0.125 2024-09-17 03:11:32,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=134620.0, ans=0.2 2024-09-17 03:11:49,239 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:11:57,948 INFO [train.py:1198] (0/2) Epoch 8, batch 2000, loss[loss=0.2372, ctc_loss=0.1498, cr_loss=0.3641, attn_decoder_loss=0.2388, over 29316.00 frames. ], tot_loss[loss=0.277, ctc_loss=0.1897, cr_loss=0.4208, attn_decoder_loss=0.2773, over 5794632.01 frames. ], batch size: 67, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:12:04,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=134700.0, ans=0.125 2024-09-17 03:12:10,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=134700.0, ans=0.025 2024-09-17 03:12:19,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=134740.0, ans=0.025 2024-09-17 03:12:25,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-09-17 03:12:31,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=134780.0, ans=0.125 2024-09-17 03:12:35,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=134780.0, ans=0.125 2024-09-17 03:12:41,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=134780.0, ans=0.125 2024-09-17 03:12:48,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=134820.0, ans=0.0 2024-09-17 03:12:52,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=134820.0, ans=0.0 2024-09-17 03:13:11,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=134860.0, ans=0.0 2024-09-17 03:13:14,528 INFO [train.py:1198] (0/2) Epoch 8, batch 2050, loss[loss=0.2473, ctc_loss=0.1633, cr_loss=0.3971, attn_decoder_loss=0.2478, over 29439.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1888, cr_loss=0.4196, attn_decoder_loss=0.2763, over 5787964.89 frames. ], batch size: 70, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:13:23,592 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:13:25,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=134900.0, ans=0.0 2024-09-17 03:13:37,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=12.0 2024-09-17 03:13:45,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.747e+01 9.821e+01 1.060e+02 1.158e+02 2.378e+02, threshold=2.119e+02, percent-clipped=1.0 2024-09-17 03:13:53,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=134980.0, ans=0.95 2024-09-17 03:14:17,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=135020.0, ans=0.125 2024-09-17 03:14:21,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=135060.0, ans=0.125 2024-09-17 03:14:31,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.80 vs. limit=22.5 2024-09-17 03:14:35,271 INFO [train.py:1198] (0/2) Epoch 8, batch 2100, loss[loss=0.268, ctc_loss=0.1784, cr_loss=0.4428, attn_decoder_loss=0.2682, over 29759.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1877, cr_loss=0.4182, attn_decoder_loss=0.2755, over 5799100.71 frames. ], batch size: 81, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:14:52,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-09-17 03:15:02,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.24 vs. limit=22.5 2024-09-17 03:15:31,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=135220.0, ans=0.125 2024-09-17 03:15:50,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=135300.0, ans=0.1 2024-09-17 03:15:51,537 INFO [train.py:1198] (0/2) Epoch 8, batch 2150, loss[loss=0.2569, ctc_loss=0.1708, cr_loss=0.4024, attn_decoder_loss=0.2575, over 29415.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1864, cr_loss=0.4169, attn_decoder_loss=0.2746, over 5813709.10 frames. ], batch size: 78, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:16:10,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=135340.0, ans=0.125 2024-09-17 03:16:21,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=135380.0, ans=0.125 2024-09-17 03:16:22,341 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.928e+01 9.784e+01 1.043e+02 1.111e+02 1.443e+02, threshold=2.086e+02, percent-clipped=0.0 2024-09-17 03:16:30,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135380.0, ans=0.1 2024-09-17 03:16:31,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=135380.0, ans=0.025 2024-09-17 03:16:48,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=135420.0, ans=0.125 2024-09-17 03:16:54,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=135460.0, ans=0.125 2024-09-17 03:17:02,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=135460.0, ans=0.125 2024-09-17 03:17:06,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=135500.0, ans=0.1 2024-09-17 03:17:07,882 INFO [train.py:1198] (0/2) Epoch 8, batch 2200, loss[loss=0.2876, ctc_loss=0.193, cr_loss=0.4384, attn_decoder_loss=0.2883, over 29622.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1871, cr_loss=0.4179, attn_decoder_loss=0.2749, over 5810736.58 frames. ], batch size: 86, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:17:15,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=135500.0, ans=0.125 2024-09-17 03:17:17,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=135500.0, ans=0.0 2024-09-17 03:17:35,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=135540.0, ans=0.125 2024-09-17 03:17:42,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=135580.0, ans=0.125 2024-09-17 03:17:55,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=135620.0, ans=0.125 2024-09-17 03:18:15,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=135660.0, ans=0.125 2024-09-17 03:18:22,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135660.0, ans=0.1 2024-09-17 03:18:28,897 INFO [train.py:1198] (0/2) Epoch 8, batch 2250, loss[loss=0.2732, ctc_loss=0.1786, cr_loss=0.39, attn_decoder_loss=0.2751, over 29699.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1865, cr_loss=0.4164, attn_decoder_loss=0.2746, over 5810439.11 frames. ], batch size: 82, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:18:44,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=135740.0, ans=0.0 2024-09-17 03:18:47,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=135740.0, ans=0.025 2024-09-17 03:18:53,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=135740.0, ans=0.0 2024-09-17 03:18:54,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-17 03:19:00,614 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.680e+01 9.920e+01 1.107e+02 1.209e+02 3.496e+02, threshold=2.214e+02, percent-clipped=1.0 2024-09-17 03:19:04,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=135780.0, ans=0.125 2024-09-17 03:19:09,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-09-17 03:19:10,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-09-17 03:19:21,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2024-09-17 03:19:22,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=135820.0, ans=0.0 2024-09-17 03:19:29,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=135860.0, ans=0.125 2024-09-17 03:19:42,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-09-17 03:19:44,834 INFO [train.py:1198] (0/2) Epoch 8, batch 2300, loss[loss=0.2535, ctc_loss=0.1703, cr_loss=0.4259, attn_decoder_loss=0.2533, over 29330.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1851, cr_loss=0.4149, attn_decoder_loss=0.2734, over 5798938.31 frames. ], batch size: 71, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:20:49,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-09-17 03:21:01,311 INFO [train.py:1198] (0/2) Epoch 8, batch 2350, loss[loss=0.2918, ctc_loss=0.1987, cr_loss=0.4678, attn_decoder_loss=0.2917, over 29677.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1853, cr_loss=0.4154, attn_decoder_loss=0.2736, over 5805266.74 frames. ], batch size: 83, lr: 1.40e-02, grad_scale: 4.0 2024-09-17 03:21:03,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=136100.0, ans=0.1 2024-09-17 03:21:19,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=136140.0, ans=0.125 2024-09-17 03:21:29,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=136140.0, ans=0.125 2024-09-17 03:21:37,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.392e+01 1.038e+02 1.165e+02 1.369e+02 2.325e+02, threshold=2.330e+02, percent-clipped=1.0 2024-09-17 03:21:51,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-17 03:22:15,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=136260.0, ans=0.125 2024-09-17 03:22:19,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2024-09-17 03:22:21,878 INFO [train.py:1198] (0/2) Epoch 8, batch 2400, loss[loss=0.2598, ctc_loss=0.1781, cr_loss=0.3989, attn_decoder_loss=0.26, over 29559.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1857, cr_loss=0.4156, attn_decoder_loss=0.274, over 5808502.42 frames. ], batch size: 76, lr: 1.40e-02, grad_scale: 8.0 2024-09-17 03:22:28,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=136300.0, ans=0.0 2024-09-17 03:23:04,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=136380.0, ans=0.0 2024-09-17 03:23:13,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=136420.0, ans=0.0 2024-09-17 03:23:22,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=136460.0, ans=0.2 2024-09-17 03:23:34,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=136460.0, ans=0.125 2024-09-17 03:23:37,608 INFO [train.py:1198] (0/2) Epoch 8, batch 2450, loss[loss=0.2763, ctc_loss=0.1833, cr_loss=0.4311, attn_decoder_loss=0.2771, over 29723.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1869, cr_loss=0.4169, attn_decoder_loss=0.2752, over 5784961.70 frames. ], batch size: 82, lr: 1.39e-02, grad_scale: 4.0 2024-09-17 03:23:39,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=136500.0, ans=0.0 2024-09-17 03:24:11,927 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.397e+01 1.019e+02 1.082e+02 1.263e+02 3.288e+02, threshold=2.163e+02, percent-clipped=1.0 2024-09-17 03:24:24,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=136620.0, ans=0.125 2024-09-17 03:24:32,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-09-17 03:24:36,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=136660.0, ans=0.0 2024-09-17 03:24:39,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=136660.0, ans=0.125 2024-09-17 03:24:45,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=136660.0, ans=0.125 2024-09-17 03:24:52,939 INFO [train.py:1198] (0/2) Epoch 8, batch 2500, loss[loss=0.2729, ctc_loss=0.1823, cr_loss=0.4075, attn_decoder_loss=0.274, over 29643.00 frames. ], tot_loss[loss=0.2749, ctc_loss=0.1872, cr_loss=0.4176, attn_decoder_loss=0.2753, over 5794684.02 frames. ], batch size: 86, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:25:06,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-09-17 03:25:28,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=136780.0, ans=0.025 2024-09-17 03:25:32,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=136780.0, ans=0.2 2024-09-17 03:25:50,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136820.0, ans=0.1 2024-09-17 03:25:59,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=136860.0, ans=0.0 2024-09-17 03:26:02,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-09-17 03:26:13,258 INFO [train.py:1198] (0/2) Epoch 8, batch 2550, loss[loss=0.2406, ctc_loss=0.1607, cr_loss=0.3813, attn_decoder_loss=0.241, over 29295.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.187, cr_loss=0.4173, attn_decoder_loss=0.2751, over 5797661.63 frames. ], batch size: 67, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:26:49,205 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.639e+01 1.024e+02 1.084e+02 1.212e+02 4.526e+02, threshold=2.168e+02, percent-clipped=2.0 2024-09-17 03:27:19,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-09-17 03:27:28,808 INFO [train.py:1198] (0/2) Epoch 8, batch 2600, loss[loss=0.2625, ctc_loss=0.1712, cr_loss=0.399, attn_decoder_loss=0.2638, over 29457.00 frames. ], tot_loss[loss=0.2753, ctc_loss=0.1875, cr_loss=0.4187, attn_decoder_loss=0.2757, over 5793752.23 frames. ], batch size: 78, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:27:45,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=137140.0, ans=0.2 2024-09-17 03:27:57,475 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:28:01,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=137180.0, ans=0.0 2024-09-17 03:28:20,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=137220.0, ans=0.0 2024-09-17 03:28:32,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137260.0, ans=0.1 2024-09-17 03:28:35,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-09-17 03:28:43,790 INFO [train.py:1198] (0/2) Epoch 8, batch 2650, loss[loss=0.2955, ctc_loss=0.208, cr_loss=0.4395, attn_decoder_loss=0.2954, over 29343.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1877, cr_loss=0.4197, attn_decoder_loss=0.2762, over 5800782.28 frames. ], batch size: 100, lr: 1.39e-02, grad_scale: 4.0 2024-09-17 03:28:47,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=137300.0, ans=0.0 2024-09-17 03:28:50,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-09-17 03:28:55,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=137300.0, ans=0.025 2024-09-17 03:28:56,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=137300.0, ans=0.125 2024-09-17 03:28:58,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=137300.0, ans=0.0 2024-09-17 03:29:23,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.237e+01 1.027e+02 1.110e+02 1.218e+02 2.254e+02, threshold=2.220e+02, percent-clipped=2.0 2024-09-17 03:29:25,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-09-17 03:29:32,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137420.0, ans=0.1 2024-09-17 03:29:38,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.06 vs. limit=22.5 2024-09-17 03:30:02,785 INFO [train.py:1198] (0/2) Epoch 8, batch 2700, loss[loss=0.2887, ctc_loss=0.1908, cr_loss=0.4502, attn_decoder_loss=0.2896, over 29538.00 frames. ], tot_loss[loss=0.2764, ctc_loss=0.1884, cr_loss=0.4206, attn_decoder_loss=0.2768, over 5797346.09 frames. ], batch size: 87, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:30:06,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=137500.0, ans=0.025 2024-09-17 03:30:09,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137500.0, ans=0.1 2024-09-17 03:30:13,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137500.0, ans=0.1 2024-09-17 03:30:52,043 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:31:02,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=137660.0, ans=0.0 2024-09-17 03:31:13,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-09-17 03:31:18,721 INFO [train.py:1198] (0/2) Epoch 8, batch 2750, loss[loss=0.2619, ctc_loss=0.1782, cr_loss=0.4189, attn_decoder_loss=0.2619, over 29524.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1871, cr_loss=0.4181, attn_decoder_loss=0.2752, over 5795411.61 frames. ], batch size: 75, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:31:31,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=137700.0, ans=0.0 2024-09-17 03:31:37,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=137740.0, ans=0.025 2024-09-17 03:31:53,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.22 vs. limit=22.5 2024-09-17 03:31:56,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.690e+01 1.009e+02 1.091e+02 1.195e+02 3.553e+02, threshold=2.183e+02, percent-clipped=1.0 2024-09-17 03:32:05,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=137820.0, ans=0.125 2024-09-17 03:32:34,231 INFO [train.py:1198] (0/2) Epoch 8, batch 2800, loss[loss=0.321, ctc_loss=0.2713, cr_loss=0.4452, attn_decoder_loss=0.3166, over 20472.00 frames. ], tot_loss[loss=0.2754, ctc_loss=0.1882, cr_loss=0.4187, attn_decoder_loss=0.2758, over 5777584.66 frames. ], batch size: 209, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:32:45,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=137900.0, ans=0.0 2024-09-17 03:33:00,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137940.0, ans=0.1 2024-09-17 03:33:11,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.90 vs. limit=15.0 2024-09-17 03:33:17,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-17 03:33:34,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-17 03:33:53,380 INFO [train.py:1198] (0/2) Epoch 8, batch 2850, loss[loss=0.2684, ctc_loss=0.1809, cr_loss=0.4146, attn_decoder_loss=0.2689, over 29516.00 frames. ], tot_loss[loss=0.2764, ctc_loss=0.1893, cr_loss=0.4203, attn_decoder_loss=0.2767, over 5763441.95 frames. ], batch size: 77, lr: 1.39e-02, grad_scale: 4.0 2024-09-17 03:34:16,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=138140.0, ans=0.0 2024-09-17 03:34:34,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.049e+01 1.050e+02 1.191e+02 1.407e+02 3.981e+02, threshold=2.382e+02, percent-clipped=5.0 2024-09-17 03:34:51,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=138220.0, ans=6.0 2024-09-17 03:34:55,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=138260.0, ans=0.5 2024-09-17 03:34:57,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.18 vs. limit=10.0 2024-09-17 03:35:06,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=138260.0, ans=0.0 2024-09-17 03:35:09,045 INFO [train.py:1198] (0/2) Epoch 8, batch 2900, loss[loss=0.2657, ctc_loss=0.1709, cr_loss=0.3983, attn_decoder_loss=0.2674, over 29437.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1896, cr_loss=0.422, attn_decoder_loss=0.2777, over 5788952.23 frames. ], batch size: 79, lr: 1.39e-02, grad_scale: 8.0 2024-09-17 03:36:13,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-09-17 03:36:18,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=138460.0, ans=0.125 2024-09-17 03:36:23,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=138500.0, ans=0.125 2024-09-17 03:36:24,465 INFO [train.py:1198] (0/2) Epoch 8, batch 2950, loss[loss=0.2626, ctc_loss=0.1669, cr_loss=0.385, attn_decoder_loss=0.2647, over 29533.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1883, cr_loss=0.4196, attn_decoder_loss=0.2761, over 5781903.50 frames. ], batch size: 75, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:37:00,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=138580.0, ans=0.0 2024-09-17 03:37:08,774 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.343e+01 1.007e+02 1.102e+02 1.224e+02 2.215e+02, threshold=2.205e+02, percent-clipped=0.0 2024-09-17 03:37:26,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2024-09-17 03:37:42,131 INFO [train.py:1198] (0/2) Epoch 8, batch 3000, loss[loss=0.2782, ctc_loss=0.194, cr_loss=0.435, attn_decoder_loss=0.2779, over 29746.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1878, cr_loss=0.4192, attn_decoder_loss=0.276, over 5783130.90 frames. ], batch size: 81, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:37:42,132 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 03:38:01,065 INFO [train.py:1230] (0/2) Epoch 8, validation: loss=0.2156, ctc_loss=0.0545, cr_loss=4.305e-15, attn_decoder_loss=0.2335, over 944034.00 frames. 2024-09-17 03:38:01,066 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 03:38:02,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=138700.0, ans=0.025 2024-09-17 03:38:04,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=138700.0, ans=0.04949747468305833 2024-09-17 03:38:06,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=138700.0, ans=0.125 2024-09-17 03:38:10,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=138700.0, ans=0.125 2024-09-17 03:38:13,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=138700.0, ans=0.0 2024-09-17 03:38:23,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=138740.0, ans=6.0 2024-09-17 03:38:24,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138740.0, ans=0.1 2024-09-17 03:39:01,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=138860.0, ans=0.125 2024-09-17 03:39:13,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=138860.0, ans=0.07 2024-09-17 03:39:16,488 INFO [train.py:1198] (0/2) Epoch 8, batch 3050, loss[loss=0.2582, ctc_loss=0.1715, cr_loss=0.4009, attn_decoder_loss=0.259, over 29551.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1882, cr_loss=0.4193, attn_decoder_loss=0.2764, over 5776967.42 frames. ], batch size: 76, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:39:36,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=138940.0, ans=0.125 2024-09-17 03:39:40,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=138940.0, ans=0.125 2024-09-17 03:39:58,627 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.026e+02 1.087e+02 1.186e+02 2.791e+02, threshold=2.173e+02, percent-clipped=1.0 2024-09-17 03:40:03,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=139020.0, ans=0.125 2024-09-17 03:40:15,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=139020.0, ans=0.125 2024-09-17 03:40:24,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=139060.0, ans=0.0 2024-09-17 03:40:29,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=139060.0, ans=0.125 2024-09-17 03:40:30,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=139060.0, ans=0.04949747468305833 2024-09-17 03:40:32,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=139100.0, ans=0.09899494936611666 2024-09-17 03:40:33,616 INFO [train.py:1198] (0/2) Epoch 8, batch 3100, loss[loss=0.3015, ctc_loss=0.2094, cr_loss=0.4319, attn_decoder_loss=0.3022, over 29287.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1876, cr_loss=0.4181, attn_decoder_loss=0.2756, over 5776640.30 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:40:59,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=139140.0, ans=0.09899494936611666 2024-09-17 03:41:08,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139180.0, ans=0.1 2024-09-17 03:41:09,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=24.19 vs. limit=22.5 2024-09-17 03:41:33,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=139220.0, ans=0.95 2024-09-17 03:41:45,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139260.0, ans=0.1 2024-09-17 03:41:51,268 INFO [train.py:1198] (0/2) Epoch 8, batch 3150, loss[loss=0.292, ctc_loss=0.202, cr_loss=0.4581, attn_decoder_loss=0.2918, over 28894.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1874, cr_loss=0.418, attn_decoder_loss=0.2754, over 5783598.22 frames. ], batch size: 104, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:41:54,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139300.0, ans=0.1 2024-09-17 03:42:17,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=139340.0, ans=0.2 2024-09-17 03:42:29,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=139380.0, ans=0.125 2024-09-17 03:42:29,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-09-17 03:42:30,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=139380.0, ans=0.2 2024-09-17 03:42:36,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.236e+01 1.018e+02 1.127e+02 1.309e+02 2.778e+02, threshold=2.254e+02, percent-clipped=1.0 2024-09-17 03:42:45,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=139420.0, ans=0.0 2024-09-17 03:42:47,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=139420.0, ans=0.125 2024-09-17 03:42:59,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=139460.0, ans=0.125 2024-09-17 03:43:06,665 INFO [train.py:1198] (0/2) Epoch 8, batch 3200, loss[loss=0.2731, ctc_loss=0.1815, cr_loss=0.4183, attn_decoder_loss=0.274, over 29414.00 frames. ], tot_loss[loss=0.2743, ctc_loss=0.1866, cr_loss=0.4168, attn_decoder_loss=0.2748, over 5794481.96 frames. ], batch size: 79, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:43:32,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=139540.0, ans=0.0 2024-09-17 03:43:57,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139620.0, ans=0.125 2024-09-17 03:44:02,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=139620.0, ans=0.07 2024-09-17 03:44:06,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=139620.0, ans=0.0 2024-09-17 03:44:14,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=139660.0, ans=0.2 2024-09-17 03:44:18,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=139660.0, ans=0.2 2024-09-17 03:44:24,504 INFO [train.py:1198] (0/2) Epoch 8, batch 3250, loss[loss=0.284, ctc_loss=0.195, cr_loss=0.4385, attn_decoder_loss=0.2841, over 29712.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1871, cr_loss=0.4182, attn_decoder_loss=0.2755, over 5800033.22 frames. ], batch size: 84, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:44:48,818 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:44:52,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.61 vs. limit=15.0 2024-09-17 03:44:59,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-09-17 03:45:08,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=139820.0, ans=0.0 2024-09-17 03:45:09,633 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.457e+01 9.664e+01 1.027e+02 1.100e+02 2.131e+02, threshold=2.054e+02, percent-clipped=0.0 2024-09-17 03:45:11,643 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:45:32,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=139860.0, ans=0.125 2024-09-17 03:45:41,708 INFO [train.py:1198] (0/2) Epoch 8, batch 3300, loss[loss=0.301, ctc_loss=0.2091, cr_loss=0.4643, attn_decoder_loss=0.3009, over 28325.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1862, cr_loss=0.4166, attn_decoder_loss=0.2742, over 5797312.70 frames. ], batch size: 111, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:45:47,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-09-17 03:45:55,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=139940.0, ans=0.0 2024-09-17 03:46:04,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=139940.0, ans=0.125 2024-09-17 03:46:15,291 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:46:39,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140020.0, ans=0.125 2024-09-17 03:46:51,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=140060.0, ans=0.0 2024-09-17 03:46:57,288 INFO [train.py:1198] (0/2) Epoch 8, batch 3350, loss[loss=0.2967, ctc_loss=0.2077, cr_loss=0.4401, attn_decoder_loss=0.2968, over 29018.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1875, cr_loss=0.4176, attn_decoder_loss=0.2753, over 5776061.67 frames. ], batch size: 104, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:47:04,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=140100.0, ans=0.07 2024-09-17 03:47:21,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=140140.0, ans=0.2 2024-09-17 03:47:22,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.whiten.whitening_limit, batch_count=140140.0, ans=12.0 2024-09-17 03:47:32,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=140180.0, ans=0.5 2024-09-17 03:47:43,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=140220.0, ans=0.125 2024-09-17 03:47:43,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=140220.0, ans=0.125 2024-09-17 03:47:46,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=140220.0, ans=0.04949747468305833 2024-09-17 03:47:47,553 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.416e+01 1.028e+02 1.095e+02 1.236e+02 5.561e+02, threshold=2.191e+02, percent-clipped=3.0 2024-09-17 03:47:54,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140220.0, ans=0.1 2024-09-17 03:47:56,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-09-17 03:48:04,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140260.0, ans=0.1 2024-09-17 03:48:14,970 INFO [train.py:1198] (0/2) Epoch 8, batch 3400, loss[loss=0.2377, ctc_loss=0.1569, cr_loss=0.3521, attn_decoder_loss=0.2388, over 29346.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1871, cr_loss=0.4175, attn_decoder_loss=0.275, over 5768943.02 frames. ], batch size: 67, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 03:48:19,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=140300.0, ans=0.0 2024-09-17 03:48:35,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=140340.0, ans=15.0 2024-09-17 03:49:07,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=140420.0, ans=0.2 2024-09-17 03:49:31,833 INFO [train.py:1198] (0/2) Epoch 8, batch 3450, loss[loss=0.2843, ctc_loss=0.1975, cr_loss=0.413, attn_decoder_loss=0.2847, over 28511.00 frames. ], tot_loss[loss=0.2752, ctc_loss=0.1879, cr_loss=0.4188, attn_decoder_loss=0.2756, over 5777423.82 frames. ], batch size: 112, lr: 1.38e-02, grad_scale: 4.0 2024-09-17 03:49:51,776 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:49:57,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=140540.0, ans=0.2 2024-09-17 03:50:00,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140580.0, ans=0.125 2024-09-17 03:50:21,397 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.539e+01 1.005e+02 1.084e+02 1.145e+02 2.009e+02, threshold=2.168e+02, percent-clipped=0.0 2024-09-17 03:50:38,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=140660.0, ans=0.2 2024-09-17 03:50:47,169 INFO [train.py:1198] (0/2) Epoch 8, batch 3500, loss[loss=0.257, ctc_loss=0.1736, cr_loss=0.4132, attn_decoder_loss=0.2571, over 29329.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1875, cr_loss=0.4184, attn_decoder_loss=0.2751, over 5779003.89 frames. ], batch size: 71, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:50:50,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=140700.0, ans=0.125 2024-09-17 03:50:52,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=140700.0, ans=0.0 2024-09-17 03:51:18,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=140780.0, ans=0.125 2024-09-17 03:51:41,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140820.0, ans=0.1 2024-09-17 03:51:42,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-17 03:51:50,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140860.0, ans=0.1 2024-09-17 03:51:53,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=140860.0, ans=0.125 2024-09-17 03:51:58,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=140860.0, ans=0.125 2024-09-17 03:52:01,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=140860.0, ans=0.125 2024-09-17 03:52:03,852 INFO [train.py:1198] (0/2) Epoch 8, batch 3550, loss[loss=0.2885, ctc_loss=0.1972, cr_loss=0.4359, attn_decoder_loss=0.289, over 29712.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1869, cr_loss=0.4177, attn_decoder_loss=0.2747, over 5784977.58 frames. ], batch size: 89, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 03:52:12,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140900.0, ans=0.1 2024-09-17 03:52:22,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.18 vs. limit=15.0 2024-09-17 03:52:23,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=140940.0, ans=0.0 2024-09-17 03:52:23,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=140940.0, ans=0.125 2024-09-17 03:52:23,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=140940.0, ans=0.2 2024-09-17 03:52:23,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.72 vs. limit=15.0 2024-09-17 03:52:40,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=140980.0, ans=0.07 2024-09-17 03:52:43,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-09-17 03:52:53,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-17 03:52:53,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.324e+01 1.017e+02 1.100e+02 1.203e+02 4.569e+02, threshold=2.200e+02, percent-clipped=1.0 2024-09-17 03:53:01,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=141060.0, ans=0.125 2024-09-17 03:53:02,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.94 vs. limit=15.0 2024-09-17 03:53:06,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2024-09-17 03:53:17,658 INFO [train.py:1198] (0/2) Epoch 8, batch 3600, loss[loss=0.2705, ctc_loss=0.1836, cr_loss=0.4252, attn_decoder_loss=0.2707, over 29471.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1862, cr_loss=0.417, attn_decoder_loss=0.2744, over 5793799.72 frames. ], batch size: 77, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:53:19,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=141100.0, ans=0.0 2024-09-17 03:53:46,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141180.0, ans=0.1 2024-09-17 03:54:02,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=141220.0, ans=0.125 2024-09-17 03:54:09,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141220.0, ans=0.1 2024-09-17 03:54:11,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=141220.0, ans=0.07 2024-09-17 03:54:18,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.69 vs. limit=15.0 2024-09-17 03:54:21,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=141260.0, ans=0.0 2024-09-17 03:54:32,134 INFO [train.py:1198] (0/2) Epoch 8, batch 3650, loss[loss=0.3029, ctc_loss=0.2144, cr_loss=0.4606, attn_decoder_loss=0.3025, over 29531.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.1853, cr_loss=0.4161, attn_decoder_loss=0.2738, over 5795743.78 frames. ], batch size: 90, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 03:54:54,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=141340.0, ans=0.0 2024-09-17 03:54:56,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=141340.0, ans=0.125 2024-09-17 03:54:59,385 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:55:07,512 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-09-17 03:55:22,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.21 vs. limit=10.0 2024-09-17 03:55:26,037 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.265e+01 1.005e+02 1.060e+02 1.174e+02 2.245e+02, threshold=2.119e+02, percent-clipped=1.0 2024-09-17 03:55:33,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141460.0, ans=0.1 2024-09-17 03:55:47,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=141500.0, ans=0.125 2024-09-17 03:55:48,544 INFO [train.py:1198] (0/2) Epoch 8, batch 3700, loss[loss=0.2778, ctc_loss=0.186, cr_loss=0.4217, attn_decoder_loss=0.2786, over 29720.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.185, cr_loss=0.4158, attn_decoder_loss=0.2737, over 5805537.72 frames. ], batch size: 84, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:55:51,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=141500.0, ans=0.125 2024-09-17 03:56:03,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141540.0, ans=0.1 2024-09-17 03:56:18,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=141580.0, ans=0.125 2024-09-17 03:56:21,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141580.0, ans=0.1 2024-09-17 03:56:22,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=141580.0, ans=0.07 2024-09-17 03:56:40,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=141620.0, ans=0.2 2024-09-17 03:56:58,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=141660.0, ans=0.0 2024-09-17 03:57:02,878 INFO [train.py:1198] (0/2) Epoch 8, batch 3750, loss[loss=0.2463, ctc_loss=0.1655, cr_loss=0.4192, attn_decoder_loss=0.246, over 29315.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1847, cr_loss=0.4163, attn_decoder_loss=0.2735, over 5808361.26 frames. ], batch size: 67, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 03:57:15,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.10 vs. limit=15.0 2024-09-17 03:57:30,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=141740.0, ans=0.125 2024-09-17 03:57:36,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.05 vs. limit=15.0 2024-09-17 03:57:55,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=141820.0, ans=0.0 2024-09-17 03:57:56,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.35 vs. limit=15.0 2024-09-17 03:57:56,595 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.547e+01 9.777e+01 1.089e+02 1.271e+02 6.127e+02, threshold=2.178e+02, percent-clipped=4.0 2024-09-17 03:58:04,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=141860.0, ans=0.2 2024-09-17 03:58:09,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=141860.0, ans=0.025 2024-09-17 03:58:10,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=141860.0, ans=0.025 2024-09-17 03:58:18,896 INFO [train.py:1198] (0/2) Epoch 8, batch 3800, loss[loss=0.2914, ctc_loss=0.1992, cr_loss=0.4432, attn_decoder_loss=0.2918, over 29645.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1848, cr_loss=0.4156, attn_decoder_loss=0.2737, over 5798818.12 frames. ], batch size: 86, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:58:28,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=141900.0, ans=0.09899494936611666 2024-09-17 03:58:33,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=141940.0, ans=0.0 2024-09-17 03:58:36,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=141940.0, ans=0.05 2024-09-17 03:58:38,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=141940.0, ans=0.0 2024-09-17 03:58:38,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=141940.0, ans=0.2 2024-09-17 03:59:16,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=142060.0, ans=0.0 2024-09-17 03:59:24,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=142060.0, ans=0.2 2024-09-17 03:59:33,130 INFO [train.py:1198] (0/2) Epoch 8, batch 3850, loss[loss=0.2955, ctc_loss=0.1986, cr_loss=0.4216, attn_decoder_loss=0.2969, over 29309.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1843, cr_loss=0.4153, attn_decoder_loss=0.2735, over 5812186.26 frames. ], batch size: 100, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 03:59:33,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142100.0, ans=0.1 2024-09-17 03:59:46,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=142140.0, ans=0.125 2024-09-17 04:00:02,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=142180.0, ans=0.0 2024-09-17 04:00:15,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.20 vs. limit=22.5 2024-09-17 04:00:19,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=142220.0, ans=0.1 2024-09-17 04:00:26,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.741e+01 9.765e+01 1.055e+02 1.135e+02 1.958e+02, threshold=2.110e+02, percent-clipped=1.0 2024-09-17 04:00:40,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.53 vs. limit=15.0 2024-09-17 04:00:47,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=142300.0, ans=0.1 2024-09-17 04:00:48,846 INFO [train.py:1198] (0/2) Epoch 8, batch 3900, loss[loss=0.2865, ctc_loss=0.1824, cr_loss=0.4207, attn_decoder_loss=0.2888, over 29630.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1846, cr_loss=0.4161, attn_decoder_loss=0.274, over 5816542.36 frames. ], batch size: 86, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 04:00:50,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=142300.0, ans=0.125 2024-09-17 04:00:52,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=142300.0, ans=0.125 2024-09-17 04:00:55,191 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:01:04,065 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:01:11,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=142340.0, ans=0.2 2024-09-17 04:01:32,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=142420.0, ans=0.0 2024-09-17 04:01:45,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=142420.0, ans=0.2 2024-09-17 04:01:46,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=142460.0, ans=0.125 2024-09-17 04:01:49,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=142460.0, ans=0.0 2024-09-17 04:02:02,699 INFO [train.py:1198] (0/2) Epoch 8, batch 3950, loss[loss=0.2866, ctc_loss=0.1851, cr_loss=0.4135, attn_decoder_loss=0.2887, over 29474.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1844, cr_loss=0.4166, attn_decoder_loss=0.274, over 5836000.69 frames. ], batch size: 97, lr: 1.37e-02, grad_scale: 4.0 2024-09-17 04:02:06,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2024-09-17 04:02:12,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=22.5 2024-09-17 04:02:19,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=142540.0, ans=0.125 2024-09-17 04:02:22,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=142540.0, ans=0.2 2024-09-17 04:02:32,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=142580.0, ans=0.0 2024-09-17 04:02:58,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 9.742e+01 1.045e+02 1.185e+02 2.599e+02, threshold=2.090e+02, percent-clipped=1.0 2024-09-17 04:03:13,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-09-17 04:03:17,017 INFO [train.py:1198] (0/2) Epoch 8, batch 4000, loss[loss=0.2573, ctc_loss=0.1676, cr_loss=0.4027, attn_decoder_loss=0.2583, over 29505.00 frames. ], tot_loss[loss=0.2734, ctc_loss=0.1846, cr_loss=0.4168, attn_decoder_loss=0.274, over 5813588.09 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:03:26,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142700.0, ans=0.1 2024-09-17 04:03:36,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142740.0, ans=0.1 2024-09-17 04:04:04,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=142820.0, ans=0.125 2024-09-17 04:04:09,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.52 vs. limit=10.0 2024-09-17 04:04:30,885 INFO [train.py:1198] (0/2) Epoch 8, batch 4050, loss[loss=0.3076, ctc_loss=0.2408, cr_loss=0.4279, attn_decoder_loss=0.3055, over 20128.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1848, cr_loss=0.4165, attn_decoder_loss=0.2741, over 5796498.80 frames. ], batch size: 209, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:04:45,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=142940.0, ans=0.1 2024-09-17 04:05:20,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143020.0, ans=0.125 2024-09-17 04:05:23,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=143020.0, ans=0.125 2024-09-17 04:05:29,425 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.245e+01 1.069e+02 1.234e+02 1.438e+02 3.012e+02, threshold=2.468e+02, percent-clipped=5.0 2024-09-17 04:05:37,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2024-09-17 04:05:45,646 INFO [train.py:1198] (0/2) Epoch 8, batch 4100, loss[loss=0.2742, ctc_loss=0.1813, cr_loss=0.423, attn_decoder_loss=0.2751, over 29501.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.1845, cr_loss=0.4161, attn_decoder_loss=0.2739, over 5791291.76 frames. ], batch size: 90, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:05:45,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=143100.0, ans=0.0 2024-09-17 04:05:47,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143100.0, ans=0.1 2024-09-17 04:06:21,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=143180.0, ans=0.05 2024-09-17 04:06:42,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.73 vs. limit=22.5 2024-09-17 04:06:48,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=12.0 2024-09-17 04:06:59,231 INFO [train.py:1198] (0/2) Epoch 8, batch 4150, loss[loss=0.2736, ctc_loss=0.1861, cr_loss=0.4512, attn_decoder_loss=0.2733, over 29506.00 frames. ], tot_loss[loss=0.2732, ctc_loss=0.1844, cr_loss=0.416, attn_decoder_loss=0.2738, over 5796746.70 frames. ], batch size: 77, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:07:15,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=143340.0, ans=0.125 2024-09-17 04:07:38,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=143380.0, ans=0.125 2024-09-17 04:07:39,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.55 vs. limit=15.0 2024-09-17 04:07:39,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2024-09-17 04:07:56,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=143420.0, ans=0.1 2024-09-17 04:07:57,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=143460.0, ans=10.0 2024-09-17 04:07:58,960 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.371e+01 9.814e+01 1.059e+02 1.146e+02 1.859e+02, threshold=2.118e+02, percent-clipped=0.0 2024-09-17 04:08:06,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=143460.0, ans=0.05 2024-09-17 04:08:09,599 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:08:13,622 INFO [train.py:1198] (0/2) Epoch 8, batch 4200, loss[loss=0.2944, ctc_loss=0.2036, cr_loss=0.4266, attn_decoder_loss=0.295, over 29488.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.185, cr_loss=0.4172, attn_decoder_loss=0.2745, over 5799359.61 frames. ], batch size: 90, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:08:33,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-17 04:08:47,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=143580.0, ans=0.025 2024-09-17 04:09:06,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=143620.0, ans=0.1 2024-09-17 04:09:27,761 INFO [train.py:1198] (0/2) Epoch 8, batch 4250, loss[loss=0.2429, ctc_loss=0.1583, cr_loss=0.3585, attn_decoder_loss=0.2444, over 29534.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1847, cr_loss=0.4162, attn_decoder_loss=0.2745, over 5805565.48 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:09:38,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143700.0, ans=0.1 2024-09-17 04:09:39,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=143700.0, ans=0.1 2024-09-17 04:09:48,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=143740.0, ans=0.5 2024-09-17 04:09:56,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.59 vs. limit=10.0 2024-09-17 04:10:25,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=143860.0, ans=0.035 2024-09-17 04:10:27,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.334e+01 1.014e+02 1.108e+02 1.214e+02 2.997e+02, threshold=2.217e+02, percent-clipped=4.0 2024-09-17 04:10:31,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-09-17 04:10:32,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=143860.0, ans=0.0 2024-09-17 04:10:36,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143860.0, ans=0.1 2024-09-17 04:10:40,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.25 vs. limit=22.5 2024-09-17 04:10:41,039 INFO [train.py:1198] (0/2) Epoch 8, batch 4300, loss[loss=0.2809, ctc_loss=0.1857, cr_loss=0.4374, attn_decoder_loss=0.2818, over 29505.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1845, cr_loss=0.4156, attn_decoder_loss=0.2744, over 5794779.35 frames. ], batch size: 87, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:10:43,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2024-09-17 04:10:44,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=143900.0, ans=0.125 2024-09-17 04:10:47,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=143900.0, ans=0.0 2024-09-17 04:11:02,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143940.0, ans=0.1 2024-09-17 04:11:17,807 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-36000.pt 2024-09-17 04:11:36,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=144020.0, ans=0.0 2024-09-17 04:11:48,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=144060.0, ans=0.0 2024-09-17 04:11:49,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=144060.0, ans=0.0 2024-09-17 04:11:49,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=144060.0, ans=0.125 2024-09-17 04:11:52,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=144060.0, ans=0.04949747468305833 2024-09-17 04:11:54,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.09 vs. limit=15.0 2024-09-17 04:12:02,548 INFO [train.py:1198] (0/2) Epoch 8, batch 4350, loss[loss=0.2984, ctc_loss=0.2102, cr_loss=0.4612, attn_decoder_loss=0.2979, over 29478.00 frames. ], tot_loss[loss=0.2774, ctc_loss=0.1876, cr_loss=0.4211, attn_decoder_loss=0.2781, over 5796484.78 frames. ], batch size: 97, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:12:26,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144140.0, ans=0.1 2024-09-17 04:12:46,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=144220.0, ans=0.2 2024-09-17 04:12:50,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=144220.0, ans=0.125 2024-09-17 04:13:03,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=144260.0, ans=0.2 2024-09-17 04:13:04,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.658e+01 1.032e+02 1.110e+02 1.170e+02 3.272e+02, threshold=2.221e+02, percent-clipped=1.0 2024-09-17 04:13:09,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=144260.0, ans=0.0 2024-09-17 04:13:16,186 INFO [train.py:1198] (0/2) Epoch 8, batch 4400, loss[loss=0.29, ctc_loss=0.2089, cr_loss=0.4593, attn_decoder_loss=0.2888, over 27229.00 frames. ], tot_loss[loss=0.28, ctc_loss=0.1902, cr_loss=0.424, attn_decoder_loss=0.2806, over 5767387.90 frames. ], batch size: 124, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:13:17,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-17 04:13:38,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=144340.0, ans=0.035 2024-09-17 04:13:47,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=144380.0, ans=0.0 2024-09-17 04:14:00,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=144420.0, ans=0.125 2024-09-17 04:14:18,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.25 vs. limit=10.0 2024-09-17 04:14:21,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144460.0, ans=0.1 2024-09-17 04:14:29,767 INFO [train.py:1198] (0/2) Epoch 8, batch 4450, loss[loss=0.3193, ctc_loss=0.2617, cr_loss=0.4495, attn_decoder_loss=0.3157, over 19898.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.1963, cr_loss=0.4276, attn_decoder_loss=0.2838, over 5580029.42 frames. ], batch size: 209, lr: 1.36e-02, grad_scale: 4.0 2024-09-17 04:14:30,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=144500.0, ans=0.0 2024-09-17 04:14:48,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=144540.0, ans=0.0 2024-09-17 04:14:58,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=144580.0, ans=0.125 2024-09-17 04:15:11,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=144580.0, ans=0.0 2024-09-17 04:15:21,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=144620.0, ans=0.125 2024-09-17 04:15:32,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=144660.0, ans=0.125 2024-09-17 04:15:34,821 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.417e+01 1.090e+02 1.182e+02 1.322e+02 3.138e+02, threshold=2.364e+02, percent-clipped=1.0 2024-09-17 04:15:45,077 INFO [train.py:1198] (0/2) Epoch 8, batch 4500, loss[loss=0.3112, ctc_loss=0.2385, cr_loss=0.4635, attn_decoder_loss=0.3089, over 20228.00 frames. ], tot_loss[loss=0.2873, ctc_loss=0.2036, cr_loss=0.4288, attn_decoder_loss=0.2871, over 5235560.03 frames. ], batch size: 209, lr: 1.36e-02, grad_scale: 8.0 2024-09-17 04:15:53,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=144700.0, ans=22.5 2024-09-17 04:16:01,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=144740.0, ans=0.2 2024-09-17 04:16:10,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=144740.0, ans=0.125 2024-09-17 04:16:22,247 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-8.pt 2024-09-17 04:17:10,883 INFO [train.py:1198] (0/2) Epoch 9, batch 0, loss[loss=0.2639, ctc_loss=0.1725, cr_loss=0.383, attn_decoder_loss=0.2655, over 29618.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1725, cr_loss=0.383, attn_decoder_loss=0.2655, over 29618.00 frames. ], batch size: 73, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:17:10,884 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 04:17:29,062 INFO [train.py:1230] (0/2) Epoch 9, validation: loss=0.2184, ctc_loss=0.05457, cr_loss=4.594e-15, attn_decoder_loss=0.2366, over 944034.00 frames. 2024-09-17 04:17:29,062 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 04:17:58,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=144840.0, ans=0.125 2024-09-17 04:18:05,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-09-17 04:18:09,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=144880.0, ans=0.2 2024-09-17 04:18:16,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2024-09-17 04:18:26,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.31 vs. limit=15.0 2024-09-17 04:18:27,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=144920.0, ans=0.025 2024-09-17 04:18:44,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=144960.0, ans=0.125 2024-09-17 04:18:48,438 INFO [train.py:1198] (0/2) Epoch 9, batch 50, loss[loss=0.2455, ctc_loss=0.1629, cr_loss=0.3747, attn_decoder_loss=0.2463, over 29400.00 frames. ], tot_loss[loss=0.2757, ctc_loss=0.1899, cr_loss=0.421, attn_decoder_loss=0.2759, over 1267956.52 frames. ], batch size: 70, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:18:59,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=145000.0, ans=0.125 2024-09-17 04:19:03,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=145040.0, ans=0.0 2024-09-17 04:19:16,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=145040.0, ans=0.025 2024-09-17 04:19:18,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.315e+01 1.028e+02 1.122e+02 1.290e+02 1.269e+03, threshold=2.245e+02, percent-clipped=1.0 2024-09-17 04:19:27,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=145080.0, ans=15.0 2024-09-17 04:19:52,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=145160.0, ans=0.025 2024-09-17 04:20:04,125 INFO [train.py:1198] (0/2) Epoch 9, batch 100, loss[loss=0.2629, ctc_loss=0.1817, cr_loss=0.4226, attn_decoder_loss=0.2625, over 29549.00 frames. ], tot_loss[loss=0.2768, ctc_loss=0.1893, cr_loss=0.4208, attn_decoder_loss=0.2772, over 2250943.85 frames. ], batch size: 76, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:20:07,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=145200.0, ans=0.5 2024-09-17 04:20:18,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=145240.0, ans=0.125 2024-09-17 04:20:43,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=145280.0, ans=0.2 2024-09-17 04:20:50,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2024-09-17 04:20:52,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=145320.0, ans=0.125 2024-09-17 04:21:10,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145360.0, ans=0.1 2024-09-17 04:21:10,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=145360.0, ans=0.125 2024-09-17 04:21:12,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=145360.0, ans=0.09899494936611666 2024-09-17 04:21:18,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=145400.0, ans=0.125 2024-09-17 04:21:19,399 INFO [train.py:1198] (0/2) Epoch 9, batch 150, loss[loss=0.2448, ctc_loss=0.1609, cr_loss=0.4009, attn_decoder_loss=0.2452, over 29442.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.186, cr_loss=0.4163, attn_decoder_loss=0.2747, over 3045710.84 frames. ], batch size: 70, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:21:19,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145400.0, ans=0.0 2024-09-17 04:21:26,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=145400.0, ans=0.05 2024-09-17 04:21:52,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2024-09-17 04:21:54,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=145480.0, ans=0.0 2024-09-17 04:21:55,712 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.472e+01 1.015e+02 1.087e+02 1.260e+02 1.994e+02, threshold=2.174e+02, percent-clipped=0.0 2024-09-17 04:22:18,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=145520.0, ans=0.025 2024-09-17 04:22:29,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=145560.0, ans=0.0 2024-09-17 04:22:30,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=145560.0, ans=0.125 2024-09-17 04:22:38,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=145600.0, ans=0.125 2024-09-17 04:22:39,823 INFO [train.py:1198] (0/2) Epoch 9, batch 200, loss[loss=0.2852, ctc_loss=0.2, cr_loss=0.4401, attn_decoder_loss=0.2849, over 27244.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1842, cr_loss=0.4155, attn_decoder_loss=0.2735, over 3659186.65 frames. ], batch size: 124, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:22:49,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2024-09-17 04:22:55,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=145640.0, ans=0.95 2024-09-17 04:22:56,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=145640.0, ans=0.0 2024-09-17 04:22:59,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=145640.0, ans=0.2 2024-09-17 04:23:11,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=145680.0, ans=0.025 2024-09-17 04:23:13,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=145680.0, ans=0.0 2024-09-17 04:23:13,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=145680.0, ans=0.025 2024-09-17 04:23:13,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145680.0, ans=0.1 2024-09-17 04:23:46,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.06 vs. limit=10.0 2024-09-17 04:23:55,899 INFO [train.py:1198] (0/2) Epoch 9, batch 250, loss[loss=0.2678, ctc_loss=0.171, cr_loss=0.3872, attn_decoder_loss=0.2699, over 29257.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1829, cr_loss=0.4144, attn_decoder_loss=0.273, over 4141703.28 frames. ], batch size: 100, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:24:25,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2024-09-17 04:24:28,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.48 vs. limit=15.0 2024-09-17 04:24:29,185 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.608e+01 1.032e+02 1.129e+02 1.433e+02, threshold=2.064e+02, percent-clipped=0.0 2024-09-17 04:24:49,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=145920.0, ans=0.1 2024-09-17 04:24:55,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=145960.0, ans=0.125 2024-09-17 04:25:01,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-17 04:25:02,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=145960.0, ans=0.1 2024-09-17 04:25:11,661 INFO [train.py:1198] (0/2) Epoch 9, batch 300, loss[loss=0.2735, ctc_loss=0.1784, cr_loss=0.4157, attn_decoder_loss=0.2748, over 29528.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1823, cr_loss=0.4146, attn_decoder_loss=0.2727, over 4509035.52 frames. ], batch size: 92, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:25:33,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=146040.0, ans=0.025 2024-09-17 04:25:56,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2024-09-17 04:26:00,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=146120.0, ans=0.125 2024-09-17 04:26:02,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=146120.0, ans=0.125 2024-09-17 04:26:06,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=146120.0, ans=0.0 2024-09-17 04:26:18,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=146160.0, ans=0.125 2024-09-17 04:26:24,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.08 vs. limit=12.0 2024-09-17 04:26:32,383 INFO [train.py:1198] (0/2) Epoch 9, batch 350, loss[loss=0.243, ctc_loss=0.1578, cr_loss=0.3792, attn_decoder_loss=0.244, over 29311.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.183, cr_loss=0.4152, attn_decoder_loss=0.2734, over 4793451.15 frames. ], batch size: 71, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:26:41,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=146200.0, ans=0.0 2024-09-17 04:26:43,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=146200.0, ans=0.2 2024-09-17 04:27:06,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.424e+01 9.475e+01 1.008e+02 1.084e+02 2.956e+02, threshold=2.017e+02, percent-clipped=2.0 2024-09-17 04:27:09,000 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:27:10,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=146280.0, ans=0.0 2024-09-17 04:27:24,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=146320.0, ans=0.125 2024-09-17 04:27:39,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146360.0, ans=0.1 2024-09-17 04:27:47,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=146400.0, ans=0.0 2024-09-17 04:27:48,634 INFO [train.py:1198] (0/2) Epoch 9, batch 400, loss[loss=0.2786, ctc_loss=0.1824, cr_loss=0.4261, attn_decoder_loss=0.2798, over 29687.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1825, cr_loss=0.4146, attn_decoder_loss=0.2731, over 5023530.68 frames. ], batch size: 82, lr: 1.28e-02, grad_scale: 8.0 2024-09-17 04:27:57,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.56 vs. limit=12.0 2024-09-17 04:27:59,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=146400.0, ans=0.0 2024-09-17 04:28:19,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=146480.0, ans=0.125 2024-09-17 04:28:27,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=146480.0, ans=0.0 2024-09-17 04:28:39,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=146520.0, ans=0.025 2024-09-17 04:28:59,216 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:29:04,922 INFO [train.py:1198] (0/2) Epoch 9, batch 450, loss[loss=0.2869, ctc_loss=0.1949, cr_loss=0.4252, attn_decoder_loss=0.2877, over 29687.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1829, cr_loss=0.4154, attn_decoder_loss=0.2735, over 5184933.77 frames. ], batch size: 83, lr: 1.28e-02, grad_scale: 4.0 2024-09-17 04:29:16,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-09-17 04:29:46,241 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.248e+01 9.637e+01 1.024e+02 1.129e+02 3.219e+02, threshold=2.049e+02, percent-clipped=1.0 2024-09-17 04:30:21,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=146760.0, ans=0.125 2024-09-17 04:30:22,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-09-17 04:30:25,861 INFO [train.py:1198] (0/2) Epoch 9, batch 500, loss[loss=0.3045, ctc_loss=0.2079, cr_loss=0.4593, attn_decoder_loss=0.305, over 29441.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1815, cr_loss=0.4135, attn_decoder_loss=0.2721, over 5328409.84 frames. ], batch size: 94, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:30:29,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=146800.0, ans=0.025 2024-09-17 04:30:30,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=146800.0, ans=0.125 2024-09-17 04:31:10,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=146920.0, ans=0.125 2024-09-17 04:31:41,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-17 04:31:42,468 INFO [train.py:1198] (0/2) Epoch 9, batch 550, loss[loss=0.2875, ctc_loss=0.1986, cr_loss=0.4442, attn_decoder_loss=0.2876, over 28888.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1817, cr_loss=0.4139, attn_decoder_loss=0.2721, over 5421415.62 frames. ], batch size: 104, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:32:01,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=147040.0, ans=0.125 2024-09-17 04:32:19,195 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.353e+01 9.413e+01 1.021e+02 1.124e+02 5.702e+02, threshold=2.041e+02, percent-clipped=1.0 2024-09-17 04:32:22,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=147080.0, ans=0.125 2024-09-17 04:32:59,451 INFO [train.py:1198] (0/2) Epoch 9, batch 600, loss[loss=0.2716, ctc_loss=0.1782, cr_loss=0.4017, attn_decoder_loss=0.273, over 29220.00 frames. ], tot_loss[loss=0.2715, ctc_loss=0.1818, cr_loss=0.4146, attn_decoder_loss=0.2722, over 5507517.01 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:33:13,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=147240.0, ans=0.2 2024-09-17 04:33:17,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.44 vs. limit=10.0 2024-09-17 04:33:50,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.87 vs. limit=22.5 2024-09-17 04:33:57,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=147320.0, ans=0.1 2024-09-17 04:33:59,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.41 vs. limit=22.5 2024-09-17 04:34:00,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=147320.0, ans=0.2 2024-09-17 04:34:00,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=147320.0, ans=0.125 2024-09-17 04:34:20,290 INFO [train.py:1198] (0/2) Epoch 9, batch 650, loss[loss=0.2656, ctc_loss=0.1788, cr_loss=0.4047, attn_decoder_loss=0.2663, over 29792.00 frames. ], tot_loss[loss=0.2703, ctc_loss=0.1803, cr_loss=0.4122, attn_decoder_loss=0.2711, over 5585769.73 frames. ], batch size: 81, lr: 1.27e-02, grad_scale: 4.0 2024-09-17 04:34:29,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=147400.0, ans=0.125 2024-09-17 04:34:45,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-09-17 04:34:51,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=147480.0, ans=0.125 2024-09-17 04:34:58,647 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.437e+01 9.683e+01 1.026e+02 1.151e+02 1.521e+02, threshold=2.052e+02, percent-clipped=0.0 2024-09-17 04:35:09,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=147520.0, ans=0.125 2024-09-17 04:35:23,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=147560.0, ans=0.0 2024-09-17 04:35:36,634 INFO [train.py:1198] (0/2) Epoch 9, batch 700, loss[loss=0.258, ctc_loss=0.1706, cr_loss=0.3956, attn_decoder_loss=0.2589, over 29546.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1809, cr_loss=0.4133, attn_decoder_loss=0.2721, over 5635767.17 frames. ], batch size: 76, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:35:50,553 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:35:54,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=147640.0, ans=0.2 2024-09-17 04:35:58,021 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:35:58,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-17 04:36:12,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.66 vs. limit=15.0 2024-09-17 04:36:13,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=147680.0, ans=0.125 2024-09-17 04:36:52,810 INFO [train.py:1198] (0/2) Epoch 9, batch 750, loss[loss=0.2825, ctc_loss=0.1841, cr_loss=0.4199, attn_decoder_loss=0.2842, over 29704.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.1805, cr_loss=0.4126, attn_decoder_loss=0.2715, over 5676897.43 frames. ], batch size: 82, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:36:57,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147800.0, ans=0.125 2024-09-17 04:37:37,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.147e+01 9.690e+01 1.045e+02 1.120e+02 4.390e+02, threshold=2.090e+02, percent-clipped=1.0 2024-09-17 04:37:57,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=147960.0, ans=0.2 2024-09-17 04:38:10,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=147960.0, ans=0.125 2024-09-17 04:38:13,961 INFO [train.py:1198] (0/2) Epoch 9, batch 800, loss[loss=0.2383, ctc_loss=0.1494, cr_loss=0.3591, attn_decoder_loss=0.2402, over 29594.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1804, cr_loss=0.4122, attn_decoder_loss=0.2715, over 5706087.84 frames. ], batch size: 73, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:38:18,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148000.0, ans=0.125 2024-09-17 04:38:24,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=148000.0, ans=0.0 2024-09-17 04:38:24,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=148000.0, ans=0.025 2024-09-17 04:38:29,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=148040.0, ans=0.0 2024-09-17 04:39:29,746 INFO [train.py:1198] (0/2) Epoch 9, batch 850, loss[loss=0.2799, ctc_loss=0.1843, cr_loss=0.4181, attn_decoder_loss=0.2812, over 29717.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1798, cr_loss=0.4121, attn_decoder_loss=0.271, over 5736119.88 frames. ], batch size: 89, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:39:43,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=148240.0, ans=0.0 2024-09-17 04:39:55,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=148240.0, ans=0.125 2024-09-17 04:40:01,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=148280.0, ans=0.2 2024-09-17 04:40:06,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.62 vs. limit=15.0 2024-09-17 04:40:10,338 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.251e+01 9.624e+01 1.050e+02 1.134e+02 2.702e+02, threshold=2.101e+02, percent-clipped=1.0 2024-09-17 04:40:15,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-17 04:40:26,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=148320.0, ans=0.09899494936611666 2024-09-17 04:40:42,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=148360.0, ans=0.125 2024-09-17 04:40:45,743 INFO [train.py:1198] (0/2) Epoch 9, batch 900, loss[loss=0.2459, ctc_loss=0.1626, cr_loss=0.3693, attn_decoder_loss=0.247, over 29614.00 frames. ], tot_loss[loss=0.2704, ctc_loss=0.1801, cr_loss=0.4124, attn_decoder_loss=0.2712, over 5740208.50 frames. ], batch size: 73, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:40:54,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=148400.0, ans=0.2 2024-09-17 04:40:55,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.98 vs. limit=22.5 2024-09-17 04:41:01,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2024-09-17 04:41:04,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=148440.0, ans=0.125 2024-09-17 04:41:05,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=148440.0, ans=0.1 2024-09-17 04:41:16,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148480.0, ans=0.1 2024-09-17 04:41:18,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=148480.0, ans=0.0 2024-09-17 04:41:21,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=148480.0, ans=0.025 2024-09-17 04:41:31,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.25 vs. limit=10.0 2024-09-17 04:41:38,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=148520.0, ans=0.2 2024-09-17 04:41:39,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=148520.0, ans=0.125 2024-09-17 04:41:40,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-17 04:42:06,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2024-09-17 04:42:06,884 INFO [train.py:1198] (0/2) Epoch 9, batch 950, loss[loss=0.2417, ctc_loss=0.153, cr_loss=0.3782, attn_decoder_loss=0.2432, over 29523.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1803, cr_loss=0.4123, attn_decoder_loss=0.2715, over 5741593.40 frames. ], batch size: 74, lr: 1.27e-02, grad_scale: 4.0 2024-09-17 04:42:40,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=148680.0, ans=0.125 2024-09-17 04:42:41,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=148680.0, ans=0.125 2024-09-17 04:42:49,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.631e+01 1.018e+02 1.126e+02 1.313e+02 4.383e+02, threshold=2.253e+02, percent-clipped=5.0 2024-09-17 04:42:53,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=22.5 2024-09-17 04:43:08,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=148760.0, ans=0.125 2024-09-17 04:43:11,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2024-09-17 04:43:16,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=148760.0, ans=0.0 2024-09-17 04:43:23,704 INFO [train.py:1198] (0/2) Epoch 9, batch 1000, loss[loss=0.2564, ctc_loss=0.1691, cr_loss=0.4221, attn_decoder_loss=0.2567, over 29528.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1814, cr_loss=0.4139, attn_decoder_loss=0.2722, over 5736856.12 frames. ], batch size: 77, lr: 1.27e-02, grad_scale: 8.0 2024-09-17 04:43:31,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=148800.0, ans=0.2 2024-09-17 04:43:47,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=148840.0, ans=0.0 2024-09-17 04:44:01,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=148880.0, ans=0.07 2024-09-17 04:44:14,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=148920.0, ans=0.125 2024-09-17 04:44:30,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=148960.0, ans=0.125 2024-09-17 04:44:31,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-09-17 04:44:32,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148960.0, ans=0.1 2024-09-17 04:44:32,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=148960.0, ans=0.025 2024-09-17 04:44:32,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=148960.0, ans=0.125 2024-09-17 04:44:38,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=149000.0, ans=0.2 2024-09-17 04:44:39,605 INFO [train.py:1198] (0/2) Epoch 9, batch 1050, loss[loss=0.2709, ctc_loss=0.1822, cr_loss=0.4033, attn_decoder_loss=0.2718, over 29691.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1806, cr_loss=0.4132, attn_decoder_loss=0.2713, over 5744111.30 frames. ], batch size: 85, lr: 1.27e-02, grad_scale: 4.0 2024-09-17 04:44:53,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=149040.0, ans=0.125 2024-09-17 04:45:04,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=149040.0, ans=0.125 2024-09-17 04:45:06,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=149040.0, ans=0.125 2024-09-17 04:45:26,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 9.706e+01 1.051e+02 1.142e+02 2.250e+02, threshold=2.101e+02, percent-clipped=0.0 2024-09-17 04:45:27,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.08 vs. limit=10.0 2024-09-17 04:46:00,313 INFO [train.py:1198] (0/2) Epoch 9, batch 1100, loss[loss=0.2686, ctc_loss=0.173, cr_loss=0.4182, attn_decoder_loss=0.2699, over 29449.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1806, cr_loss=0.4132, attn_decoder_loss=0.2714, over 5755979.91 frames. ], batch size: 78, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:46:32,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=149280.0, ans=0.125 2024-09-17 04:46:41,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=149280.0, ans=0.125 2024-09-17 04:46:46,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=149320.0, ans=0.0 2024-09-17 04:46:54,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.28 vs. limit=10.0 2024-09-17 04:46:54,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=149320.0, ans=0.125 2024-09-17 04:47:08,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149360.0, ans=0.0 2024-09-17 04:47:08,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=149360.0, ans=0.025 2024-09-17 04:47:14,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=149400.0, ans=0.0 2024-09-17 04:47:16,059 INFO [train.py:1198] (0/2) Epoch 9, batch 1150, loss[loss=0.2712, ctc_loss=0.1801, cr_loss=0.4413, attn_decoder_loss=0.2715, over 29474.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1805, cr_loss=0.4132, attn_decoder_loss=0.2714, over 5753695.31 frames. ], batch size: 78, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 04:47:23,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.32 vs. limit=22.5 2024-09-17 04:47:33,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-09-17 04:47:36,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=149440.0, ans=0.07 2024-09-17 04:47:44,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=149440.0, ans=0.125 2024-09-17 04:48:01,923 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.308e+01 9.807e+01 1.085e+02 1.342e+02 2.441e+02, threshold=2.171e+02, percent-clipped=4.0 2024-09-17 04:48:06,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=149520.0, ans=0.2 2024-09-17 04:48:07,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2024-09-17 04:48:33,333 INFO [train.py:1198] (0/2) Epoch 9, batch 1200, loss[loss=0.273, ctc_loss=0.1788, cr_loss=0.3918, attn_decoder_loss=0.2748, over 29679.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1815, cr_loss=0.4143, attn_decoder_loss=0.2725, over 5747368.80 frames. ], batch size: 85, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:49:11,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=149680.0, ans=0.125 2024-09-17 04:49:11,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149680.0, ans=0.125 2024-09-17 04:49:14,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=149680.0, ans=0.125 2024-09-17 04:49:17,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=149680.0, ans=0.04949747468305833 2024-09-17 04:49:19,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=149720.0, ans=0.0 2024-09-17 04:49:35,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=12.0 2024-09-17 04:49:38,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=149760.0, ans=0.5 2024-09-17 04:49:43,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2024-09-17 04:49:50,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-09-17 04:49:50,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-09-17 04:49:52,898 INFO [train.py:1198] (0/2) Epoch 9, batch 1250, loss[loss=0.2973, ctc_loss=0.2076, cr_loss=0.4638, attn_decoder_loss=0.297, over 29531.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1819, cr_loss=0.4154, attn_decoder_loss=0.2731, over 5774468.10 frames. ], batch size: 92, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:49:56,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=149800.0, ans=0.125 2024-09-17 04:49:57,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=149800.0, ans=0.125 2024-09-17 04:50:13,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-09-17 04:50:31,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149880.0, ans=0.125 2024-09-17 04:50:34,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149880.0, ans=0.125 2024-09-17 04:50:38,399 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 9.591e+01 1.046e+02 1.160e+02 1.832e+02, threshold=2.092e+02, percent-clipped=0.0 2024-09-17 04:51:08,756 INFO [train.py:1198] (0/2) Epoch 9, batch 1300, loss[loss=0.2859, ctc_loss=0.1925, cr_loss=0.4262, attn_decoder_loss=0.2868, over 28170.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1814, cr_loss=0.4147, attn_decoder_loss=0.2726, over 5778849.66 frames. ], batch size: 111, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:51:19,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=150000.0, ans=0.125 2024-09-17 04:51:25,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150040.0, ans=0.125 2024-09-17 04:51:37,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150080.0, ans=0.1 2024-09-17 04:51:50,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=150080.0, ans=0.125 2024-09-17 04:52:06,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=150120.0, ans=0.02 2024-09-17 04:52:20,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=150160.0, ans=0.05 2024-09-17 04:52:21,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=150160.0, ans=0.2 2024-09-17 04:52:24,373 INFO [train.py:1198] (0/2) Epoch 9, batch 1350, loss[loss=0.2718, ctc_loss=0.1777, cr_loss=0.418, attn_decoder_loss=0.273, over 29757.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1806, cr_loss=0.414, attn_decoder_loss=0.2721, over 5795996.35 frames. ], batch size: 81, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:52:27,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=150200.0, ans=0.0 2024-09-17 04:52:34,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.38 vs. limit=22.5 2024-09-17 04:52:41,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=150240.0, ans=0.0 2024-09-17 04:52:50,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=150240.0, ans=0.125 2024-09-17 04:52:55,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-09-17 04:53:01,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=150280.0, ans=0.0 2024-09-17 04:53:02,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.50 vs. limit=10.0 2024-09-17 04:53:11,520 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.083e+01 9.658e+01 1.049e+02 1.137e+02 1.500e+02, threshold=2.097e+02, percent-clipped=0.0 2024-09-17 04:53:32,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-17 04:53:44,222 INFO [train.py:1198] (0/2) Epoch 9, batch 1400, loss[loss=0.2357, ctc_loss=0.1528, cr_loss=0.3783, attn_decoder_loss=0.2365, over 29605.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1802, cr_loss=0.4135, attn_decoder_loss=0.2719, over 5807450.23 frames. ], batch size: 69, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:54:11,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.16 vs. limit=15.0 2024-09-17 04:54:30,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=150520.0, ans=0.125 2024-09-17 04:54:36,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=150520.0, ans=0.025 2024-09-17 04:54:41,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=150520.0, ans=0.0 2024-09-17 04:54:43,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=150560.0, ans=10.0 2024-09-17 04:54:53,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=150560.0, ans=0.125 2024-09-17 04:54:55,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-09-17 04:54:59,196 INFO [train.py:1198] (0/2) Epoch 9, batch 1450, loss[loss=0.2809, ctc_loss=0.1865, cr_loss=0.4229, attn_decoder_loss=0.282, over 29410.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1804, cr_loss=0.4138, attn_decoder_loss=0.2724, over 5803808.47 frames. ], batch size: 94, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 04:55:16,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=150640.0, ans=0.125 2024-09-17 04:55:16,125 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:55:23,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=150640.0, ans=0.2 2024-09-17 04:55:30,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=150680.0, ans=0.125 2024-09-17 04:55:45,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.347e+01 1.006e+02 1.117e+02 1.243e+02 2.760e+02, threshold=2.234e+02, percent-clipped=2.0 2024-09-17 04:55:56,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150720.0, ans=0.1 2024-09-17 04:56:13,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=150800.0, ans=0.125 2024-09-17 04:56:14,397 INFO [train.py:1198] (0/2) Epoch 9, batch 1500, loss[loss=0.2788, ctc_loss=0.1794, cr_loss=0.4253, attn_decoder_loss=0.2804, over 29638.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1806, cr_loss=0.4145, attn_decoder_loss=0.2727, over 5805081.65 frames. ], batch size: 86, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:56:31,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=150840.0, ans=0.125 2024-09-17 04:57:23,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.59 vs. limit=15.0 2024-09-17 04:57:31,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150960.0, ans=0.1 2024-09-17 04:57:33,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=151000.0, ans=0.125 2024-09-17 04:57:34,238 INFO [train.py:1198] (0/2) Epoch 9, batch 1550, loss[loss=0.2818, ctc_loss=0.1924, cr_loss=0.4437, attn_decoder_loss=0.2818, over 29505.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.181, cr_loss=0.4142, attn_decoder_loss=0.2727, over 5780926.64 frames. ], batch size: 90, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 04:57:54,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=151040.0, ans=0.125 2024-09-17 04:58:04,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=151080.0, ans=0.125 2024-09-17 04:58:22,185 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.049e+01 9.832e+01 1.106e+02 1.253e+02 2.763e+02, threshold=2.212e+02, percent-clipped=1.0 2024-09-17 04:58:24,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=151120.0, ans=0.0 2024-09-17 04:58:30,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=151120.0, ans=0.125 2024-09-17 04:58:48,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151200.0, ans=0.1 2024-09-17 04:58:49,783 INFO [train.py:1198] (0/2) Epoch 9, batch 1600, loss[loss=0.2879, ctc_loss=0.1989, cr_loss=0.4532, attn_decoder_loss=0.2877, over 29660.00 frames. ], tot_loss[loss=0.2715, ctc_loss=0.1807, cr_loss=0.4139, attn_decoder_loss=0.2724, over 5765228.09 frames. ], batch size: 85, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 04:59:02,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.28 vs. limit=15.0 2024-09-17 04:59:42,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=151320.0, ans=0.04949747468305833 2024-09-17 04:59:55,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=151360.0, ans=0.125 2024-09-17 04:59:58,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151360.0, ans=0.1 2024-09-17 05:00:05,099 INFO [train.py:1198] (0/2) Epoch 9, batch 1650, loss[loss=0.2828, ctc_loss=0.1893, cr_loss=0.4185, attn_decoder_loss=0.2839, over 29715.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1804, cr_loss=0.413, attn_decoder_loss=0.2721, over 5758831.92 frames. ], batch size: 89, lr: 1.26e-02, grad_scale: 4.0 2024-09-17 05:00:23,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=151440.0, ans=0.0 2024-09-17 05:00:55,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=151520.0, ans=0.02 2024-09-17 05:00:57,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.589e+01 1.020e+02 1.089e+02 1.544e+02, threshold=2.040e+02, percent-clipped=0.0 2024-09-17 05:00:58,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=151520.0, ans=0.035 2024-09-17 05:01:03,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=151520.0, ans=0.0 2024-09-17 05:01:24,480 INFO [train.py:1198] (0/2) Epoch 9, batch 1700, loss[loss=0.247, ctc_loss=0.1692, cr_loss=0.4065, attn_decoder_loss=0.2466, over 29561.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1802, cr_loss=0.4133, attn_decoder_loss=0.2719, over 5781189.10 frames. ], batch size: 69, lr: 1.26e-02, grad_scale: 8.0 2024-09-17 05:01:45,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=151640.0, ans=0.0 2024-09-17 05:01:47,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=151640.0, ans=0.125 2024-09-17 05:01:58,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2024-09-17 05:02:12,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=151720.0, ans=0.0 2024-09-17 05:02:39,565 INFO [train.py:1198] (0/2) Epoch 9, batch 1750, loss[loss=0.2435, ctc_loss=0.1584, cr_loss=0.392, attn_decoder_loss=0.2443, over 29323.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1798, cr_loss=0.4129, attn_decoder_loss=0.2715, over 5790068.74 frames. ], batch size: 67, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:02:51,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=151800.0, ans=0.0 2024-09-17 05:02:59,469 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:03:16,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=151880.0, ans=0.125 2024-09-17 05:03:29,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=151920.0, ans=0.0 2024-09-17 05:03:29,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2024-09-17 05:03:30,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 9.433e+01 1.015e+02 1.120e+02 2.449e+02, threshold=2.030e+02, percent-clipped=1.0 2024-09-17 05:03:54,783 INFO [train.py:1198] (0/2) Epoch 9, batch 1800, loss[loss=0.2719, ctc_loss=0.1751, cr_loss=0.3896, attn_decoder_loss=0.274, over 29683.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1801, cr_loss=0.4134, attn_decoder_loss=0.2718, over 5792250.98 frames. ], batch size: 83, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:04:01,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152000.0, ans=0.1 2024-09-17 05:04:28,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=152080.0, ans=0.125 2024-09-17 05:04:29,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=152080.0, ans=0.125 2024-09-17 05:04:40,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=152120.0, ans=0.0 2024-09-17 05:05:02,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-09-17 05:05:12,107 INFO [train.py:1198] (0/2) Epoch 9, batch 1850, loss[loss=0.2803, ctc_loss=0.1851, cr_loss=0.4138, attn_decoder_loss=0.2817, over 29627.00 frames. ], tot_loss[loss=0.2708, ctc_loss=0.1802, cr_loss=0.4144, attn_decoder_loss=0.2717, over 5798348.38 frames. ], batch size: 86, lr: 1.25e-02, grad_scale: 4.0 2024-09-17 05:05:23,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=152200.0, ans=0.1 2024-09-17 05:05:27,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=152240.0, ans=0.0 2024-09-17 05:05:35,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=152240.0, ans=0.2 2024-09-17 05:05:51,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-17 05:06:06,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.561e+01 1.000e+02 1.112e+02 1.269e+02 1.875e+02, threshold=2.225e+02, percent-clipped=0.0 2024-09-17 05:06:13,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=152360.0, ans=0.125 2024-09-17 05:06:22,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=152360.0, ans=0.125 2024-09-17 05:06:29,019 INFO [train.py:1198] (0/2) Epoch 9, batch 1900, loss[loss=0.2833, ctc_loss=0.1929, cr_loss=0.4342, attn_decoder_loss=0.2837, over 29707.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1809, cr_loss=0.415, attn_decoder_loss=0.2726, over 5805115.50 frames. ], batch size: 89, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:06:37,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-17 05:06:42,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=152440.0, ans=0.125 2024-09-17 05:06:45,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2024-09-17 05:06:50,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-09-17 05:07:25,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=152520.0, ans=0.125 2024-09-17 05:07:35,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152560.0, ans=0.1 2024-09-17 05:07:38,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=152560.0, ans=0.025 2024-09-17 05:07:42,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=152600.0, ans=0.0 2024-09-17 05:07:44,261 INFO [train.py:1198] (0/2) Epoch 9, batch 1950, loss[loss=0.2693, ctc_loss=0.1804, cr_loss=0.4056, attn_decoder_loss=0.2701, over 29458.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1817, cr_loss=0.417, attn_decoder_loss=0.2737, over 5819884.90 frames. ], batch size: 78, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:08:16,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=152680.0, ans=0.05 2024-09-17 05:08:40,100 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 9.742e+01 1.027e+02 1.111e+02 1.388e+02, threshold=2.054e+02, percent-clipped=0.0 2024-09-17 05:08:46,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=152760.0, ans=0.125 2024-09-17 05:08:46,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=152760.0, ans=0.125 2024-09-17 05:08:46,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=152760.0, ans=0.2 2024-09-17 05:09:00,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=152800.0, ans=0.1 2024-09-17 05:09:01,811 INFO [train.py:1198] (0/2) Epoch 9, batch 2000, loss[loss=0.2386, ctc_loss=0.1569, cr_loss=0.381, attn_decoder_loss=0.2392, over 29302.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1821, cr_loss=0.4168, attn_decoder_loss=0.2739, over 5796815.62 frames. ], batch size: 67, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:09:02,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-17 05:09:14,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=152800.0, ans=0.125 2024-09-17 05:09:58,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=152920.0, ans=0.125 2024-09-17 05:10:02,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=152960.0, ans=0.2 2024-09-17 05:10:04,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=152960.0, ans=0.2 2024-09-17 05:10:07,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=152960.0, ans=0.125 2024-09-17 05:10:08,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=152960.0, ans=0.125 2024-09-17 05:10:19,051 INFO [train.py:1198] (0/2) Epoch 9, batch 2050, loss[loss=0.2366, ctc_loss=0.1443, cr_loss=0.3671, attn_decoder_loss=0.2387, over 29442.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1812, cr_loss=0.4151, attn_decoder_loss=0.2726, over 5789217.53 frames. ], batch size: 70, lr: 1.25e-02, grad_scale: 4.0 2024-09-17 05:10:40,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=153040.0, ans=0.0 2024-09-17 05:10:45,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=153040.0, ans=0.0 2024-09-17 05:11:01,917 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:11:07,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=153120.0, ans=0.95 2024-09-17 05:11:12,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=153120.0, ans=0.125 2024-09-17 05:11:15,051 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.385e+01 9.413e+01 1.004e+02 1.102e+02 4.512e+02, threshold=2.009e+02, percent-clipped=3.0 2024-09-17 05:11:34,703 INFO [train.py:1198] (0/2) Epoch 9, batch 2100, loss[loss=0.2664, ctc_loss=0.1692, cr_loss=0.4139, attn_decoder_loss=0.2681, over 29769.00 frames. ], tot_loss[loss=0.2707, ctc_loss=0.18, cr_loss=0.4137, attn_decoder_loss=0.2716, over 5800147.05 frames. ], batch size: 81, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:11:35,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.57 vs. limit=15.0 2024-09-17 05:12:01,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2024-09-17 05:12:01,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=153240.0, ans=0.0 2024-09-17 05:12:05,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.75 vs. limit=15.0 2024-09-17 05:12:17,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=153280.0, ans=0.125 2024-09-17 05:12:17,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=153280.0, ans=0.0 2024-09-17 05:12:17,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=153280.0, ans=0.125 2024-09-17 05:12:21,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=153320.0, ans=0.0 2024-09-17 05:12:27,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=153320.0, ans=0.125 2024-09-17 05:12:51,528 INFO [train.py:1198] (0/2) Epoch 9, batch 2150, loss[loss=0.2516, ctc_loss=0.1547, cr_loss=0.3741, attn_decoder_loss=0.254, over 29440.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1788, cr_loss=0.4122, attn_decoder_loss=0.2709, over 5814935.89 frames. ], batch size: 78, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:13:27,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.07 vs. limit=15.0 2024-09-17 05:13:51,016 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 9.836e+01 1.055e+02 1.144e+02 2.218e+02, threshold=2.111e+02, percent-clipped=2.0 2024-09-17 05:13:57,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=153560.0, ans=0.0 2024-09-17 05:14:08,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=153600.0, ans=0.125 2024-09-17 05:14:09,673 INFO [train.py:1198] (0/2) Epoch 9, batch 2200, loss[loss=0.2911, ctc_loss=0.1983, cr_loss=0.4394, attn_decoder_loss=0.2917, over 29626.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1796, cr_loss=0.4133, attn_decoder_loss=0.2715, over 5811617.05 frames. ], batch size: 86, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:14:19,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=153600.0, ans=0.125 2024-09-17 05:14:29,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=153640.0, ans=0.0 2024-09-17 05:14:39,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2024-09-17 05:14:55,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=153720.0, ans=10.0 2024-09-17 05:14:56,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=153720.0, ans=0.0 2024-09-17 05:15:02,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153720.0, ans=0.1 2024-09-17 05:15:04,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.72 vs. limit=15.0 2024-09-17 05:15:09,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=153760.0, ans=0.1 2024-09-17 05:15:19,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153760.0, ans=0.125 2024-09-17 05:15:20,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=153760.0, ans=0.125 2024-09-17 05:15:25,276 INFO [train.py:1198] (0/2) Epoch 9, batch 2250, loss[loss=0.262, ctc_loss=0.1656, cr_loss=0.3847, attn_decoder_loss=0.2641, over 29748.00 frames. ], tot_loss[loss=0.2704, ctc_loss=0.1793, cr_loss=0.4129, attn_decoder_loss=0.2714, over 5810736.13 frames. ], batch size: 82, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:15:33,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=153800.0, ans=0.2 2024-09-17 05:15:37,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5 2024-09-17 05:15:45,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=153840.0, ans=10.0 2024-09-17 05:15:52,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=153840.0, ans=0.125 2024-09-17 05:16:18,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2024-09-17 05:16:24,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-09-17 05:16:24,413 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.430e+01 9.555e+01 1.015e+02 1.096e+02 3.730e+02, threshold=2.031e+02, percent-clipped=3.0 2024-09-17 05:16:29,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=153960.0, ans=0.2 2024-09-17 05:16:32,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=153960.0, ans=0.125 2024-09-17 05:16:42,434 INFO [train.py:1198] (0/2) Epoch 9, batch 2300, loss[loss=0.2516, ctc_loss=0.1636, cr_loss=0.3839, attn_decoder_loss=0.2528, over 29321.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1778, cr_loss=0.4104, attn_decoder_loss=0.2699, over 5797824.72 frames. ], batch size: 71, lr: 1.25e-02, grad_scale: 8.0 2024-09-17 05:17:30,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154120.0, ans=0.125 2024-09-17 05:17:42,690 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:17:50,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=154160.0, ans=10.0 2024-09-17 05:17:51,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154160.0, ans=0.1 2024-09-17 05:17:51,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154160.0, ans=0.125 2024-09-17 05:17:53,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=154160.0, ans=0.125 2024-09-17 05:18:02,041 INFO [train.py:1198] (0/2) Epoch 9, batch 2350, loss[loss=0.2765, ctc_loss=0.181, cr_loss=0.4283, attn_decoder_loss=0.2776, over 29686.00 frames. ], tot_loss[loss=0.2691, ctc_loss=0.1779, cr_loss=0.4105, attn_decoder_loss=0.2701, over 5804098.15 frames. ], batch size: 83, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:18:05,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=154200.0, ans=0.0 2024-09-17 05:18:09,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=154200.0, ans=0.125 2024-09-17 05:18:26,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=154240.0, ans=0.2 2024-09-17 05:18:51,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2024-09-17 05:18:53,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-09-17 05:18:58,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=154320.0, ans=0.125 2024-09-17 05:18:59,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.337e+01 9.484e+01 1.020e+02 1.101e+02 1.845e+02, threshold=2.040e+02, percent-clipped=0.0 2024-09-17 05:19:17,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=154400.0, ans=0.125 2024-09-17 05:19:18,465 INFO [train.py:1198] (0/2) Epoch 9, batch 2400, loss[loss=0.2495, ctc_loss=0.1634, cr_loss=0.384, attn_decoder_loss=0.2506, over 29524.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1788, cr_loss=0.4119, attn_decoder_loss=0.2708, over 5807424.17 frames. ], batch size: 76, lr: 1.24e-02, grad_scale: 16.0 2024-09-17 05:19:18,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154400.0, ans=0.125 2024-09-17 05:19:20,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=154400.0, ans=0.02 2024-09-17 05:20:03,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=154480.0, ans=0.125 2024-09-17 05:20:04,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=154520.0, ans=0.125 2024-09-17 05:20:13,934 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:20:21,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=154560.0, ans=0.125 2024-09-17 05:20:27,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=154560.0, ans=0.125 2024-09-17 05:20:36,503 INFO [train.py:1198] (0/2) Epoch 9, batch 2450, loss[loss=0.2726, ctc_loss=0.1728, cr_loss=0.394, attn_decoder_loss=0.2749, over 29717.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1798, cr_loss=0.4135, attn_decoder_loss=0.2718, over 5784768.76 frames. ], batch size: 82, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:20:39,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=154600.0, ans=0.125 2024-09-17 05:20:42,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=154600.0, ans=0.125 2024-09-17 05:21:02,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.55 vs. limit=15.0 2024-09-17 05:21:10,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154680.0, ans=0.125 2024-09-17 05:21:38,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.373e+01 9.786e+01 1.038e+02 1.229e+02 2.658e+02, threshold=2.076e+02, percent-clipped=2.0 2024-09-17 05:21:43,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154760.0, ans=0.125 2024-09-17 05:21:49,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=154760.0, ans=0.125 2024-09-17 05:21:52,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=154800.0, ans=0.0 2024-09-17 05:21:53,788 INFO [train.py:1198] (0/2) Epoch 9, batch 2500, loss[loss=0.2831, ctc_loss=0.1886, cr_loss=0.4342, attn_decoder_loss=0.284, over 29615.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.1788, cr_loss=0.4123, attn_decoder_loss=0.2712, over 5795048.54 frames. ], batch size: 86, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:22:09,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=154840.0, ans=0.0 2024-09-17 05:22:21,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-09-17 05:22:22,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=154880.0, ans=0.025 2024-09-17 05:22:32,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=154880.0, ans=0.2 2024-09-17 05:22:48,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=154920.0, ans=0.0 2024-09-17 05:23:03,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154960.0, ans=0.125 2024-09-17 05:23:06,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=154960.0, ans=0.125 2024-09-17 05:23:09,607 INFO [train.py:1198] (0/2) Epoch 9, batch 2550, loss[loss=0.2341, ctc_loss=0.1476, cr_loss=0.3869, attn_decoder_loss=0.2351, over 29351.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1784, cr_loss=0.4122, attn_decoder_loss=0.2712, over 5800061.40 frames. ], batch size: 67, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:23:13,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=15.0 2024-09-17 05:23:29,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=155040.0, ans=0.0 2024-09-17 05:23:31,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.51 vs. limit=15.0 2024-09-17 05:23:37,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.18 vs. limit=15.0 2024-09-17 05:23:39,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-09-17 05:23:49,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=155080.0, ans=0.125 2024-09-17 05:23:52,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=155080.0, ans=0.2 2024-09-17 05:24:06,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=155120.0, ans=0.0 2024-09-17 05:24:06,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=155120.0, ans=0.125 2024-09-17 05:24:12,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.204e+01 9.986e+01 1.053e+02 1.251e+02 2.083e+02, threshold=2.107e+02, percent-clipped=1.0 2024-09-17 05:24:22,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2024-09-17 05:24:24,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=155160.0, ans=0.0 2024-09-17 05:24:28,054 INFO [train.py:1198] (0/2) Epoch 9, batch 2600, loss[loss=0.2588, ctc_loss=0.1667, cr_loss=0.4163, attn_decoder_loss=0.2598, over 29450.00 frames. ], tot_loss[loss=0.2712, ctc_loss=0.1795, cr_loss=0.4144, attn_decoder_loss=0.2721, over 5796228.05 frames. ], batch size: 78, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:24:37,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=155200.0, ans=0.125 2024-09-17 05:24:53,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=155240.0, ans=0.125 2024-09-17 05:25:06,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=155280.0, ans=0.2 2024-09-17 05:25:12,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=155280.0, ans=0.125 2024-09-17 05:25:14,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2024-09-17 05:25:31,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-17 05:25:33,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=155360.0, ans=0.0 2024-09-17 05:25:45,390 INFO [train.py:1198] (0/2) Epoch 9, batch 2650, loss[loss=0.2907, ctc_loss=0.1969, cr_loss=0.4555, attn_decoder_loss=0.291, over 29212.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1786, cr_loss=0.4137, attn_decoder_loss=0.2715, over 5802667.30 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:25:50,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=155400.0, ans=0.0 2024-09-17 05:25:51,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=155400.0, ans=0.125 2024-09-17 05:26:42,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-17 05:26:47,528 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.070e+01 9.715e+01 1.022e+02 1.111e+02 3.079e+02, threshold=2.044e+02, percent-clipped=1.0 2024-09-17 05:27:01,170 INFO [train.py:1198] (0/2) Epoch 9, batch 2700, loss[loss=0.2853, ctc_loss=0.1903, cr_loss=0.4197, attn_decoder_loss=0.2866, over 29512.00 frames. ], tot_loss[loss=0.2714, ctc_loss=0.1797, cr_loss=0.4153, attn_decoder_loss=0.2723, over 5798415.18 frames. ], batch size: 87, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:27:12,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-17 05:27:27,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=155640.0, ans=0.125 2024-09-17 05:27:31,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=155680.0, ans=0.0 2024-09-17 05:27:43,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.98 vs. limit=15.0 2024-09-17 05:28:09,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.64 vs. limit=22.5 2024-09-17 05:28:10,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.82 vs. limit=15.0 2024-09-17 05:28:19,255 INFO [train.py:1198] (0/2) Epoch 9, batch 2750, loss[loss=0.2644, ctc_loss=0.1775, cr_loss=0.4157, attn_decoder_loss=0.2649, over 29521.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1786, cr_loss=0.413, attn_decoder_loss=0.271, over 5797837.16 frames. ], batch size: 75, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:28:22,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=155800.0, ans=0.5 2024-09-17 05:28:22,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155800.0, ans=0.125 2024-09-17 05:28:23,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=155800.0, ans=10.0 2024-09-17 05:28:40,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=155840.0, ans=0.07 2024-09-17 05:28:59,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=155880.0, ans=0.95 2024-09-17 05:29:03,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2024-09-17 05:29:13,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=155920.0, ans=0.125 2024-09-17 05:29:15,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-09-17 05:29:23,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.85 vs. limit=15.0 2024-09-17 05:29:25,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.518e+01 1.047e+02 1.158e+02 3.298e+02, threshold=2.093e+02, percent-clipped=1.0 2024-09-17 05:29:36,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=156000.0, ans=0.125 2024-09-17 05:29:37,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=156000.0, ans=0.125 2024-09-17 05:29:38,023 INFO [train.py:1198] (0/2) Epoch 9, batch 2800, loss[loss=0.3038, ctc_loss=0.2406, cr_loss=0.4369, attn_decoder_loss=0.3011, over 20605.00 frames. ], tot_loss[loss=0.2702, ctc_loss=0.1789, cr_loss=0.4129, attn_decoder_loss=0.2712, over 5778106.06 frames. ], batch size: 210, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:29:54,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=156040.0, ans=0.025 2024-09-17 05:29:56,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=156040.0, ans=0.0 2024-09-17 05:29:56,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2024-09-17 05:29:57,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=156040.0, ans=0.125 2024-09-17 05:30:14,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=156080.0, ans=0.0 2024-09-17 05:30:14,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2024-09-17 05:30:16,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.43 vs. limit=22.5 2024-09-17 05:30:24,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=156120.0, ans=0.2 2024-09-17 05:30:29,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=156120.0, ans=0.07 2024-09-17 05:30:30,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=156120.0, ans=0.125 2024-09-17 05:30:34,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-09-17 05:30:52,945 INFO [train.py:1198] (0/2) Epoch 9, batch 2850, loss[loss=0.2738, ctc_loss=0.1873, cr_loss=0.4138, attn_decoder_loss=0.2742, over 29490.00 frames. ], tot_loss[loss=0.2709, ctc_loss=0.1798, cr_loss=0.4134, attn_decoder_loss=0.2719, over 5763668.05 frames. ], batch size: 77, lr: 1.24e-02, grad_scale: 4.0 2024-09-17 05:30:57,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=156200.0, ans=0.125 2024-09-17 05:30:59,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=156200.0, ans=0.125 2024-09-17 05:31:03,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=156200.0, ans=0.1 2024-09-17 05:31:30,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156280.0, ans=0.125 2024-09-17 05:31:38,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.96 vs. limit=10.0 2024-09-17 05:32:00,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.550e+01 9.773e+01 1.033e+02 1.202e+02 1.627e+02, threshold=2.066e+02, percent-clipped=0.0 2024-09-17 05:32:10,736 INFO [train.py:1198] (0/2) Epoch 9, batch 2900, loss[loss=0.274, ctc_loss=0.1818, cr_loss=0.4443, attn_decoder_loss=0.2744, over 29805.00 frames. ], tot_loss[loss=0.2716, ctc_loss=0.1797, cr_loss=0.4148, attn_decoder_loss=0.2726, over 5789444.03 frames. ], batch size: 80, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:32:21,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156400.0, ans=0.1 2024-09-17 05:32:27,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=156440.0, ans=0.125 2024-09-17 05:32:30,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=156440.0, ans=0.0 2024-09-17 05:32:33,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=156440.0, ans=0.125 2024-09-17 05:32:48,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=156480.0, ans=0.125 2024-09-17 05:32:49,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=156480.0, ans=0.125 2024-09-17 05:32:56,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=156520.0, ans=0.025 2024-09-17 05:32:59,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=156520.0, ans=0.2 2024-09-17 05:33:28,635 INFO [train.py:1198] (0/2) Epoch 9, batch 2950, loss[loss=0.2625, ctc_loss=0.181, cr_loss=0.4046, attn_decoder_loss=0.2626, over 29532.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1787, cr_loss=0.4131, attn_decoder_loss=0.2711, over 5785377.58 frames. ], batch size: 75, lr: 1.24e-02, grad_scale: 8.0 2024-09-17 05:34:09,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=156680.0, ans=0.2 2024-09-17 05:34:11,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.38 vs. limit=22.5 2024-09-17 05:34:24,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=156720.0, ans=0.0 2024-09-17 05:34:33,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.085e+01 9.535e+01 1.020e+02 1.127e+02 2.521e+02, threshold=2.039e+02, percent-clipped=1.0 2024-09-17 05:34:38,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=156760.0, ans=0.125 2024-09-17 05:34:44,910 INFO [train.py:1198] (0/2) Epoch 9, batch 3000, loss[loss=0.2679, ctc_loss=0.1737, cr_loss=0.4156, attn_decoder_loss=0.2691, over 29738.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1787, cr_loss=0.4127, attn_decoder_loss=0.2709, over 5786830.55 frames. ], batch size: 81, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:34:44,911 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 05:35:03,249 INFO [train.py:1230] (0/2) Epoch 9, validation: loss=0.2139, ctc_loss=0.05057, cr_loss=4.328e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-17 05:35:03,249 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 05:35:09,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156800.0, ans=0.1 2024-09-17 05:35:16,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=156800.0, ans=0.125 2024-09-17 05:35:25,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-09-17 05:35:25,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=156840.0, ans=0.0 2024-09-17 05:36:05,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-09-17 05:36:11,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=156960.0, ans=0.125 2024-09-17 05:36:12,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=156960.0, ans=0.2 2024-09-17 05:36:15,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=156960.0, ans=0.0 2024-09-17 05:36:15,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=156960.0, ans=0.0 2024-09-17 05:36:21,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-17 05:36:21,550 INFO [train.py:1198] (0/2) Epoch 9, batch 3050, loss[loss=0.2493, ctc_loss=0.1597, cr_loss=0.394, attn_decoder_loss=0.2505, over 29535.00 frames. ], tot_loss[loss=0.271, ctc_loss=0.1796, cr_loss=0.4138, attn_decoder_loss=0.2719, over 5779958.83 frames. ], batch size: 76, lr: 1.23e-02, grad_scale: 4.0 2024-09-17 05:36:29,554 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:36:29,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=157000.0, ans=0.0 2024-09-17 05:36:30,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=157000.0, ans=0.5 2024-09-17 05:36:40,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=157040.0, ans=0.2 2024-09-17 05:36:45,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2024-09-17 05:36:54,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=157080.0, ans=0.125 2024-09-17 05:37:00,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=157080.0, ans=0.0 2024-09-17 05:37:00,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2024-09-17 05:37:12,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=157120.0, ans=0.09899494936611666 2024-09-17 05:37:13,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff3.min_abs, batch_count=157120.0, ans=0.2 2024-09-17 05:37:29,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.561e+01 1.002e+02 1.065e+02 1.234e+02 3.157e+02, threshold=2.130e+02, percent-clipped=3.0 2024-09-17 05:37:37,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=157200.0, ans=0.125 2024-09-17 05:37:38,812 INFO [train.py:1198] (0/2) Epoch 9, batch 3100, loss[loss=0.2828, ctc_loss=0.1862, cr_loss=0.4262, attn_decoder_loss=0.2841, over 29321.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1793, cr_loss=0.4132, attn_decoder_loss=0.2714, over 5778907.11 frames. ], batch size: 100, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:37:43,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=157200.0, ans=0.025 2024-09-17 05:37:46,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=157200.0, ans=0.0 2024-09-17 05:37:52,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=157240.0, ans=0.125 2024-09-17 05:37:54,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=157240.0, ans=0.0 2024-09-17 05:38:00,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=157240.0, ans=0.0 2024-09-17 05:38:02,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-09-17 05:38:18,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=157280.0, ans=10.0 2024-09-17 05:38:31,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.36 vs. limit=15.0 2024-09-17 05:38:43,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.69 vs. limit=22.5 2024-09-17 05:38:54,760 INFO [train.py:1198] (0/2) Epoch 9, batch 3150, loss[loss=0.2759, ctc_loss=0.1765, cr_loss=0.4223, attn_decoder_loss=0.2776, over 28944.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1787, cr_loss=0.4123, attn_decoder_loss=0.2709, over 5784661.06 frames. ], batch size: 104, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:38:55,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=157400.0, ans=0.025 2024-09-17 05:39:24,588 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:39:47,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-17 05:40:04,837 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.427e+01 1.013e+02 1.077e+02 1.205e+02 2.021e+02, threshold=2.154e+02, percent-clipped=0.0 2024-09-17 05:40:08,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157560.0, ans=0.1 2024-09-17 05:40:09,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=157560.0, ans=0.125 2024-09-17 05:40:12,971 INFO [train.py:1198] (0/2) Epoch 9, batch 3200, loss[loss=0.2627, ctc_loss=0.1706, cr_loss=0.414, attn_decoder_loss=0.2637, over 29423.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1781, cr_loss=0.4121, attn_decoder_loss=0.2705, over 5793940.27 frames. ], batch size: 79, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:40:21,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.19 vs. limit=10.0 2024-09-17 05:40:30,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.04 vs. limit=15.0 2024-09-17 05:40:32,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.79 vs. limit=15.0 2024-09-17 05:40:33,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=157640.0, ans=0.0 2024-09-17 05:40:41,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=157640.0, ans=0.0 2024-09-17 05:41:07,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=157720.0, ans=0.125 2024-09-17 05:41:11,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-17 05:41:16,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=157760.0, ans=0.125 2024-09-17 05:41:25,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=157760.0, ans=0.015 2024-09-17 05:41:31,266 INFO [train.py:1198] (0/2) Epoch 9, batch 3250, loss[loss=0.2799, ctc_loss=0.1821, cr_loss=0.4206, attn_decoder_loss=0.2814, over 29705.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1785, cr_loss=0.4125, attn_decoder_loss=0.2711, over 5801379.23 frames. ], batch size: 84, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:41:36,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=157800.0, ans=0.125 2024-09-17 05:41:54,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157840.0, ans=0.1 2024-09-17 05:41:54,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=157840.0, ans=0.125 2024-09-17 05:42:00,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=157880.0, ans=0.05 2024-09-17 05:42:01,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=157880.0, ans=0.2 2024-09-17 05:42:14,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:42:36,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=157960.0, ans=0.0 2024-09-17 05:42:39,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.206e+01 9.558e+01 1.067e+02 1.153e+02 2.320e+02, threshold=2.135e+02, percent-clipped=2.0 2024-09-17 05:42:46,865 INFO [train.py:1198] (0/2) Epoch 9, batch 3300, loss[loss=0.2788, ctc_loss=0.1773, cr_loss=0.3715, attn_decoder_loss=0.2818, over 28533.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1771, cr_loss=0.4107, attn_decoder_loss=0.2697, over 5798699.25 frames. ], batch size: 112, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:42:47,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=158000.0, ans=0.125 2024-09-17 05:42:59,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-09-17 05:43:07,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=158040.0, ans=0.125 2024-09-17 05:43:31,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=158080.0, ans=0.0 2024-09-17 05:43:55,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=158160.0, ans=0.025 2024-09-17 05:44:01,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=158160.0, ans=0.07 2024-09-17 05:44:02,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2024-09-17 05:44:04,444 INFO [train.py:1198] (0/2) Epoch 9, batch 3350, loss[loss=0.2896, ctc_loss=0.1984, cr_loss=0.4365, attn_decoder_loss=0.29, over 28911.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1785, cr_loss=0.4128, attn_decoder_loss=0.2707, over 5774734.46 frames. ], batch size: 104, lr: 1.23e-02, grad_scale: 4.0 2024-09-17 05:44:04,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=158200.0, ans=0.125 2024-09-17 05:44:13,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=158200.0, ans=0.125 2024-09-17 05:44:13,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=158200.0, ans=0.0 2024-09-17 05:44:17,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.82 vs. limit=22.5 2024-09-17 05:44:18,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=158240.0, ans=0.0 2024-09-17 05:44:34,450 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:44:41,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=158280.0, ans=0.125 2024-09-17 05:44:44,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=158280.0, ans=0.125 2024-09-17 05:44:44,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=158280.0, ans=0.125 2024-09-17 05:44:53,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=158320.0, ans=0.125 2024-09-17 05:44:55,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=158320.0, ans=0.0 2024-09-17 05:45:09,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-09-17 05:45:15,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=158360.0, ans=0.2 2024-09-17 05:45:16,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 9.844e+01 1.079e+02 1.203e+02 3.746e+02, threshold=2.158e+02, percent-clipped=3.0 2024-09-17 05:45:18,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158360.0, ans=0.1 2024-09-17 05:45:19,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=158360.0, ans=0.09899494936611666 2024-09-17 05:45:22,611 INFO [train.py:1198] (0/2) Epoch 9, batch 3400, loss[loss=0.2386, ctc_loss=0.1514, cr_loss=0.3554, attn_decoder_loss=0.2404, over 29363.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1789, cr_loss=0.4123, attn_decoder_loss=0.2707, over 5767787.35 frames. ], batch size: 67, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:45:38,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=158440.0, ans=0.125 2024-09-17 05:45:43,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-17 05:45:53,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=158480.0, ans=0.125 2024-09-17 05:46:21,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=12.0 2024-09-17 05:46:38,414 INFO [train.py:1198] (0/2) Epoch 9, batch 3450, loss[loss=0.2925, ctc_loss=0.2019, cr_loss=0.4366, attn_decoder_loss=0.2929, over 28149.00 frames. ], tot_loss[loss=0.2705, ctc_loss=0.1793, cr_loss=0.4135, attn_decoder_loss=0.2714, over 5775926.38 frames. ], batch size: 111, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:46:48,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=158600.0, ans=0.2 2024-09-17 05:46:51,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158600.0, ans=0.1 2024-09-17 05:47:00,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158640.0, ans=0.1 2024-09-17 05:47:21,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=158680.0, ans=0.125 2024-09-17 05:47:21,522 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:47:32,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=158720.0, ans=0.125 2024-09-17 05:47:49,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2024-09-17 05:47:50,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-09-17 05:47:51,398 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.572e+01 9.332e+01 9.969e+01 1.060e+02 1.614e+02, threshold=1.994e+02, percent-clipped=0.0 2024-09-17 05:47:55,996 INFO [train.py:1198] (0/2) Epoch 9, batch 3500, loss[loss=0.2588, ctc_loss=0.1778, cr_loss=0.4192, attn_decoder_loss=0.2585, over 29311.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1784, cr_loss=0.4128, attn_decoder_loss=0.2707, over 5777687.84 frames. ], batch size: 71, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:48:03,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=158800.0, ans=0.025 2024-09-17 05:48:05,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=158800.0, ans=0.1 2024-09-17 05:48:07,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=158800.0, ans=0.0 2024-09-17 05:48:14,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=158840.0, ans=0.125 2024-09-17 05:48:17,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158840.0, ans=0.1 2024-09-17 05:48:19,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.80 vs. limit=10.0 2024-09-17 05:48:37,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158880.0, ans=0.1 2024-09-17 05:48:42,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=158920.0, ans=0.125 2024-09-17 05:48:52,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=158920.0, ans=0.125 2024-09-17 05:49:07,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=158960.0, ans=0.125 2024-09-17 05:49:12,739 INFO [train.py:1198] (0/2) Epoch 9, batch 3550, loss[loss=0.2887, ctc_loss=0.1889, cr_loss=0.4221, attn_decoder_loss=0.2904, over 29714.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.178, cr_loss=0.4123, attn_decoder_loss=0.2705, over 5784731.85 frames. ], batch size: 89, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:49:12,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=159000.0, ans=0.0 2024-09-17 05:49:23,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=159000.0, ans=0.125 2024-09-17 05:49:31,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-17 05:49:35,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=159040.0, ans=0.125 2024-09-17 05:49:45,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=159080.0, ans=0.125 2024-09-17 05:50:06,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=159120.0, ans=0.025 2024-09-17 05:50:06,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=159120.0, ans=0.0 2024-09-17 05:50:07,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=159120.0, ans=0.125 2024-09-17 05:50:23,964 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.321e+01 9.598e+01 1.057e+02 1.166e+02 3.699e+02, threshold=2.113e+02, percent-clipped=1.0 2024-09-17 05:50:27,357 INFO [train.py:1198] (0/2) Epoch 9, batch 3600, loss[loss=0.2558, ctc_loss=0.1639, cr_loss=0.4103, attn_decoder_loss=0.2569, over 29506.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.178, cr_loss=0.4125, attn_decoder_loss=0.271, over 5793281.63 frames. ], batch size: 77, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:50:37,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.03 vs. limit=10.0 2024-09-17 05:50:38,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.21 vs. limit=15.0 2024-09-17 05:51:00,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=159280.0, ans=0.2 2024-09-17 05:51:09,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=159280.0, ans=0.0 2024-09-17 05:51:27,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.83 vs. limit=22.5 2024-09-17 05:51:41,635 INFO [train.py:1198] (0/2) Epoch 9, batch 3650, loss[loss=0.2903, ctc_loss=0.1906, cr_loss=0.4737, attn_decoder_loss=0.2908, over 29472.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1772, cr_loss=0.4112, attn_decoder_loss=0.2701, over 5795130.13 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 05:51:59,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=159440.0, ans=0.1 2024-09-17 05:52:27,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.87 vs. limit=15.0 2024-09-17 05:52:28,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=159520.0, ans=0.125 2024-09-17 05:52:40,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=159520.0, ans=0.025 2024-09-17 05:52:43,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=159560.0, ans=0.2 2024-09-17 05:52:54,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.190e+01 9.468e+01 1.034e+02 1.121e+02 1.943e+02, threshold=2.068e+02, percent-clipped=0.0 2024-09-17 05:52:57,919 INFO [train.py:1198] (0/2) Epoch 9, batch 3700, loss[loss=0.276, ctc_loss=0.1766, cr_loss=0.3883, attn_decoder_loss=0.2784, over 29718.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1771, cr_loss=0.4108, attn_decoder_loss=0.27, over 5805702.79 frames. ], batch size: 84, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:53:07,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=159600.0, ans=0.0 2024-09-17 05:53:36,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=159680.0, ans=0.05 2024-09-17 05:53:48,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=159720.0, ans=0.125 2024-09-17 05:53:53,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=159720.0, ans=0.02 2024-09-17 05:54:12,622 INFO [train.py:1198] (0/2) Epoch 9, batch 3750, loss[loss=0.2468, ctc_loss=0.1535, cr_loss=0.3696, attn_decoder_loss=0.2489, over 29347.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1776, cr_loss=0.4111, attn_decoder_loss=0.27, over 5808920.46 frames. ], batch size: 67, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:54:29,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=159840.0, ans=0.125 2024-09-17 05:54:44,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2024-09-17 05:54:48,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159880.0, ans=0.125 2024-09-17 05:54:53,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=159880.0, ans=0.125 2024-09-17 05:55:08,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=159920.0, ans=0.125 2024-09-17 05:55:13,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-09-17 05:55:18,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=159960.0, ans=10.0 2024-09-17 05:55:25,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.200e+01 9.758e+01 1.061e+02 1.222e+02 3.852e+02, threshold=2.121e+02, percent-clipped=3.0 2024-09-17 05:55:27,372 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-40000.pt 2024-09-17 05:55:35,963 INFO [train.py:1198] (0/2) Epoch 9, batch 3800, loss[loss=0.2775, ctc_loss=0.1773, cr_loss=0.4209, attn_decoder_loss=0.2793, over 29609.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1775, cr_loss=0.4109, attn_decoder_loss=0.2699, over 5799932.54 frames. ], batch size: 86, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:55:46,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=160000.0, ans=0.0 2024-09-17 05:55:57,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.01 vs. limit=15.0 2024-09-17 05:56:11,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=160080.0, ans=0.125 2024-09-17 05:56:16,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.44 vs. limit=22.5 2024-09-17 05:56:27,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=17.44 vs. limit=15.0 2024-09-17 05:56:44,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=160160.0, ans=0.125 2024-09-17 05:56:50,300 INFO [train.py:1198] (0/2) Epoch 9, batch 3850, loss[loss=0.2945, ctc_loss=0.2069, cr_loss=0.4685, attn_decoder_loss=0.2939, over 29283.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1773, cr_loss=0.411, attn_decoder_loss=0.2698, over 5813097.96 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:57:06,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=160240.0, ans=0.2 2024-09-17 05:57:11,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160240.0, ans=0.1 2024-09-17 05:57:16,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.87 vs. limit=15.0 2024-09-17 05:57:36,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160320.0, ans=0.1 2024-09-17 05:57:39,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=160320.0, ans=0.025 2024-09-17 05:58:03,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.356e+01 9.561e+01 1.016e+02 1.083e+02 1.844e+02, threshold=2.033e+02, percent-clipped=0.0 2024-09-17 05:58:06,432 INFO [train.py:1198] (0/2) Epoch 9, batch 3900, loss[loss=0.2851, ctc_loss=0.1837, cr_loss=0.4017, attn_decoder_loss=0.2875, over 29614.00 frames. ], tot_loss[loss=0.2693, ctc_loss=0.1777, cr_loss=0.4118, attn_decoder_loss=0.2704, over 5817288.54 frames. ], batch size: 86, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 05:58:12,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=160400.0, ans=0.0 2024-09-17 05:58:20,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=160440.0, ans=0.0 2024-09-17 05:58:33,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=160440.0, ans=0.1 2024-09-17 05:58:41,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160480.0, ans=0.125 2024-09-17 05:59:09,199 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:59:20,923 INFO [train.py:1198] (0/2) Epoch 9, batch 3950, loss[loss=0.2987, ctc_loss=0.2067, cr_loss=0.471, attn_decoder_loss=0.2985, over 29460.00 frames. ], tot_loss[loss=0.2694, ctc_loss=0.1775, cr_loss=0.4127, attn_decoder_loss=0.2705, over 5836286.26 frames. ], batch size: 97, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 05:59:28,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=160600.0, ans=0.0 2024-09-17 05:59:34,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=160640.0, ans=0.02 2024-09-17 06:00:16,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-17 06:00:34,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.640e+01 1.057e+02 1.201e+02 4.208e+02, threshold=2.114e+02, percent-clipped=4.0 2024-09-17 06:00:36,383 INFO [train.py:1198] (0/2) Epoch 9, batch 4000, loss[loss=0.237, ctc_loss=0.1452, cr_loss=0.3646, attn_decoder_loss=0.2391, over 29507.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1779, cr_loss=0.4127, attn_decoder_loss=0.2708, over 5813149.69 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:01:05,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=160880.0, ans=0.0 2024-09-17 06:01:50,828 INFO [train.py:1198] (0/2) Epoch 9, batch 4050, loss[loss=0.3068, ctc_loss=0.2345, cr_loss=0.4361, attn_decoder_loss=0.3052, over 20249.00 frames. ], tot_loss[loss=0.2696, ctc_loss=0.1776, cr_loss=0.4115, attn_decoder_loss=0.2707, over 5796803.30 frames. ], batch size: 210, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 06:01:53,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-17 06:02:03,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2024-09-17 06:02:03,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2024-09-17 06:02:05,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=161040.0, ans=0.025 2024-09-17 06:02:10,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=161040.0, ans=0.2 2024-09-17 06:02:13,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161040.0, ans=0.1 2024-09-17 06:02:20,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=161080.0, ans=0.2 2024-09-17 06:02:21,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=161080.0, ans=0.025 2024-09-17 06:02:38,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=161120.0, ans=0.125 2024-09-17 06:03:05,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 9.617e+01 1.028e+02 1.240e+02 2.479e+02, threshold=2.055e+02, percent-clipped=2.0 2024-09-17 06:03:05,412 INFO [train.py:1198] (0/2) Epoch 9, batch 4100, loss[loss=0.2943, ctc_loss=0.1999, cr_loss=0.4541, attn_decoder_loss=0.2947, over 29515.00 frames. ], tot_loss[loss=0.2698, ctc_loss=0.1777, cr_loss=0.4117, attn_decoder_loss=0.2709, over 5791828.07 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:03:11,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=161200.0, ans=0.1 2024-09-17 06:03:21,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=161240.0, ans=0.0 2024-09-17 06:04:19,361 INFO [train.py:1198] (0/2) Epoch 9, batch 4150, loss[loss=0.2568, ctc_loss=0.1614, cr_loss=0.4009, attn_decoder_loss=0.2585, over 29519.00 frames. ], tot_loss[loss=0.2694, ctc_loss=0.1773, cr_loss=0.4115, attn_decoder_loss=0.2705, over 5797248.76 frames. ], batch size: 77, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 06:04:26,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=161400.0, ans=0.0 2024-09-17 06:04:40,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=161440.0, ans=0.125 2024-09-17 06:04:48,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=161480.0, ans=0.0 2024-09-17 06:04:55,906 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:05:04,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-09-17 06:05:21,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=161560.0, ans=0.0 2024-09-17 06:05:26,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=161560.0, ans=0.125 2024-09-17 06:05:28,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=161560.0, ans=0.125 2024-09-17 06:05:33,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=161600.0, ans=0.125 2024-09-17 06:05:34,469 INFO [train.py:1198] (0/2) Epoch 9, batch 4200, loss[loss=0.2877, ctc_loss=0.1948, cr_loss=0.4464, attn_decoder_loss=0.2881, over 29555.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1776, cr_loss=0.412, attn_decoder_loss=0.2707, over 5799820.75 frames. ], batch size: 90, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:05:35,869 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.556e+01 1.027e+02 1.111e+02 2.120e+02, threshold=2.054e+02, percent-clipped=2.0 2024-09-17 06:05:38,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-09-17 06:06:11,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=161680.0, ans=0.07 2024-09-17 06:06:28,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161720.0, ans=0.1 2024-09-17 06:06:30,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=161720.0, ans=0.125 2024-09-17 06:06:37,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=161760.0, ans=0.125 2024-09-17 06:06:48,814 INFO [train.py:1198] (0/2) Epoch 9, batch 4250, loss[loss=0.2509, ctc_loss=0.1589, cr_loss=0.3832, attn_decoder_loss=0.2526, over 29505.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1773, cr_loss=0.4117, attn_decoder_loss=0.2707, over 5805153.07 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 4.0 2024-09-17 06:07:05,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-17 06:07:20,204 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:07:25,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=12.0 2024-09-17 06:07:45,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.21 vs. limit=22.5 2024-09-17 06:08:02,495 INFO [train.py:1198] (0/2) Epoch 9, batch 4300, loss[loss=0.282, ctc_loss=0.1845, cr_loss=0.4514, attn_decoder_loss=0.2828, over 29556.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1775, cr_loss=0.4123, attn_decoder_loss=0.271, over 5793667.29 frames. ], batch size: 87, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 06:08:05,467 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.660e+01 9.959e+01 1.074e+02 1.170e+02 2.141e+02, threshold=2.147e+02, percent-clipped=1.0 2024-09-17 06:08:36,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=162080.0, ans=0.0 2024-09-17 06:08:42,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.55 vs. limit=15.0 2024-09-17 06:08:43,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=162080.0, ans=0.2 2024-09-17 06:09:17,152 INFO [train.py:1198] (0/2) Epoch 9, batch 4350, loss[loss=0.2844, ctc_loss=0.1926, cr_loss=0.433, attn_decoder_loss=0.285, over 29487.00 frames. ], tot_loss[loss=0.2731, ctc_loss=0.1804, cr_loss=0.4169, attn_decoder_loss=0.2742, over 5795674.96 frames. ], batch size: 97, lr: 1.21e-02, grad_scale: 8.0 2024-09-17 06:09:40,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=162240.0, ans=0.2 2024-09-17 06:09:47,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=162280.0, ans=0.125 2024-09-17 06:09:50,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=162280.0, ans=0.125 2024-09-17 06:10:30,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.40 vs. limit=15.0 2024-09-17 06:10:31,075 INFO [train.py:1198] (0/2) Epoch 9, batch 4400, loss[loss=0.2801, ctc_loss=0.1958, cr_loss=0.4088, attn_decoder_loss=0.2803, over 27133.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1828, cr_loss=0.4197, attn_decoder_loss=0.2766, over 5765689.79 frames. ], batch size: 124, lr: 1.21e-02, grad_scale: 8.0 2024-09-17 06:10:34,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162400.0, ans=0.1 2024-09-17 06:10:35,517 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.675e+01 9.860e+01 1.034e+02 1.169e+02 1.757e+02, threshold=2.069e+02, percent-clipped=0.0 2024-09-17 06:11:11,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=162480.0, ans=0.2 2024-09-17 06:11:37,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162560.0, ans=0.1 2024-09-17 06:11:46,095 INFO [train.py:1198] (0/2) Epoch 9, batch 4450, loss[loss=0.3029, ctc_loss=0.2203, cr_loss=0.4344, attn_decoder_loss=0.3024, over 20222.00 frames. ], tot_loss[loss=0.2795, ctc_loss=0.1889, cr_loss=0.4244, attn_decoder_loss=0.2801, over 5574389.15 frames. ], batch size: 210, lr: 1.21e-02, grad_scale: 4.0 2024-09-17 06:11:55,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=162600.0, ans=0.015 2024-09-17 06:11:55,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=162600.0, ans=0.0 2024-09-17 06:11:55,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=162600.0, ans=0.0 2024-09-17 06:12:16,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=162680.0, ans=0.125 2024-09-17 06:12:19,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-17 06:12:45,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.31 vs. limit=15.0 2024-09-17 06:12:56,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=162760.0, ans=0.1 2024-09-17 06:13:01,695 INFO [train.py:1198] (0/2) Epoch 9, batch 4500, loss[loss=0.2997, ctc_loss=0.2322, cr_loss=0.4458, attn_decoder_loss=0.2973, over 20024.00 frames. ], tot_loss[loss=0.2831, ctc_loss=0.1961, cr_loss=0.4264, attn_decoder_loss=0.2833, over 5235550.11 frames. ], batch size: 209, lr: 1.21e-02, grad_scale: 8.0 2024-09-17 06:13:07,510 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.470e+01 1.060e+02 1.171e+02 1.308e+02 2.646e+02, threshold=2.342e+02, percent-clipped=3.0 2024-09-17 06:13:09,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=162800.0, ans=0.1 2024-09-17 06:13:15,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=11.10 vs. limit=10.0 2024-09-17 06:13:25,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=162840.0, ans=0.0 2024-09-17 06:13:31,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=162880.0, ans=0.0 2024-09-17 06:13:38,199 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-9.pt 2024-09-17 06:14:29,144 INFO [train.py:1198] (0/2) Epoch 10, batch 0, loss[loss=0.2468, ctc_loss=0.1506, cr_loss=0.3715, attn_decoder_loss=0.2493, over 29599.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1506, cr_loss=0.3715, attn_decoder_loss=0.2493, over 29599.00 frames. ], batch size: 73, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:14:29,145 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 06:14:47,512 INFO [train.py:1230] (0/2) Epoch 10, validation: loss=0.2171, ctc_loss=0.05118, cr_loss=4.759e-15, attn_decoder_loss=0.2355, over 944034.00 frames. 2024-09-17 06:14:47,512 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 06:14:50,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=162900.0, ans=0.125 2024-09-17 06:15:02,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=162940.0, ans=0.2 2024-09-17 06:15:13,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=162940.0, ans=0.125 2024-09-17 06:15:25,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=162980.0, ans=0.2 2024-09-17 06:15:32,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2024-09-17 06:15:49,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=163060.0, ans=0.125 2024-09-17 06:15:49,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=163060.0, ans=0.125 2024-09-17 06:15:49,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163060.0, ans=0.1 2024-09-17 06:15:55,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=163060.0, ans=0.125 2024-09-17 06:16:02,849 INFO [train.py:1198] (0/2) Epoch 10, batch 50, loss[loss=0.246, ctc_loss=0.1603, cr_loss=0.3829, attn_decoder_loss=0.247, over 29441.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1814, cr_loss=0.4167, attn_decoder_loss=0.2732, over 1268198.61 frames. ], batch size: 70, lr: 1.15e-02, grad_scale: 4.0 2024-09-17 06:16:07,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=163100.0, ans=0.125 2024-09-17 06:16:10,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=163100.0, ans=0.025 2024-09-17 06:16:33,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=163180.0, ans=0.95 2024-09-17 06:16:36,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=163180.0, ans=0.0 2024-09-17 06:16:52,270 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.660e+01 1.078e+02 1.244e+02 7.750e+02, threshold=2.155e+02, percent-clipped=3.0 2024-09-17 06:17:22,987 INFO [train.py:1198] (0/2) Epoch 10, batch 100, loss[loss=0.2598, ctc_loss=0.1726, cr_loss=0.4062, attn_decoder_loss=0.2605, over 29532.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1811, cr_loss=0.4181, attn_decoder_loss=0.2746, over 2253636.68 frames. ], batch size: 76, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:17:29,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=163300.0, ans=0.0 2024-09-17 06:17:29,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=163300.0, ans=0.125 2024-09-17 06:18:09,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=163420.0, ans=0.2 2024-09-17 06:18:09,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=163420.0, ans=0.5 2024-09-17 06:18:18,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=163420.0, ans=0.125 2024-09-17 06:18:22,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163460.0, ans=0.1 2024-09-17 06:18:22,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=163460.0, ans=0.0 2024-09-17 06:18:30,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=163460.0, ans=0.2 2024-09-17 06:18:35,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-17 06:18:37,444 INFO [train.py:1198] (0/2) Epoch 10, batch 150, loss[loss=0.2355, ctc_loss=0.1499, cr_loss=0.3779, attn_decoder_loss=0.2366, over 29432.00 frames. ], tot_loss[loss=0.2701, ctc_loss=0.1777, cr_loss=0.4144, attn_decoder_loss=0.2712, over 3048279.39 frames. ], batch size: 70, lr: 1.15e-02, grad_scale: 4.0 2024-09-17 06:18:40,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=163500.0, ans=0.125 2024-09-17 06:19:03,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=163540.0, ans=0.125 2024-09-17 06:19:04,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=163540.0, ans=0.0 2024-09-17 06:19:25,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.231e+01 9.712e+01 1.046e+02 1.496e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-17 06:19:30,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163620.0, ans=0.1 2024-09-17 06:19:39,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=163660.0, ans=0.025 2024-09-17 06:19:52,318 INFO [train.py:1198] (0/2) Epoch 10, batch 200, loss[loss=0.2928, ctc_loss=0.2045, cr_loss=0.461, attn_decoder_loss=0.2924, over 27488.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1762, cr_loss=0.4134, attn_decoder_loss=0.2698, over 3660246.32 frames. ], batch size: 124, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:20:00,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=22.5 2024-09-17 06:20:04,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=163700.0, ans=0.0 2024-09-17 06:20:05,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.97 vs. limit=22.5 2024-09-17 06:20:20,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=163780.0, ans=0.125 2024-09-17 06:20:26,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=22.5 2024-09-17 06:20:27,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=163780.0, ans=0.125 2024-09-17 06:20:33,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=163780.0, ans=0.0 2024-09-17 06:20:41,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=163820.0, ans=0.0 2024-09-17 06:20:54,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163820.0, ans=0.1 2024-09-17 06:20:57,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=163860.0, ans=0.125 2024-09-17 06:21:11,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163900.0, ans=0.125 2024-09-17 06:21:12,289 INFO [train.py:1198] (0/2) Epoch 10, batch 250, loss[loss=0.2986, ctc_loss=0.208, cr_loss=0.4642, attn_decoder_loss=0.2983, over 29258.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1764, cr_loss=0.4129, attn_decoder_loss=0.2701, over 4142685.48 frames. ], batch size: 100, lr: 1.15e-02, grad_scale: 4.0 2024-09-17 06:21:26,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=163940.0, ans=0.07 2024-09-17 06:21:44,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=163980.0, ans=0.125 2024-09-17 06:21:44,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=163980.0, ans=0.0 2024-09-17 06:21:49,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=163980.0, ans=0.04949747468305833 2024-09-17 06:21:50,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=163980.0, ans=0.0 2024-09-17 06:22:02,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.245e+01 9.493e+01 1.020e+02 1.129e+02 1.613e+02, threshold=2.040e+02, percent-clipped=0.0 2024-09-17 06:22:10,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=164020.0, ans=0.025 2024-09-17 06:22:28,437 INFO [train.py:1198] (0/2) Epoch 10, batch 300, loss[loss=0.2778, ctc_loss=0.183, cr_loss=0.4143, attn_decoder_loss=0.2791, over 29545.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1752, cr_loss=0.411, attn_decoder_loss=0.2695, over 4510690.87 frames. ], batch size: 92, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:22:30,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=164100.0, ans=0.125 2024-09-17 06:22:41,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.34 vs. limit=10.0 2024-09-17 06:22:53,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=164140.0, ans=0.125 2024-09-17 06:23:29,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=164260.0, ans=0.0 2024-09-17 06:23:35,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=164260.0, ans=0.125 2024-09-17 06:23:35,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=164260.0, ans=0.2 2024-09-17 06:23:44,702 INFO [train.py:1198] (0/2) Epoch 10, batch 350, loss[loss=0.2364, ctc_loss=0.143, cr_loss=0.3499, attn_decoder_loss=0.2391, over 29332.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.1751, cr_loss=0.4108, attn_decoder_loss=0.2695, over 4796276.05 frames. ], batch size: 71, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:23:58,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=164340.0, ans=0.125 2024-09-17 06:24:06,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=164340.0, ans=0.5 2024-09-17 06:24:24,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164380.0, ans=0.1 2024-09-17 06:24:25,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=164380.0, ans=0.2 2024-09-17 06:24:28,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=164420.0, ans=0.125 2024-09-17 06:24:34,757 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.667e+01 1.045e+02 1.260e+02 3.351e+02, threshold=2.090e+02, percent-clipped=2.0 2024-09-17 06:25:05,711 INFO [train.py:1198] (0/2) Epoch 10, batch 400, loss[loss=0.2789, ctc_loss=0.184, cr_loss=0.4218, attn_decoder_loss=0.28, over 29707.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1747, cr_loss=0.4097, attn_decoder_loss=0.2691, over 5026691.05 frames. ], batch size: 82, lr: 1.15e-02, grad_scale: 16.0 2024-09-17 06:25:30,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.33 vs. limit=22.5 2024-09-17 06:25:35,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-09-17 06:25:58,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=164620.0, ans=0.125 2024-09-17 06:26:20,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=164700.0, ans=0.2 2024-09-17 06:26:21,196 INFO [train.py:1198] (0/2) Epoch 10, batch 450, loss[loss=0.2663, ctc_loss=0.1682, cr_loss=0.4074, attn_decoder_loss=0.2682, over 29693.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.175, cr_loss=0.4094, attn_decoder_loss=0.2694, over 5187681.68 frames. ], batch size: 83, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:26:52,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-17 06:27:13,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.295e+01 1.006e+02 1.063e+02 1.826e+02, threshold=2.013e+02, percent-clipped=0.0 2024-09-17 06:27:31,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=164860.0, ans=0.125 2024-09-17 06:27:31,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=164860.0, ans=0.125 2024-09-17 06:27:36,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-17 06:27:37,052 INFO [train.py:1198] (0/2) Epoch 10, batch 500, loss[loss=0.276, ctc_loss=0.1696, cr_loss=0.4105, attn_decoder_loss=0.2787, over 29439.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1744, cr_loss=0.4088, attn_decoder_loss=0.2687, over 5330404.34 frames. ], batch size: 94, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:27:39,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.07 vs. limit=15.0 2024-09-17 06:28:00,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.21 vs. limit=22.5 2024-09-17 06:28:01,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=164940.0, ans=0.04949747468305833 2024-09-17 06:28:03,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164940.0, ans=0.1 2024-09-17 06:28:15,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=164980.0, ans=0.1 2024-09-17 06:28:57,306 INFO [train.py:1198] (0/2) Epoch 10, batch 550, loss[loss=0.2833, ctc_loss=0.1884, cr_loss=0.4299, attn_decoder_loss=0.2843, over 28800.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1751, cr_loss=0.4105, attn_decoder_loss=0.2691, over 5424215.36 frames. ], batch size: 104, lr: 1.15e-02, grad_scale: 8.0 2024-09-17 06:29:09,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=165100.0, ans=0.0 2024-09-17 06:29:20,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=165140.0, ans=0.125 2024-09-17 06:29:43,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-17 06:29:50,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165220.0, ans=0.1 2024-09-17 06:29:51,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.131e+01 9.579e+01 1.029e+02 1.127e+02 2.367e+02, threshold=2.058e+02, percent-clipped=2.0 2024-09-17 06:29:57,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.57 vs. limit=15.0 2024-09-17 06:29:59,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=165260.0, ans=0.125 2024-09-17 06:30:00,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2024-09-17 06:30:13,110 INFO [train.py:1198] (0/2) Epoch 10, batch 600, loss[loss=0.2803, ctc_loss=0.1828, cr_loss=0.4353, attn_decoder_loss=0.2814, over 29246.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1754, cr_loss=0.4114, attn_decoder_loss=0.2697, over 5510033.08 frames. ], batch size: 100, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:30:13,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=165300.0, ans=0.0 2024-09-17 06:30:29,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=165340.0, ans=0.125 2024-09-17 06:30:38,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-17 06:30:39,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=165340.0, ans=0.2 2024-09-17 06:31:03,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=1.99 vs. limit=15.0 2024-09-17 06:31:06,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.51 vs. limit=15.0 2024-09-17 06:31:09,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=165420.0, ans=0.125 2024-09-17 06:31:19,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=165460.0, ans=0.0 2024-09-17 06:31:23,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=165460.0, ans=0.0 2024-09-17 06:31:27,698 INFO [train.py:1198] (0/2) Epoch 10, batch 650, loss[loss=0.2651, ctc_loss=0.17, cr_loss=0.4241, attn_decoder_loss=0.2662, over 29751.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.174, cr_loss=0.4102, attn_decoder_loss=0.2686, over 5586982.39 frames. ], batch size: 81, lr: 1.14e-02, grad_scale: 4.0 2024-09-17 06:31:28,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-09-17 06:31:56,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=165580.0, ans=0.09899494936611666 2024-09-17 06:31:58,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165580.0, ans=0.125 2024-09-17 06:32:12,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=165620.0, ans=0.1 2024-09-17 06:32:12,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.08 vs. limit=15.0 2024-09-17 06:32:23,906 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.258e+01 9.852e+01 1.047e+02 1.585e+02, threshold=1.970e+02, percent-clipped=0.0 2024-09-17 06:32:35,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=165660.0, ans=0.125 2024-09-17 06:32:45,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=165660.0, ans=0.125 2024-09-17 06:32:47,996 INFO [train.py:1198] (0/2) Epoch 10, batch 700, loss[loss=0.2625, ctc_loss=0.1695, cr_loss=0.402, attn_decoder_loss=0.264, over 29539.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1741, cr_loss=0.4108, attn_decoder_loss=0.2688, over 5638100.27 frames. ], batch size: 76, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:33:02,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2024-09-17 06:33:16,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=165780.0, ans=0.125 2024-09-17 06:33:18,348 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.502e-03 2024-09-17 06:33:19,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=165780.0, ans=0.2 2024-09-17 06:33:34,818 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:33:41,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=15.0 2024-09-17 06:33:47,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=165860.0, ans=0.125 2024-09-17 06:33:47,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.31 vs. limit=15.0 2024-09-17 06:33:53,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=165860.0, ans=0.125 2024-09-17 06:34:03,341 INFO [train.py:1198] (0/2) Epoch 10, batch 750, loss[loss=0.2693, ctc_loss=0.1734, cr_loss=0.4356, attn_decoder_loss=0.2703, over 29712.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1733, cr_loss=0.4089, attn_decoder_loss=0.268, over 5676643.21 frames. ], batch size: 82, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:34:15,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-09-17 06:34:23,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=165940.0, ans=0.0 2024-09-17 06:34:34,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-09-17 06:34:39,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165980.0, ans=0.125 2024-09-17 06:34:58,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=166020.0, ans=0.125 2024-09-17 06:35:00,790 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.819e+01 1.062e+02 1.153e+02 3.541e+02, threshold=2.124e+02, percent-clipped=2.0 2024-09-17 06:35:05,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=166060.0, ans=0.0 2024-09-17 06:35:11,660 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:35:18,940 INFO [train.py:1198] (0/2) Epoch 10, batch 800, loss[loss=0.2377, ctc_loss=0.1374, cr_loss=0.3501, attn_decoder_loss=0.241, over 29611.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1734, cr_loss=0.4093, attn_decoder_loss=0.2681, over 5708001.17 frames. ], batch size: 73, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:35:36,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-09-17 06:35:37,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=166140.0, ans=0.0 2024-09-17 06:36:01,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=166180.0, ans=0.025 2024-09-17 06:36:11,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=166220.0, ans=0.125 2024-09-17 06:36:36,139 INFO [train.py:1198] (0/2) Epoch 10, batch 850, loss[loss=0.2763, ctc_loss=0.1737, cr_loss=0.3967, attn_decoder_loss=0.2789, over 29710.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1734, cr_loss=0.4089, attn_decoder_loss=0.2679, over 5735597.04 frames. ], batch size: 89, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:36:59,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=166340.0, ans=0.2 2024-09-17 06:37:02,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=166340.0, ans=0.125 2024-09-17 06:37:25,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=166420.0, ans=0.95 2024-09-17 06:37:36,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=166420.0, ans=0.125 2024-09-17 06:37:37,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.461e+01 9.790e+01 1.072e+02 1.196e+02 1.464e+02, threshold=2.145e+02, percent-clipped=0.0 2024-09-17 06:37:48,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=166460.0, ans=0.0 2024-09-17 06:37:54,165 INFO [train.py:1198] (0/2) Epoch 10, batch 900, loss[loss=0.2393, ctc_loss=0.1511, cr_loss=0.3744, attn_decoder_loss=0.2408, over 29616.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1739, cr_loss=0.4093, attn_decoder_loss=0.2682, over 5741004.98 frames. ], batch size: 73, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:38:26,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=166580.0, ans=0.0 2024-09-17 06:38:32,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-17 06:38:41,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=166620.0, ans=0.125 2024-09-17 06:38:52,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=166620.0, ans=22.5 2024-09-17 06:38:58,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2024-09-17 06:39:09,487 INFO [train.py:1198] (0/2) Epoch 10, batch 950, loss[loss=0.249, ctc_loss=0.1544, cr_loss=0.389, attn_decoder_loss=0.2509, over 29514.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1741, cr_loss=0.409, attn_decoder_loss=0.2681, over 5744250.11 frames. ], batch size: 74, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:39:18,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=166700.0, ans=0.125 2024-09-17 06:39:29,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=166740.0, ans=0.2 2024-09-17 06:39:47,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=166780.0, ans=0.125 2024-09-17 06:39:50,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=166780.0, ans=0.125 2024-09-17 06:39:58,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=166820.0, ans=0.0 2024-09-17 06:40:00,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.08 vs. limit=15.0 2024-09-17 06:40:05,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=166820.0, ans=0.125 2024-09-17 06:40:09,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.298e+01 9.770e+01 1.085e+02 1.240e+02 2.634e+02, threshold=2.170e+02, percent-clipped=2.0 2024-09-17 06:40:16,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2024-09-17 06:40:26,957 INFO [train.py:1198] (0/2) Epoch 10, batch 1000, loss[loss=0.2684, ctc_loss=0.1733, cr_loss=0.3976, attn_decoder_loss=0.2702, over 29530.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1758, cr_loss=0.4109, attn_decoder_loss=0.2693, over 5737324.30 frames. ], batch size: 77, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:40:28,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=166900.0, ans=0.04949747468305833 2024-09-17 06:40:35,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=166900.0, ans=0.2 2024-09-17 06:40:38,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=166900.0, ans=0.125 2024-09-17 06:41:01,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=166980.0, ans=0.125 2024-09-17 06:41:13,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=167020.0, ans=0.0 2024-09-17 06:41:33,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2024-09-17 06:41:39,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=167060.0, ans=0.04949747468305833 2024-09-17 06:41:44,652 INFO [train.py:1198] (0/2) Epoch 10, batch 1050, loss[loss=0.2791, ctc_loss=0.1852, cr_loss=0.4329, attn_decoder_loss=0.2799, over 29683.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1746, cr_loss=0.4095, attn_decoder_loss=0.2683, over 5746408.54 frames. ], batch size: 85, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:42:03,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=167140.0, ans=0.125 2024-09-17 06:42:05,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=167140.0, ans=0.015 2024-09-17 06:42:26,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167180.0, ans=0.1 2024-09-17 06:42:45,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.318e+01 9.401e+01 9.855e+01 1.069e+02 2.033e+02, threshold=1.971e+02, percent-clipped=0.0 2024-09-17 06:42:46,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2024-09-17 06:43:00,925 INFO [train.py:1198] (0/2) Epoch 10, batch 1100, loss[loss=0.2583, ctc_loss=0.1669, cr_loss=0.4136, attn_decoder_loss=0.2593, over 29433.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1743, cr_loss=0.409, attn_decoder_loss=0.2679, over 5757530.38 frames. ], batch size: 78, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:43:04,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-09-17 06:43:06,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.10 vs. limit=22.5 2024-09-17 06:43:10,239 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:43:10,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=167300.0, ans=0.1 2024-09-17 06:43:11,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=167300.0, ans=0.2 2024-09-17 06:43:25,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167340.0, ans=0.1 2024-09-17 06:43:37,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=167380.0, ans=0.125 2024-09-17 06:43:43,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=167380.0, ans=0.0 2024-09-17 06:43:47,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=167420.0, ans=0.125 2024-09-17 06:44:05,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=167460.0, ans=0.0 2024-09-17 06:44:06,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=167460.0, ans=0.025 2024-09-17 06:44:18,500 INFO [train.py:1198] (0/2) Epoch 10, batch 1150, loss[loss=0.2651, ctc_loss=0.1718, cr_loss=0.3948, attn_decoder_loss=0.2667, over 29455.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1744, cr_loss=0.4096, attn_decoder_loss=0.268, over 5755220.74 frames. ], batch size: 78, lr: 1.14e-02, grad_scale: 4.0 2024-09-17 06:44:19,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.04 vs. limit=15.0 2024-09-17 06:44:34,848 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:44:37,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=167540.0, ans=0.0 2024-09-17 06:44:50,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=167580.0, ans=0.1 2024-09-17 06:45:22,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.006e+01 9.612e+01 1.039e+02 1.179e+02 2.688e+02, threshold=2.078e+02, percent-clipped=2.0 2024-09-17 06:45:26,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=167660.0, ans=0.125 2024-09-17 06:45:36,415 INFO [train.py:1198] (0/2) Epoch 10, batch 1200, loss[loss=0.2656, ctc_loss=0.1596, cr_loss=0.4013, attn_decoder_loss=0.2684, over 29671.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1752, cr_loss=0.4107, attn_decoder_loss=0.269, over 5749426.87 frames. ], batch size: 85, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:45:42,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=167700.0, ans=0.2 2024-09-17 06:45:43,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-17 06:46:51,948 INFO [train.py:1198] (0/2) Epoch 10, batch 1250, loss[loss=0.284, ctc_loss=0.1814, cr_loss=0.4177, attn_decoder_loss=0.2861, over 29540.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1751, cr_loss=0.411, attn_decoder_loss=0.2693, over 5776377.48 frames. ], batch size: 92, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:46:57,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.89 vs. limit=15.0 2024-09-17 06:46:59,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167900.0, ans=0.125 2024-09-17 06:47:07,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=167940.0, ans=0.0 2024-09-17 06:47:13,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=167940.0, ans=0.1 2024-09-17 06:47:19,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=167940.0, ans=0.025 2024-09-17 06:47:26,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=167980.0, ans=0.125 2024-09-17 06:47:28,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=167980.0, ans=0.125 2024-09-17 06:47:49,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168020.0, ans=0.1 2024-09-17 06:47:51,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=168060.0, ans=0.95 2024-09-17 06:47:56,413 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.293e+01 9.363e+01 1.028e+02 1.124e+02 2.251e+02, threshold=2.057e+02, percent-clipped=1.0 2024-09-17 06:47:59,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=168060.0, ans=0.125 2024-09-17 06:48:02,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=168060.0, ans=0.125 2024-09-17 06:48:06,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2024-09-17 06:48:08,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=168100.0, ans=0.125 2024-09-17 06:48:09,973 INFO [train.py:1198] (0/2) Epoch 10, batch 1300, loss[loss=0.279, ctc_loss=0.182, cr_loss=0.4311, attn_decoder_loss=0.2802, over 28201.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1742, cr_loss=0.4102, attn_decoder_loss=0.2686, over 5779979.19 frames. ], batch size: 111, lr: 1.14e-02, grad_scale: 8.0 2024-09-17 06:48:22,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=168100.0, ans=0.125 2024-09-17 06:48:30,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2024-09-17 06:48:57,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=12.0 2024-09-17 06:49:11,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=168260.0, ans=0.015 2024-09-17 06:49:20,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=168260.0, ans=0.125 2024-09-17 06:49:27,980 INFO [train.py:1198] (0/2) Epoch 10, batch 1350, loss[loss=0.2631, ctc_loss=0.1664, cr_loss=0.3978, attn_decoder_loss=0.265, over 29751.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1738, cr_loss=0.4097, attn_decoder_loss=0.2683, over 5796748.01 frames. ], batch size: 81, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:49:44,710 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:50:21,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168420.0, ans=0.1 2024-09-17 06:50:29,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.801e+01 9.605e+01 1.036e+02 1.132e+02 1.597e+02, threshold=2.072e+02, percent-clipped=0.0 2024-09-17 06:50:38,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=168460.0, ans=0.125 2024-09-17 06:50:42,763 INFO [train.py:1198] (0/2) Epoch 10, batch 1400, loss[loss=0.2296, ctc_loss=0.147, cr_loss=0.3693, attn_decoder_loss=0.2306, over 29541.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1732, cr_loss=0.4091, attn_decoder_loss=0.2677, over 5807534.98 frames. ], batch size: 69, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:50:52,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=168500.0, ans=0.125 2024-09-17 06:51:20,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=168580.0, ans=0.125 2024-09-17 06:51:34,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=168620.0, ans=0.0 2024-09-17 06:51:40,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=168620.0, ans=0.125 2024-09-17 06:51:51,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=168660.0, ans=0.125 2024-09-17 06:52:00,286 INFO [train.py:1198] (0/2) Epoch 10, batch 1450, loss[loss=0.2806, ctc_loss=0.1788, cr_loss=0.4122, attn_decoder_loss=0.2828, over 29427.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1732, cr_loss=0.4087, attn_decoder_loss=0.268, over 5803468.49 frames. ], batch size: 94, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:52:00,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=168700.0, ans=0.2 2024-09-17 06:52:02,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2024-09-17 06:52:18,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=168740.0, ans=0.125 2024-09-17 06:52:32,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=168780.0, ans=0.125 2024-09-17 06:52:48,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=168820.0, ans=0.125 2024-09-17 06:52:51,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-09-17 06:52:55,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=168820.0, ans=0.2 2024-09-17 06:52:58,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=168820.0, ans=0.0 2024-09-17 06:53:00,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=168820.0, ans=0.125 2024-09-17 06:53:06,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.081e+01 9.586e+01 1.053e+02 1.129e+02 3.740e+02, threshold=2.106e+02, percent-clipped=3.0 2024-09-17 06:53:16,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.89 vs. limit=22.5 2024-09-17 06:53:18,307 INFO [train.py:1198] (0/2) Epoch 10, batch 1500, loss[loss=0.2793, ctc_loss=0.18, cr_loss=0.4193, attn_decoder_loss=0.281, over 29642.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1736, cr_loss=0.4098, attn_decoder_loss=0.2687, over 5804705.95 frames. ], batch size: 86, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:53:33,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168940.0, ans=0.1 2024-09-17 06:53:39,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-09-17 06:53:43,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.07 vs. limit=6.0 2024-09-17 06:53:45,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=168940.0, ans=0.125 2024-09-17 06:53:50,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=168980.0, ans=0.125 2024-09-17 06:53:56,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=168980.0, ans=0.1 2024-09-17 06:53:58,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=168980.0, ans=0.2 2024-09-17 06:54:34,392 INFO [train.py:1198] (0/2) Epoch 10, batch 1550, loss[loss=0.2826, ctc_loss=0.1902, cr_loss=0.4478, attn_decoder_loss=0.2829, over 29522.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1749, cr_loss=0.411, attn_decoder_loss=0.2692, over 5780615.61 frames. ], batch size: 90, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 06:54:45,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169100.0, ans=0.1 2024-09-17 06:54:56,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=22.5 2024-09-17 06:55:19,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=169220.0, ans=0.035 2024-09-17 06:55:22,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=169220.0, ans=0.09899494936611666 2024-09-17 06:55:39,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=12.0 2024-09-17 06:55:41,101 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 9.536e+01 1.067e+02 1.173e+02 2.612e+02, threshold=2.133e+02, percent-clipped=1.0 2024-09-17 06:55:44,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=169260.0, ans=0.2 2024-09-17 06:55:51,706 INFO [train.py:1198] (0/2) Epoch 10, batch 1600, loss[loss=0.2759, ctc_loss=0.1814, cr_loss=0.4237, attn_decoder_loss=0.2769, over 29680.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1752, cr_loss=0.4109, attn_decoder_loss=0.2691, over 5763623.98 frames. ], batch size: 85, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:55:55,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=169300.0, ans=0.125 2024-09-17 06:56:03,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=22.5 2024-09-17 06:56:04,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=169300.0, ans=0.0 2024-09-17 06:56:05,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=169340.0, ans=0.025 2024-09-17 06:56:13,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2024-09-17 06:56:16,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=169340.0, ans=0.04949747468305833 2024-09-17 06:57:09,454 INFO [train.py:1198] (0/2) Epoch 10, batch 1650, loss[loss=0.2772, ctc_loss=0.1736, cr_loss=0.4211, attn_decoder_loss=0.2793, over 29695.00 frames. ], tot_loss[loss=0.2675, ctc_loss=0.1745, cr_loss=0.4109, attn_decoder_loss=0.2687, over 5758589.64 frames. ], batch size: 89, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:57:14,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=169500.0, ans=0.025 2024-09-17 06:57:36,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169540.0, ans=0.1 2024-09-17 06:57:44,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=169580.0, ans=0.04949747468305833 2024-09-17 06:57:49,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=169580.0, ans=0.125 2024-09-17 06:57:52,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=169580.0, ans=0.0 2024-09-17 06:58:14,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.189e+01 9.368e+01 9.821e+01 1.048e+02 1.434e+02, threshold=1.964e+02, percent-clipped=0.0 2024-09-17 06:58:21,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.52 vs. limit=15.0 2024-09-17 06:58:25,120 INFO [train.py:1198] (0/2) Epoch 10, batch 1700, loss[loss=0.23, ctc_loss=0.143, cr_loss=0.3621, attn_decoder_loss=0.2316, over 29551.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1738, cr_loss=0.4105, attn_decoder_loss=0.2682, over 5779936.06 frames. ], batch size: 69, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 06:58:48,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=169740.0, ans=0.125 2024-09-17 06:58:49,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-17 06:58:50,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.10 vs. limit=10.0 2024-09-17 06:59:05,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-17 06:59:15,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=169820.0, ans=0.0 2024-09-17 06:59:18,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=169820.0, ans=0.125 2024-09-17 06:59:24,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169860.0, ans=0.1 2024-09-17 06:59:31,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=22.5 2024-09-17 06:59:42,516 INFO [train.py:1198] (0/2) Epoch 10, batch 1750, loss[loss=0.2365, ctc_loss=0.143, cr_loss=0.3506, attn_decoder_loss=0.2391, over 29353.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1732, cr_loss=0.4091, attn_decoder_loss=0.2678, over 5788056.87 frames. ], batch size: 67, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 06:59:56,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=169940.0, ans=0.125 2024-09-17 07:00:03,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=169940.0, ans=0.0 2024-09-17 07:00:10,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=169940.0, ans=0.125 2024-09-17 07:00:40,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170020.0, ans=0.125 2024-09-17 07:00:43,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=170060.0, ans=0.0 2024-09-17 07:00:51,161 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 9.325e+01 9.999e+01 1.093e+02 1.950e+02, threshold=2.000e+02, percent-clipped=0.0 2024-09-17 07:00:59,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2024-09-17 07:01:00,145 INFO [train.py:1198] (0/2) Epoch 10, batch 1800, loss[loss=0.285, ctc_loss=0.189, cr_loss=0.4399, attn_decoder_loss=0.2859, over 29694.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.173, cr_loss=0.4091, attn_decoder_loss=0.2678, over 5790303.28 frames. ], batch size: 83, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:01:14,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=170140.0, ans=0.0 2024-09-17 07:01:21,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=170140.0, ans=0.2 2024-09-17 07:01:51,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-09-17 07:02:15,924 INFO [train.py:1198] (0/2) Epoch 10, batch 1850, loss[loss=0.2777, ctc_loss=0.184, cr_loss=0.4301, attn_decoder_loss=0.2786, over 29647.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.173, cr_loss=0.4095, attn_decoder_loss=0.2677, over 5797165.17 frames. ], batch size: 86, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 07:02:19,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170300.0, ans=0.125 2024-09-17 07:02:34,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=170340.0, ans=0.1 2024-09-17 07:02:55,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=170380.0, ans=0.125 2024-09-17 07:03:03,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=170420.0, ans=0.0 2024-09-17 07:03:03,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=170420.0, ans=0.0 2024-09-17 07:03:10,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=170420.0, ans=0.125 2024-09-17 07:03:11,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-09-17 07:03:18,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.29 vs. limit=15.0 2024-09-17 07:03:26,159 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.871e+01 9.382e+01 1.051e+02 1.159e+02 3.606e+02, threshold=2.101e+02, percent-clipped=3.0 2024-09-17 07:03:33,540 INFO [train.py:1198] (0/2) Epoch 10, batch 1900, loss[loss=0.2805, ctc_loss=0.174, cr_loss=0.4282, attn_decoder_loss=0.2828, over 29684.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1734, cr_loss=0.4097, attn_decoder_loss=0.2685, over 5805251.83 frames. ], batch size: 89, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:03:41,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=170500.0, ans=0.125 2024-09-17 07:03:45,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=170500.0, ans=0.125 2024-09-17 07:04:15,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=170580.0, ans=0.0 2024-09-17 07:04:15,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=170580.0, ans=0.0 2024-09-17 07:04:15,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-17 07:04:38,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=170660.0, ans=0.125 2024-09-17 07:04:41,386 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:04:42,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=170660.0, ans=0.125 2024-09-17 07:04:51,547 INFO [train.py:1198] (0/2) Epoch 10, batch 1950, loss[loss=0.273, ctc_loss=0.1781, cr_loss=0.4351, attn_decoder_loss=0.2739, over 29456.00 frames. ], tot_loss[loss=0.2683, ctc_loss=0.174, cr_loss=0.4115, attn_decoder_loss=0.2696, over 5819929.42 frames. ], batch size: 78, lr: 1.13e-02, grad_scale: 4.0 2024-09-17 07:04:59,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=170700.0, ans=0.125 2024-09-17 07:05:09,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-17 07:05:34,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170780.0, ans=0.1 2024-09-17 07:05:38,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=170820.0, ans=0.025 2024-09-17 07:05:44,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=170820.0, ans=0.95 2024-09-17 07:06:00,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.428e+01 1.003e+02 1.077e+02 1.161e+02 3.833e+02, threshold=2.155e+02, percent-clipped=2.0 2024-09-17 07:06:04,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=170860.0, ans=0.2 2024-09-17 07:06:06,876 INFO [train.py:1198] (0/2) Epoch 10, batch 2000, loss[loss=0.2585, ctc_loss=0.1794, cr_loss=0.4195, attn_decoder_loss=0.2579, over 29377.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1751, cr_loss=0.4127, attn_decoder_loss=0.2702, over 5797252.74 frames. ], batch size: 67, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:06:17,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=170900.0, ans=0.0 2024-09-17 07:06:28,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=170940.0, ans=0.0 2024-09-17 07:06:30,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=170940.0, ans=22.5 2024-09-17 07:06:31,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=170940.0, ans=0.125 2024-09-17 07:06:32,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=170940.0, ans=0.125 2024-09-17 07:07:14,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171060.0, ans=0.1 2024-09-17 07:07:24,546 INFO [train.py:1198] (0/2) Epoch 10, batch 2050, loss[loss=0.2514, ctc_loss=0.1674, cr_loss=0.4111, attn_decoder_loss=0.2516, over 29458.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1746, cr_loss=0.4116, attn_decoder_loss=0.2694, over 5788791.56 frames. ], batch size: 70, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:07:30,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=171100.0, ans=10.0 2024-09-17 07:07:39,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=171140.0, ans=0.125 2024-09-17 07:07:39,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=171140.0, ans=0.2 2024-09-17 07:07:46,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-09-17 07:07:49,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=12.0 2024-09-17 07:07:56,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=171180.0, ans=0.125 2024-09-17 07:08:04,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=171180.0, ans=0.2 2024-09-17 07:08:05,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=171180.0, ans=0.0 2024-09-17 07:08:24,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=171220.0, ans=0.125 2024-09-17 07:08:37,667 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.251e+01 9.516e+01 1.015e+02 1.092e+02 1.956e+02, threshold=2.031e+02, percent-clipped=0.0 2024-09-17 07:08:42,365 INFO [train.py:1198] (0/2) Epoch 10, batch 2100, loss[loss=0.2579, ctc_loss=0.1584, cr_loss=0.3974, attn_decoder_loss=0.2601, over 29760.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1736, cr_loss=0.4109, attn_decoder_loss=0.2687, over 5801160.98 frames. ], batch size: 81, lr: 1.13e-02, grad_scale: 8.0 2024-09-17 07:09:03,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=171340.0, ans=0.5 2024-09-17 07:09:08,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=171340.0, ans=0.125 2024-09-17 07:09:09,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=171340.0, ans=0.2 2024-09-17 07:09:19,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=171380.0, ans=0.0 2024-09-17 07:09:29,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=171420.0, ans=0.0 2024-09-17 07:09:31,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2024-09-17 07:09:32,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=171420.0, ans=0.0 2024-09-17 07:09:57,457 INFO [train.py:1198] (0/2) Epoch 10, batch 2150, loss[loss=0.2641, ctc_loss=0.1767, cr_loss=0.4131, attn_decoder_loss=0.2646, over 29454.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1731, cr_loss=0.4098, attn_decoder_loss=0.2681, over 5815222.25 frames. ], batch size: 78, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:10:02,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-09-17 07:10:05,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=171500.0, ans=0.125 2024-09-17 07:10:05,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=171500.0, ans=0.0 2024-09-17 07:10:06,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=171500.0, ans=0.0 2024-09-17 07:10:08,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171500.0, ans=0.125 2024-09-17 07:10:13,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=12.0 2024-09-17 07:10:50,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=171620.0, ans=0.2 2024-09-17 07:10:52,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=171620.0, ans=0.125 2024-09-17 07:11:11,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.141e+01 9.514e+01 1.011e+02 1.070e+02 3.193e+02, threshold=2.022e+02, percent-clipped=1.0 2024-09-17 07:11:15,096 INFO [train.py:1198] (0/2) Epoch 10, batch 2200, loss[loss=0.2639, ctc_loss=0.1622, cr_loss=0.3928, attn_decoder_loss=0.2665, over 29626.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1732, cr_loss=0.409, attn_decoder_loss=0.2682, over 5811576.17 frames. ], batch size: 86, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:11:39,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=171740.0, ans=0.05 2024-09-17 07:11:43,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=171780.0, ans=0.125 2024-09-17 07:11:51,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=171780.0, ans=0.07 2024-09-17 07:11:55,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=22.5 2024-09-17 07:11:59,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.85 vs. limit=15.0 2024-09-17 07:12:00,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171820.0, ans=0.0 2024-09-17 07:12:19,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=171860.0, ans=0.125 2024-09-17 07:12:32,669 INFO [train.py:1198] (0/2) Epoch 10, batch 2250, loss[loss=0.2703, ctc_loss=0.1673, cr_loss=0.4118, attn_decoder_loss=0.2726, over 29703.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1728, cr_loss=0.4086, attn_decoder_loss=0.2683, over 5810293.80 frames. ], batch size: 82, lr: 1.12e-02, grad_scale: 4.0 2024-09-17 07:12:45,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.46 vs. limit=10.0 2024-09-17 07:13:00,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2024-09-17 07:13:04,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=171980.0, ans=0.0 2024-09-17 07:13:06,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=171980.0, ans=0.125 2024-09-17 07:13:12,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=171980.0, ans=0.09899494936611666 2024-09-17 07:13:26,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=172020.0, ans=10.0 2024-09-17 07:13:38,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2024-09-17 07:13:42,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=172060.0, ans=0.0 2024-09-17 07:13:46,983 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.071e+01 9.786e+01 1.069e+02 1.181e+02 2.871e+02, threshold=2.139e+02, percent-clipped=1.0 2024-09-17 07:13:48,478 INFO [train.py:1198] (0/2) Epoch 10, batch 2300, loss[loss=0.2368, ctc_loss=0.1357, cr_loss=0.3582, attn_decoder_loss=0.2401, over 29305.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1722, cr_loss=0.4075, attn_decoder_loss=0.2674, over 5797021.43 frames. ], batch size: 71, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:13:55,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=12.0 2024-09-17 07:14:29,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=172180.0, ans=0.125 2024-09-17 07:14:44,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=172220.0, ans=0.0 2024-09-17 07:15:05,736 INFO [train.py:1198] (0/2) Epoch 10, batch 2350, loss[loss=0.2705, ctc_loss=0.1757, cr_loss=0.4174, attn_decoder_loss=0.2718, over 29692.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.172, cr_loss=0.4078, attn_decoder_loss=0.2674, over 5803581.47 frames. ], batch size: 83, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:15:14,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=172300.0, ans=0.2 2024-09-17 07:15:36,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.59 vs. limit=22.5 2024-09-17 07:16:00,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=172420.0, ans=0.125 2024-09-17 07:16:06,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-09-17 07:16:10,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=172460.0, ans=0.125 2024-09-17 07:16:21,797 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.247e+01 1.012e+02 1.078e+02 1.616e+02, threshold=2.023e+02, percent-clipped=0.0 2024-09-17 07:16:23,396 INFO [train.py:1198] (0/2) Epoch 10, batch 2400, loss[loss=0.2544, ctc_loss=0.1613, cr_loss=0.4175, attn_decoder_loss=0.2555, over 29543.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1722, cr_loss=0.4084, attn_decoder_loss=0.2678, over 5808037.13 frames. ], batch size: 76, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 07:16:37,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=172540.0, ans=0.125 2024-09-17 07:16:54,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=172580.0, ans=0.2 2024-09-17 07:17:10,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=172620.0, ans=0.05 2024-09-17 07:17:13,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=172620.0, ans=0.025 2024-09-17 07:17:21,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=172620.0, ans=0.025 2024-09-17 07:17:36,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172660.0, ans=0.125 2024-09-17 07:17:38,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.66 vs. limit=15.0 2024-09-17 07:17:39,188 INFO [train.py:1198] (0/2) Epoch 10, batch 2450, loss[loss=0.2709, ctc_loss=0.1764, cr_loss=0.4125, attn_decoder_loss=0.2722, over 29685.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1735, cr_loss=0.4101, attn_decoder_loss=0.269, over 5783643.26 frames. ], batch size: 82, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:17:50,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=172700.0, ans=0.125 2024-09-17 07:18:11,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=172780.0, ans=0.125 2024-09-17 07:18:17,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=172780.0, ans=0.2 2024-09-17 07:18:39,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=172860.0, ans=0.2 2024-09-17 07:18:53,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=172860.0, ans=0.125 2024-09-17 07:18:57,667 INFO [train.py:1198] (0/2) Epoch 10, batch 2500, loss[loss=0.2782, ctc_loss=0.18, cr_loss=0.4443, attn_decoder_loss=0.2793, over 29613.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1735, cr_loss=0.4111, attn_decoder_loss=0.2689, over 5794131.21 frames. ], batch size: 86, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:18:59,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.164e+01 9.501e+01 9.966e+01 1.113e+02 2.388e+02, threshold=1.993e+02, percent-clipped=1.0 2024-09-17 07:19:03,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.84 vs. limit=15.0 2024-09-17 07:19:04,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=172900.0, ans=0.125 2024-09-17 07:19:25,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=172940.0, ans=0.125 2024-09-17 07:19:36,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2024-09-17 07:19:52,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173020.0, ans=0.1 2024-09-17 07:20:15,466 INFO [train.py:1198] (0/2) Epoch 10, batch 2550, loss[loss=0.2319, ctc_loss=0.1475, cr_loss=0.3615, attn_decoder_loss=0.2332, over 29388.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1734, cr_loss=0.4106, attn_decoder_loss=0.2689, over 5799143.15 frames. ], batch size: 67, lr: 1.12e-02, grad_scale: 4.0 2024-09-17 07:20:17,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=173100.0, ans=0.125 2024-09-17 07:20:41,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=173140.0, ans=0.025 2024-09-17 07:20:47,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=173180.0, ans=0.125 2024-09-17 07:21:14,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=173260.0, ans=0.125 2024-09-17 07:21:19,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=173260.0, ans=0.125 2024-09-17 07:21:30,618 INFO [train.py:1198] (0/2) Epoch 10, batch 2600, loss[loss=0.256, ctc_loss=0.1642, cr_loss=0.4094, attn_decoder_loss=0.2571, over 29431.00 frames. ], tot_loss[loss=0.2677, ctc_loss=0.1733, cr_loss=0.4101, attn_decoder_loss=0.2691, over 5794678.32 frames. ], batch size: 78, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:21:33,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.476e+01 9.594e+01 1.032e+02 1.139e+02 3.672e+02, threshold=2.065e+02, percent-clipped=4.0 2024-09-17 07:21:36,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-17 07:21:39,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=173300.0, ans=0.0 2024-09-17 07:21:48,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=173340.0, ans=0.0 2024-09-17 07:21:50,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=173340.0, ans=0.5 2024-09-17 07:22:18,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=173420.0, ans=0.125 2024-09-17 07:22:33,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=173460.0, ans=0.2 2024-09-17 07:22:47,840 INFO [train.py:1198] (0/2) Epoch 10, batch 2650, loss[loss=0.2768, ctc_loss=0.1885, cr_loss=0.4342, attn_decoder_loss=0.277, over 29312.00 frames. ], tot_loss[loss=0.2678, ctc_loss=0.1737, cr_loss=0.4109, attn_decoder_loss=0.2692, over 5800667.47 frames. ], batch size: 100, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:22:58,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=173500.0, ans=0.125 2024-09-17 07:23:03,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=173540.0, ans=0.0 2024-09-17 07:23:40,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=173620.0, ans=0.0 2024-09-17 07:24:03,239 INFO [train.py:1198] (0/2) Epoch 10, batch 2700, loss[loss=0.2817, ctc_loss=0.184, cr_loss=0.4259, attn_decoder_loss=0.2831, over 29513.00 frames. ], tot_loss[loss=0.2679, ctc_loss=0.1734, cr_loss=0.4108, attn_decoder_loss=0.2692, over 5796376.80 frames. ], batch size: 87, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:24:08,381 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.630e+01 1.023e+02 1.091e+02 1.557e+02, threshold=2.045e+02, percent-clipped=0.0 2024-09-17 07:24:31,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173740.0, ans=0.1 2024-09-17 07:24:34,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=173780.0, ans=0.0 2024-09-17 07:25:15,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=173860.0, ans=0.04949747468305833 2024-09-17 07:25:17,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173860.0, ans=0.1 2024-09-17 07:25:21,533 INFO [train.py:1198] (0/2) Epoch 10, batch 2750, loss[loss=0.2622, ctc_loss=0.1689, cr_loss=0.4056, attn_decoder_loss=0.2635, over 29548.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.172, cr_loss=0.409, attn_decoder_loss=0.2679, over 5794471.40 frames. ], batch size: 75, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:25:32,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=173900.0, ans=0.0 2024-09-17 07:25:35,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=173940.0, ans=0.1 2024-09-17 07:25:41,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=173940.0, ans=6.0 2024-09-17 07:25:54,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=173980.0, ans=0.125 2024-09-17 07:25:56,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=173980.0, ans=0.025 2024-09-17 07:26:05,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=174020.0, ans=0.0 2024-09-17 07:26:27,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=174060.0, ans=0.125 2024-09-17 07:26:36,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=174060.0, ans=0.0 2024-09-17 07:26:39,170 INFO [train.py:1198] (0/2) Epoch 10, batch 2800, loss[loss=0.3036, ctc_loss=0.2314, cr_loss=0.4154, attn_decoder_loss=0.3024, over 20239.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1724, cr_loss=0.4098, attn_decoder_loss=0.2683, over 5775947.76 frames. ], batch size: 209, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 07:26:43,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.595e+01 1.029e+02 1.148e+02 1.291e+02 2.335e+02, threshold=2.295e+02, percent-clipped=2.0 2024-09-17 07:26:57,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=12.0 2024-09-17 07:27:13,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2024-09-17 07:27:31,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-09-17 07:27:38,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=174260.0, ans=0.125 2024-09-17 07:27:42,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=174260.0, ans=0.025 2024-09-17 07:27:54,278 INFO [train.py:1198] (0/2) Epoch 10, batch 2850, loss[loss=0.2686, ctc_loss=0.1743, cr_loss=0.3776, attn_decoder_loss=0.2707, over 29485.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1732, cr_loss=0.4103, attn_decoder_loss=0.2685, over 5761410.03 frames. ], batch size: 77, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:28:19,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.35 vs. limit=15.0 2024-09-17 07:28:23,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=174340.0, ans=0.125 2024-09-17 07:28:51,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=174420.0, ans=0.2 2024-09-17 07:29:11,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=174500.0, ans=0.025 2024-09-17 07:29:12,372 INFO [train.py:1198] (0/2) Epoch 10, batch 2900, loss[loss=0.2494, ctc_loss=0.1502, cr_loss=0.3791, attn_decoder_loss=0.252, over 29439.00 frames. ], tot_loss[loss=0.2684, ctc_loss=0.174, cr_loss=0.4122, attn_decoder_loss=0.2698, over 5786168.27 frames. ], batch size: 79, lr: 1.12e-02, grad_scale: 8.0 2024-09-17 07:29:18,301 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.890e+01 9.394e+01 1.011e+02 1.079e+02 3.902e+02, threshold=2.022e+02, percent-clipped=2.0 2024-09-17 07:29:27,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.08 vs. limit=22.5 2024-09-17 07:29:50,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=174580.0, ans=0.125 2024-09-17 07:29:56,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=174620.0, ans=0.025 2024-09-17 07:30:01,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=174620.0, ans=0.0 2024-09-17 07:30:18,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=174660.0, ans=0.95 2024-09-17 07:30:27,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=174660.0, ans=0.0 2024-09-17 07:30:30,071 INFO [train.py:1198] (0/2) Epoch 10, batch 2950, loss[loss=0.253, ctc_loss=0.1604, cr_loss=0.388, attn_decoder_loss=0.2547, over 29530.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1724, cr_loss=0.4089, attn_decoder_loss=0.2683, over 5781632.00 frames. ], batch size: 75, lr: 1.11e-02, grad_scale: 4.0 2024-09-17 07:30:30,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=174700.0, ans=0.5 2024-09-17 07:30:59,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-17 07:31:15,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.73 vs. limit=15.0 2024-09-17 07:31:24,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.26 vs. limit=15.0 2024-09-17 07:31:28,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=174820.0, ans=0.125 2024-09-17 07:31:46,173 INFO [train.py:1198] (0/2) Epoch 10, batch 3000, loss[loss=0.2705, ctc_loss=0.1691, cr_loss=0.4437, attn_decoder_loss=0.2719, over 29761.00 frames. ], tot_loss[loss=0.267, ctc_loss=0.1727, cr_loss=0.4091, attn_decoder_loss=0.2684, over 5782139.92 frames. ], batch size: 81, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:31:46,173 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 07:32:05,336 INFO [train.py:1230] (0/2) Epoch 10, validation: loss=0.2137, ctc_loss=0.04855, cr_loss=4.713e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-17 07:32:05,337 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 07:32:06,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-09-17 07:32:14,596 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.331e+01 9.561e+01 1.037e+02 1.121e+02 2.530e+02, threshold=2.075e+02, percent-clipped=2.0 2024-09-17 07:32:52,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=175020.0, ans=0.125 2024-09-17 07:33:00,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=175020.0, ans=0.07 2024-09-17 07:33:01,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=175020.0, ans=0.025 2024-09-17 07:33:04,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=175060.0, ans=0.125 2024-09-17 07:33:07,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=175060.0, ans=0.0 2024-09-17 07:33:07,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=175060.0, ans=0.0 2024-09-17 07:33:16,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=175060.0, ans=0.95 2024-09-17 07:33:20,910 INFO [train.py:1198] (0/2) Epoch 10, batch 3050, loss[loss=0.2519, ctc_loss=0.1623, cr_loss=0.4096, attn_decoder_loss=0.2527, over 29535.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1735, cr_loss=0.4107, attn_decoder_loss=0.269, over 5775788.27 frames. ], batch size: 76, lr: 1.11e-02, grad_scale: 4.0 2024-09-17 07:33:30,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=175100.0, ans=0.0 2024-09-17 07:33:55,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175180.0, ans=0.1 2024-09-17 07:34:12,444 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:34:19,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2024-09-17 07:34:36,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=175260.0, ans=0.2 2024-09-17 07:34:39,150 INFO [train.py:1198] (0/2) Epoch 10, batch 3100, loss[loss=0.2929, ctc_loss=0.1933, cr_loss=0.449, attn_decoder_loss=0.294, over 29285.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.173, cr_loss=0.4099, attn_decoder_loss=0.2686, over 5776765.80 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:34:39,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=175300.0, ans=0.2 2024-09-17 07:34:48,269 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.394e+01 9.542e+01 1.021e+02 1.174e+02 1.946e+02, threshold=2.041e+02, percent-clipped=0.0 2024-09-17 07:34:48,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=175300.0, ans=0.2 2024-09-17 07:34:50,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=175300.0, ans=0.125 2024-09-17 07:34:57,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=175340.0, ans=0.07 2024-09-17 07:35:03,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=175340.0, ans=0.025 2024-09-17 07:35:21,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175380.0, ans=0.1 2024-09-17 07:35:39,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.33 vs. limit=22.5 2024-09-17 07:35:57,191 INFO [train.py:1198] (0/2) Epoch 10, batch 3150, loss[loss=0.274, ctc_loss=0.175, cr_loss=0.4064, attn_decoder_loss=0.2759, over 28817.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1725, cr_loss=0.4093, attn_decoder_loss=0.2683, over 5783188.77 frames. ], batch size: 104, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:36:00,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=175500.0, ans=0.1 2024-09-17 07:36:21,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=175540.0, ans=10.0 2024-09-17 07:36:30,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=175580.0, ans=0.0 2024-09-17 07:36:32,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=175580.0, ans=0.1 2024-09-17 07:36:51,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-09-17 07:37:12,407 INFO [train.py:1198] (0/2) Epoch 10, batch 3200, loss[loss=0.2567, ctc_loss=0.1622, cr_loss=0.3981, attn_decoder_loss=0.2583, over 29402.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1723, cr_loss=0.4094, attn_decoder_loss=0.2681, over 5794011.20 frames. ], batch size: 79, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:37:24,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.118e+01 9.421e+01 9.970e+01 1.120e+02 1.872e+02, threshold=1.994e+02, percent-clipped=0.0 2024-09-17 07:37:29,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175740.0, ans=0.1 2024-09-17 07:37:45,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=175780.0, ans=0.2 2024-09-17 07:37:49,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=175780.0, ans=0.0 2024-09-17 07:38:10,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=175820.0, ans=0.0 2024-09-17 07:38:30,173 INFO [train.py:1198] (0/2) Epoch 10, batch 3250, loss[loss=0.2747, ctc_loss=0.1745, cr_loss=0.4224, attn_decoder_loss=0.2764, over 29712.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1728, cr_loss=0.4103, attn_decoder_loss=0.2686, over 5800208.71 frames. ], batch size: 84, lr: 1.11e-02, grad_scale: 4.0 2024-09-17 07:38:30,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=175900.0, ans=0.09899494936611666 2024-09-17 07:38:46,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.57 vs. limit=10.0 2024-09-17 07:39:06,443 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-44000.pt 2024-09-17 07:39:19,569 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:39:37,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=176060.0, ans=0.0 2024-09-17 07:39:45,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=176060.0, ans=0.07 2024-09-17 07:39:54,677 INFO [train.py:1198] (0/2) Epoch 10, batch 3300, loss[loss=0.2864, ctc_loss=0.1917, cr_loss=0.4319, attn_decoder_loss=0.2873, over 28312.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1721, cr_loss=0.4093, attn_decoder_loss=0.2677, over 5796883.31 frames. ], batch size: 111, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:40:01,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=176100.0, ans=0.125 2024-09-17 07:40:06,777 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 9.364e+01 1.005e+02 1.120e+02 3.139e+02, threshold=2.009e+02, percent-clipped=4.0 2024-09-17 07:40:20,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176140.0, ans=0.1 2024-09-17 07:40:32,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=176180.0, ans=0.2 2024-09-17 07:40:35,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=176180.0, ans=0.025 2024-09-17 07:40:38,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176220.0, ans=0.1 2024-09-17 07:40:56,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=176260.0, ans=0.2 2024-09-17 07:40:58,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-17 07:41:09,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.66 vs. limit=22.5 2024-09-17 07:41:09,871 INFO [train.py:1198] (0/2) Epoch 10, batch 3350, loss[loss=0.2794, ctc_loss=0.1827, cr_loss=0.4324, attn_decoder_loss=0.2805, over 28870.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1733, cr_loss=0.4102, attn_decoder_loss=0.2687, over 5773905.06 frames. ], batch size: 104, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:41:17,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=176300.0, ans=0.0 2024-09-17 07:41:32,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176340.0, ans=0.1 2024-09-17 07:41:33,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=12.0 2024-09-17 07:41:34,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=176340.0, ans=0.125 2024-09-17 07:41:49,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=176380.0, ans=0.0 2024-09-17 07:41:55,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=176420.0, ans=0.2 2024-09-17 07:42:01,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=176420.0, ans=0.125 2024-09-17 07:42:02,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.66 vs. limit=15.0 2024-09-17 07:42:27,214 INFO [train.py:1198] (0/2) Epoch 10, batch 3400, loss[loss=0.2413, ctc_loss=0.1557, cr_loss=0.3896, attn_decoder_loss=0.2422, over 29330.00 frames. ], tot_loss[loss=0.2674, ctc_loss=0.1736, cr_loss=0.4107, attn_decoder_loss=0.2687, over 5766880.42 frames. ], batch size: 67, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:42:36,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=176500.0, ans=0.95 2024-09-17 07:42:39,315 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.353e+01 9.300e+01 1.006e+02 1.112e+02 2.316e+02, threshold=2.013e+02, percent-clipped=1.0 2024-09-17 07:42:41,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176540.0, ans=0.1 2024-09-17 07:42:56,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=176580.0, ans=0.025 2024-09-17 07:43:05,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=176580.0, ans=0.125 2024-09-17 07:43:05,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-09-17 07:43:06,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=176580.0, ans=0.07 2024-09-17 07:43:14,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=176620.0, ans=0.2 2024-09-17 07:43:21,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=176620.0, ans=0.0 2024-09-17 07:43:31,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=12.0 2024-09-17 07:43:44,824 INFO [train.py:1198] (0/2) Epoch 10, batch 3450, loss[loss=0.2648, ctc_loss=0.1618, cr_loss=0.3791, attn_decoder_loss=0.2678, over 28216.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1731, cr_loss=0.41, attn_decoder_loss=0.2686, over 5775709.15 frames. ], batch size: 111, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:43:58,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176740.0, ans=0.1 2024-09-17 07:43:58,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=176740.0, ans=0.0 2024-09-17 07:44:12,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.29 vs. limit=22.5 2024-09-17 07:44:27,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=176780.0, ans=0.09899494936611666 2024-09-17 07:44:29,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=176820.0, ans=0.125 2024-09-17 07:44:48,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=176860.0, ans=0.125 2024-09-17 07:44:57,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.45 vs. limit=22.5 2024-09-17 07:44:57,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=176860.0, ans=0.125 2024-09-17 07:45:00,783 INFO [train.py:1198] (0/2) Epoch 10, batch 3500, loss[loss=0.2404, ctc_loss=0.1551, cr_loss=0.3864, attn_decoder_loss=0.2412, over 29315.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1728, cr_loss=0.4097, attn_decoder_loss=0.2681, over 5777256.16 frames. ], batch size: 71, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:45:12,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 9.560e+01 1.051e+02 1.170e+02 3.242e+02, threshold=2.102e+02, percent-clipped=4.0 2024-09-17 07:45:22,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176940.0, ans=0.1 2024-09-17 07:45:30,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.93 vs. limit=15.0 2024-09-17 07:45:31,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176980.0, ans=0.1 2024-09-17 07:45:32,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=176980.0, ans=0.0 2024-09-17 07:45:37,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=176980.0, ans=0.125 2024-09-17 07:45:39,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=176980.0, ans=0.025 2024-09-17 07:45:45,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=176980.0, ans=0.125 2024-09-17 07:46:16,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177100.0, ans=0.1 2024-09-17 07:46:17,482 INFO [train.py:1198] (0/2) Epoch 10, batch 3550, loss[loss=0.28, ctc_loss=0.1875, cr_loss=0.4393, attn_decoder_loss=0.2805, over 29708.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1724, cr_loss=0.4091, attn_decoder_loss=0.268, over 5784224.02 frames. ], batch size: 89, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:46:33,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=177140.0, ans=0.125 2024-09-17 07:46:38,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=177140.0, ans=0.125 2024-09-17 07:46:47,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=177180.0, ans=0.125 2024-09-17 07:46:53,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=177180.0, ans=0.125 2024-09-17 07:47:06,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=177220.0, ans=0.125 2024-09-17 07:47:20,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=177260.0, ans=0.2 2024-09-17 07:47:21,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=177260.0, ans=0.125 2024-09-17 07:47:31,489 INFO [train.py:1198] (0/2) Epoch 10, batch 3600, loss[loss=0.2664, ctc_loss=0.1691, cr_loss=0.4141, attn_decoder_loss=0.268, over 29495.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1718, cr_loss=0.4082, attn_decoder_loss=0.2677, over 5792891.47 frames. ], batch size: 77, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 07:47:39,443 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:47:42,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=177300.0, ans=0.025 2024-09-17 07:47:44,996 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 9.296e+01 9.828e+01 1.086e+02 1.804e+02, threshold=1.966e+02, percent-clipped=0.0 2024-09-17 07:48:04,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=177380.0, ans=0.125 2024-09-17 07:48:11,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2024-09-17 07:48:40,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=177460.0, ans=0.125 2024-09-17 07:48:44,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=22.5 2024-09-17 07:48:45,917 INFO [train.py:1198] (0/2) Epoch 10, batch 3650, loss[loss=0.2791, ctc_loss=0.1767, cr_loss=0.4274, attn_decoder_loss=0.281, over 29523.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1705, cr_loss=0.406, attn_decoder_loss=0.2667, over 5792929.07 frames. ], batch size: 90, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:48:46,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=177500.0, ans=0.125 2024-09-17 07:48:50,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=177500.0, ans=0.025 2024-09-17 07:48:53,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-17 07:48:58,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.77 vs. limit=22.5 2024-09-17 07:48:59,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=177540.0, ans=0.0 2024-09-17 07:49:06,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=22.5 2024-09-17 07:49:08,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=177540.0, ans=0.125 2024-09-17 07:49:25,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177580.0, ans=0.1 2024-09-17 07:49:25,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=177580.0, ans=0.0 2024-09-17 07:49:37,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=177620.0, ans=0.025 2024-09-17 07:50:03,432 INFO [train.py:1198] (0/2) Epoch 10, batch 3700, loss[loss=0.2732, ctc_loss=0.1709, cr_loss=0.4155, attn_decoder_loss=0.2754, over 29711.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.1703, cr_loss=0.4059, attn_decoder_loss=0.2667, over 5803799.23 frames. ], batch size: 84, lr: 1.11e-02, grad_scale: 8.0 2024-09-17 07:50:15,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=177700.0, ans=0.125 2024-09-17 07:50:16,866 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.275e+01 9.841e+01 1.076e+02 3.002e+02, threshold=1.968e+02, percent-clipped=1.0 2024-09-17 07:50:21,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=177740.0, ans=0.2 2024-09-17 07:50:55,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=177820.0, ans=0.2 2024-09-17 07:51:07,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=177860.0, ans=0.95 2024-09-17 07:51:17,406 INFO [train.py:1198] (0/2) Epoch 10, batch 3750, loss[loss=0.2418, ctc_loss=0.1517, cr_loss=0.3831, attn_decoder_loss=0.2433, over 29308.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1709, cr_loss=0.4069, attn_decoder_loss=0.267, over 5806864.01 frames. ], batch size: 67, lr: 1.10e-02, grad_scale: 4.0 2024-09-17 07:51:52,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=177980.0, ans=0.0 2024-09-17 07:52:10,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=178020.0, ans=0.0 2024-09-17 07:52:16,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.57 vs. limit=10.0 2024-09-17 07:52:17,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.15 vs. limit=15.0 2024-09-17 07:52:21,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=178060.0, ans=0.125 2024-09-17 07:52:28,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=178060.0, ans=0.125 2024-09-17 07:52:32,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=178100.0, ans=0.0 2024-09-17 07:52:33,648 INFO [train.py:1198] (0/2) Epoch 10, batch 3800, loss[loss=0.2709, ctc_loss=0.1694, cr_loss=0.4216, attn_decoder_loss=0.2728, over 29627.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1699, cr_loss=0.4052, attn_decoder_loss=0.2663, over 5797175.08 frames. ], batch size: 86, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:52:48,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 9.594e+01 1.015e+02 1.096e+02 4.461e+02, threshold=2.030e+02, percent-clipped=1.0 2024-09-17 07:52:54,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=12.0 2024-09-17 07:53:05,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=178180.0, ans=0.125 2024-09-17 07:53:26,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=178220.0, ans=0.125 2024-09-17 07:53:48,156 INFO [train.py:1198] (0/2) Epoch 10, batch 3850, loss[loss=0.2761, ctc_loss=0.1767, cr_loss=0.4189, attn_decoder_loss=0.2778, over 29230.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1697, cr_loss=0.4055, attn_decoder_loss=0.2662, over 5811755.85 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:54:16,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178380.0, ans=0.1 2024-09-17 07:54:19,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=178380.0, ans=0.0 2024-09-17 07:54:24,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=178380.0, ans=0.0 2024-09-17 07:54:45,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=178420.0, ans=0.0 2024-09-17 07:55:01,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=178500.0, ans=0.0 2024-09-17 07:55:04,010 INFO [train.py:1198] (0/2) Epoch 10, batch 3900, loss[loss=0.284, ctc_loss=0.1926, cr_loss=0.4425, attn_decoder_loss=0.2843, over 29638.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1701, cr_loss=0.4063, attn_decoder_loss=0.2665, over 5816508.48 frames. ], batch size: 86, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:55:11,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=178500.0, ans=0.0 2024-09-17 07:55:20,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 9.621e+01 1.032e+02 1.104e+02 1.342e+02, threshold=2.064e+02, percent-clipped=0.0 2024-09-17 07:55:22,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=178540.0, ans=0.0 2024-09-17 07:55:22,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-09-17 07:55:23,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178540.0, ans=0.1 2024-09-17 07:55:26,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=178540.0, ans=0.125 2024-09-17 07:56:15,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=178660.0, ans=0.2 2024-09-17 07:56:18,296 INFO [train.py:1198] (0/2) Epoch 10, batch 3950, loss[loss=0.2834, ctc_loss=0.1807, cr_loss=0.4176, attn_decoder_loss=0.2855, over 29490.00 frames. ], tot_loss[loss=0.2652, ctc_loss=0.17, cr_loss=0.4069, attn_decoder_loss=0.2668, over 5836067.45 frames. ], batch size: 97, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:57:01,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=178820.0, ans=0.0 2024-09-17 07:57:01,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-09-17 07:57:02,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=178820.0, ans=0.0 2024-09-17 07:57:09,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=178820.0, ans=0.125 2024-09-17 07:57:10,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=178820.0, ans=0.125 2024-09-17 07:57:27,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=178860.0, ans=0.2 2024-09-17 07:57:33,108 INFO [train.py:1198] (0/2) Epoch 10, batch 4000, loss[loss=0.2473, ctc_loss=0.158, cr_loss=0.3984, attn_decoder_loss=0.2484, over 29506.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1707, cr_loss=0.4078, attn_decoder_loss=0.2671, over 5812341.99 frames. ], batch size: 74, lr: 1.10e-02, grad_scale: 16.0 2024-09-17 07:57:40,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=178900.0, ans=0.0 2024-09-17 07:57:50,421 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 9.319e+01 1.030e+02 1.152e+02 2.635e+02, threshold=2.059e+02, percent-clipped=1.0 2024-09-17 07:58:47,041 INFO [train.py:1198] (0/2) Epoch 10, batch 4050, loss[loss=0.3077, ctc_loss=0.2443, cr_loss=0.4507, attn_decoder_loss=0.3047, over 20444.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.171, cr_loss=0.4077, attn_decoder_loss=0.2672, over 5796888.87 frames. ], batch size: 210, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 07:59:01,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179140.0, ans=0.1 2024-09-17 07:59:01,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=179140.0, ans=0.125 2024-09-17 07:59:16,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=179180.0, ans=10.0 2024-09-17 07:59:37,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-17 07:59:38,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=179220.0, ans=0.2 2024-09-17 07:59:56,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-09-17 08:00:00,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=179300.0, ans=0.0 2024-09-17 08:00:01,773 INFO [train.py:1198] (0/2) Epoch 10, batch 4100, loss[loss=0.284, ctc_loss=0.1859, cr_loss=0.4158, attn_decoder_loss=0.2857, over 29493.00 frames. ], tot_loss[loss=0.2656, ctc_loss=0.1707, cr_loss=0.4071, attn_decoder_loss=0.2671, over 5791699.78 frames. ], batch size: 90, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:00:03,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179300.0, ans=0.125 2024-09-17 08:00:12,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.04 vs. limit=22.5 2024-09-17 08:00:20,760 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.136e+01 9.895e+01 1.094e+02 2.839e+02, threshold=1.979e+02, percent-clipped=1.0 2024-09-17 08:00:46,416 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:01:00,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=179460.0, ans=0.0 2024-09-17 08:01:05,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=179460.0, ans=0.0 2024-09-17 08:01:05,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179460.0, ans=0.125 2024-09-17 08:01:15,649 INFO [train.py:1198] (0/2) Epoch 10, batch 4150, loss[loss=0.2479, ctc_loss=0.1598, cr_loss=0.3947, attn_decoder_loss=0.2489, over 29488.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1704, cr_loss=0.4064, attn_decoder_loss=0.2666, over 5797811.55 frames. ], batch size: 77, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:01:33,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179540.0, ans=0.1 2024-09-17 08:01:36,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=179540.0, ans=0.2 2024-09-17 08:01:39,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=179540.0, ans=0.0 2024-09-17 08:01:46,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=179580.0, ans=0.2 2024-09-17 08:01:47,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=179580.0, ans=0.125 2024-09-17 08:01:50,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=179580.0, ans=0.1 2024-09-17 08:02:02,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=179620.0, ans=0.0 2024-09-17 08:02:16,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179660.0, ans=0.0 2024-09-17 08:02:30,720 INFO [train.py:1198] (0/2) Epoch 10, batch 4200, loss[loss=0.2814, ctc_loss=0.182, cr_loss=0.423, attn_decoder_loss=0.283, over 29512.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.171, cr_loss=0.4071, attn_decoder_loss=0.2673, over 5800154.09 frames. ], batch size: 90, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:02:31,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=179700.0, ans=0.95 2024-09-17 08:02:46,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=179740.0, ans=0.125 2024-09-17 08:02:48,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=179740.0, ans=0.125 2024-09-17 08:02:49,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=12.0 2024-09-17 08:02:50,150 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 9.480e+01 1.011e+02 1.105e+02 3.367e+02, threshold=2.021e+02, percent-clipped=4.0 2024-09-17 08:02:59,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=179780.0, ans=0.125 2024-09-17 08:03:14,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2024-09-17 08:03:44,285 INFO [train.py:1198] (0/2) Epoch 10, batch 4250, loss[loss=0.2552, ctc_loss=0.1658, cr_loss=0.4041, attn_decoder_loss=0.2562, over 29501.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1707, cr_loss=0.4068, attn_decoder_loss=0.2675, over 5805306.04 frames. ], batch size: 74, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:03:45,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=179900.0, ans=0.025 2024-09-17 08:03:52,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=179900.0, ans=0.0 2024-09-17 08:03:54,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=179900.0, ans=0.07 2024-09-17 08:04:13,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=179980.0, ans=0.125 2024-09-17 08:04:25,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=179980.0, ans=0.0 2024-09-17 08:04:33,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=180020.0, ans=0.125 2024-09-17 08:04:49,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=180060.0, ans=0.125 2024-09-17 08:04:59,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=180100.0, ans=0.125 2024-09-17 08:05:00,424 INFO [train.py:1198] (0/2) Epoch 10, batch 4300, loss[loss=0.2636, ctc_loss=0.1574, cr_loss=0.3865, attn_decoder_loss=0.2668, over 29536.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.1707, cr_loss=0.4067, attn_decoder_loss=0.2678, over 5794726.40 frames. ], batch size: 87, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:05:19,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 9.707e+01 1.038e+02 1.136e+02 2.980e+02, threshold=2.076e+02, percent-clipped=1.0 2024-09-17 08:05:34,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=180180.0, ans=0.1 2024-09-17 08:05:37,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=180180.0, ans=0.0 2024-09-17 08:05:49,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=180220.0, ans=0.125 2024-09-17 08:05:49,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=180220.0, ans=0.0 2024-09-17 08:05:55,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=180220.0, ans=15.0 2024-09-17 08:06:10,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.77 vs. limit=10.0 2024-09-17 08:06:14,238 INFO [train.py:1198] (0/2) Epoch 10, batch 4350, loss[loss=0.2812, ctc_loss=0.1753, cr_loss=0.4214, attn_decoder_loss=0.2836, over 29435.00 frames. ], tot_loss[loss=0.2697, ctc_loss=0.1738, cr_loss=0.4118, attn_decoder_loss=0.2712, over 5796891.06 frames. ], batch size: 97, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:07:12,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=180460.0, ans=0.2 2024-09-17 08:07:17,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=180460.0, ans=0.2 2024-09-17 08:07:27,831 INFO [train.py:1198] (0/2) Epoch 10, batch 4400, loss[loss=0.2868, ctc_loss=0.2008, cr_loss=0.4554, attn_decoder_loss=0.2863, over 27168.00 frames. ], tot_loss[loss=0.2722, ctc_loss=0.1762, cr_loss=0.4154, attn_decoder_loss=0.2737, over 5767897.93 frames. ], batch size: 124, lr: 1.10e-02, grad_scale: 16.0 2024-09-17 08:07:38,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=180500.0, ans=0.0 2024-09-17 08:07:46,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=180540.0, ans=0.95 2024-09-17 08:07:48,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.847e+01 9.762e+01 1.026e+02 1.096e+02 2.982e+02, threshold=2.053e+02, percent-clipped=1.0 2024-09-17 08:07:53,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=180540.0, ans=0.125 2024-09-17 08:08:24,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=180620.0, ans=0.0 2024-09-17 08:08:26,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=180660.0, ans=0.125 2024-09-17 08:08:41,975 INFO [train.py:1198] (0/2) Epoch 10, batch 4450, loss[loss=0.3005, ctc_loss=0.2256, cr_loss=0.4356, attn_decoder_loss=0.2992, over 19687.00 frames. ], tot_loss[loss=0.2756, ctc_loss=0.1817, cr_loss=0.4196, attn_decoder_loss=0.2767, over 5582632.39 frames. ], batch size: 209, lr: 1.10e-02, grad_scale: 4.0 2024-09-17 08:08:48,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=180700.0, ans=0.125 2024-09-17 08:08:57,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-09-17 08:09:01,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.62 vs. limit=15.0 2024-09-17 08:09:05,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=180740.0, ans=0.125 2024-09-17 08:09:09,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=9.00 vs. limit=15.0 2024-09-17 08:09:18,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-09-17 08:09:23,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=180780.0, ans=0.125 2024-09-17 08:09:29,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=180820.0, ans=0.125 2024-09-17 08:09:33,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=180820.0, ans=0.125 2024-09-17 08:09:35,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-09-17 08:09:58,575 INFO [train.py:1198] (0/2) Epoch 10, batch 4500, loss[loss=0.2983, ctc_loss=0.2231, cr_loss=0.4442, attn_decoder_loss=0.2968, over 19704.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1894, cr_loss=0.4223, attn_decoder_loss=0.2804, over 5239847.92 frames. ], batch size: 209, lr: 1.10e-02, grad_scale: 8.0 2024-09-17 08:10:07,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=180900.0, ans=0.1 2024-09-17 08:10:21,315 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.766e+01 1.077e+02 1.142e+02 1.231e+02 1.732e+02, threshold=2.283e+02, percent-clipped=0.0 2024-09-17 08:10:24,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=180940.0, ans=0.0 2024-09-17 08:10:35,968 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-10.pt 2024-09-17 08:11:32,465 INFO [train.py:1198] (0/2) Epoch 11, batch 0, loss[loss=0.2543, ctc_loss=0.1563, cr_loss=0.3716, attn_decoder_loss=0.2569, over 29639.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1563, cr_loss=0.3716, attn_decoder_loss=0.2569, over 29639.00 frames. ], batch size: 73, lr: 1.05e-02, grad_scale: 16.0 2024-09-17 08:11:32,465 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 08:11:50,861 INFO [train.py:1230] (0/2) Epoch 11, validation: loss=0.2172, ctc_loss=0.0495, cr_loss=4.7e-15, attn_decoder_loss=0.2358, over 944034.00 frames. 2024-09-17 08:11:50,862 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 08:11:54,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=181000.0, ans=0.125 2024-09-17 08:12:12,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181040.0, ans=0.125 2024-09-17 08:12:18,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=181040.0, ans=0.125 2024-09-17 08:12:20,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2024-09-17 08:12:28,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=181080.0, ans=0.125 2024-09-17 08:12:30,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-09-17 08:12:32,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=181080.0, ans=0.0 2024-09-17 08:12:58,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181160.0, ans=0.1 2024-09-17 08:13:01,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=181160.0, ans=0.125 2024-09-17 08:13:02,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.64 vs. limit=15.0 2024-09-17 08:13:10,329 INFO [train.py:1198] (0/2) Epoch 11, batch 50, loss[loss=0.2368, ctc_loss=0.1469, cr_loss=0.3612, attn_decoder_loss=0.2387, over 29422.00 frames. ], tot_loss[loss=0.2673, ctc_loss=0.1735, cr_loss=0.4106, attn_decoder_loss=0.2687, over 1268309.62 frames. ], batch size: 70, lr: 1.05e-02, grad_scale: 8.0 2024-09-17 08:13:10,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=181200.0, ans=0.0 2024-09-17 08:13:35,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-09-17 08:13:47,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=181280.0, ans=22.5 2024-09-17 08:14:13,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 9.758e+01 1.123e+02 1.302e+02 1.602e+03, threshold=2.247e+02, percent-clipped=5.0 2024-09-17 08:14:25,859 INFO [train.py:1198] (0/2) Epoch 11, batch 100, loss[loss=0.2614, ctc_loss=0.1727, cr_loss=0.4104, attn_decoder_loss=0.2622, over 29537.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1742, cr_loss=0.4131, attn_decoder_loss=0.2706, over 2252457.89 frames. ], batch size: 76, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:14:34,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181400.0, ans=0.125 2024-09-17 08:14:44,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-17 08:14:45,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=181440.0, ans=0.0 2024-09-17 08:14:50,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181440.0, ans=0.125 2024-09-17 08:14:50,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=181440.0, ans=0.125 2024-09-17 08:14:53,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=181440.0, ans=0.2 2024-09-17 08:14:56,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=181480.0, ans=0.125 2024-09-17 08:15:05,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-17 08:15:06,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181480.0, ans=0.1 2024-09-17 08:15:13,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-09-17 08:15:33,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181560.0, ans=0.1 2024-09-17 08:15:35,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.67 vs. limit=10.0 2024-09-17 08:15:39,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=181600.0, ans=0.1 2024-09-17 08:15:41,049 INFO [train.py:1198] (0/2) Epoch 11, batch 150, loss[loss=0.2391, ctc_loss=0.1446, cr_loss=0.3746, attn_decoder_loss=0.2412, over 29400.00 frames. ], tot_loss[loss=0.2667, ctc_loss=0.1714, cr_loss=0.4085, attn_decoder_loss=0.2682, over 3047301.48 frames. ], batch size: 70, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:15:49,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=181600.0, ans=0.125 2024-09-17 08:15:51,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181600.0, ans=0.1 2024-09-17 08:16:05,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=181640.0, ans=0.2 2024-09-17 08:16:09,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=181640.0, ans=0.0 2024-09-17 08:16:22,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181680.0, ans=0.0 2024-09-17 08:16:25,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=181680.0, ans=0.0 2024-09-17 08:16:49,222 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.120e+01 9.120e+01 9.727e+01 1.024e+02 1.360e+02, threshold=1.945e+02, percent-clipped=0.0 2024-09-17 08:16:56,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=181760.0, ans=0.04949747468305833 2024-09-17 08:17:01,151 INFO [train.py:1198] (0/2) Epoch 11, batch 200, loss[loss=0.2901, ctc_loss=0.1911, cr_loss=0.435, attn_decoder_loss=0.2915, over 27562.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1702, cr_loss=0.4074, attn_decoder_loss=0.267, over 3659115.41 frames. ], batch size: 125, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:17:15,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=181840.0, ans=0.125 2024-09-17 08:17:22,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=181840.0, ans=0.0 2024-09-17 08:17:57,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181920.0, ans=0.0 2024-09-17 08:18:04,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=181960.0, ans=0.125 2024-09-17 08:18:16,509 INFO [train.py:1198] (0/2) Epoch 11, batch 250, loss[loss=0.2846, ctc_loss=0.1821, cr_loss=0.438, attn_decoder_loss=0.2863, over 29243.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1687, cr_loss=0.4047, attn_decoder_loss=0.2663, over 4141709.34 frames. ], batch size: 100, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:18:16,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=182000.0, ans=0.0 2024-09-17 08:18:26,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-17 08:18:32,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=182040.0, ans=0.0 2024-09-17 08:18:34,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2024-09-17 08:18:36,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=182040.0, ans=0.125 2024-09-17 08:18:44,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-09-17 08:18:47,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.33 vs. limit=22.5 2024-09-17 08:19:11,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=182120.0, ans=0.2 2024-09-17 08:19:16,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=182160.0, ans=0.025 2024-09-17 08:19:20,161 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 9.171e+01 1.004e+02 1.090e+02 1.755e+02, threshold=2.009e+02, percent-clipped=0.0 2024-09-17 08:19:32,196 INFO [train.py:1198] (0/2) Epoch 11, batch 300, loss[loss=0.2835, ctc_loss=0.1764, cr_loss=0.4261, attn_decoder_loss=0.286, over 29560.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1684, cr_loss=0.4047, attn_decoder_loss=0.266, over 4510296.05 frames. ], batch size: 92, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:19:57,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=182240.0, ans=0.0 2024-09-17 08:20:05,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=12.0 2024-09-17 08:20:15,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=182280.0, ans=0.09899494936611666 2024-09-17 08:20:31,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=182320.0, ans=0.0 2024-09-17 08:20:37,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=182360.0, ans=0.0 2024-09-17 08:20:45,017 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:20:45,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-09-17 08:20:52,577 INFO [train.py:1198] (0/2) Epoch 11, batch 350, loss[loss=0.2546, ctc_loss=0.1649, cr_loss=0.4017, attn_decoder_loss=0.2557, over 29342.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1689, cr_loss=0.4059, attn_decoder_loss=0.2667, over 4796724.14 frames. ], batch size: 71, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:21:06,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=182440.0, ans=0.025 2024-09-17 08:21:27,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=182480.0, ans=0.5 2024-09-17 08:21:36,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=182520.0, ans=0.125 2024-09-17 08:21:42,597 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:21:45,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=182520.0, ans=0.0 2024-09-17 08:21:55,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.030e+01 9.345e+01 9.958e+01 1.088e+02 1.726e+02, threshold=1.992e+02, percent-clipped=0.0 2024-09-17 08:22:07,832 INFO [train.py:1198] (0/2) Epoch 11, batch 400, loss[loss=0.2807, ctc_loss=0.1801, cr_loss=0.4426, attn_decoder_loss=0.282, over 29686.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1686, cr_loss=0.4057, attn_decoder_loss=0.2661, over 5025969.95 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 16.0 2024-09-17 08:22:28,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-09-17 08:22:49,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-17 08:23:03,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=182720.0, ans=0.2 2024-09-17 08:23:23,314 INFO [train.py:1198] (0/2) Epoch 11, batch 450, loss[loss=0.2692, ctc_loss=0.167, cr_loss=0.4207, attn_decoder_loss=0.2712, over 29699.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.169, cr_loss=0.4068, attn_decoder_loss=0.2663, over 5187708.50 frames. ], batch size: 83, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:23:23,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=182800.0, ans=0.125 2024-09-17 08:23:26,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=182800.0, ans=0.0 2024-09-17 08:23:56,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=182880.0, ans=0.0 2024-09-17 08:24:17,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=182920.0, ans=0.0 2024-09-17 08:24:32,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 9.150e+01 9.879e+01 1.056e+02 3.994e+02, threshold=1.976e+02, percent-clipped=1.0 2024-09-17 08:24:33,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=182960.0, ans=0.1 2024-09-17 08:24:43,360 INFO [train.py:1198] (0/2) Epoch 11, batch 500, loss[loss=0.2933, ctc_loss=0.1957, cr_loss=0.4628, attn_decoder_loss=0.2938, over 29440.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1678, cr_loss=0.4048, attn_decoder_loss=0.2653, over 5329822.86 frames. ], batch size: 94, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:24:57,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183040.0, ans=0.125 2024-09-17 08:25:06,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=183040.0, ans=0.2 2024-09-17 08:25:09,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=183040.0, ans=0.125 2024-09-17 08:25:23,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=183080.0, ans=0.125 2024-09-17 08:25:32,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=183120.0, ans=0.0 2024-09-17 08:25:32,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-17 08:25:44,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=183160.0, ans=0.125 2024-09-17 08:25:59,355 INFO [train.py:1198] (0/2) Epoch 11, batch 550, loss[loss=0.2651, ctc_loss=0.1637, cr_loss=0.3914, attn_decoder_loss=0.2677, over 28872.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1681, cr_loss=0.4046, attn_decoder_loss=0.2654, over 5422839.44 frames. ], batch size: 104, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:25:59,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=183200.0, ans=0.125 2024-09-17 08:26:37,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=183280.0, ans=0.07 2024-09-17 08:26:44,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-09-17 08:26:50,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=183320.0, ans=0.025 2024-09-17 08:26:56,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=183320.0, ans=0.5 2024-09-17 08:27:02,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=183360.0, ans=0.0 2024-09-17 08:27:05,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.204e+01 9.712e+01 1.043e+02 1.936e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-17 08:27:14,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183400.0, ans=0.1 2024-09-17 08:27:15,705 INFO [train.py:1198] (0/2) Epoch 11, batch 600, loss[loss=0.2807, ctc_loss=0.1754, cr_loss=0.4224, attn_decoder_loss=0.283, over 29207.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.168, cr_loss=0.405, attn_decoder_loss=0.2656, over 5509685.71 frames. ], batch size: 100, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:27:16,107 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:27:43,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-09-17 08:27:43,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=183440.0, ans=0.125 2024-09-17 08:28:20,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=183560.0, ans=0.0 2024-09-17 08:28:35,708 INFO [train.py:1198] (0/2) Epoch 11, batch 650, loss[loss=0.2686, ctc_loss=0.1709, cr_loss=0.4206, attn_decoder_loss=0.2701, over 29740.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1671, cr_loss=0.4039, attn_decoder_loss=0.2649, over 5587091.60 frames. ], batch size: 81, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:28:36,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-09-17 08:28:56,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.13 vs. limit=22.5 2024-09-17 08:28:57,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=183640.0, ans=0.025 2024-09-17 08:29:21,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=183720.0, ans=0.0 2024-09-17 08:29:36,644 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:29:42,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 9.127e+01 9.643e+01 1.047e+02 1.455e+02, threshold=1.929e+02, percent-clipped=0.0 2024-09-17 08:29:51,506 INFO [train.py:1198] (0/2) Epoch 11, batch 700, loss[loss=0.2618, ctc_loss=0.1655, cr_loss=0.4076, attn_decoder_loss=0.2635, over 29516.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1673, cr_loss=0.4049, attn_decoder_loss=0.2653, over 5636975.61 frames. ], batch size: 76, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:29:53,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=183800.0, ans=0.125 2024-09-17 08:29:54,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=183800.0, ans=0.125 2024-09-17 08:30:05,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=183840.0, ans=0.125 2024-09-17 08:30:26,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=183880.0, ans=0.1 2024-09-17 08:30:35,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=183920.0, ans=0.125 2024-09-17 08:30:39,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2024-09-17 08:30:39,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-09-17 08:30:45,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183920.0, ans=0.1 2024-09-17 08:30:48,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=183920.0, ans=0.125 2024-09-17 08:31:00,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=183960.0, ans=0.0 2024-09-17 08:31:00,569 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:31:08,141 INFO [train.py:1198] (0/2) Epoch 11, batch 750, loss[loss=0.2606, ctc_loss=0.164, cr_loss=0.4162, attn_decoder_loss=0.2621, over 29719.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1675, cr_loss=0.4049, attn_decoder_loss=0.2647, over 5676967.82 frames. ], batch size: 82, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:31:14,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.40 vs. limit=15.0 2024-09-17 08:31:15,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=184000.0, ans=0.02 2024-09-17 08:31:18,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=184000.0, ans=0.125 2024-09-17 08:31:18,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184000.0, ans=0.125 2024-09-17 08:31:20,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=184000.0, ans=0.0 2024-09-17 08:31:24,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184040.0, ans=0.1 2024-09-17 08:31:51,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-09-17 08:32:16,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.265e+01 9.471e+01 1.047e+02 1.151e+02 2.834e+02, threshold=2.094e+02, percent-clipped=4.0 2024-09-17 08:32:27,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-17 08:32:28,008 INFO [train.py:1198] (0/2) Epoch 11, batch 800, loss[loss=0.2381, ctc_loss=0.1432, cr_loss=0.3732, attn_decoder_loss=0.2404, over 29590.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1679, cr_loss=0.4056, attn_decoder_loss=0.2652, over 5708108.86 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 16.0 2024-09-17 08:32:45,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=184240.0, ans=0.125 2024-09-17 08:32:46,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=184240.0, ans=0.2 2024-09-17 08:32:51,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=184240.0, ans=0.1 2024-09-17 08:33:35,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2024-09-17 08:33:35,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=184360.0, ans=0.0 2024-09-17 08:33:42,956 INFO [train.py:1198] (0/2) Epoch 11, batch 850, loss[loss=0.2691, ctc_loss=0.1623, cr_loss=0.3659, attn_decoder_loss=0.2728, over 29726.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1675, cr_loss=0.4045, attn_decoder_loss=0.2649, over 5736575.20 frames. ], batch size: 89, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:33:45,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.17 vs. limit=15.0 2024-09-17 08:33:58,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=184440.0, ans=0.125 2024-09-17 08:34:08,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=184440.0, ans=0.0 2024-09-17 08:34:16,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.53 vs. limit=22.5 2024-09-17 08:34:36,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=184520.0, ans=0.0 2024-09-17 08:34:37,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=184520.0, ans=0.0 2024-09-17 08:34:40,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=184520.0, ans=0.0 2024-09-17 08:34:40,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=184520.0, ans=0.125 2024-09-17 08:34:40,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=184520.0, ans=0.125 2024-09-17 08:34:50,949 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.310e+01 9.399e+01 9.999e+01 1.067e+02 1.963e+02, threshold=2.000e+02, percent-clipped=0.0 2024-09-17 08:34:58,536 INFO [train.py:1198] (0/2) Epoch 11, batch 900, loss[loss=0.2416, ctc_loss=0.1501, cr_loss=0.3887, attn_decoder_loss=0.2431, over 29604.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1678, cr_loss=0.405, attn_decoder_loss=0.265, over 5740937.85 frames. ], batch size: 73, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:35:03,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=184600.0, ans=0.125 2024-09-17 08:35:03,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-17 08:35:06,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=184600.0, ans=0.1 2024-09-17 08:35:49,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=184720.0, ans=0.125 2024-09-17 08:35:49,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=184720.0, ans=0.05 2024-09-17 08:35:52,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=184720.0, ans=0.0 2024-09-17 08:35:55,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=184720.0, ans=0.125 2024-09-17 08:36:16,596 INFO [train.py:1198] (0/2) Epoch 11, batch 950, loss[loss=0.2418, ctc_loss=0.1453, cr_loss=0.3668, attn_decoder_loss=0.2443, over 29502.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1677, cr_loss=0.4041, attn_decoder_loss=0.2652, over 5742525.53 frames. ], batch size: 74, lr: 1.04e-02, grad_scale: 8.0 2024-09-17 08:36:28,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=184800.0, ans=0.0 2024-09-17 08:36:52,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=184880.0, ans=0.125 2024-09-17 08:37:02,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=184920.0, ans=0.5 2024-09-17 08:37:26,871 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 9.626e+01 1.055e+02 1.179e+02 5.157e+02, threshold=2.111e+02, percent-clipped=4.0 2024-09-17 08:37:34,438 INFO [train.py:1198] (0/2) Epoch 11, batch 1000, loss[loss=0.2501, ctc_loss=0.1566, cr_loss=0.397, attn_decoder_loss=0.2517, over 29517.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1685, cr_loss=0.4055, attn_decoder_loss=0.2659, over 5734684.07 frames. ], batch size: 77, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:37:39,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=185000.0, ans=0.0 2024-09-17 08:37:42,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185000.0, ans=0.1 2024-09-17 08:38:02,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185040.0, ans=0.1 2024-09-17 08:38:02,196 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:38:02,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=185040.0, ans=0.0 2024-09-17 08:38:03,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=185080.0, ans=0.025 2024-09-17 08:38:09,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=185080.0, ans=0.125 2024-09-17 08:38:29,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=185120.0, ans=0.125 2024-09-17 08:38:35,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=185160.0, ans=0.04949747468305833 2024-09-17 08:38:50,018 INFO [train.py:1198] (0/2) Epoch 11, batch 1050, loss[loss=0.2785, ctc_loss=0.1798, cr_loss=0.4399, attn_decoder_loss=0.2797, over 29671.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1683, cr_loss=0.4054, attn_decoder_loss=0.2655, over 5744922.05 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 4.0 2024-09-17 08:40:01,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.188e+01 9.776e+01 1.031e+02 2.876e+02, threshold=1.955e+02, percent-clipped=0.0 2024-09-17 08:40:07,595 INFO [train.py:1198] (0/2) Epoch 11, batch 1100, loss[loss=0.2512, ctc_loss=0.1593, cr_loss=0.3903, attn_decoder_loss=0.2527, over 29446.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1679, cr_loss=0.4048, attn_decoder_loss=0.2652, over 5757288.87 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:40:15,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=185400.0, ans=0.125 2024-09-17 08:40:31,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=185440.0, ans=0.125 2024-09-17 08:41:06,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=185520.0, ans=0.0 2024-09-17 08:41:10,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=185560.0, ans=0.125 2024-09-17 08:41:19,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=185560.0, ans=0.0 2024-09-17 08:41:25,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-17 08:41:25,830 INFO [train.py:1198] (0/2) Epoch 11, batch 1150, loss[loss=0.2546, ctc_loss=0.1672, cr_loss=0.4009, attn_decoder_loss=0.2554, over 29435.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1683, cr_loss=0.405, attn_decoder_loss=0.2652, over 5755582.45 frames. ], batch size: 78, lr: 1.03e-02, grad_scale: 4.0 2024-09-17 08:41:39,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2024-09-17 08:41:41,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=185640.0, ans=0.125 2024-09-17 08:41:42,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-09-17 08:41:59,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=185680.0, ans=0.125 2024-09-17 08:42:09,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=185680.0, ans=0.1 2024-09-17 08:42:19,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-17 08:42:25,886 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:42:33,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=151.70 vs. limit=15.0 2024-09-17 08:42:37,684 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.117e+01 9.543e+01 1.008e+02 1.096e+02 1.940e+02, threshold=2.016e+02, percent-clipped=1.0 2024-09-17 08:42:39,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=185760.0, ans=0.125 2024-09-17 08:42:42,184 INFO [train.py:1198] (0/2) Epoch 11, batch 1200, loss[loss=0.2666, ctc_loss=0.1568, cr_loss=0.3987, attn_decoder_loss=0.27, over 29689.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1689, cr_loss=0.4057, attn_decoder_loss=0.2662, over 5747895.58 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:43:05,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=185840.0, ans=0.125 2024-09-17 08:43:21,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=22.5 2024-09-17 08:43:31,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-09-17 08:43:39,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.76 vs. limit=15.0 2024-09-17 08:43:51,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-09-17 08:44:00,288 INFO [train.py:1198] (0/2) Epoch 11, batch 1250, loss[loss=0.2782, ctc_loss=0.182, cr_loss=0.4319, attn_decoder_loss=0.2793, over 29520.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1688, cr_loss=0.4062, attn_decoder_loss=0.2664, over 5775017.11 frames. ], batch size: 92, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:44:05,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-09-17 08:44:09,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=186000.0, ans=0.0 2024-09-17 08:44:20,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.91 vs. limit=22.5 2024-09-17 08:44:39,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=186080.0, ans=0.125 2024-09-17 08:44:47,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2024-09-17 08:44:58,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=186120.0, ans=0.125 2024-09-17 08:45:05,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=186160.0, ans=0.0 2024-09-17 08:45:12,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=186160.0, ans=0.2 2024-09-17 08:45:13,656 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.030e+01 9.338e+01 9.972e+01 1.044e+02 2.073e+02, threshold=1.994e+02, percent-clipped=1.0 2024-09-17 08:45:18,172 INFO [train.py:1198] (0/2) Epoch 11, batch 1300, loss[loss=0.2786, ctc_loss=0.1809, cr_loss=0.4148, attn_decoder_loss=0.2802, over 28134.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1678, cr_loss=0.4046, attn_decoder_loss=0.2655, over 5779399.43 frames. ], batch size: 111, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:45:36,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=186240.0, ans=0.2 2024-09-17 08:45:40,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-09-17 08:46:00,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186280.0, ans=0.0 2024-09-17 08:46:08,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=186320.0, ans=0.125 2024-09-17 08:46:15,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=186320.0, ans=0.0 2024-09-17 08:46:23,508 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:46:34,061 INFO [train.py:1198] (0/2) Epoch 11, batch 1350, loss[loss=0.2747, ctc_loss=0.1757, cr_loss=0.4324, attn_decoder_loss=0.2761, over 29746.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1668, cr_loss=0.4037, attn_decoder_loss=0.2649, over 5796966.46 frames. ], batch size: 81, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:46:37,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=186400.0, ans=0.0 2024-09-17 08:46:38,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2024-09-17 08:46:47,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=186440.0, ans=0.1 2024-09-17 08:47:01,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=186440.0, ans=0.0 2024-09-17 08:47:07,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=186480.0, ans=0.125 2024-09-17 08:47:14,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-17 08:47:46,678 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.722e+01 1.047e+02 1.106e+02 2.453e+02, threshold=2.093e+02, percent-clipped=1.0 2024-09-17 08:47:51,365 INFO [train.py:1198] (0/2) Epoch 11, batch 1400, loss[loss=0.2309, ctc_loss=0.1391, cr_loss=0.356, attn_decoder_loss=0.2332, over 29576.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1669, cr_loss=0.404, attn_decoder_loss=0.265, over 5807559.50 frames. ], batch size: 69, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:47:54,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=186600.0, ans=0.0 2024-09-17 08:48:14,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.70 vs. limit=15.0 2024-09-17 08:48:22,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=186680.0, ans=0.125 2024-09-17 08:48:24,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=186680.0, ans=0.0 2024-09-17 08:48:49,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=186720.0, ans=0.125 2024-09-17 08:49:09,108 INFO [train.py:1198] (0/2) Epoch 11, batch 1450, loss[loss=0.2819, ctc_loss=0.1874, cr_loss=0.4428, attn_decoder_loss=0.2826, over 29458.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1673, cr_loss=0.4046, attn_decoder_loss=0.2655, over 5804797.93 frames. ], batch size: 94, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:49:51,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=186880.0, ans=0.125 2024-09-17 08:50:11,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-09-17 08:50:12,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186960.0, ans=0.1 2024-09-17 08:50:19,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=186960.0, ans=0.025 2024-09-17 08:50:21,159 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.456e+01 1.033e+02 1.133e+02 2.904e+02, threshold=2.066e+02, percent-clipped=2.0 2024-09-17 08:50:24,328 INFO [train.py:1198] (0/2) Epoch 11, batch 1500, loss[loss=0.2863, ctc_loss=0.1849, cr_loss=0.4481, attn_decoder_loss=0.2876, over 29629.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1672, cr_loss=0.4045, attn_decoder_loss=0.2659, over 5805836.85 frames. ], batch size: 86, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:51:01,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=187080.0, ans=0.125 2024-09-17 08:51:14,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=187120.0, ans=0.125 2024-09-17 08:51:14,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=187120.0, ans=0.2 2024-09-17 08:51:20,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=187120.0, ans=0.125 2024-09-17 08:51:43,029 INFO [train.py:1198] (0/2) Epoch 11, batch 1550, loss[loss=0.2814, ctc_loss=0.1761, cr_loss=0.4269, attn_decoder_loss=0.2836, over 29535.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1683, cr_loss=0.4057, attn_decoder_loss=0.2664, over 5781826.35 frames. ], batch size: 90, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:51:46,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=187200.0, ans=0.125 2024-09-17 08:51:52,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=12.0 2024-09-17 08:52:09,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.09 vs. limit=22.5 2024-09-17 08:52:46,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=187360.0, ans=0.0 2024-09-17 08:52:57,784 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.021e+01 9.413e+01 1.008e+02 1.137e+02 5.479e+02, threshold=2.016e+02, percent-clipped=2.0 2024-09-17 08:53:00,778 INFO [train.py:1198] (0/2) Epoch 11, batch 1600, loss[loss=0.2738, ctc_loss=0.1719, cr_loss=0.4048, attn_decoder_loss=0.2761, over 29682.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1681, cr_loss=0.4053, attn_decoder_loss=0.2662, over 5764983.74 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 16.0 2024-09-17 08:53:14,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=187440.0, ans=0.0 2024-09-17 08:53:23,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187440.0, ans=0.1 2024-09-17 08:53:35,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=187480.0, ans=0.0 2024-09-17 08:53:46,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187520.0, ans=0.1 2024-09-17 08:53:48,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=187520.0, ans=0.125 2024-09-17 08:53:49,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=187520.0, ans=0.025 2024-09-17 08:53:49,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=187520.0, ans=0.125 2024-09-17 08:54:16,158 INFO [train.py:1198] (0/2) Epoch 11, batch 1650, loss[loss=0.2871, ctc_loss=0.1844, cr_loss=0.4474, attn_decoder_loss=0.2886, over 29678.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1676, cr_loss=0.4044, attn_decoder_loss=0.2659, over 5760764.09 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:54:24,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=187600.0, ans=0.05 2024-09-17 08:54:25,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=187600.0, ans=0.0 2024-09-17 08:55:00,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-17 08:55:08,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.46 vs. limit=22.5 2024-09-17 08:55:11,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187720.0, ans=0.1 2024-09-17 08:55:30,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=187760.0, ans=0.125 2024-09-17 08:55:32,180 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 9.133e+01 1.002e+02 1.058e+02 1.581e+02, threshold=2.003e+02, percent-clipped=0.0 2024-09-17 08:55:33,701 INFO [train.py:1198] (0/2) Epoch 11, batch 1700, loss[loss=0.2449, ctc_loss=0.1579, cr_loss=0.3898, attn_decoder_loss=0.2459, over 29592.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1675, cr_loss=0.4045, attn_decoder_loss=0.2657, over 5783071.64 frames. ], batch size: 69, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:55:52,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.25 vs. limit=15.0 2024-09-17 08:56:47,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2024-09-17 08:56:51,928 INFO [train.py:1198] (0/2) Epoch 11, batch 1750, loss[loss=0.2377, ctc_loss=0.1619, cr_loss=0.379, attn_decoder_loss=0.2377, over 29333.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1671, cr_loss=0.4037, attn_decoder_loss=0.2655, over 5790035.53 frames. ], batch size: 67, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:57:06,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2024-09-17 08:57:30,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=188080.0, ans=0.125 2024-09-17 08:58:04,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=188160.0, ans=0.0 2024-09-17 08:58:05,833 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.041e+01 9.226e+01 9.775e+01 1.040e+02 1.595e+02, threshold=1.955e+02, percent-clipped=0.0 2024-09-17 08:58:07,329 INFO [train.py:1198] (0/2) Epoch 11, batch 1800, loss[loss=0.2738, ctc_loss=0.176, cr_loss=0.436, attn_decoder_loss=0.2749, over 29670.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1672, cr_loss=0.4036, attn_decoder_loss=0.2656, over 5791899.89 frames. ], batch size: 83, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 08:58:38,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=188280.0, ans=0.1 2024-09-17 08:58:43,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2024-09-17 08:58:52,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=188280.0, ans=0.0 2024-09-17 08:58:55,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-09-17 08:58:57,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=188320.0, ans=0.0 2024-09-17 08:59:06,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.50 vs. limit=10.0 2024-09-17 08:59:12,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-17 08:59:24,941 INFO [train.py:1198] (0/2) Epoch 11, batch 1850, loss[loss=0.2704, ctc_loss=0.1753, cr_loss=0.4003, attn_decoder_loss=0.272, over 29641.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1672, cr_loss=0.4041, attn_decoder_loss=0.2653, over 5797563.27 frames. ], batch size: 86, lr: 1.03e-02, grad_scale: 4.0 2024-09-17 08:59:34,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=188400.0, ans=0.2 2024-09-17 08:59:38,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=188440.0, ans=0.0 2024-09-17 08:59:45,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2024-09-17 08:59:47,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=188440.0, ans=0.125 2024-09-17 08:59:59,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=188480.0, ans=0.125 2024-09-17 08:59:59,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=188480.0, ans=0.125 2024-09-17 09:00:13,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=188520.0, ans=15.0 2024-09-17 09:00:13,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2024-09-17 09:00:18,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=188520.0, ans=0.125 2024-09-17 09:00:30,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=188560.0, ans=22.5 2024-09-17 09:00:41,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=188600.0, ans=0.0 2024-09-17 09:00:42,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.959e+01 9.168e+01 9.699e+01 1.053e+02 1.276e+02, threshold=1.940e+02, percent-clipped=0.0 2024-09-17 09:00:42,161 INFO [train.py:1198] (0/2) Epoch 11, batch 1900, loss[loss=0.2856, ctc_loss=0.1845, cr_loss=0.4469, attn_decoder_loss=0.2869, over 29710.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1675, cr_loss=0.4054, attn_decoder_loss=0.2659, over 5805276.02 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 8.0 2024-09-17 09:01:00,538 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:01:11,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=188680.0, ans=0.125 2024-09-17 09:01:17,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=188680.0, ans=0.0 2024-09-17 09:01:50,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=188760.0, ans=0.125 2024-09-17 09:01:57,962 INFO [train.py:1198] (0/2) Epoch 11, batch 1950, loss[loss=0.2541, ctc_loss=0.1609, cr_loss=0.401, attn_decoder_loss=0.2556, over 29468.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1681, cr_loss=0.4071, attn_decoder_loss=0.2669, over 5819719.28 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 4.0 2024-09-17 09:01:58,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-17 09:02:12,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.19 vs. limit=15.0 2024-09-17 09:02:22,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188840.0, ans=0.1 2024-09-17 09:02:22,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=188840.0, ans=10.0 2024-09-17 09:02:27,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188840.0, ans=0.1 2024-09-17 09:02:27,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=188840.0, ans=0.125 2024-09-17 09:02:30,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2024-09-17 09:02:33,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=188880.0, ans=0.0 2024-09-17 09:02:37,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.97 vs. limit=10.0 2024-09-17 09:02:39,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=188880.0, ans=0.125 2024-09-17 09:02:48,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=188920.0, ans=0.125 2024-09-17 09:02:57,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=188920.0, ans=0.125 2024-09-17 09:02:58,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=188960.0, ans=0.2 2024-09-17 09:02:58,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=188960.0, ans=0.025 2024-09-17 09:03:15,373 INFO [train.py:1198] (0/2) Epoch 11, batch 2000, loss[loss=0.2433, ctc_loss=0.1568, cr_loss=0.3898, attn_decoder_loss=0.2442, over 29347.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1688, cr_loss=0.4071, attn_decoder_loss=0.2675, over 5796372.65 frames. ], batch size: 67, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:03:16,928 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.350e+01 9.444e+01 9.987e+01 1.091e+02 4.605e+02, threshold=1.997e+02, percent-clipped=2.0 2024-09-17 09:03:29,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=189040.0, ans=0.0 2024-09-17 09:03:44,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=189080.0, ans=0.2 2024-09-17 09:04:01,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=189120.0, ans=0.125 2024-09-17 09:04:33,454 INFO [train.py:1198] (0/2) Epoch 11, batch 2050, loss[loss=0.2364, ctc_loss=0.1434, cr_loss=0.3716, attn_decoder_loss=0.2384, over 29430.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.168, cr_loss=0.4055, attn_decoder_loss=0.2664, over 5790097.64 frames. ], batch size: 70, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:04:36,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=189200.0, ans=0.2 2024-09-17 09:04:36,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=189200.0, ans=0.125 2024-09-17 09:04:53,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=189240.0, ans=0.0 2024-09-17 09:04:56,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.23 vs. limit=22.5 2024-09-17 09:05:13,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=189280.0, ans=0.125 2024-09-17 09:05:24,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=189320.0, ans=0.025 2024-09-17 09:05:33,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.10 vs. limit=10.0 2024-09-17 09:05:40,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=189360.0, ans=0.125 2024-09-17 09:05:47,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=189400.0, ans=0.0 2024-09-17 09:05:49,139 INFO [train.py:1198] (0/2) Epoch 11, batch 2100, loss[loss=0.2631, ctc_loss=0.1603, cr_loss=0.3767, attn_decoder_loss=0.2662, over 29785.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1667, cr_loss=0.4044, attn_decoder_loss=0.2653, over 5801026.20 frames. ], batch size: 81, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:05:50,610 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.970e+01 9.676e+01 1.062e+02 4.848e+02, threshold=1.935e+02, percent-clipped=1.0 2024-09-17 09:05:55,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=189400.0, ans=0.125 2024-09-17 09:05:56,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=189400.0, ans=0.125 2024-09-17 09:06:15,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=189440.0, ans=0.125 2024-09-17 09:06:30,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=189480.0, ans=0.125 2024-09-17 09:06:53,097 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:06:53,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-09-17 09:06:57,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=189560.0, ans=0.09899494936611666 2024-09-17 09:06:57,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=189560.0, ans=0.125 2024-09-17 09:06:58,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=189560.0, ans=0.0 2024-09-17 09:06:59,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=189560.0, ans=0.1 2024-09-17 09:07:06,749 INFO [train.py:1198] (0/2) Epoch 11, batch 2150, loss[loss=0.255, ctc_loss=0.152, cr_loss=0.3915, attn_decoder_loss=0.2577, over 29432.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1659, cr_loss=0.4037, attn_decoder_loss=0.2645, over 5815267.35 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:07:08,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=189600.0, ans=0.0 2024-09-17 09:07:08,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=189600.0, ans=0.2 2024-09-17 09:07:11,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.15 vs. limit=22.5 2024-09-17 09:07:48,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=189680.0, ans=0.025 2024-09-17 09:07:58,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=189720.0, ans=0.05 2024-09-17 09:08:08,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=189760.0, ans=0.0 2024-09-17 09:08:24,836 INFO [train.py:1198] (0/2) Epoch 11, batch 2200, loss[loss=0.2764, ctc_loss=0.1762, cr_loss=0.4225, attn_decoder_loss=0.2781, over 29632.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1664, cr_loss=0.4038, attn_decoder_loss=0.2648, over 5812008.77 frames. ], batch size: 86, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:08:26,327 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.251e+01 9.350e+01 9.957e+01 1.083e+02 2.059e+02, threshold=1.991e+02, percent-clipped=1.0 2024-09-17 09:08:43,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189840.0, ans=0.125 2024-09-17 09:08:54,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-09-17 09:09:02,797 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:09:06,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=189880.0, ans=15.0 2024-09-17 09:09:40,357 INFO [train.py:1198] (0/2) Epoch 11, batch 2250, loss[loss=0.2544, ctc_loss=0.1547, cr_loss=0.3873, attn_decoder_loss=0.2568, over 29705.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1663, cr_loss=0.404, attn_decoder_loss=0.2646, over 5811565.07 frames. ], batch size: 82, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:09:40,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=190000.0, ans=0.035 2024-09-17 09:09:54,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-17 09:10:06,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=190040.0, ans=0.2 2024-09-17 09:10:24,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190080.0, ans=0.0 2024-09-17 09:10:32,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=190120.0, ans=0.5 2024-09-17 09:10:45,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2024-09-17 09:10:57,794 INFO [train.py:1198] (0/2) Epoch 11, batch 2300, loss[loss=0.2366, ctc_loss=0.1397, cr_loss=0.3946, attn_decoder_loss=0.2386, over 29335.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1655, cr_loss=0.4025, attn_decoder_loss=0.2635, over 5798975.57 frames. ], batch size: 71, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:10:59,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 9.241e+01 9.973e+01 1.088e+02 2.493e+02, threshold=1.995e+02, percent-clipped=2.0 2024-09-17 09:11:11,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190240.0, ans=0.0 2024-09-17 09:11:14,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=190240.0, ans=0.2 2024-09-17 09:11:17,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=190240.0, ans=0.0 2024-09-17 09:11:23,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=190240.0, ans=0.0 2024-09-17 09:11:25,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=190240.0, ans=0.0 2024-09-17 09:11:42,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=190320.0, ans=10.0 2024-09-17 09:11:49,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190320.0, ans=0.1 2024-09-17 09:11:54,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2024-09-17 09:12:09,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=190360.0, ans=0.0 2024-09-17 09:12:11,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=190360.0, ans=0.2 2024-09-17 09:12:15,946 INFO [train.py:1198] (0/2) Epoch 11, batch 2350, loss[loss=0.276, ctc_loss=0.1777, cr_loss=0.4294, attn_decoder_loss=0.2774, over 29686.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1661, cr_loss=0.4035, attn_decoder_loss=0.2642, over 5804386.65 frames. ], batch size: 83, lr: 1.02e-02, grad_scale: 4.0 2024-09-17 09:12:19,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=190400.0, ans=0.125 2024-09-17 09:12:26,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=190400.0, ans=0.2 2024-09-17 09:12:38,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=190440.0, ans=0.07 2024-09-17 09:12:39,120 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:13:14,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.65 vs. limit=15.0 2024-09-17 09:13:14,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-09-17 09:13:16,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=190560.0, ans=0.125 2024-09-17 09:13:18,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=190560.0, ans=10.0 2024-09-17 09:13:31,664 INFO [train.py:1198] (0/2) Epoch 11, batch 2400, loss[loss=0.254, ctc_loss=0.1557, cr_loss=0.3979, attn_decoder_loss=0.256, over 29531.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1665, cr_loss=0.4042, attn_decoder_loss=0.265, over 5807297.56 frames. ], batch size: 76, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:13:34,606 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.144e+01 9.902e+01 1.071e+02 1.818e+02, threshold=1.980e+02, percent-clipped=0.0 2024-09-17 09:14:01,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=190640.0, ans=0.2 2024-09-17 09:14:11,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=190680.0, ans=0.125 2024-09-17 09:14:21,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=190720.0, ans=0.125 2024-09-17 09:14:38,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=12.0 2024-09-17 09:14:50,026 INFO [train.py:1198] (0/2) Epoch 11, batch 2450, loss[loss=0.2739, ctc_loss=0.1741, cr_loss=0.4132, attn_decoder_loss=0.2758, over 29717.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1674, cr_loss=0.4055, attn_decoder_loss=0.2658, over 5783302.04 frames. ], batch size: 82, lr: 1.02e-02, grad_scale: 4.0 2024-09-17 09:14:57,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-09-17 09:15:03,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=190840.0, ans=0.0 2024-09-17 09:15:33,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=190920.0, ans=0.1 2024-09-17 09:15:35,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=190920.0, ans=0.0 2024-09-17 09:16:00,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=190960.0, ans=0.025 2024-09-17 09:16:00,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2024-09-17 09:16:06,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=191000.0, ans=0.0 2024-09-17 09:16:06,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.61 vs. limit=22.5 2024-09-17 09:16:07,524 INFO [train.py:1198] (0/2) Epoch 11, batch 2500, loss[loss=0.2827, ctc_loss=0.1759, cr_loss=0.3962, attn_decoder_loss=0.2857, over 29611.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1673, cr_loss=0.4058, attn_decoder_loss=0.2658, over 5794113.25 frames. ], batch size: 86, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:16:07,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=191000.0, ans=0.0 2024-09-17 09:16:12,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 9.413e+01 9.956e+01 1.120e+02 1.816e+02, threshold=1.991e+02, percent-clipped=0.0 2024-09-17 09:16:12,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=191000.0, ans=0.0 2024-09-17 09:16:56,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=191120.0, ans=0.2 2024-09-17 09:16:59,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=191120.0, ans=0.0 2024-09-17 09:17:05,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=191120.0, ans=0.2 2024-09-17 09:17:23,755 INFO [train.py:1198] (0/2) Epoch 11, batch 2550, loss[loss=0.239, ctc_loss=0.1487, cr_loss=0.3674, attn_decoder_loss=0.2408, over 29356.00 frames. ], tot_loss[loss=0.264, ctc_loss=0.1674, cr_loss=0.4054, attn_decoder_loss=0.2657, over 5798354.35 frames. ], batch size: 67, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:17:37,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=191240.0, ans=0.0 2024-09-17 09:17:45,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=191240.0, ans=0.0 2024-09-17 09:17:57,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=191280.0, ans=0.0 2024-09-17 09:18:19,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191320.0, ans=0.125 2024-09-17 09:18:19,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=191320.0, ans=0.125 2024-09-17 09:18:39,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-09-17 09:18:41,963 INFO [train.py:1198] (0/2) Epoch 11, batch 2600, loss[loss=0.2658, ctc_loss=0.1698, cr_loss=0.3943, attn_decoder_loss=0.2677, over 29465.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1674, cr_loss=0.4056, attn_decoder_loss=0.2661, over 5794505.11 frames. ], batch size: 78, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:18:45,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=191400.0, ans=0.2 2024-09-17 09:18:46,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.455e+01 1.019e+02 1.112e+02 3.211e+02, threshold=2.037e+02, percent-clipped=2.0 2024-09-17 09:19:20,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.57 vs. limit=15.0 2024-09-17 09:19:34,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=191520.0, ans=0.125 2024-09-17 09:19:47,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.48 vs. limit=10.0 2024-09-17 09:19:59,103 INFO [train.py:1198] (0/2) Epoch 11, batch 2650, loss[loss=0.2808, ctc_loss=0.1799, cr_loss=0.4219, attn_decoder_loss=0.2827, over 29222.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1672, cr_loss=0.4055, attn_decoder_loss=0.2662, over 5801307.17 frames. ], batch size: 100, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:19:59,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=191600.0, ans=0.125 2024-09-17 09:20:06,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=191600.0, ans=0.0 2024-09-17 09:20:22,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191640.0, ans=0.1 2024-09-17 09:20:22,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=191640.0, ans=0.125 2024-09-17 09:20:26,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=191640.0, ans=0.025 2024-09-17 09:20:26,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=191640.0, ans=0.125 2024-09-17 09:20:40,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191680.0, ans=0.125 2024-09-17 09:20:43,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=191720.0, ans=0.125 2024-09-17 09:20:56,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191720.0, ans=0.125 2024-09-17 09:21:14,609 INFO [train.py:1198] (0/2) Epoch 11, batch 2700, loss[loss=0.2816, ctc_loss=0.1671, cr_loss=0.4044, attn_decoder_loss=0.2853, over 29540.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1676, cr_loss=0.4062, attn_decoder_loss=0.2667, over 5796694.53 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:21:20,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.206e+01 9.835e+01 1.075e+02 2.605e+02, threshold=1.967e+02, percent-clipped=2.0 2024-09-17 09:21:20,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=191800.0, ans=0.025 2024-09-17 09:21:20,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=191800.0, ans=0.125 2024-09-17 09:21:21,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=22.5 2024-09-17 09:21:48,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=191880.0, ans=0.0 2024-09-17 09:21:56,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=191880.0, ans=0.0 2024-09-17 09:22:05,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=191920.0, ans=0.0 2024-09-17 09:22:12,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.48 vs. limit=22.5 2024-09-17 09:22:26,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191960.0, ans=0.125 2024-09-17 09:22:31,571 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-48000.pt 2024-09-17 09:22:39,940 INFO [train.py:1198] (0/2) Epoch 11, batch 2750, loss[loss=0.2535, ctc_loss=0.1587, cr_loss=0.4204, attn_decoder_loss=0.2546, over 29511.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1667, cr_loss=0.4043, attn_decoder_loss=0.2654, over 5794957.27 frames. ], batch size: 75, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:22:46,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=192000.0, ans=0.0 2024-09-17 09:22:50,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=192000.0, ans=0.125 2024-09-17 09:22:57,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=12.0 2024-09-17 09:23:17,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=192080.0, ans=0.125 2024-09-17 09:23:32,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=192120.0, ans=0.125 2024-09-17 09:23:40,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=192160.0, ans=0.0 2024-09-17 09:23:42,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=192160.0, ans=0.2 2024-09-17 09:23:44,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-17 09:23:48,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192160.0, ans=0.1 2024-09-17 09:23:57,820 INFO [train.py:1198] (0/2) Epoch 11, batch 2800, loss[loss=0.3007, ctc_loss=0.2352, cr_loss=0.4176, attn_decoder_loss=0.2987, over 20215.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1674, cr_loss=0.4053, attn_decoder_loss=0.2656, over 5775822.31 frames. ], batch size: 211, lr: 1.02e-02, grad_scale: 16.0 2024-09-17 09:23:59,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-09-17 09:24:04,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2024-09-17 09:24:05,067 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.696e+01 8.863e+01 9.648e+01 1.109e+02 4.510e+02, threshold=1.930e+02, percent-clipped=4.0 2024-09-17 09:24:05,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.73 vs. limit=15.0 2024-09-17 09:24:07,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=192200.0, ans=0.0 2024-09-17 09:24:08,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=192200.0, ans=0.0 2024-09-17 09:24:19,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=192240.0, ans=0.2 2024-09-17 09:24:22,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=192240.0, ans=0.0 2024-09-17 09:24:49,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=192320.0, ans=0.125 2024-09-17 09:25:13,021 INFO [train.py:1198] (0/2) Epoch 11, batch 2850, loss[loss=0.2512, ctc_loss=0.1473, cr_loss=0.3796, attn_decoder_loss=0.2543, over 29459.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1682, cr_loss=0.4055, attn_decoder_loss=0.2663, over 5760529.73 frames. ], batch size: 77, lr: 1.02e-02, grad_scale: 8.0 2024-09-17 09:25:15,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.11 vs. limit=10.0 2024-09-17 09:25:19,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=192400.0, ans=0.125 2024-09-17 09:25:25,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=192400.0, ans=0.125 2024-09-17 09:25:54,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=192480.0, ans=0.0 2024-09-17 09:25:57,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=192480.0, ans=0.0 2024-09-17 09:26:25,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=192560.0, ans=0.125 2024-09-17 09:26:30,890 INFO [train.py:1198] (0/2) Epoch 11, batch 2900, loss[loss=0.2475, ctc_loss=0.1531, cr_loss=0.3854, attn_decoder_loss=0.2494, over 29423.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.1682, cr_loss=0.4065, attn_decoder_loss=0.2669, over 5786612.56 frames. ], batch size: 79, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:26:32,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=192600.0, ans=0.0 2024-09-17 09:26:38,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 9.694e+01 1.018e+02 1.122e+02 2.522e+02, threshold=2.035e+02, percent-clipped=2.0 2024-09-17 09:26:48,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2024-09-17 09:26:50,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=192640.0, ans=0.025 2024-09-17 09:27:13,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=192680.0, ans=0.125 2024-09-17 09:27:21,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=192720.0, ans=0.125 2024-09-17 09:27:49,119 INFO [train.py:1198] (0/2) Epoch 11, batch 2950, loss[loss=0.2607, ctc_loss=0.1669, cr_loss=0.4278, attn_decoder_loss=0.2616, over 29511.00 frames. ], tot_loss[loss=0.2637, ctc_loss=0.1667, cr_loss=0.404, attn_decoder_loss=0.2655, over 5782239.33 frames. ], batch size: 75, lr: 1.01e-02, grad_scale: 4.0 2024-09-17 09:27:58,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=192800.0, ans=0.025 2024-09-17 09:28:18,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=192880.0, ans=0.125 2024-09-17 09:28:41,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.09 vs. limit=12.0 2024-09-17 09:29:04,914 INFO [train.py:1198] (0/2) Epoch 11, batch 3000, loss[loss=0.2658, ctc_loss=0.1638, cr_loss=0.3938, attn_decoder_loss=0.2684, over 29756.00 frames. ], tot_loss[loss=0.2636, ctc_loss=0.1666, cr_loss=0.4038, attn_decoder_loss=0.2654, over 5782372.17 frames. ], batch size: 81, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:29:04,915 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 09:29:17,446 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.9760, 4.1393, 4.3561, 4.3107], device='cuda:0') 2024-09-17 09:29:24,076 INFO [train.py:1230] (0/2) Epoch 11, validation: loss=0.2124, ctc_loss=0.04636, cr_loss=4.851e-15, attn_decoder_loss=0.2308, over 944034.00 frames. 2024-09-17 09:29:24,076 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 09:29:33,324 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.274e+01 9.995e+01 1.117e+02 3.922e+02, threshold=1.999e+02, percent-clipped=3.0 2024-09-17 09:30:16,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=193120.0, ans=0.2 2024-09-17 09:30:17,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=193120.0, ans=0.2 2024-09-17 09:30:22,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193120.0, ans=0.1 2024-09-17 09:30:34,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193160.0, ans=0.1 2024-09-17 09:30:39,846 INFO [train.py:1198] (0/2) Epoch 11, batch 3050, loss[loss=0.2577, ctc_loss=0.1717, cr_loss=0.3986, attn_decoder_loss=0.2584, over 29543.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1676, cr_loss=0.4055, attn_decoder_loss=0.2664, over 5777200.52 frames. ], batch size: 76, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:30:52,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=193200.0, ans=0.125 2024-09-17 09:31:17,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=193280.0, ans=0.0 2024-09-17 09:31:48,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=193360.0, ans=0.125 2024-09-17 09:31:51,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-17 09:31:57,177 INFO [train.py:1198] (0/2) Epoch 11, batch 3100, loss[loss=0.2875, ctc_loss=0.1834, cr_loss=0.4555, attn_decoder_loss=0.2889, over 29270.00 frames. ], tot_loss[loss=0.2643, ctc_loss=0.1675, cr_loss=0.4053, attn_decoder_loss=0.266, over 5776739.36 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:32:07,733 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.928e+01 9.888e+01 1.137e+02 1.275e+02 2.184e+02, threshold=2.273e+02, percent-clipped=1.0 2024-09-17 09:32:17,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2024-09-17 09:32:36,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193480.0, ans=0.1 2024-09-17 09:32:42,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193520.0, ans=0.1 2024-09-17 09:32:47,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-09-17 09:32:50,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=193520.0, ans=0.125 2024-09-17 09:33:07,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=193560.0, ans=0.0 2024-09-17 09:33:09,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=193560.0, ans=0.0 2024-09-17 09:33:15,296 INFO [train.py:1198] (0/2) Epoch 11, batch 3150, loss[loss=0.2783, ctc_loss=0.1748, cr_loss=0.4258, attn_decoder_loss=0.2803, over 28956.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1672, cr_loss=0.4052, attn_decoder_loss=0.2659, over 5782795.76 frames. ], batch size: 104, lr: 1.01e-02, grad_scale: 4.0 2024-09-17 09:33:37,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=193640.0, ans=0.125 2024-09-17 09:33:38,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193640.0, ans=0.1 2024-09-17 09:33:45,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=193680.0, ans=0.125 2024-09-17 09:33:47,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.97 vs. limit=22.5 2024-09-17 09:33:57,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=193680.0, ans=0.0 2024-09-17 09:34:10,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.24 vs. limit=10.0 2024-09-17 09:34:15,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=193760.0, ans=0.125 2024-09-17 09:34:21,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=193760.0, ans=0.0 2024-09-17 09:34:30,662 INFO [train.py:1198] (0/2) Epoch 11, batch 3200, loss[loss=0.2712, ctc_loss=0.1708, cr_loss=0.422, attn_decoder_loss=0.273, over 29442.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1668, cr_loss=0.4051, attn_decoder_loss=0.2657, over 5794093.28 frames. ], batch size: 79, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:34:38,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=193800.0, ans=0.025 2024-09-17 09:34:42,601 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.244e+01 9.694e+01 1.030e+02 2.478e+02, threshold=1.939e+02, percent-clipped=1.0 2024-09-17 09:34:45,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.01 vs. limit=15.0 2024-09-17 09:35:01,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=193880.0, ans=0.125 2024-09-17 09:35:10,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=193880.0, ans=0.0 2024-09-17 09:35:18,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193920.0, ans=0.1 2024-09-17 09:35:21,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=193920.0, ans=0.125 2024-09-17 09:35:24,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=193920.0, ans=0.5 2024-09-17 09:35:49,419 INFO [train.py:1198] (0/2) Epoch 11, batch 3250, loss[loss=0.2694, ctc_loss=0.1746, cr_loss=0.4148, attn_decoder_loss=0.2707, over 29708.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1675, cr_loss=0.4065, attn_decoder_loss=0.2664, over 5801643.74 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:35:59,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-09-17 09:36:33,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=194120.0, ans=0.025 2024-09-17 09:36:48,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=194160.0, ans=0.125 2024-09-17 09:36:59,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=194160.0, ans=0.0 2024-09-17 09:37:07,223 INFO [train.py:1198] (0/2) Epoch 11, batch 3300, loss[loss=0.2731, ctc_loss=0.1694, cr_loss=0.3897, attn_decoder_loss=0.276, over 28629.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.166, cr_loss=0.4042, attn_decoder_loss=0.2647, over 5797994.07 frames. ], batch size: 112, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:37:19,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.489e+01 1.035e+02 1.154e+02 2.549e+02, threshold=2.070e+02, percent-clipped=1.0 2024-09-17 09:37:19,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=194200.0, ans=0.025 2024-09-17 09:37:25,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194240.0, ans=0.1 2024-09-17 09:37:51,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-09-17 09:38:22,981 INFO [train.py:1198] (0/2) Epoch 11, batch 3350, loss[loss=0.282, ctc_loss=0.1754, cr_loss=0.404, attn_decoder_loss=0.2849, over 28774.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1665, cr_loss=0.4046, attn_decoder_loss=0.2652, over 5774748.66 frames. ], batch size: 104, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:38:38,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=194440.0, ans=0.0 2024-09-17 09:38:41,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=194440.0, ans=0.07 2024-09-17 09:38:49,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194440.0, ans=0.1 2024-09-17 09:38:58,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=194480.0, ans=0.025 2024-09-17 09:39:05,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=194480.0, ans=0.125 2024-09-17 09:39:12,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=8.0 2024-09-17 09:39:27,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=194560.0, ans=0.125 2024-09-17 09:39:28,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=194560.0, ans=0.025 2024-09-17 09:39:40,726 INFO [train.py:1198] (0/2) Epoch 11, batch 3400, loss[loss=0.2408, ctc_loss=0.1485, cr_loss=0.3977, attn_decoder_loss=0.2422, over 29323.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1671, cr_loss=0.4045, attn_decoder_loss=0.2655, over 5768936.10 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:39:51,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=22.5 2024-09-17 09:39:52,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 9.232e+01 1.004e+02 1.095e+02 3.484e+02, threshold=2.008e+02, percent-clipped=1.0 2024-09-17 09:40:14,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2024-09-17 09:40:32,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=194720.0, ans=0.125 2024-09-17 09:40:58,438 INFO [train.py:1198] (0/2) Epoch 11, batch 3450, loss[loss=0.2754, ctc_loss=0.1807, cr_loss=0.426, attn_decoder_loss=0.2765, over 28404.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1668, cr_loss=0.4054, attn_decoder_loss=0.2656, over 5775925.10 frames. ], batch size: 111, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:41:03,436 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:41:32,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=194880.0, ans=22.5 2024-09-17 09:41:34,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2024-09-17 09:41:48,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=194920.0, ans=0.2 2024-09-17 09:41:59,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=194960.0, ans=0.0 2024-09-17 09:42:11,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=194960.0, ans=0.125 2024-09-17 09:42:12,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=22.5 2024-09-17 09:42:13,890 INFO [train.py:1198] (0/2) Epoch 11, batch 3500, loss[loss=0.2243, ctc_loss=0.13, cr_loss=0.3367, attn_decoder_loss=0.2273, over 29325.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1665, cr_loss=0.4047, attn_decoder_loss=0.2649, over 5777987.30 frames. ], batch size: 71, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:42:14,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-17 09:42:19,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.28 vs. limit=15.0 2024-09-17 09:42:26,191 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.959e+01 9.210e+01 9.867e+01 1.123e+02 1.745e+02, threshold=1.973e+02, percent-clipped=0.0 2024-09-17 09:42:26,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=195000.0, ans=0.125 2024-09-17 09:42:29,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=195040.0, ans=0.2 2024-09-17 09:42:31,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=12.0 2024-09-17 09:43:14,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=195160.0, ans=0.2 2024-09-17 09:43:29,167 INFO [train.py:1198] (0/2) Epoch 11, batch 3550, loss[loss=0.2607, ctc_loss=0.1563, cr_loss=0.3877, attn_decoder_loss=0.2636, over 29702.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.166, cr_loss=0.4035, attn_decoder_loss=0.2646, over 5783763.48 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:43:57,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-09-17 09:43:58,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=195240.0, ans=0.1 2024-09-17 09:44:08,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=195280.0, ans=0.025 2024-09-17 09:44:24,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195320.0, ans=0.1 2024-09-17 09:44:32,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=195360.0, ans=0.125 2024-09-17 09:44:37,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-17 09:44:41,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=195360.0, ans=0.1 2024-09-17 09:44:42,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=195360.0, ans=0.0 2024-09-17 09:44:45,311 INFO [train.py:1198] (0/2) Epoch 11, batch 3600, loss[loss=0.2563, ctc_loss=0.1612, cr_loss=0.396, attn_decoder_loss=0.2581, over 29503.00 frames. ], tot_loss[loss=0.2627, ctc_loss=0.1657, cr_loss=0.4032, attn_decoder_loss=0.2646, over 5793398.36 frames. ], batch size: 77, lr: 1.01e-02, grad_scale: 16.0 2024-09-17 09:44:47,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=195400.0, ans=0.0 2024-09-17 09:44:48,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=195400.0, ans=0.0 2024-09-17 09:44:58,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.269e+01 9.046e+01 9.949e+01 1.066e+02 3.484e+02, threshold=1.990e+02, percent-clipped=1.0 2024-09-17 09:45:04,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=195440.0, ans=0.0 2024-09-17 09:45:10,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=195440.0, ans=0.0 2024-09-17 09:45:24,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=195480.0, ans=0.125 2024-09-17 09:45:28,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=195520.0, ans=0.0 2024-09-17 09:45:37,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=195520.0, ans=0.125 2024-09-17 09:45:39,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-17 09:45:44,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=12.0 2024-09-17 09:45:49,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=195560.0, ans=0.1 2024-09-17 09:45:59,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=195600.0, ans=0.2 2024-09-17 09:46:01,226 INFO [train.py:1198] (0/2) Epoch 11, batch 3650, loss[loss=0.2818, ctc_loss=0.1807, cr_loss=0.4038, attn_decoder_loss=0.284, over 29513.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.165, cr_loss=0.4019, attn_decoder_loss=0.264, over 5795049.37 frames. ], batch size: 90, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:46:01,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=195600.0, ans=0.125 2024-09-17 09:46:09,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-09-17 09:46:14,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=195640.0, ans=0.2 2024-09-17 09:46:20,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=195640.0, ans=0.2 2024-09-17 09:46:23,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=195640.0, ans=0.0 2024-09-17 09:46:43,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.60 vs. limit=15.0 2024-09-17 09:46:49,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195720.0, ans=0.0 2024-09-17 09:47:03,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=195760.0, ans=0.0 2024-09-17 09:47:05,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=195760.0, ans=0.0 2024-09-17 09:47:15,840 INFO [train.py:1198] (0/2) Epoch 11, batch 3700, loss[loss=0.2684, ctc_loss=0.1689, cr_loss=0.3907, attn_decoder_loss=0.2708, over 29709.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1653, cr_loss=0.4025, attn_decoder_loss=0.2643, over 5806028.67 frames. ], batch size: 84, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:47:20,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=195800.0, ans=0.0 2024-09-17 09:47:29,199 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.197e+01 9.899e+01 1.076e+02 2.230e+02, threshold=1.980e+02, percent-clipped=1.0 2024-09-17 09:47:29,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=195840.0, ans=0.125 2024-09-17 09:47:39,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2024-09-17 09:47:43,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-09-17 09:48:10,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=195920.0, ans=0.025 2024-09-17 09:48:22,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=195960.0, ans=0.0 2024-09-17 09:48:30,211 INFO [train.py:1198] (0/2) Epoch 11, batch 3750, loss[loss=0.2371, ctc_loss=0.1461, cr_loss=0.3559, attn_decoder_loss=0.2393, over 29359.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.165, cr_loss=0.4017, attn_decoder_loss=0.264, over 5809283.93 frames. ], batch size: 67, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:48:49,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196040.0, ans=0.1 2024-09-17 09:49:10,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=196080.0, ans=0.1 2024-09-17 09:49:11,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-17 09:49:34,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=196160.0, ans=0.125 2024-09-17 09:49:36,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-17 09:49:44,089 INFO [train.py:1198] (0/2) Epoch 11, batch 3800, loss[loss=0.2594, ctc_loss=0.1631, cr_loss=0.3918, attn_decoder_loss=0.2615, over 29620.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1646, cr_loss=0.401, attn_decoder_loss=0.2635, over 5799597.88 frames. ], batch size: 86, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:49:47,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=196200.0, ans=0.2 2024-09-17 09:49:54,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=196200.0, ans=0.125 2024-09-17 09:49:57,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.230e+01 9.992e+01 1.078e+02 1.190e+02 1.793e+02, threshold=2.156e+02, percent-clipped=0.0 2024-09-17 09:50:03,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=196240.0, ans=0.125 2024-09-17 09:50:36,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=196320.0, ans=0.125 2024-09-17 09:50:38,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.34 vs. limit=15.0 2024-09-17 09:51:00,333 INFO [train.py:1198] (0/2) Epoch 11, batch 3850, loss[loss=0.272, ctc_loss=0.1713, cr_loss=0.4005, attn_decoder_loss=0.2743, over 29268.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1644, cr_loss=0.4008, attn_decoder_loss=0.2633, over 5812701.42 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 8.0 2024-09-17 09:51:14,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=22.5 2024-09-17 09:51:26,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.29 vs. limit=15.0 2024-09-17 09:51:58,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=196520.0, ans=0.125 2024-09-17 09:51:58,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=196520.0, ans=0.125 2024-09-17 09:52:05,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=196560.0, ans=0.025 2024-09-17 09:52:16,108 INFO [train.py:1198] (0/2) Epoch 11, batch 3900, loss[loss=0.2813, ctc_loss=0.1814, cr_loss=0.4221, attn_decoder_loss=0.283, over 29620.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1649, cr_loss=0.402, attn_decoder_loss=0.2638, over 5816717.29 frames. ], batch size: 86, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:52:25,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=196600.0, ans=0.09899494936611666 2024-09-17 09:52:29,224 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.236e+01 9.717e+01 1.078e+02 1.405e+02, threshold=1.943e+02, percent-clipped=0.0 2024-09-17 09:52:51,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=196680.0, ans=0.0 2024-09-17 09:53:30,389 INFO [train.py:1198] (0/2) Epoch 11, batch 3950, loss[loss=0.2718, ctc_loss=0.17, cr_loss=0.4286, attn_decoder_loss=0.2735, over 29438.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1645, cr_loss=0.402, attn_decoder_loss=0.2638, over 5835963.12 frames. ], batch size: 97, lr: 1.00e-02, grad_scale: 4.0 2024-09-17 09:54:04,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=196880.0, ans=0.125 2024-09-17 09:54:16,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=196920.0, ans=0.0 2024-09-17 09:54:35,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=196960.0, ans=0.0 2024-09-17 09:54:43,897 INFO [train.py:1198] (0/2) Epoch 11, batch 4000, loss[loss=0.2555, ctc_loss=0.1645, cr_loss=0.4011, attn_decoder_loss=0.2567, over 29503.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1649, cr_loss=0.402, attn_decoder_loss=0.2639, over 5813709.51 frames. ], batch size: 74, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:54:47,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2024-09-17 09:54:58,574 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.223e+01 9.139e+01 9.851e+01 1.070e+02 1.973e+02, threshold=1.970e+02, percent-clipped=1.0 2024-09-17 09:54:58,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=197040.0, ans=0.025 2024-09-17 09:55:03,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=197040.0, ans=0.0 2024-09-17 09:55:20,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=197080.0, ans=0.125 2024-09-17 09:55:24,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=197080.0, ans=0.2 2024-09-17 09:55:38,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=197120.0, ans=0.0 2024-09-17 09:55:52,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197160.0, ans=0.1 2024-09-17 09:55:59,402 INFO [train.py:1198] (0/2) Epoch 11, batch 4050, loss[loss=0.3126, ctc_loss=0.2353, cr_loss=0.4227, attn_decoder_loss=0.3118, over 19807.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1648, cr_loss=0.4013, attn_decoder_loss=0.2636, over 5796320.19 frames. ], batch size: 210, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:56:02,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=197200.0, ans=0.125 2024-09-17 09:56:08,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=197200.0, ans=0.125 2024-09-17 09:56:16,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=197240.0, ans=0.0 2024-09-17 09:56:31,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-17 09:56:43,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=197320.0, ans=0.2 2024-09-17 09:56:47,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=197320.0, ans=0.2 2024-09-17 09:56:55,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=197320.0, ans=0.125 2024-09-17 09:57:09,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=197360.0, ans=0.125 2024-09-17 09:57:12,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=197400.0, ans=0.0 2024-09-17 09:57:13,997 INFO [train.py:1198] (0/2) Epoch 11, batch 4100, loss[loss=0.2738, ctc_loss=0.1673, cr_loss=0.4028, attn_decoder_loss=0.2767, over 29541.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1647, cr_loss=0.4013, attn_decoder_loss=0.2638, over 5792508.45 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:57:17,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=197400.0, ans=0.1 2024-09-17 09:57:30,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 9.399e+01 1.013e+02 1.118e+02 3.429e+02, threshold=2.026e+02, percent-clipped=2.0 2024-09-17 09:57:42,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=197480.0, ans=0.2 2024-09-17 09:57:55,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=197480.0, ans=0.2 2024-09-17 09:58:05,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=197520.0, ans=0.0 2024-09-17 09:58:05,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=197520.0, ans=0.125 2024-09-17 09:58:22,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2024-09-17 09:58:28,185 INFO [train.py:1198] (0/2) Epoch 11, batch 4150, loss[loss=0.2611, ctc_loss=0.1755, cr_loss=0.4162, attn_decoder_loss=0.2613, over 29495.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.165, cr_loss=0.4021, attn_decoder_loss=0.2636, over 5798602.36 frames. ], batch size: 77, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:58:57,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2024-09-17 09:59:34,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=197760.0, ans=0.025 2024-09-17 09:59:37,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=197760.0, ans=0.0 2024-09-17 09:59:42,929 INFO [train.py:1198] (0/2) Epoch 11, batch 4200, loss[loss=0.2771, ctc_loss=0.1836, cr_loss=0.4513, attn_decoder_loss=0.2775, over 29515.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.1653, cr_loss=0.4028, attn_decoder_loss=0.2641, over 5800223.46 frames. ], batch size: 90, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 09:59:59,216 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 8.983e+01 9.678e+01 1.042e+02 2.526e+02, threshold=1.936e+02, percent-clipped=1.0 2024-09-17 09:59:59,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=197840.0, ans=0.125 2024-09-17 10:00:22,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.49 vs. limit=10.0 2024-09-17 10:00:31,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=197920.0, ans=0.125 2024-09-17 10:00:57,507 INFO [train.py:1198] (0/2) Epoch 11, batch 4250, loss[loss=0.2517, ctc_loss=0.1547, cr_loss=0.3908, attn_decoder_loss=0.2538, over 29517.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1651, cr_loss=0.4026, attn_decoder_loss=0.2643, over 5805909.02 frames. ], batch size: 74, lr: 1.00e-02, grad_scale: 4.0 2024-09-17 10:01:21,209 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:02:02,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=198160.0, ans=15.0 2024-09-17 10:02:09,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=198200.0, ans=0.125 2024-09-17 10:02:11,149 INFO [train.py:1198] (0/2) Epoch 11, batch 4300, loss[loss=0.2852, ctc_loss=0.1843, cr_loss=0.4495, attn_decoder_loss=0.2864, over 29541.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1655, cr_loss=0.4039, attn_decoder_loss=0.2648, over 5794604.47 frames. ], batch size: 87, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 10:02:19,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=198200.0, ans=0.125 2024-09-17 10:02:19,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=198200.0, ans=0.125 2024-09-17 10:02:23,969 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-17 10:02:29,144 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.197e+01 9.375e+01 1.006e+02 1.083e+02 2.279e+02, threshold=2.011e+02, percent-clipped=1.0 2024-09-17 10:02:33,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=198240.0, ans=0.025 2024-09-17 10:03:13,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=12.0 2024-09-17 10:03:17,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=198360.0, ans=0.025 2024-09-17 10:03:20,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-17 10:03:23,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-09-17 10:03:27,241 INFO [train.py:1198] (0/2) Epoch 11, batch 4350, loss[loss=0.284, ctc_loss=0.1819, cr_loss=0.4354, attn_decoder_loss=0.2856, over 29471.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1686, cr_loss=0.4084, attn_decoder_loss=0.2684, over 5796074.70 frames. ], batch size: 97, lr: 1.00e-02, grad_scale: 8.0 2024-09-17 10:03:27,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=198400.0, ans=0.125 2024-09-17 10:03:29,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=198400.0, ans=0.2 2024-09-17 10:03:35,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=198400.0, ans=0.025 2024-09-17 10:03:40,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=12.0 2024-09-17 10:03:42,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=198440.0, ans=0.125 2024-09-17 10:03:59,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.12 vs. limit=12.0 2024-09-17 10:04:18,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198520.0, ans=0.1 2024-09-17 10:04:19,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-09-17 10:04:24,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=198560.0, ans=0.125 2024-09-17 10:04:26,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=198560.0, ans=10.0 2024-09-17 10:04:31,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=198560.0, ans=0.0 2024-09-17 10:04:40,225 INFO [train.py:1198] (0/2) Epoch 11, batch 4400, loss[loss=0.2749, ctc_loss=0.1854, cr_loss=0.41, attn_decoder_loss=0.2757, over 27294.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1704, cr_loss=0.411, attn_decoder_loss=0.2706, over 5766234.78 frames. ], batch size: 124, lr: 1.00e-02, grad_scale: 16.0 2024-09-17 10:04:49,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=198600.0, ans=0.0 2024-09-17 10:04:56,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=198640.0, ans=0.0 2024-09-17 10:04:59,277 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.516e+01 9.761e+01 1.030e+02 1.162e+02 9.107e+02, threshold=2.060e+02, percent-clipped=3.0 2024-09-17 10:05:30,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2024-09-17 10:05:40,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=198760.0, ans=0.125 2024-09-17 10:05:54,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-09-17 10:05:55,218 INFO [train.py:1198] (0/2) Epoch 11, batch 4450, loss[loss=0.2922, ctc_loss=0.2213, cr_loss=0.4232, attn_decoder_loss=0.2907, over 19948.00 frames. ], tot_loss[loss=0.2724, ctc_loss=0.1765, cr_loss=0.416, attn_decoder_loss=0.2738, over 5570670.87 frames. ], batch size: 209, lr: 9.99e-03, grad_scale: 8.0 2024-09-17 10:05:57,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=198800.0, ans=0.0 2024-09-17 10:05:57,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-09-17 10:06:23,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-09-17 10:06:34,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=198880.0, ans=0.125 2024-09-17 10:06:43,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=198920.0, ans=0.125 2024-09-17 10:06:45,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=198920.0, ans=0.0 2024-09-17 10:06:45,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=198920.0, ans=0.125 2024-09-17 10:06:45,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198920.0, ans=0.125 2024-09-17 10:07:08,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=198960.0, ans=0.0 2024-09-17 10:07:08,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=198960.0, ans=0.0 2024-09-17 10:07:11,094 INFO [train.py:1198] (0/2) Epoch 11, batch 4500, loss[loss=0.2945, ctc_loss=0.2246, cr_loss=0.4396, attn_decoder_loss=0.2925, over 20357.00 frames. ], tot_loss[loss=0.276, ctc_loss=0.1831, cr_loss=0.4184, attn_decoder_loss=0.277, over 5228962.96 frames. ], batch size: 210, lr: 9.99e-03, grad_scale: 8.0 2024-09-17 10:07:11,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=199000.0, ans=0.125 2024-09-17 10:07:21,754 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:07:31,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.514e+01 1.082e+02 1.141e+02 1.239e+02 5.446e+02, threshold=2.282e+02, percent-clipped=2.0 2024-09-17 10:07:47,669 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-11.pt 2024-09-17 10:08:38,959 INFO [train.py:1198] (0/2) Epoch 12, batch 0, loss[loss=0.2504, ctc_loss=0.1528, cr_loss=0.3984, attn_decoder_loss=0.2524, over 29594.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1528, cr_loss=0.3984, attn_decoder_loss=0.2524, over 29594.00 frames. ], batch size: 73, lr: 9.56e-03, grad_scale: 16.0 2024-09-17 10:08:38,960 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 10:08:46,432 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.0889, 3.9492, 3.6319, 3.4240], device='cuda:0') 2024-09-17 10:08:57,356 INFO [train.py:1230] (0/2) Epoch 12, validation: loss=0.2149, ctc_loss=0.04611, cr_loss=4.481e-15, attn_decoder_loss=0.2337, over 944034.00 frames. 2024-09-17 10:08:57,357 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 10:09:08,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=199100.0, ans=0.0 2024-09-17 10:09:32,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=199180.0, ans=0.1 2024-09-17 10:09:50,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=199220.0, ans=15.0 2024-09-17 10:09:54,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=199220.0, ans=10.0 2024-09-17 10:09:58,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=199260.0, ans=0.0 2024-09-17 10:10:13,553 INFO [train.py:1198] (0/2) Epoch 12, batch 50, loss[loss=0.2316, ctc_loss=0.1392, cr_loss=0.3646, attn_decoder_loss=0.2338, over 29436.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1689, cr_loss=0.4098, attn_decoder_loss=0.267, over 1267159.47 frames. ], batch size: 70, lr: 9.56e-03, grad_scale: 8.0 2024-09-17 10:10:32,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=199340.0, ans=0.1 2024-09-17 10:10:37,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=199340.0, ans=0.125 2024-09-17 10:10:40,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=199340.0, ans=0.0 2024-09-17 10:10:44,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=199380.0, ans=0.0 2024-09-17 10:10:50,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=199380.0, ans=0.125 2024-09-17 10:11:16,127 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.442e+01 9.517e+01 1.012e+02 1.140e+02 5.609e+02, threshold=2.023e+02, percent-clipped=2.0 2024-09-17 10:11:33,306 INFO [train.py:1198] (0/2) Epoch 12, batch 100, loss[loss=0.2563, ctc_loss=0.1647, cr_loss=0.3945, attn_decoder_loss=0.2577, over 29569.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1686, cr_loss=0.4084, attn_decoder_loss=0.2674, over 2250980.15 frames. ], batch size: 76, lr: 9.56e-03, grad_scale: 8.0 2024-09-17 10:11:38,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-17 10:11:50,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=199540.0, ans=0.1 2024-09-17 10:11:56,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=199540.0, ans=0.125 2024-09-17 10:11:59,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=199540.0, ans=0.0 2024-09-17 10:12:03,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=199580.0, ans=0.0 2024-09-17 10:12:47,661 INFO [train.py:1198] (0/2) Epoch 12, batch 150, loss[loss=0.238, ctc_loss=0.14, cr_loss=0.3718, attn_decoder_loss=0.2407, over 29446.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1656, cr_loss=0.4038, attn_decoder_loss=0.2651, over 3046390.88 frames. ], batch size: 70, lr: 9.55e-03, grad_scale: 8.0 2024-09-17 10:12:52,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199700.0, ans=0.1 2024-09-17 10:13:04,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=199740.0, ans=0.125 2024-09-17 10:13:07,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=199740.0, ans=0.125 2024-09-17 10:13:34,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=199820.0, ans=0.0 2024-09-17 10:13:34,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199820.0, ans=0.1 2024-09-17 10:13:36,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.02 vs. limit=10.0 2024-09-17 10:13:40,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=199820.0, ans=0.125 2024-09-17 10:13:41,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=199820.0, ans=0.0 2024-09-17 10:13:41,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=199820.0, ans=0.2 2024-09-17 10:13:47,582 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.054e+01 9.523e+01 1.007e+02 1.391e+02, threshold=1.905e+02, percent-clipped=0.0 2024-09-17 10:13:49,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=22.5 2024-09-17 10:13:51,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-17 10:13:52,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=199860.0, ans=0.0 2024-09-17 10:13:53,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=199860.0, ans=0.0 2024-09-17 10:13:56,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=199860.0, ans=0.025 2024-09-17 10:14:02,683 INFO [train.py:1198] (0/2) Epoch 12, batch 200, loss[loss=0.2827, ctc_loss=0.1858, cr_loss=0.4394, attn_decoder_loss=0.2837, over 27407.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1641, cr_loss=0.4027, attn_decoder_loss=0.264, over 3657176.50 frames. ], batch size: 124, lr: 9.55e-03, grad_scale: 8.0 2024-09-17 10:14:04,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=199900.0, ans=0.125 2024-09-17 10:14:32,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=199940.0, ans=10.0 2024-09-17 10:14:38,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=199980.0, ans=0.05 2024-09-17 10:14:43,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=199980.0, ans=0.125 2024-09-17 10:14:45,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.35 vs. limit=12.0 2024-09-17 10:14:57,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-17 10:15:01,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=200020.0, ans=0.125 2024-09-17 10:15:20,793 INFO [train.py:1198] (0/2) Epoch 12, batch 250, loss[loss=0.2695, ctc_loss=0.1643, cr_loss=0.3994, attn_decoder_loss=0.2723, over 29249.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1634, cr_loss=0.4032, attn_decoder_loss=0.2636, over 4139777.95 frames. ], batch size: 100, lr: 9.54e-03, grad_scale: 8.0 2024-09-17 10:15:28,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=200100.0, ans=0.05 2024-09-17 10:15:35,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=200100.0, ans=0.125 2024-09-17 10:15:37,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2024-09-17 10:15:54,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-09-17 10:16:23,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.074e+01 9.707e+01 1.061e+02 3.060e+02, threshold=1.941e+02, percent-clipped=1.0 2024-09-17 10:16:28,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=200260.0, ans=0.025 2024-09-17 10:16:37,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200300.0, ans=0.125 2024-09-17 10:16:38,674 INFO [train.py:1198] (0/2) Epoch 12, batch 300, loss[loss=0.2725, ctc_loss=0.1766, cr_loss=0.4138, attn_decoder_loss=0.274, over 29543.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1639, cr_loss=0.4035, attn_decoder_loss=0.2637, over 4507694.14 frames. ], batch size: 92, lr: 9.54e-03, grad_scale: 8.0 2024-09-17 10:17:31,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=200420.0, ans=0.09899494936611666 2024-09-17 10:17:32,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.24 vs. limit=15.0 2024-09-17 10:17:54,250 INFO [train.py:1198] (0/2) Epoch 12, batch 350, loss[loss=0.2424, ctc_loss=0.1381, cr_loss=0.3731, attn_decoder_loss=0.2457, over 29278.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1643, cr_loss=0.4039, attn_decoder_loss=0.2642, over 4793592.44 frames. ], batch size: 71, lr: 9.53e-03, grad_scale: 8.0 2024-09-17 10:17:57,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200500.0, ans=0.0 2024-09-17 10:18:31,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=200580.0, ans=0.0 2024-09-17 10:18:45,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-17 10:18:56,615 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 9.289e+01 1.000e+02 1.114e+02 4.401e+02, threshold=2.000e+02, percent-clipped=4.0 2024-09-17 10:19:02,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2024-09-17 10:19:11,730 INFO [train.py:1198] (0/2) Epoch 12, batch 400, loss[loss=0.2707, ctc_loss=0.175, cr_loss=0.4476, attn_decoder_loss=0.2714, over 29696.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1637, cr_loss=0.4034, attn_decoder_loss=0.2638, over 5022792.67 frames. ], batch size: 82, lr: 9.53e-03, grad_scale: 16.0 2024-09-17 10:19:11,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=200700.0, ans=0.125 2024-09-17 10:19:15,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=200700.0, ans=0.5 2024-09-17 10:19:15,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=200700.0, ans=0.125 2024-09-17 10:19:16,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=200700.0, ans=0.025 2024-09-17 10:19:18,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=200700.0, ans=0.2 2024-09-17 10:19:35,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=200740.0, ans=0.0 2024-09-17 10:19:55,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=200780.0, ans=0.0 2024-09-17 10:20:03,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200820.0, ans=0.0 2024-09-17 10:20:16,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=200860.0, ans=0.0 2024-09-17 10:20:18,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=200860.0, ans=0.125 2024-09-17 10:20:26,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=200860.0, ans=0.125 2024-09-17 10:20:30,351 INFO [train.py:1198] (0/2) Epoch 12, batch 450, loss[loss=0.2779, ctc_loss=0.1795, cr_loss=0.4294, attn_decoder_loss=0.2793, over 29695.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.164, cr_loss=0.4034, attn_decoder_loss=0.264, over 5186766.53 frames. ], batch size: 83, lr: 9.52e-03, grad_scale: 8.0 2024-09-17 10:20:49,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=200940.0, ans=0.125 2024-09-17 10:20:50,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=200940.0, ans=0.125 2024-09-17 10:20:56,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=200940.0, ans=0.0 2024-09-17 10:20:58,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=200940.0, ans=0.125 2024-09-17 10:20:59,734 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:21:16,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201020.0, ans=0.125 2024-09-17 10:21:30,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=201060.0, ans=0.0 2024-09-17 10:21:32,802 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.842e+01 9.090e+01 9.686e+01 1.023e+02 4.799e+02, threshold=1.937e+02, percent-clipped=1.0 2024-09-17 10:21:40,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-09-17 10:21:46,302 INFO [train.py:1198] (0/2) Epoch 12, batch 500, loss[loss=0.2764, ctc_loss=0.1762, cr_loss=0.4326, attn_decoder_loss=0.2779, over 29445.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1634, cr_loss=0.4026, attn_decoder_loss=0.2632, over 5329922.58 frames. ], batch size: 94, lr: 9.52e-03, grad_scale: 8.0 2024-09-17 10:21:57,408 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:22:01,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=201140.0, ans=0.0 2024-09-17 10:22:13,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=201140.0, ans=0.125 2024-09-17 10:22:16,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-17 10:22:20,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=201180.0, ans=0.125 2024-09-17 10:22:26,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201180.0, ans=0.1 2024-09-17 10:22:52,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=201260.0, ans=15.0 2024-09-17 10:22:56,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=201260.0, ans=0.2 2024-09-17 10:23:03,902 INFO [train.py:1198] (0/2) Epoch 12, batch 550, loss[loss=0.2745, ctc_loss=0.1771, cr_loss=0.4369, attn_decoder_loss=0.2756, over 28794.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1635, cr_loss=0.4029, attn_decoder_loss=0.2633, over 5421586.34 frames. ], batch size: 104, lr: 9.51e-03, grad_scale: 4.0 2024-09-17 10:23:05,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201300.0, ans=0.125 2024-09-17 10:23:46,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=201380.0, ans=0.125 2024-09-17 10:24:01,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201420.0, ans=0.125 2024-09-17 10:24:02,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2024-09-17 10:24:06,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=201460.0, ans=0.125 2024-09-17 10:24:06,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-09-17 10:24:09,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=201460.0, ans=6.0 2024-09-17 10:24:10,193 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.800e+01 9.023e+01 9.660e+01 1.069e+02 2.891e+02, threshold=1.932e+02, percent-clipped=2.0 2024-09-17 10:24:10,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=201460.0, ans=0.0 2024-09-17 10:24:18,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=201460.0, ans=0.125 2024-09-17 10:24:22,362 INFO [train.py:1198] (0/2) Epoch 12, batch 600, loss[loss=0.2733, ctc_loss=0.1716, cr_loss=0.3822, attn_decoder_loss=0.2761, over 29254.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1638, cr_loss=0.4029, attn_decoder_loss=0.2639, over 5509166.16 frames. ], batch size: 100, lr: 9.51e-03, grad_scale: 8.0 2024-09-17 10:24:28,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=201500.0, ans=0.1 2024-09-17 10:24:40,757 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:25:18,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=201620.0, ans=0.05 2024-09-17 10:25:23,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201660.0, ans=0.1 2024-09-17 10:25:23,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201660.0, ans=0.125 2024-09-17 10:25:28,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=201660.0, ans=0.2 2024-09-17 10:25:38,287 INFO [train.py:1198] (0/2) Epoch 12, batch 650, loss[loss=0.256, ctc_loss=0.1615, cr_loss=0.3886, attn_decoder_loss=0.2578, over 29752.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1626, cr_loss=0.4011, attn_decoder_loss=0.263, over 5586415.78 frames. ], batch size: 81, lr: 9.50e-03, grad_scale: 8.0 2024-09-17 10:26:23,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=201780.0, ans=0.0 2024-09-17 10:26:23,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=201780.0, ans=0.125 2024-09-17 10:26:29,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=201820.0, ans=0.125 2024-09-17 10:26:32,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=201820.0, ans=0.0 2024-09-17 10:26:35,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=201820.0, ans=0.2 2024-09-17 10:26:43,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-09-17 10:26:43,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.946e+01 9.154e+01 9.702e+01 1.037e+02 1.837e+02, threshold=1.940e+02, percent-clipped=0.0 2024-09-17 10:26:56,072 INFO [train.py:1198] (0/2) Epoch 12, batch 700, loss[loss=0.2513, ctc_loss=0.1543, cr_loss=0.4142, attn_decoder_loss=0.2529, over 29520.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1629, cr_loss=0.4014, attn_decoder_loss=0.2633, over 5634595.34 frames. ], batch size: 76, lr: 9.50e-03, grad_scale: 8.0 2024-09-17 10:27:03,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=201900.0, ans=0.0 2024-09-17 10:27:11,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=201940.0, ans=0.0 2024-09-17 10:27:12,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-09-17 10:27:19,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-09-17 10:27:23,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=201940.0, ans=0.0 2024-09-17 10:27:31,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201980.0, ans=0.125 2024-09-17 10:27:31,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=201980.0, ans=0.2 2024-09-17 10:27:43,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=202020.0, ans=0.2 2024-09-17 10:27:44,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2024-09-17 10:27:54,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=202020.0, ans=0.025 2024-09-17 10:28:03,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=202060.0, ans=0.125 2024-09-17 10:28:14,116 INFO [train.py:1198] (0/2) Epoch 12, batch 750, loss[loss=0.2721, ctc_loss=0.1715, cr_loss=0.4226, attn_decoder_loss=0.2739, over 29718.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1624, cr_loss=0.4004, attn_decoder_loss=0.2627, over 5673991.79 frames. ], batch size: 82, lr: 9.49e-03, grad_scale: 8.0 2024-09-17 10:28:29,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=202140.0, ans=0.125 2024-09-17 10:28:36,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=202140.0, ans=0.125 2024-09-17 10:28:39,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202140.0, ans=0.1 2024-09-17 10:28:55,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202180.0, ans=0.1 2024-09-17 10:29:07,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=202220.0, ans=0.125 2024-09-17 10:29:11,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202220.0, ans=0.1 2024-09-17 10:29:17,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.082e+01 9.622e+01 1.037e+02 1.120e+02 2.104e+02, threshold=2.074e+02, percent-clipped=1.0 2024-09-17 10:29:29,713 INFO [train.py:1198] (0/2) Epoch 12, batch 800, loss[loss=0.2373, ctc_loss=0.1343, cr_loss=0.3651, attn_decoder_loss=0.2406, over 29584.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1623, cr_loss=0.4002, attn_decoder_loss=0.2627, over 5704209.08 frames. ], batch size: 73, lr: 9.49e-03, grad_scale: 16.0 2024-09-17 10:30:08,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=202380.0, ans=0.2 2024-09-17 10:30:47,505 INFO [train.py:1198] (0/2) Epoch 12, batch 850, loss[loss=0.278, ctc_loss=0.1773, cr_loss=0.443, attn_decoder_loss=0.2793, over 29708.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1614, cr_loss=0.3993, attn_decoder_loss=0.2621, over 5734490.50 frames. ], batch size: 89, lr: 9.49e-03, grad_scale: 4.0 2024-09-17 10:31:03,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=15.0 2024-09-17 10:31:38,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=202620.0, ans=0.07 2024-09-17 10:31:55,954 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.132e+01 9.370e+01 1.044e+02 1.217e+02 3.517e+02, threshold=2.088e+02, percent-clipped=3.0 2024-09-17 10:32:04,986 INFO [train.py:1198] (0/2) Epoch 12, batch 900, loss[loss=0.2437, ctc_loss=0.1531, cr_loss=0.3867, attn_decoder_loss=0.2452, over 29592.00 frames. ], tot_loss[loss=0.2606, ctc_loss=0.1621, cr_loss=0.4004, attn_decoder_loss=0.2627, over 5738378.15 frames. ], batch size: 73, lr: 9.48e-03, grad_scale: 8.0 2024-09-17 10:32:32,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.47 vs. limit=15.0 2024-09-17 10:32:33,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=202780.0, ans=0.125 2024-09-17 10:33:02,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=202820.0, ans=0.025 2024-09-17 10:33:20,313 INFO [train.py:1198] (0/2) Epoch 12, batch 950, loss[loss=0.2435, ctc_loss=0.1466, cr_loss=0.3897, attn_decoder_loss=0.2456, over 29545.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1627, cr_loss=0.4018, attn_decoder_loss=0.2633, over 5740413.56 frames. ], batch size: 74, lr: 9.48e-03, grad_scale: 8.0 2024-09-17 10:33:20,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=202900.0, ans=0.125 2024-09-17 10:33:33,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202940.0, ans=0.1 2024-09-17 10:34:19,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-17 10:34:28,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.408e+01 9.367e+01 1.035e+02 1.151e+02 3.076e+02, threshold=2.071e+02, percent-clipped=5.0 2024-09-17 10:34:37,634 INFO [train.py:1198] (0/2) Epoch 12, batch 1000, loss[loss=0.2508, ctc_loss=0.1562, cr_loss=0.3762, attn_decoder_loss=0.253, over 29484.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1638, cr_loss=0.4023, attn_decoder_loss=0.2638, over 5734450.83 frames. ], batch size: 77, lr: 9.47e-03, grad_scale: 8.0 2024-09-17 10:35:03,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=203140.0, ans=0.125 2024-09-17 10:35:55,928 INFO [train.py:1198] (0/2) Epoch 12, batch 1050, loss[loss=0.271, ctc_loss=0.1713, cr_loss=0.4121, attn_decoder_loss=0.273, over 29673.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1629, cr_loss=0.4011, attn_decoder_loss=0.2631, over 5741932.12 frames. ], batch size: 85, lr: 9.47e-03, grad_scale: 4.0 2024-09-17 10:36:03,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=203300.0, ans=0.0 2024-09-17 10:36:10,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=203340.0, ans=0.125 2024-09-17 10:36:15,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.15 vs. limit=22.5 2024-09-17 10:36:25,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=203380.0, ans=0.125 2024-09-17 10:36:26,661 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:36:38,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=203380.0, ans=0.2 2024-09-17 10:37:04,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.982e+01 9.127e+01 9.739e+01 1.067e+02 1.550e+02, threshold=1.948e+02, percent-clipped=0.0 2024-09-17 10:37:06,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-09-17 10:37:11,894 INFO [train.py:1198] (0/2) Epoch 12, batch 1100, loss[loss=0.2558, ctc_loss=0.1587, cr_loss=0.4178, attn_decoder_loss=0.2573, over 29434.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1623, cr_loss=0.4006, attn_decoder_loss=0.2625, over 5754298.05 frames. ], batch size: 78, lr: 9.46e-03, grad_scale: 8.0 2024-09-17 10:37:13,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=203500.0, ans=0.125 2024-09-17 10:37:27,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=203540.0, ans=0.125 2024-09-17 10:37:46,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=203580.0, ans=0.025 2024-09-17 10:37:46,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=203580.0, ans=0.0 2024-09-17 10:38:03,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=203620.0, ans=0.0 2024-09-17 10:38:10,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=203620.0, ans=0.125 2024-09-17 10:38:30,356 INFO [train.py:1198] (0/2) Epoch 12, batch 1150, loss[loss=0.2634, ctc_loss=0.1646, cr_loss=0.4047, attn_decoder_loss=0.2654, over 29457.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1625, cr_loss=0.4002, attn_decoder_loss=0.2627, over 5753240.27 frames. ], batch size: 78, lr: 9.46e-03, grad_scale: 8.0 2024-09-17 10:38:36,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203700.0, ans=0.1 2024-09-17 10:38:47,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=12.0 2024-09-17 10:38:53,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=203740.0, ans=0.125 2024-09-17 10:38:59,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=203780.0, ans=0.0 2024-09-17 10:39:01,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-17 10:39:03,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-17 10:39:05,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=203780.0, ans=0.125 2024-09-17 10:39:30,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=203820.0, ans=0.0 2024-09-17 10:39:40,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.985e+01 9.529e+01 1.043e+02 1.150e+02 3.679e+02, threshold=2.085e+02, percent-clipped=2.0 2024-09-17 10:39:48,418 INFO [train.py:1198] (0/2) Epoch 12, batch 1200, loss[loss=0.2561, ctc_loss=0.1529, cr_loss=0.3975, attn_decoder_loss=0.2587, over 29660.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1628, cr_loss=0.4006, attn_decoder_loss=0.2632, over 5746732.94 frames. ], batch size: 85, lr: 9.45e-03, grad_scale: 16.0 2024-09-17 10:39:50,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=203900.0, ans=0.0 2024-09-17 10:39:54,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=203900.0, ans=0.125 2024-09-17 10:40:36,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=204020.0, ans=0.125 2024-09-17 10:41:04,794 INFO [train.py:1198] (0/2) Epoch 12, batch 1250, loss[loss=0.2682, ctc_loss=0.1659, cr_loss=0.4034, attn_decoder_loss=0.2706, over 29523.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1629, cr_loss=0.4014, attn_decoder_loss=0.2637, over 5774692.62 frames. ], batch size: 92, lr: 9.45e-03, grad_scale: 8.0 2024-09-17 10:41:45,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=204180.0, ans=0.125 2024-09-17 10:41:45,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-09-17 10:41:58,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=204220.0, ans=0.025 2024-09-17 10:42:08,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-09-17 10:42:15,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=204260.0, ans=0.125 2024-09-17 10:42:16,783 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.687e+01 9.051e+01 9.485e+01 1.007e+02 2.061e+02, threshold=1.897e+02, percent-clipped=0.0 2024-09-17 10:42:22,709 INFO [train.py:1198] (0/2) Epoch 12, batch 1300, loss[loss=0.2712, ctc_loss=0.1634, cr_loss=0.3919, attn_decoder_loss=0.2745, over 28166.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1627, cr_loss=0.4017, attn_decoder_loss=0.2633, over 5779840.78 frames. ], batch size: 111, lr: 9.44e-03, grad_scale: 8.0 2024-09-17 10:42:50,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=204340.0, ans=0.5 2024-09-17 10:42:57,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=12.0 2024-09-17 10:43:02,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=204380.0, ans=0.04949747468305833 2024-09-17 10:43:04,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.71 vs. limit=15.0 2024-09-17 10:43:20,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=204420.0, ans=0.125 2024-09-17 10:43:39,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=204500.0, ans=0.125 2024-09-17 10:43:40,857 INFO [train.py:1198] (0/2) Epoch 12, batch 1350, loss[loss=0.2588, ctc_loss=0.1571, cr_loss=0.3895, attn_decoder_loss=0.2615, over 29769.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1621, cr_loss=0.401, attn_decoder_loss=0.2629, over 5797949.68 frames. ], batch size: 81, lr: 9.44e-03, grad_scale: 8.0 2024-09-17 10:44:22,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=204580.0, ans=0.07 2024-09-17 10:44:49,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 9.112e+01 9.581e+01 1.019e+02 1.292e+02, threshold=1.916e+02, percent-clipped=0.0 2024-09-17 10:44:55,537 INFO [train.py:1198] (0/2) Epoch 12, batch 1400, loss[loss=0.2214, ctc_loss=0.1276, cr_loss=0.3526, attn_decoder_loss=0.224, over 29573.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1621, cr_loss=0.4015, attn_decoder_loss=0.263, over 5808011.04 frames. ], batch size: 69, lr: 9.44e-03, grad_scale: 8.0 2024-09-17 10:45:01,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=204700.0, ans=0.5 2024-09-17 10:45:14,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.82 vs. limit=15.0 2024-09-17 10:45:15,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=204740.0, ans=0.125 2024-09-17 10:45:30,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2024-09-17 10:45:54,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=204820.0, ans=0.2 2024-09-17 10:46:06,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=204860.0, ans=0.0 2024-09-17 10:46:13,815 INFO [train.py:1198] (0/2) Epoch 12, batch 1450, loss[loss=0.2775, ctc_loss=0.1758, cr_loss=0.4179, attn_decoder_loss=0.2795, over 29467.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1623, cr_loss=0.4015, attn_decoder_loss=0.2632, over 5804863.14 frames. ], batch size: 94, lr: 9.43e-03, grad_scale: 4.0 2024-09-17 10:46:22,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.37 vs. limit=10.0 2024-09-17 10:46:27,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=204940.0, ans=0.125 2024-09-17 10:46:40,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-17 10:46:55,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-17 10:47:08,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=205020.0, ans=0.0 2024-09-17 10:47:09,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=205020.0, ans=0.125 2024-09-17 10:47:14,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=205060.0, ans=0.0 2024-09-17 10:47:25,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=205060.0, ans=0.025 2024-09-17 10:47:26,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.748e+01 9.404e+01 1.003e+02 1.073e+02 8.206e+02, threshold=2.005e+02, percent-clipped=2.0 2024-09-17 10:47:28,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=205060.0, ans=0.025 2024-09-17 10:47:30,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205100.0, ans=0.1 2024-09-17 10:47:31,567 INFO [train.py:1198] (0/2) Epoch 12, batch 1500, loss[loss=0.2694, ctc_loss=0.1632, cr_loss=0.3906, attn_decoder_loss=0.2725, over 29633.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1623, cr_loss=0.4013, attn_decoder_loss=0.2634, over 5805494.17 frames. ], batch size: 86, lr: 9.43e-03, grad_scale: 8.0 2024-09-17 10:47:33,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=205100.0, ans=0.125 2024-09-17 10:47:38,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205100.0, ans=0.1 2024-09-17 10:47:42,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=205100.0, ans=0.0 2024-09-17 10:47:47,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.09 vs. limit=15.0 2024-09-17 10:47:56,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=205140.0, ans=0.125 2024-09-17 10:48:07,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=205180.0, ans=0.2 2024-09-17 10:48:22,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.67 vs. limit=15.0 2024-09-17 10:48:35,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=205260.0, ans=0.0 2024-09-17 10:48:47,594 INFO [train.py:1198] (0/2) Epoch 12, batch 1550, loss[loss=0.2766, ctc_loss=0.174, cr_loss=0.4268, attn_decoder_loss=0.2785, over 29505.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1628, cr_loss=0.4023, attn_decoder_loss=0.2637, over 5782574.07 frames. ], batch size: 90, lr: 9.42e-03, grad_scale: 8.0 2024-09-17 10:48:47,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=205300.0, ans=0.125 2024-09-17 10:48:48,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=205300.0, ans=0.125 2024-09-17 10:49:23,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2024-09-17 10:49:25,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2024-09-17 10:49:48,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205460.0, ans=0.1 2024-09-17 10:49:59,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205460.0, ans=0.1 2024-09-17 10:50:01,909 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.435e+01 1.085e+02 1.264e+02 1.596e+02, threshold=2.170e+02, percent-clipped=0.0 2024-09-17 10:50:04,927 INFO [train.py:1198] (0/2) Epoch 12, batch 1600, loss[loss=0.2671, ctc_loss=0.1645, cr_loss=0.3928, attn_decoder_loss=0.2698, over 29679.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1631, cr_loss=0.4024, attn_decoder_loss=0.2638, over 5764157.60 frames. ], batch size: 85, lr: 9.42e-03, grad_scale: 8.0 2024-09-17 10:50:05,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2024-09-17 10:50:09,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=205500.0, ans=0.0 2024-09-17 10:50:10,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-09-17 10:50:46,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=205580.0, ans=0.0 2024-09-17 10:50:48,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=205580.0, ans=0.125 2024-09-17 10:51:05,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.26 vs. limit=15.0 2024-09-17 10:51:06,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-09-17 10:51:08,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205660.0, ans=0.1 2024-09-17 10:51:14,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=205660.0, ans=0.0 2024-09-17 10:51:15,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-09-17 10:51:19,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=205660.0, ans=0.07 2024-09-17 10:51:23,146 INFO [train.py:1198] (0/2) Epoch 12, batch 1650, loss[loss=0.274, ctc_loss=0.1665, cr_loss=0.414, attn_decoder_loss=0.2768, over 29670.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.163, cr_loss=0.4022, attn_decoder_loss=0.2635, over 5758245.06 frames. ], batch size: 89, lr: 9.41e-03, grad_scale: 4.0 2024-09-17 10:51:42,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=205740.0, ans=0.1 2024-09-17 10:51:53,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=205780.0, ans=0.125 2024-09-17 10:52:10,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=205820.0, ans=0.2 2024-09-17 10:52:36,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.373e+01 9.988e+01 1.108e+02 2.072e+02, threshold=1.998e+02, percent-clipped=0.0 2024-09-17 10:52:38,333 INFO [train.py:1198] (0/2) Epoch 12, batch 1700, loss[loss=0.2294, ctc_loss=0.1332, cr_loss=0.3624, attn_decoder_loss=0.232, over 29610.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1627, cr_loss=0.4022, attn_decoder_loss=0.2634, over 5780361.23 frames. ], batch size: 69, lr: 9.41e-03, grad_scale: 8.0 2024-09-17 10:52:46,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=205900.0, ans=0.2 2024-09-17 10:53:28,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=206020.0, ans=0.95 2024-09-17 10:53:39,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206060.0, ans=0.1 2024-09-17 10:53:46,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=206060.0, ans=0.125 2024-09-17 10:53:49,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.49 vs. limit=15.0 2024-09-17 10:53:51,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=206060.0, ans=0.0 2024-09-17 10:53:55,950 INFO [train.py:1198] (0/2) Epoch 12, batch 1750, loss[loss=0.2294, ctc_loss=0.1376, cr_loss=0.3611, attn_decoder_loss=0.2316, over 29323.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.162, cr_loss=0.4009, attn_decoder_loss=0.2628, over 5788488.71 frames. ], batch size: 67, lr: 9.40e-03, grad_scale: 8.0 2024-09-17 10:54:00,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=206100.0, ans=0.1 2024-09-17 10:54:18,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=206140.0, ans=0.125 2024-09-17 10:54:39,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=206220.0, ans=0.0 2024-09-17 10:54:44,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=206220.0, ans=0.0 2024-09-17 10:55:11,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.922e+01 9.583e+01 1.012e+02 1.403e+02, threshold=1.917e+02, percent-clipped=0.0 2024-09-17 10:55:13,154 INFO [train.py:1198] (0/2) Epoch 12, batch 1800, loss[loss=0.2784, ctc_loss=0.1733, cr_loss=0.4454, attn_decoder_loss=0.2802, over 29689.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1623, cr_loss=0.4016, attn_decoder_loss=0.263, over 5790504.00 frames. ], batch size: 83, lr: 9.40e-03, grad_scale: 8.0 2024-09-17 10:55:16,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=206300.0, ans=0.2 2024-09-17 10:55:22,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=206300.0, ans=0.07 2024-09-17 10:55:31,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206340.0, ans=0.1 2024-09-17 10:55:31,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=206340.0, ans=0.125 2024-09-17 10:55:33,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=12.0 2024-09-17 10:55:34,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=206340.0, ans=0.0 2024-09-17 10:55:54,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=206380.0, ans=0.1 2024-09-17 10:55:57,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=206420.0, ans=0.025 2024-09-17 10:55:57,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=206420.0, ans=0.125 2024-09-17 10:56:05,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=206420.0, ans=0.125 2024-09-17 10:56:15,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=206460.0, ans=0.125 2024-09-17 10:56:29,237 INFO [train.py:1198] (0/2) Epoch 12, batch 1850, loss[loss=0.2641, ctc_loss=0.1591, cr_loss=0.4073, attn_decoder_loss=0.2667, over 29638.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1617, cr_loss=0.4004, attn_decoder_loss=0.2626, over 5795036.01 frames. ], batch size: 86, lr: 9.40e-03, grad_scale: 8.0 2024-09-17 10:56:50,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=206540.0, ans=0.125 2024-09-17 10:56:57,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=206540.0, ans=0.125 2024-09-17 10:57:26,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-17 10:57:32,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.41 vs. limit=22.5 2024-09-17 10:57:44,672 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 9.093e+01 9.711e+01 1.054e+02 1.569e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-17 10:57:46,184 INFO [train.py:1198] (0/2) Epoch 12, batch 1900, loss[loss=0.269, ctc_loss=0.1731, cr_loss=0.3999, attn_decoder_loss=0.2708, over 29677.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1619, cr_loss=0.4007, attn_decoder_loss=0.2631, over 5802388.24 frames. ], batch size: 89, lr: 9.39e-03, grad_scale: 8.0 2024-09-17 10:58:01,653 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:58:09,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=206740.0, ans=0.0 2024-09-17 10:58:13,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=206740.0, ans=0.05 2024-09-17 10:58:26,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206780.0, ans=0.1 2024-09-17 10:58:51,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=206860.0, ans=0.0 2024-09-17 10:59:01,592 INFO [train.py:1198] (0/2) Epoch 12, batch 1950, loss[loss=0.263, ctc_loss=0.1621, cr_loss=0.4187, attn_decoder_loss=0.2649, over 29439.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1626, cr_loss=0.4028, attn_decoder_loss=0.2642, over 5817328.13 frames. ], batch size: 78, lr: 9.39e-03, grad_scale: 8.0 2024-09-17 10:59:30,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-09-17 10:59:31,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-09-17 10:59:37,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.73 vs. limit=22.5 2024-09-17 10:59:43,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=206980.0, ans=0.125 2024-09-17 10:59:48,237 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2024-09-17 10:59:50,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=207020.0, ans=0.125 2024-09-17 10:59:52,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=207020.0, ans=0.025 2024-09-17 11:00:17,490 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 9.308e+01 9.738e+01 1.045e+02 2.594e+02, threshold=1.948e+02, percent-clipped=1.0 2024-09-17 11:00:19,010 INFO [train.py:1198] (0/2) Epoch 12, batch 2000, loss[loss=0.2281, ctc_loss=0.1375, cr_loss=0.3511, attn_decoder_loss=0.2304, over 29297.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1637, cr_loss=0.4036, attn_decoder_loss=0.2649, over 5795539.78 frames. ], batch size: 67, lr: 9.38e-03, grad_scale: 16.0 2024-09-17 11:00:21,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-09-17 11:00:53,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=207180.0, ans=0.125 2024-09-17 11:01:02,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=207180.0, ans=0.125 2024-09-17 11:01:05,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=207220.0, ans=0.0 2024-09-17 11:01:07,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=207220.0, ans=0.2 2024-09-17 11:01:37,086 INFO [train.py:1198] (0/2) Epoch 12, batch 2050, loss[loss=0.2403, ctc_loss=0.1439, cr_loss=0.3844, attn_decoder_loss=0.2425, over 29432.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1629, cr_loss=0.4017, attn_decoder_loss=0.2639, over 5788049.78 frames. ], batch size: 70, lr: 9.38e-03, grad_scale: 4.0 2024-09-17 11:01:43,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207300.0, ans=0.1 2024-09-17 11:02:12,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=207380.0, ans=0.125 2024-09-17 11:02:16,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-17 11:02:18,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-09-17 11:02:19,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=207380.0, ans=0.1 2024-09-17 11:02:36,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.76 vs. limit=15.0 2024-09-17 11:02:39,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=207460.0, ans=0.125 2024-09-17 11:02:42,710 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:02:44,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-09-17 11:02:48,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=207460.0, ans=0.05 2024-09-17 11:02:52,887 INFO [train.py:1198] (0/2) Epoch 12, batch 2100, loss[loss=0.265, ctc_loss=0.164, cr_loss=0.3953, attn_decoder_loss=0.2674, over 29766.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1621, cr_loss=0.4009, attn_decoder_loss=0.263, over 5799784.53 frames. ], batch size: 81, lr: 9.37e-03, grad_scale: 8.0 2024-09-17 11:02:54,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.176e+01 9.560e+01 1.030e+02 1.406e+02, threshold=1.912e+02, percent-clipped=0.0 2024-09-17 11:03:04,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=207500.0, ans=0.125 2024-09-17 11:03:21,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207540.0, ans=0.1 2024-09-17 11:03:51,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=207620.0, ans=0.2 2024-09-17 11:04:00,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=207660.0, ans=0.04949747468305833 2024-09-17 11:04:01,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207660.0, ans=0.125 2024-09-17 11:04:02,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=207660.0, ans=0.125 2024-09-17 11:04:09,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=207700.0, ans=0.125 2024-09-17 11:04:10,887 INFO [train.py:1198] (0/2) Epoch 12, batch 2150, loss[loss=0.2778, ctc_loss=0.1791, cr_loss=0.4628, attn_decoder_loss=0.2784, over 29452.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1613, cr_loss=0.3992, attn_decoder_loss=0.2625, over 5815169.54 frames. ], batch size: 78, lr: 9.37e-03, grad_scale: 4.0 2024-09-17 11:04:35,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=207740.0, ans=0.0 2024-09-17 11:04:58,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207820.0, ans=0.1 2024-09-17 11:04:59,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=207820.0, ans=0.0 2024-09-17 11:05:01,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=207820.0, ans=0.04949747468305833 2024-09-17 11:05:28,908 INFO [train.py:1198] (0/2) Epoch 12, batch 2200, loss[loss=0.2617, ctc_loss=0.172, cr_loss=0.404, attn_decoder_loss=0.2626, over 29639.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1611, cr_loss=0.3993, attn_decoder_loss=0.2623, over 5811610.08 frames. ], batch size: 86, lr: 9.36e-03, grad_scale: 8.0 2024-09-17 11:05:31,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 9.159e+01 9.816e+01 1.050e+02 6.382e+02, threshold=1.963e+02, percent-clipped=1.0 2024-09-17 11:05:32,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=207900.0, ans=15.0 2024-09-17 11:05:42,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=207940.0, ans=0.0 2024-09-17 11:06:05,712 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-52000.pt 2024-09-17 11:06:16,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=207980.0, ans=0.125 2024-09-17 11:06:30,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=12.0 2024-09-17 11:06:49,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=208060.0, ans=0.0 2024-09-17 11:06:52,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-09-17 11:06:52,587 INFO [train.py:1198] (0/2) Epoch 12, batch 2250, loss[loss=0.2733, ctc_loss=0.1784, cr_loss=0.4018, attn_decoder_loss=0.2749, over 29699.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1609, cr_loss=0.3987, attn_decoder_loss=0.2621, over 5809694.74 frames. ], batch size: 82, lr: 9.36e-03, grad_scale: 8.0 2024-09-17 11:06:52,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=208100.0, ans=0.125 2024-09-17 11:06:56,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2024-09-17 11:07:15,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=208140.0, ans=0.125 2024-09-17 11:07:34,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=208180.0, ans=0.125 2024-09-17 11:07:37,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=208180.0, ans=0.125 2024-09-17 11:07:37,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=12.0 2024-09-17 11:08:10,390 INFO [train.py:1198] (0/2) Epoch 12, batch 2300, loss[loss=0.2434, ctc_loss=0.1455, cr_loss=0.3803, attn_decoder_loss=0.2458, over 29345.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1609, cr_loss=0.3984, attn_decoder_loss=0.2614, over 5797100.49 frames. ], batch size: 71, lr: 9.36e-03, grad_scale: 8.0 2024-09-17 11:08:13,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 9.033e+01 9.553e+01 1.076e+02 7.023e+02, threshold=1.911e+02, percent-clipped=3.0 2024-09-17 11:08:32,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2024-09-17 11:08:44,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=208380.0, ans=0.125 2024-09-17 11:08:50,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-09-17 11:08:56,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=208420.0, ans=0.125 2024-09-17 11:08:58,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.84 vs. limit=10.0 2024-09-17 11:09:01,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=208420.0, ans=0.125 2024-09-17 11:09:14,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=208460.0, ans=0.125 2024-09-17 11:09:28,438 INFO [train.py:1198] (0/2) Epoch 12, batch 2350, loss[loss=0.2754, ctc_loss=0.1712, cr_loss=0.4265, attn_decoder_loss=0.2775, over 29690.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1615, cr_loss=0.3991, attn_decoder_loss=0.2618, over 5802717.16 frames. ], batch size: 83, lr: 9.35e-03, grad_scale: 8.0 2024-09-17 11:09:30,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-09-17 11:09:33,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=208500.0, ans=0.0 2024-09-17 11:10:03,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=208580.0, ans=0.0 2024-09-17 11:10:10,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=208580.0, ans=0.125 2024-09-17 11:10:18,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=208620.0, ans=0.0 2024-09-17 11:10:39,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=208660.0, ans=0.0 2024-09-17 11:10:43,835 INFO [train.py:1198] (0/2) Epoch 12, batch 2400, loss[loss=0.2627, ctc_loss=0.1693, cr_loss=0.4045, attn_decoder_loss=0.2641, over 29549.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1615, cr_loss=0.3993, attn_decoder_loss=0.2622, over 5807264.33 frames. ], batch size: 76, lr: 9.35e-03, grad_scale: 16.0 2024-09-17 11:10:49,768 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.246e+01 9.641e+01 1.033e+02 3.378e+02, threshold=1.928e+02, percent-clipped=1.0 2024-09-17 11:11:02,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=208740.0, ans=0.2 2024-09-17 11:11:44,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=208820.0, ans=0.05 2024-09-17 11:12:02,486 INFO [train.py:1198] (0/2) Epoch 12, batch 2450, loss[loss=0.265, ctc_loss=0.1576, cr_loss=0.4003, attn_decoder_loss=0.268, over 29713.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1627, cr_loss=0.4015, attn_decoder_loss=0.2634, over 5785247.68 frames. ], batch size: 82, lr: 9.34e-03, grad_scale: 4.0 2024-09-17 11:12:02,900 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:12:04,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=208900.0, ans=0.125 2024-09-17 11:12:16,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=208940.0, ans=0.0 2024-09-17 11:12:31,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=208940.0, ans=0.0 2024-09-17 11:12:36,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-17 11:12:37,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=208980.0, ans=0.125 2024-09-17 11:12:54,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=209020.0, ans=0.125 2024-09-17 11:13:09,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=209060.0, ans=0.125 2024-09-17 11:13:11,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209060.0, ans=0.125 2024-09-17 11:13:18,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-09-17 11:13:19,879 INFO [train.py:1198] (0/2) Epoch 12, batch 2500, loss[loss=0.2758, ctc_loss=0.176, cr_loss=0.4058, attn_decoder_loss=0.2779, over 29628.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1627, cr_loss=0.4017, attn_decoder_loss=0.2633, over 5795523.80 frames. ], batch size: 86, lr: 9.34e-03, grad_scale: 8.0 2024-09-17 11:13:25,836 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.198e+01 9.207e+01 9.738e+01 1.065e+02 1.820e+02, threshold=1.948e+02, percent-clipped=0.0 2024-09-17 11:13:33,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=209140.0, ans=0.125 2024-09-17 11:13:35,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=209140.0, ans=0.0 2024-09-17 11:14:04,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=209220.0, ans=0.2 2024-09-17 11:14:05,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=209220.0, ans=0.025 2024-09-17 11:14:06,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2024-09-17 11:14:29,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.05 vs. limit=15.0 2024-09-17 11:14:29,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=209260.0, ans=0.125 2024-09-17 11:14:36,063 INFO [train.py:1198] (0/2) Epoch 12, batch 2550, loss[loss=0.2298, ctc_loss=0.1367, cr_loss=0.3709, attn_decoder_loss=0.2319, over 29362.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1622, cr_loss=0.4004, attn_decoder_loss=0.2632, over 5797980.27 frames. ], batch size: 67, lr: 9.33e-03, grad_scale: 8.0 2024-09-17 11:14:53,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2024-09-17 11:14:56,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-17 11:14:56,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-17 11:15:35,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=209420.0, ans=0.0 2024-09-17 11:15:37,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=209460.0, ans=0.125 2024-09-17 11:15:47,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=209460.0, ans=0.0 2024-09-17 11:15:49,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2024-09-17 11:15:53,672 INFO [train.py:1198] (0/2) Epoch 12, batch 2600, loss[loss=0.2415, ctc_loss=0.1368, cr_loss=0.366, attn_decoder_loss=0.245, over 29457.00 frames. ], tot_loss[loss=0.2614, ctc_loss=0.1625, cr_loss=0.4015, attn_decoder_loss=0.2635, over 5794233.94 frames. ], batch size: 78, lr: 9.33e-03, grad_scale: 8.0 2024-09-17 11:15:56,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=209500.0, ans=0.125 2024-09-17 11:16:01,133 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.249e+01 9.008e+01 9.501e+01 1.038e+02 1.745e+02, threshold=1.900e+02, percent-clipped=0.0 2024-09-17 11:16:01,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=209500.0, ans=0.0 2024-09-17 11:16:02,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2024-09-17 11:16:13,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=209540.0, ans=0.2 2024-09-17 11:16:23,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.10 vs. limit=15.0 2024-09-17 11:16:29,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=209580.0, ans=0.0 2024-09-17 11:16:53,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=209620.0, ans=0.2 2024-09-17 11:16:58,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=209660.0, ans=0.125 2024-09-17 11:17:01,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=209660.0, ans=0.125 2024-09-17 11:17:11,200 INFO [train.py:1198] (0/2) Epoch 12, batch 2650, loss[loss=0.2868, ctc_loss=0.1774, cr_loss=0.4283, attn_decoder_loss=0.2895, over 29285.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1624, cr_loss=0.4019, attn_decoder_loss=0.2637, over 5801021.06 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 8.0 2024-09-17 11:17:14,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=209700.0, ans=0.025 2024-09-17 11:17:36,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-09-17 11:17:40,099 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:17:44,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=209780.0, ans=0.09899494936611666 2024-09-17 11:17:59,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=209820.0, ans=0.0 2024-09-17 11:18:16,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=209860.0, ans=0.025 2024-09-17 11:18:19,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209860.0, ans=0.1 2024-09-17 11:18:20,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=209860.0, ans=0.125 2024-09-17 11:18:26,269 INFO [train.py:1198] (0/2) Epoch 12, batch 2700, loss[loss=0.2801, ctc_loss=0.1806, cr_loss=0.441, attn_decoder_loss=0.2814, over 29524.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1626, cr_loss=0.4022, attn_decoder_loss=0.2638, over 5796400.65 frames. ], batch size: 87, lr: 9.32e-03, grad_scale: 8.0 2024-09-17 11:18:32,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=209900.0, ans=0.125 2024-09-17 11:18:35,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.890e+01 9.487e+01 1.014e+02 1.859e+02, threshold=1.897e+02, percent-clipped=0.0 2024-09-17 11:18:36,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2024-09-17 11:19:04,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-17 11:19:12,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=210020.0, ans=0.1 2024-09-17 11:19:14,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=210020.0, ans=0.125 2024-09-17 11:19:16,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-17 11:19:40,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=210060.0, ans=0.1 2024-09-17 11:19:44,376 INFO [train.py:1198] (0/2) Epoch 12, batch 2750, loss[loss=0.2486, ctc_loss=0.1564, cr_loss=0.3888, attn_decoder_loss=0.2502, over 29525.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1617, cr_loss=0.4009, attn_decoder_loss=0.2626, over 5796045.41 frames. ], batch size: 75, lr: 9.32e-03, grad_scale: 8.0 2024-09-17 11:19:55,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=210100.0, ans=0.0 2024-09-17 11:19:55,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=210100.0, ans=0.125 2024-09-17 11:20:07,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210140.0, ans=0.1 2024-09-17 11:20:38,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-09-17 11:20:44,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=210220.0, ans=0.0 2024-09-17 11:20:55,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-17 11:21:02,205 INFO [train.py:1198] (0/2) Epoch 12, batch 2800, loss[loss=0.2927, ctc_loss=0.213, cr_loss=0.4275, attn_decoder_loss=0.2921, over 20080.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1624, cr_loss=0.4014, attn_decoder_loss=0.263, over 5775761.35 frames. ], batch size: 209, lr: 9.31e-03, grad_scale: 16.0 2024-09-17 11:21:12,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 9.480e+01 1.026e+02 1.256e+02 4.560e+02, threshold=2.052e+02, percent-clipped=3.0 2024-09-17 11:21:31,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=210380.0, ans=0.125 2024-09-17 11:22:07,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=210460.0, ans=0.0 2024-09-17 11:22:18,260 INFO [train.py:1198] (0/2) Epoch 12, batch 2850, loss[loss=0.2424, ctc_loss=0.1419, cr_loss=0.3667, attn_decoder_loss=0.2454, over 29524.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1624, cr_loss=0.4008, attn_decoder_loss=0.2631, over 5762222.69 frames. ], batch size: 77, lr: 9.31e-03, grad_scale: 4.0 2024-09-17 11:22:29,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=210500.0, ans=0.0 2024-09-17 11:22:36,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210540.0, ans=0.125 2024-09-17 11:22:55,670 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:22:55,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=210580.0, ans=0.025 2024-09-17 11:22:57,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=210580.0, ans=0.125 2024-09-17 11:23:04,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=210620.0, ans=0.2 2024-09-17 11:23:06,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=210620.0, ans=0.2 2024-09-17 11:23:36,113 INFO [train.py:1198] (0/2) Epoch 12, batch 2900, loss[loss=0.2543, ctc_loss=0.1494, cr_loss=0.3867, attn_decoder_loss=0.2574, over 29409.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1631, cr_loss=0.4036, attn_decoder_loss=0.2643, over 5787796.24 frames. ], batch size: 79, lr: 9.30e-03, grad_scale: 8.0 2024-09-17 11:23:46,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=210700.0, ans=0.2 2024-09-17 11:23:48,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.952e+01 8.957e+01 9.627e+01 1.010e+02 3.114e+02, threshold=1.925e+02, percent-clipped=2.0 2024-09-17 11:24:20,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=210780.0, ans=0.0 2024-09-17 11:24:28,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=210820.0, ans=0.125 2024-09-17 11:24:53,856 INFO [train.py:1198] (0/2) Epoch 12, batch 2950, loss[loss=0.2464, ctc_loss=0.1531, cr_loss=0.3952, attn_decoder_loss=0.2479, over 29519.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1616, cr_loss=0.401, attn_decoder_loss=0.2629, over 5782971.46 frames. ], batch size: 75, lr: 9.30e-03, grad_scale: 8.0 2024-09-17 11:25:27,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210980.0, ans=0.125 2024-09-17 11:26:09,713 INFO [train.py:1198] (0/2) Epoch 12, batch 3000, loss[loss=0.2676, ctc_loss=0.1746, cr_loss=0.4194, attn_decoder_loss=0.2686, over 29734.00 frames. ], tot_loss[loss=0.2607, ctc_loss=0.1618, cr_loss=0.4018, attn_decoder_loss=0.2628, over 5783520.39 frames. ], batch size: 81, lr: 9.29e-03, grad_scale: 8.0 2024-09-17 11:26:09,714 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 11:26:15,328 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.7040, 6.6246, 6.0556, 6.3602], device='cuda:0') 2024-09-17 11:26:28,165 INFO [train.py:1230] (0/2) Epoch 12, validation: loss=0.2128, ctc_loss=0.04571, cr_loss=4.818e-15, attn_decoder_loss=0.2314, over 944034.00 frames. 2024-09-17 11:26:28,165 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 11:26:34,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=211100.0, ans=0.125 2024-09-17 11:26:42,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.212e+01 9.963e+01 1.087e+02 2.371e+02, threshold=1.993e+02, percent-clipped=1.0 2024-09-17 11:26:44,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=211140.0, ans=0.0 2024-09-17 11:27:36,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=211260.0, ans=0.125 2024-09-17 11:27:42,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=211260.0, ans=0.125 2024-09-17 11:27:48,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-09-17 11:27:48,913 INFO [train.py:1198] (0/2) Epoch 12, batch 3050, loss[loss=0.2502, ctc_loss=0.1478, cr_loss=0.3888, attn_decoder_loss=0.253, over 29521.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1627, cr_loss=0.4024, attn_decoder_loss=0.2637, over 5776808.83 frames. ], batch size: 76, lr: 9.29e-03, grad_scale: 8.0 2024-09-17 11:27:54,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=211300.0, ans=0.0 2024-09-17 11:28:00,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=211300.0, ans=0.2 2024-09-17 11:29:04,429 INFO [train.py:1198] (0/2) Epoch 12, batch 3100, loss[loss=0.2753, ctc_loss=0.1768, cr_loss=0.4312, attn_decoder_loss=0.2767, over 29290.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1621, cr_loss=0.4015, attn_decoder_loss=0.2631, over 5776052.73 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 8.0 2024-09-17 11:29:12,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=211500.0, ans=0.2 2024-09-17 11:29:16,535 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.116e+01 9.262e+01 9.866e+01 1.070e+02 1.746e+02, threshold=1.973e+02, percent-clipped=0.0 2024-09-17 11:29:21,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=211540.0, ans=0.2 2024-09-17 11:29:31,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=211540.0, ans=0.125 2024-09-17 11:29:40,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=211580.0, ans=0.07 2024-09-17 11:29:54,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=211620.0, ans=0.125 2024-09-17 11:30:02,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=211620.0, ans=0.125 2024-09-17 11:30:05,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.22 vs. limit=15.0 2024-09-17 11:30:19,858 INFO [train.py:1198] (0/2) Epoch 12, batch 3150, loss[loss=0.2857, ctc_loss=0.1858, cr_loss=0.4263, attn_decoder_loss=0.2873, over 28740.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.1617, cr_loss=0.4016, attn_decoder_loss=0.2629, over 5781737.47 frames. ], batch size: 104, lr: 9.28e-03, grad_scale: 8.0 2024-09-17 11:30:45,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=211740.0, ans=0.0 2024-09-17 11:31:05,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=211780.0, ans=0.125 2024-09-17 11:31:08,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-09-17 11:31:28,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=211860.0, ans=0.0 2024-09-17 11:31:39,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=211900.0, ans=0.025 2024-09-17 11:31:40,401 INFO [train.py:1198] (0/2) Epoch 12, batch 3200, loss[loss=0.248, ctc_loss=0.1408, cr_loss=0.3808, attn_decoder_loss=0.2514, over 29420.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.161, cr_loss=0.4004, attn_decoder_loss=0.2622, over 5792530.98 frames. ], batch size: 79, lr: 9.28e-03, grad_scale: 16.0 2024-09-17 11:31:43,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=211900.0, ans=0.125 2024-09-17 11:31:45,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-09-17 11:31:53,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 9.023e+01 9.600e+01 1.061e+02 2.809e+02, threshold=1.920e+02, percent-clipped=1.0 2024-09-17 11:32:19,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=211980.0, ans=0.2 2024-09-17 11:32:29,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=212020.0, ans=10.0 2024-09-17 11:32:38,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=212020.0, ans=0.125 2024-09-17 11:32:56,861 INFO [train.py:1198] (0/2) Epoch 12, batch 3250, loss[loss=0.2857, ctc_loss=0.1819, cr_loss=0.4415, attn_decoder_loss=0.2874, over 29702.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1615, cr_loss=0.4017, attn_decoder_loss=0.2631, over 5799069.98 frames. ], batch size: 84, lr: 9.27e-03, grad_scale: 8.0 2024-09-17 11:33:30,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=212180.0, ans=0.0 2024-09-17 11:33:51,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=212220.0, ans=0.125 2024-09-17 11:34:00,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=212260.0, ans=0.125 2024-09-17 11:34:12,003 INFO [train.py:1198] (0/2) Epoch 12, batch 3300, loss[loss=0.276, ctc_loss=0.1696, cr_loss=0.3986, attn_decoder_loss=0.279, over 28338.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1604, cr_loss=0.3991, attn_decoder_loss=0.2617, over 5796384.07 frames. ], batch size: 111, lr: 9.27e-03, grad_scale: 8.0 2024-09-17 11:34:14,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=212300.0, ans=0.125 2024-09-17 11:34:23,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=212300.0, ans=0.125 2024-09-17 11:34:27,366 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 9.323e+01 1.013e+02 1.133e+02 3.364e+02, threshold=2.026e+02, percent-clipped=1.0 2024-09-17 11:34:43,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=212380.0, ans=0.125 2024-09-17 11:34:52,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=212380.0, ans=0.2 2024-09-17 11:34:57,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=212380.0, ans=0.125 2024-09-17 11:35:04,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=212420.0, ans=0.0 2024-09-17 11:35:28,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=212460.0, ans=0.1 2024-09-17 11:35:32,396 INFO [train.py:1198] (0/2) Epoch 12, batch 3350, loss[loss=0.2814, ctc_loss=0.1834, cr_loss=0.4154, attn_decoder_loss=0.283, over 28881.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1614, cr_loss=0.4005, attn_decoder_loss=0.2626, over 5773547.28 frames. ], batch size: 104, lr: 9.26e-03, grad_scale: 8.0 2024-09-17 11:35:34,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=212500.0, ans=0.1 2024-09-17 11:35:35,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212500.0, ans=0.1 2024-09-17 11:35:58,651 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:36:06,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=212580.0, ans=0.125 2024-09-17 11:36:24,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=212620.0, ans=0.0 2024-09-17 11:36:48,158 INFO [train.py:1198] (0/2) Epoch 12, batch 3400, loss[loss=0.2334, ctc_loss=0.1451, cr_loss=0.3925, attn_decoder_loss=0.2345, over 29351.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1617, cr_loss=0.4008, attn_decoder_loss=0.2625, over 5767012.89 frames. ], batch size: 67, lr: 9.26e-03, grad_scale: 8.0 2024-09-17 11:37:03,318 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.423e+01 1.002e+02 1.091e+02 2.670e+02, threshold=2.004e+02, percent-clipped=1.0 2024-09-17 11:37:05,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=212740.0, ans=0.2 2024-09-17 11:37:21,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=212780.0, ans=0.125 2024-09-17 11:37:32,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=12.0 2024-09-17 11:37:35,937 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:38:02,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=212900.0, ans=0.125 2024-09-17 11:38:04,056 INFO [train.py:1198] (0/2) Epoch 12, batch 3450, loss[loss=0.2676, ctc_loss=0.1666, cr_loss=0.391, attn_decoder_loss=0.2702, over 28306.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1621, cr_loss=0.4016, attn_decoder_loss=0.2629, over 5774725.46 frames. ], batch size: 111, lr: 9.26e-03, grad_scale: 8.0 2024-09-17 11:38:59,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=213020.0, ans=0.125 2024-09-17 11:39:04,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=213020.0, ans=0.125 2024-09-17 11:39:16,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=213060.0, ans=0.125 2024-09-17 11:39:23,612 INFO [train.py:1198] (0/2) Epoch 12, batch 3500, loss[loss=0.2466, ctc_loss=0.1597, cr_loss=0.4104, attn_decoder_loss=0.2471, over 29312.00 frames. ], tot_loss[loss=0.2604, ctc_loss=0.1623, cr_loss=0.4022, attn_decoder_loss=0.2624, over 5777918.90 frames. ], batch size: 71, lr: 9.25e-03, grad_scale: 8.0 2024-09-17 11:39:23,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=213100.0, ans=0.125 2024-09-17 11:39:36,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-09-17 11:39:40,421 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.771e+01 8.906e+01 9.633e+01 1.043e+02 3.728e+02, threshold=1.927e+02, percent-clipped=3.0 2024-09-17 11:39:42,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.14 vs. limit=10.0 2024-09-17 11:39:45,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.54 vs. limit=22.5 2024-09-17 11:39:58,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=213180.0, ans=0.0 2024-09-17 11:40:04,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.66 vs. limit=12.0 2024-09-17 11:40:13,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=213220.0, ans=0.5 2024-09-17 11:40:26,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213260.0, ans=0.1 2024-09-17 11:40:38,548 INFO [train.py:1198] (0/2) Epoch 12, batch 3550, loss[loss=0.2547, ctc_loss=0.1459, cr_loss=0.4088, attn_decoder_loss=0.2577, over 29718.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1615, cr_loss=0.4011, attn_decoder_loss=0.262, over 5784372.48 frames. ], batch size: 89, lr: 9.25e-03, grad_scale: 8.0 2024-09-17 11:40:41,822 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:40:44,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=213300.0, ans=0.1 2024-09-17 11:40:53,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=213340.0, ans=0.125 2024-09-17 11:41:02,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=213340.0, ans=0.025 2024-09-17 11:41:15,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=213380.0, ans=0.125 2024-09-17 11:41:15,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=213380.0, ans=0.0 2024-09-17 11:41:23,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-09-17 11:41:27,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=213420.0, ans=0.125 2024-09-17 11:41:52,784 INFO [train.py:1198] (0/2) Epoch 12, batch 3600, loss[loss=0.2667, ctc_loss=0.1765, cr_loss=0.4118, attn_decoder_loss=0.2676, over 29494.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1613, cr_loss=0.401, attn_decoder_loss=0.2624, over 5792475.19 frames. ], batch size: 77, lr: 9.24e-03, grad_scale: 16.0 2024-09-17 11:41:54,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=213500.0, ans=0.125 2024-09-17 11:42:10,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 9.059e+01 9.779e+01 1.035e+02 3.079e+02, threshold=1.956e+02, percent-clipped=1.0 2024-09-17 11:42:26,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.12 vs. limit=15.0 2024-09-17 11:42:35,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=213580.0, ans=0.125 2024-09-17 11:42:36,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=213620.0, ans=0.0 2024-09-17 11:43:05,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.97 vs. limit=15.0 2024-09-17 11:43:08,013 INFO [train.py:1198] (0/2) Epoch 12, batch 3650, loss[loss=0.2667, ctc_loss=0.1594, cr_loss=0.4071, attn_decoder_loss=0.2695, over 29511.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1607, cr_loss=0.3998, attn_decoder_loss=0.2617, over 5794304.20 frames. ], batch size: 90, lr: 9.24e-03, grad_scale: 8.0 2024-09-17 11:43:18,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=213700.0, ans=0.0 2024-09-17 11:43:29,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=213740.0, ans=0.2 2024-09-17 11:44:23,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=213900.0, ans=0.025 2024-09-17 11:44:24,701 INFO [train.py:1198] (0/2) Epoch 12, batch 3700, loss[loss=0.2754, ctc_loss=0.1744, cr_loss=0.4143, attn_decoder_loss=0.2774, over 29718.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1603, cr_loss=0.3999, attn_decoder_loss=0.2617, over 5804564.03 frames. ], batch size: 84, lr: 9.23e-03, grad_scale: 8.0 2024-09-17 11:44:42,611 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.238e+01 9.737e+01 1.052e+02 3.934e+02, threshold=1.947e+02, percent-clipped=3.0 2024-09-17 11:45:11,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=214020.0, ans=0.125 2024-09-17 11:45:38,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=214060.0, ans=0.125 2024-09-17 11:45:40,794 INFO [train.py:1198] (0/2) Epoch 12, batch 3750, loss[loss=0.2312, ctc_loss=0.1342, cr_loss=0.3622, attn_decoder_loss=0.234, over 29399.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1604, cr_loss=0.3995, attn_decoder_loss=0.2618, over 5807922.45 frames. ], batch size: 67, lr: 9.23e-03, grad_scale: 8.0 2024-09-17 11:45:41,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=214100.0, ans=0.125 2024-09-17 11:45:43,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-17 11:46:21,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214180.0, ans=0.1 2024-09-17 11:46:22,846 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:46:22,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=214180.0, ans=0.0 2024-09-17 11:46:31,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=214220.0, ans=0.0 2024-09-17 11:46:40,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-09-17 11:46:46,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=214260.0, ans=0.125 2024-09-17 11:46:55,366 INFO [train.py:1198] (0/2) Epoch 12, batch 3800, loss[loss=0.2791, ctc_loss=0.1787, cr_loss=0.4438, attn_decoder_loss=0.2804, over 29645.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1603, cr_loss=0.399, attn_decoder_loss=0.2614, over 5799019.34 frames. ], batch size: 86, lr: 9.23e-03, grad_scale: 8.0 2024-09-17 11:46:59,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=214300.0, ans=0.125 2024-09-17 11:46:59,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=214300.0, ans=0.0 2024-09-17 11:47:13,094 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 9.449e+01 1.046e+02 1.140e+02 2.045e+02, threshold=2.093e+02, percent-clipped=1.0 2024-09-17 11:47:20,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214340.0, ans=0.125 2024-09-17 11:47:26,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=214380.0, ans=0.125 2024-09-17 11:47:36,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=214380.0, ans=0.025 2024-09-17 11:47:43,486 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:47:43,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=214420.0, ans=0.0 2024-09-17 11:47:56,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.03 vs. limit=15.0 2024-09-17 11:48:09,978 INFO [train.py:1198] (0/2) Epoch 12, batch 3850, loss[loss=0.2693, ctc_loss=0.1608, cr_loss=0.4175, attn_decoder_loss=0.2721, over 29263.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1599, cr_loss=0.3988, attn_decoder_loss=0.2614, over 5813076.11 frames. ], batch size: 100, lr: 9.22e-03, grad_scale: 8.0 2024-09-17 11:48:11,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214500.0, ans=0.1 2024-09-17 11:48:26,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214540.0, ans=0.125 2024-09-17 11:48:29,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=214540.0, ans=0.2 2024-09-17 11:48:39,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=214580.0, ans=0.125 2024-09-17 11:48:55,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.43 vs. limit=15.0 2024-09-17 11:48:56,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=214620.0, ans=0.0 2024-09-17 11:48:59,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=214620.0, ans=0.125 2024-09-17 11:49:04,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.36 vs. limit=22.5 2024-09-17 11:49:17,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214660.0, ans=0.1 2024-09-17 11:49:24,514 INFO [train.py:1198] (0/2) Epoch 12, batch 3900, loss[loss=0.2722, ctc_loss=0.1711, cr_loss=0.4252, attn_decoder_loss=0.274, over 29623.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1607, cr_loss=0.4, attn_decoder_loss=0.262, over 5817441.25 frames. ], batch size: 86, lr: 9.22e-03, grad_scale: 8.0 2024-09-17 11:49:34,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-09-17 11:49:36,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=214700.0, ans=0.1 2024-09-17 11:49:42,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.696e+01 9.064e+01 9.520e+01 1.003e+02 3.590e+02, threshold=1.904e+02, percent-clipped=1.0 2024-09-17 11:50:03,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=214780.0, ans=0.125 2024-09-17 11:50:38,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=214860.0, ans=0.125 2024-09-17 11:50:40,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214900.0, ans=0.1 2024-09-17 11:50:41,382 INFO [train.py:1198] (0/2) Epoch 12, batch 3950, loss[loss=0.2733, ctc_loss=0.1643, cr_loss=0.4063, attn_decoder_loss=0.2764, over 29420.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1598, cr_loss=0.3995, attn_decoder_loss=0.2618, over 5836506.16 frames. ], batch size: 97, lr: 9.21e-03, grad_scale: 8.0 2024-09-17 11:50:45,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=15.0 2024-09-17 11:50:47,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.95 vs. limit=10.0 2024-09-17 11:50:56,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214940.0, ans=0.0 2024-09-17 11:51:08,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2024-09-17 11:51:21,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=214980.0, ans=0.0 2024-09-17 11:51:40,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=215060.0, ans=0.125 2024-09-17 11:51:42,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=215060.0, ans=0.0 2024-09-17 11:51:55,353 INFO [train.py:1198] (0/2) Epoch 12, batch 4000, loss[loss=0.2394, ctc_loss=0.1393, cr_loss=0.3793, attn_decoder_loss=0.2421, over 29489.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1606, cr_loss=0.4002, attn_decoder_loss=0.2621, over 5812185.62 frames. ], batch size: 74, lr: 9.21e-03, grad_scale: 16.0 2024-09-17 11:51:56,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2024-09-17 11:52:14,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.967e+01 9.592e+01 1.062e+02 2.028e+02, threshold=1.918e+02, percent-clipped=1.0 2024-09-17 11:52:46,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=215220.0, ans=0.0 2024-09-17 11:53:10,002 INFO [train.py:1198] (0/2) Epoch 12, batch 4050, loss[loss=0.2966, ctc_loss=0.2172, cr_loss=0.4422, attn_decoder_loss=0.2955, over 19684.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1606, cr_loss=0.3995, attn_decoder_loss=0.262, over 5795612.58 frames. ], batch size: 210, lr: 9.21e-03, grad_scale: 8.0 2024-09-17 11:53:13,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-17 11:53:27,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=215340.0, ans=0.0 2024-09-17 11:53:41,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=215380.0, ans=0.05 2024-09-17 11:53:45,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=215380.0, ans=0.2 2024-09-17 11:54:09,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=215460.0, ans=0.09899494936611666 2024-09-17 11:54:23,751 INFO [train.py:1198] (0/2) Epoch 12, batch 4100, loss[loss=0.2694, ctc_loss=0.1624, cr_loss=0.3855, attn_decoder_loss=0.2728, over 29512.00 frames. ], tot_loss[loss=0.2602, ctc_loss=0.1609, cr_loss=0.4, attn_decoder_loss=0.2624, over 5791537.94 frames. ], batch size: 90, lr: 9.20e-03, grad_scale: 8.0 2024-09-17 11:54:27,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2024-09-17 11:54:43,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.986e+01 9.324e+01 9.990e+01 1.134e+02 3.141e+02, threshold=1.998e+02, percent-clipped=1.0 2024-09-17 11:54:55,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=215580.0, ans=0.125 2024-09-17 11:55:05,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=215580.0, ans=0.0 2024-09-17 11:55:14,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=215620.0, ans=0.05 2024-09-17 11:55:16,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=215620.0, ans=0.07 2024-09-17 11:55:29,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=215660.0, ans=0.0 2024-09-17 11:55:35,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=215660.0, ans=0.125 2024-09-17 11:55:38,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215700.0, ans=0.125 2024-09-17 11:55:39,604 INFO [train.py:1198] (0/2) Epoch 12, batch 4150, loss[loss=0.2549, ctc_loss=0.1508, cr_loss=0.3842, attn_decoder_loss=0.258, over 29487.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1605, cr_loss=0.399, attn_decoder_loss=0.2618, over 5798059.24 frames. ], batch size: 77, lr: 9.20e-03, grad_scale: 8.0 2024-09-17 11:55:53,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-09-17 11:55:57,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=215740.0, ans=0.2 2024-09-17 11:56:00,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=215740.0, ans=0.2 2024-09-17 11:56:05,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-17 11:56:15,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=215780.0, ans=0.0 2024-09-17 11:56:43,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.80 vs. limit=10.0 2024-09-17 11:56:53,538 INFO [train.py:1198] (0/2) Epoch 12, batch 4200, loss[loss=0.2863, ctc_loss=0.1849, cr_loss=0.4494, attn_decoder_loss=0.2876, over 29488.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1608, cr_loss=0.3996, attn_decoder_loss=0.2623, over 5800043.24 frames. ], batch size: 90, lr: 9.19e-03, grad_scale: 8.0 2024-09-17 11:57:04,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.19 vs. limit=15.0 2024-09-17 11:57:12,888 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.094e+01 9.529e+01 1.014e+02 1.072e+02 1.789e+02, threshold=2.028e+02, percent-clipped=0.0 2024-09-17 11:57:53,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.75 vs. limit=15.0 2024-09-17 11:58:07,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-17 11:58:07,691 INFO [train.py:1198] (0/2) Epoch 12, batch 4250, loss[loss=0.2349, ctc_loss=0.1338, cr_loss=0.3496, attn_decoder_loss=0.2384, over 29496.00 frames. ], tot_loss[loss=0.2603, ctc_loss=0.1607, cr_loss=0.3989, attn_decoder_loss=0.2625, over 5805892.81 frames. ], batch size: 74, lr: 9.19e-03, grad_scale: 8.0 2024-09-17 11:58:09,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=216100.0, ans=0.125 2024-09-17 11:58:10,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=216100.0, ans=0.125 2024-09-17 11:58:14,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-09-17 11:58:15,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=216100.0, ans=0.0 2024-09-17 11:58:16,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=216100.0, ans=0.0 2024-09-17 11:58:49,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=216180.0, ans=0.2 2024-09-17 11:59:02,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=216220.0, ans=0.125 2024-09-17 11:59:23,269 INFO [train.py:1198] (0/2) Epoch 12, batch 4300, loss[loss=0.2635, ctc_loss=0.1591, cr_loss=0.403, attn_decoder_loss=0.2661, over 29528.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1602, cr_loss=0.3981, attn_decoder_loss=0.2624, over 5795914.46 frames. ], batch size: 87, lr: 9.18e-03, grad_scale: 8.0 2024-09-17 11:59:31,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.44 vs. limit=15.0 2024-09-17 11:59:32,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=216300.0, ans=0.2 2024-09-17 11:59:36,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.67 vs. limit=15.0 2024-09-17 11:59:37,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2024-09-17 11:59:44,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.076e+01 9.375e+01 1.010e+02 1.083e+02 2.799e+02, threshold=2.019e+02, percent-clipped=3.0 2024-09-17 12:00:14,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.04 vs. limit=15.0 2024-09-17 12:00:30,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=216460.0, ans=0.0 2024-09-17 12:00:37,748 INFO [train.py:1198] (0/2) Epoch 12, batch 4350, loss[loss=0.2815, ctc_loss=0.1809, cr_loss=0.4292, attn_decoder_loss=0.2832, over 29515.00 frames. ], tot_loss[loss=0.2632, ctc_loss=0.1628, cr_loss=0.4033, attn_decoder_loss=0.2654, over 5798105.91 frames. ], batch size: 97, lr: 9.18e-03, grad_scale: 4.0 2024-09-17 12:00:47,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=216500.0, ans=0.0 2024-09-17 12:00:54,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=216540.0, ans=0.2 2024-09-17 12:01:10,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2024-09-17 12:01:13,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=216580.0, ans=0.07 2024-09-17 12:01:16,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=216580.0, ans=0.0 2024-09-17 12:01:21,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=216620.0, ans=0.125 2024-09-17 12:01:28,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=216620.0, ans=0.125 2024-09-17 12:01:34,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=216620.0, ans=0.125 2024-09-17 12:01:37,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=216660.0, ans=0.125 2024-09-17 12:01:45,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=216660.0, ans=0.125 2024-09-17 12:01:52,274 INFO [train.py:1198] (0/2) Epoch 12, batch 4400, loss[loss=0.2729, ctc_loss=0.1757, cr_loss=0.4361, attn_decoder_loss=0.274, over 27296.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1647, cr_loss=0.4059, attn_decoder_loss=0.2677, over 5768845.56 frames. ], batch size: 125, lr: 9.18e-03, grad_scale: 8.0 2024-09-17 12:02:12,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.351e+01 9.429e+01 9.897e+01 1.056e+02 1.811e+02, threshold=1.979e+02, percent-clipped=0.0 2024-09-17 12:02:41,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=216820.0, ans=0.125 2024-09-17 12:03:02,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=216860.0, ans=0.0 2024-09-17 12:03:03,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.71 vs. limit=22.5 2024-09-17 12:03:06,651 INFO [train.py:1198] (0/2) Epoch 12, batch 4450, loss[loss=0.298, ctc_loss=0.225, cr_loss=0.4492, attn_decoder_loss=0.2962, over 20492.00 frames. ], tot_loss[loss=0.2689, ctc_loss=0.1698, cr_loss=0.4103, attn_decoder_loss=0.2708, over 5584792.55 frames. ], batch size: 209, lr: 9.17e-03, grad_scale: 8.0 2024-09-17 12:03:08,597 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:03:42,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=216980.0, ans=0.125 2024-09-17 12:03:55,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.05 vs. limit=15.0 2024-09-17 12:04:12,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=217060.0, ans=0.0 2024-09-17 12:04:17,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=217060.0, ans=0.125 2024-09-17 12:04:23,109 INFO [train.py:1198] (0/2) Epoch 12, batch 4500, loss[loss=0.3003, ctc_loss=0.2235, cr_loss=0.4326, attn_decoder_loss=0.2992, over 20064.00 frames. ], tot_loss[loss=0.2722, ctc_loss=0.1759, cr_loss=0.4123, attn_decoder_loss=0.2738, over 5244026.66 frames. ], batch size: 210, lr: 9.17e-03, grad_scale: 8.0 2024-09-17 12:04:45,821 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.368e+01 1.035e+02 1.137e+02 1.264e+02 3.702e+02, threshold=2.273e+02, percent-clipped=1.0 2024-09-17 12:05:00,373 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-12.pt 2024-09-17 12:05:52,066 INFO [train.py:1198] (0/2) Epoch 13, batch 0, loss[loss=0.2425, ctc_loss=0.1363, cr_loss=0.3769, attn_decoder_loss=0.2459, over 29595.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1363, cr_loss=0.3769, attn_decoder_loss=0.2459, over 29595.00 frames. ], batch size: 73, lr: 8.81e-03, grad_scale: 16.0 2024-09-17 12:05:52,067 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 12:06:10,485 INFO [train.py:1230] (0/2) Epoch 13, validation: loss=0.214, ctc_loss=0.04435, cr_loss=4.652e-15, attn_decoder_loss=0.2329, over 944034.00 frames. 2024-09-17 12:06:10,485 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 12:06:10,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=217200.0, ans=0.1 2024-09-17 12:06:10,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=217200.0, ans=0.05 2024-09-17 12:06:13,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=217200.0, ans=0.125 2024-09-17 12:06:33,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=217240.0, ans=0.125 2024-09-17 12:06:50,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=217280.0, ans=0.0 2024-09-17 12:07:19,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=217360.0, ans=0.1 2024-09-17 12:07:22,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=217360.0, ans=0.2 2024-09-17 12:07:28,728 INFO [train.py:1198] (0/2) Epoch 13, batch 50, loss[loss=0.2398, ctc_loss=0.1449, cr_loss=0.3748, attn_decoder_loss=0.242, over 29444.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1665, cr_loss=0.4088, attn_decoder_loss=0.2665, over 1269537.14 frames. ], batch size: 70, lr: 8.80e-03, grad_scale: 8.0 2024-09-17 12:07:51,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=217440.0, ans=0.2 2024-09-17 12:07:59,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=217480.0, ans=0.0 2024-09-17 12:08:03,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=217480.0, ans=0.0 2024-09-17 12:08:12,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2024-09-17 12:08:23,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=217520.0, ans=0.0 2024-09-17 12:08:26,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=217520.0, ans=0.125 2024-09-17 12:08:30,976 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.263e+01 9.622e+01 1.023e+02 1.146e+02 3.788e+02, threshold=2.046e+02, percent-clipped=2.0 2024-09-17 12:08:45,179 INFO [train.py:1198] (0/2) Epoch 13, batch 100, loss[loss=0.2547, ctc_loss=0.1572, cr_loss=0.4009, attn_decoder_loss=0.2566, over 29516.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1653, cr_loss=0.4066, attn_decoder_loss=0.267, over 2251992.19 frames. ], batch size: 76, lr: 8.80e-03, grad_scale: 8.0 2024-09-17 12:08:46,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=217600.0, ans=0.125 2024-09-17 12:08:59,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-17 12:09:06,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=217640.0, ans=0.125 2024-09-17 12:09:12,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217640.0, ans=0.1 2024-09-17 12:09:13,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=217680.0, ans=0.0 2024-09-17 12:09:33,847 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:09:41,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=217720.0, ans=0.0 2024-09-17 12:09:51,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=217760.0, ans=0.125 2024-09-17 12:10:01,871 INFO [train.py:1198] (0/2) Epoch 13, batch 150, loss[loss=0.2369, ctc_loss=0.1441, cr_loss=0.3816, attn_decoder_loss=0.2387, over 29429.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1609, cr_loss=0.4008, attn_decoder_loss=0.2632, over 3046915.09 frames. ], batch size: 70, lr: 8.80e-03, grad_scale: 8.0 2024-09-17 12:10:32,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=217880.0, ans=0.125 2024-09-17 12:10:59,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=217920.0, ans=0.125 2024-09-17 12:11:06,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.881e+01 9.835e+01 1.094e+02 1.657e+02, threshold=1.967e+02, percent-clipped=0.0 2024-09-17 12:11:14,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=217960.0, ans=0.5 2024-09-17 12:11:20,034 INFO [train.py:1198] (0/2) Epoch 13, batch 200, loss[loss=0.2862, ctc_loss=0.191, cr_loss=0.4541, attn_decoder_loss=0.2867, over 27351.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1601, cr_loss=0.4001, attn_decoder_loss=0.2619, over 3659107.09 frames. ], batch size: 124, lr: 8.79e-03, grad_scale: 8.0 2024-09-17 12:11:21,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=218000.0, ans=0.125 2024-09-17 12:11:33,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=218040.0, ans=0.2 2024-09-17 12:11:59,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=218080.0, ans=0.0 2024-09-17 12:12:01,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=218080.0, ans=0.1 2024-09-17 12:12:20,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=218160.0, ans=0.125 2024-09-17 12:12:35,451 INFO [train.py:1198] (0/2) Epoch 13, batch 250, loss[loss=0.2768, ctc_loss=0.1787, cr_loss=0.4112, attn_decoder_loss=0.2786, over 29275.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1592, cr_loss=0.3994, attn_decoder_loss=0.2618, over 4141365.92 frames. ], batch size: 100, lr: 8.79e-03, grad_scale: 8.0 2024-09-17 12:12:43,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=218200.0, ans=0.0 2024-09-17 12:12:50,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=218240.0, ans=0.2 2024-09-17 12:13:01,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=218240.0, ans=0.1 2024-09-17 12:13:02,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=218240.0, ans=0.0 2024-09-17 12:13:05,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=218280.0, ans=0.0 2024-09-17 12:13:11,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=218280.0, ans=0.125 2024-09-17 12:13:11,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=218280.0, ans=0.025 2024-09-17 12:13:12,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=218280.0, ans=0.95 2024-09-17 12:13:14,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=218280.0, ans=0.025 2024-09-17 12:13:16,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=218280.0, ans=0.0 2024-09-17 12:13:32,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=218320.0, ans=0.1 2024-09-17 12:13:34,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=218320.0, ans=0.125 2024-09-17 12:13:38,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=218360.0, ans=0.125 2024-09-17 12:13:39,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.019e+01 9.647e+01 1.091e+02 1.389e+02, threshold=1.929e+02, percent-clipped=0.0 2024-09-17 12:13:53,954 INFO [train.py:1198] (0/2) Epoch 13, batch 300, loss[loss=0.2807, ctc_loss=0.1738, cr_loss=0.4294, attn_decoder_loss=0.2831, over 29528.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.159, cr_loss=0.3993, attn_decoder_loss=0.2612, over 4510866.46 frames. ], batch size: 92, lr: 8.78e-03, grad_scale: 8.0 2024-09-17 12:14:16,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=218440.0, ans=0.125 2024-09-17 12:14:27,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2024-09-17 12:14:30,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=218480.0, ans=0.1 2024-09-17 12:14:50,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=218520.0, ans=0.0 2024-09-17 12:14:59,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=218560.0, ans=10.0 2024-09-17 12:15:09,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=218560.0, ans=0.0 2024-09-17 12:15:11,640 INFO [train.py:1198] (0/2) Epoch 13, batch 350, loss[loss=0.2271, ctc_loss=0.139, cr_loss=0.3527, attn_decoder_loss=0.229, over 29328.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1591, cr_loss=0.3991, attn_decoder_loss=0.2615, over 4794884.81 frames. ], batch size: 71, lr: 8.78e-03, grad_scale: 8.0 2024-09-17 12:15:13,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218600.0, ans=0.1 2024-09-17 12:15:27,031 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:15:35,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.01 vs. limit=22.5 2024-09-17 12:15:37,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=218640.0, ans=0.125 2024-09-17 12:15:37,531 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:15:52,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=218680.0, ans=0.0 2024-09-17 12:16:08,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.09 vs. limit=15.0 2024-09-17 12:16:13,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 9.251e+01 9.818e+01 1.107e+02 7.103e+02, threshold=1.964e+02, percent-clipped=3.0 2024-09-17 12:16:27,031 INFO [train.py:1198] (0/2) Epoch 13, batch 400, loss[loss=0.2607, ctc_loss=0.1591, cr_loss=0.4134, attn_decoder_loss=0.2629, over 29715.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1586, cr_loss=0.3978, attn_decoder_loss=0.2611, over 5024477.02 frames. ], batch size: 82, lr: 8.78e-03, grad_scale: 16.0 2024-09-17 12:16:34,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=218800.0, ans=0.1 2024-09-17 12:16:36,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=218800.0, ans=0.2 2024-09-17 12:16:53,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=218840.0, ans=0.2 2024-09-17 12:16:54,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=218840.0, ans=0.0 2024-09-17 12:16:57,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=218880.0, ans=0.025 2024-09-17 12:16:58,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.37 vs. limit=22.5 2024-09-17 12:17:19,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=218920.0, ans=0.0 2024-09-17 12:17:39,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-09-17 12:17:40,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-09-17 12:17:42,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=218960.0, ans=0.0 2024-09-17 12:17:45,480 INFO [train.py:1198] (0/2) Epoch 13, batch 450, loss[loss=0.2442, ctc_loss=0.125, cr_loss=0.3513, attn_decoder_loss=0.2496, over 29716.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1586, cr_loss=0.3973, attn_decoder_loss=0.2612, over 5186895.45 frames. ], batch size: 83, lr: 8.77e-03, grad_scale: 8.0 2024-09-17 12:17:47,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=219000.0, ans=0.2 2024-09-17 12:18:00,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=219040.0, ans=0.0 2024-09-17 12:18:03,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=219040.0, ans=0.125 2024-09-17 12:18:08,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219040.0, ans=0.1 2024-09-17 12:18:29,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=219120.0, ans=0.5 2024-09-17 12:18:31,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=219120.0, ans=0.1 2024-09-17 12:18:47,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=219160.0, ans=0.125 2024-09-17 12:18:51,589 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.793e+01 9.370e+01 9.843e+01 2.913e+02, threshold=1.874e+02, percent-clipped=1.0 2024-09-17 12:19:04,120 INFO [train.py:1198] (0/2) Epoch 13, batch 500, loss[loss=0.2806, ctc_loss=0.182, cr_loss=0.4562, attn_decoder_loss=0.2814, over 29407.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1579, cr_loss=0.3967, attn_decoder_loss=0.2603, over 5330132.75 frames. ], batch size: 94, lr: 8.77e-03, grad_scale: 8.0 2024-09-17 12:19:05,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=219200.0, ans=0.125 2024-09-17 12:19:28,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=219240.0, ans=0.5 2024-09-17 12:19:37,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=219280.0, ans=0.0 2024-09-17 12:19:38,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2024-09-17 12:19:57,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=219320.0, ans=0.125 2024-09-17 12:20:19,694 INFO [train.py:1198] (0/2) Epoch 13, batch 550, loss[loss=0.2683, ctc_loss=0.1585, cr_loss=0.4123, attn_decoder_loss=0.2713, over 28828.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1582, cr_loss=0.3972, attn_decoder_loss=0.2605, over 5423798.61 frames. ], batch size: 104, lr: 8.76e-03, grad_scale: 8.0 2024-09-17 12:20:19,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=219400.0, ans=0.125 2024-09-17 12:20:30,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=219400.0, ans=0.025 2024-09-17 12:20:56,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=219480.0, ans=0.2 2024-09-17 12:21:08,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=219520.0, ans=0.125 2024-09-17 12:21:11,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=219520.0, ans=0.125 2024-09-17 12:21:26,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 9.312e+01 1.008e+02 1.110e+02 1.901e+02, threshold=2.017e+02, percent-clipped=1.0 2024-09-17 12:21:29,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=219560.0, ans=0.125 2024-09-17 12:21:31,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=22.5 2024-09-17 12:21:38,306 INFO [train.py:1198] (0/2) Epoch 13, batch 600, loss[loss=0.2666, ctc_loss=0.1583, cr_loss=0.3866, attn_decoder_loss=0.27, over 29261.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1587, cr_loss=0.3979, attn_decoder_loss=0.2611, over 5511558.11 frames. ], batch size: 100, lr: 8.76e-03, grad_scale: 8.0 2024-09-17 12:21:58,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2024-09-17 12:22:08,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=219680.0, ans=0.2 2024-09-17 12:22:08,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=219680.0, ans=0.0 2024-09-17 12:22:28,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=219720.0, ans=0.2 2024-09-17 12:22:43,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=219760.0, ans=0.125 2024-09-17 12:22:49,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.87 vs. limit=15.0 2024-09-17 12:22:53,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=219760.0, ans=0.125 2024-09-17 12:22:55,878 INFO [train.py:1198] (0/2) Epoch 13, batch 650, loss[loss=0.262, ctc_loss=0.1593, cr_loss=0.3958, attn_decoder_loss=0.2646, over 29755.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1572, cr_loss=0.3957, attn_decoder_loss=0.2599, over 5588068.26 frames. ], batch size: 81, lr: 8.76e-03, grad_scale: 8.0 2024-09-17 12:23:06,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-09-17 12:23:08,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=219800.0, ans=0.2 2024-09-17 12:23:11,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=219840.0, ans=0.125 2024-09-17 12:23:15,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=219840.0, ans=0.125 2024-09-17 12:23:29,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=219880.0, ans=0.0 2024-09-17 12:23:30,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-17 12:23:42,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-17 12:23:43,395 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:23:59,542 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 9.144e+01 9.957e+01 1.061e+02 1.597e+02, threshold=1.991e+02, percent-clipped=0.0 2024-09-17 12:24:12,258 INFO [train.py:1198] (0/2) Epoch 13, batch 700, loss[loss=0.239, ctc_loss=0.135, cr_loss=0.359, attn_decoder_loss=0.2426, over 29536.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1576, cr_loss=0.3967, attn_decoder_loss=0.2607, over 5637638.60 frames. ], batch size: 76, lr: 8.75e-03, grad_scale: 8.0 2024-09-17 12:24:24,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=220000.0, ans=0.125 2024-09-17 12:24:24,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=220000.0, ans=0.025 2024-09-17 12:24:24,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-09-17 12:24:26,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.29 vs. limit=10.0 2024-09-17 12:24:35,044 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:24:36,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220040.0, ans=0.125 2024-09-17 12:24:53,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=220080.0, ans=0.0 2024-09-17 12:25:30,054 INFO [train.py:1198] (0/2) Epoch 13, batch 750, loss[loss=0.2696, ctc_loss=0.1705, cr_loss=0.4244, attn_decoder_loss=0.2711, over 29724.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1574, cr_loss=0.3968, attn_decoder_loss=0.2604, over 5675423.52 frames. ], batch size: 82, lr: 8.75e-03, grad_scale: 8.0 2024-09-17 12:25:37,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=220200.0, ans=0.0 2024-09-17 12:25:40,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=220200.0, ans=10.0 2024-09-17 12:25:45,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=220240.0, ans=0.125 2024-09-17 12:25:57,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=220240.0, ans=0.07 2024-09-17 12:26:21,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=220320.0, ans=0.0 2024-09-17 12:26:30,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=220360.0, ans=0.125 2024-09-17 12:26:33,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.372e+01 1.007e+02 1.108e+02 5.289e+02, threshold=2.013e+02, percent-clipped=1.0 2024-09-17 12:26:41,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=220360.0, ans=0.0 2024-09-17 12:26:42,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=220360.0, ans=0.125 2024-09-17 12:26:45,666 INFO [train.py:1198] (0/2) Epoch 13, batch 800, loss[loss=0.2305, ctc_loss=0.1279, cr_loss=0.3447, attn_decoder_loss=0.2342, over 29576.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1575, cr_loss=0.3973, attn_decoder_loss=0.2603, over 5706516.52 frames. ], batch size: 73, lr: 8.74e-03, grad_scale: 16.0 2024-09-17 12:26:45,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220400.0, ans=0.1 2024-09-17 12:27:27,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=220480.0, ans=0.125 2024-09-17 12:27:32,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=220520.0, ans=0.125 2024-09-17 12:27:33,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=220520.0, ans=0.125 2024-09-17 12:28:03,606 INFO [train.py:1198] (0/2) Epoch 13, batch 850, loss[loss=0.271, ctc_loss=0.1667, cr_loss=0.4136, attn_decoder_loss=0.2733, over 29692.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1567, cr_loss=0.3958, attn_decoder_loss=0.2598, over 5736895.10 frames. ], batch size: 89, lr: 8.74e-03, grad_scale: 8.0 2024-09-17 12:28:11,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220600.0, ans=0.125 2024-09-17 12:28:11,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=220600.0, ans=0.0 2024-09-17 12:28:38,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=220680.0, ans=0.0 2024-09-17 12:28:55,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=220720.0, ans=0.025 2024-09-17 12:29:11,979 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.923e+01 9.332e+01 1.023e+02 2.147e+02, threshold=1.866e+02, percent-clipped=2.0 2024-09-17 12:29:23,103 INFO [train.py:1198] (0/2) Epoch 13, batch 900, loss[loss=0.231, ctc_loss=0.1221, cr_loss=0.3393, attn_decoder_loss=0.2355, over 29587.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1571, cr_loss=0.3966, attn_decoder_loss=0.2604, over 5741553.95 frames. ], batch size: 73, lr: 8.74e-03, grad_scale: 8.0 2024-09-17 12:29:59,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=220880.0, ans=0.0 2024-09-17 12:30:38,551 INFO [train.py:1198] (0/2) Epoch 13, batch 950, loss[loss=0.2404, ctc_loss=0.1394, cr_loss=0.3586, attn_decoder_loss=0.2436, over 29523.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1576, cr_loss=0.3972, attn_decoder_loss=0.2607, over 5743906.01 frames. ], batch size: 74, lr: 8.73e-03, grad_scale: 8.0 2024-09-17 12:30:40,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.95 vs. limit=15.0 2024-09-17 12:30:47,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=221000.0, ans=10.0 2024-09-17 12:30:48,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=221000.0, ans=0.0 2024-09-17 12:31:05,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=221040.0, ans=0.025 2024-09-17 12:31:46,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 9.884e+01 1.087e+02 1.225e+02 3.377e+02, threshold=2.174e+02, percent-clipped=3.0 2024-09-17 12:31:47,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=221160.0, ans=0.0 2024-09-17 12:31:50,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2024-09-17 12:31:56,570 INFO [train.py:1198] (0/2) Epoch 13, batch 1000, loss[loss=0.2558, ctc_loss=0.1548, cr_loss=0.415, attn_decoder_loss=0.2578, over 29509.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.159, cr_loss=0.3986, attn_decoder_loss=0.2616, over 5736927.58 frames. ], batch size: 77, lr: 8.73e-03, grad_scale: 8.0 2024-09-17 12:31:58,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=221200.0, ans=0.0 2024-09-17 12:32:32,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221280.0, ans=0.1 2024-09-17 12:33:01,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=221360.0, ans=0.2 2024-09-17 12:33:13,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=221400.0, ans=0.2 2024-09-17 12:33:15,011 INFO [train.py:1198] (0/2) Epoch 13, batch 1050, loss[loss=0.2715, ctc_loss=0.1679, cr_loss=0.4192, attn_decoder_loss=0.2737, over 29669.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1585, cr_loss=0.3978, attn_decoder_loss=0.2608, over 5744739.55 frames. ], batch size: 85, lr: 8.73e-03, grad_scale: 8.0 2024-09-17 12:33:16,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2024-09-17 12:33:33,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=221440.0, ans=0.125 2024-09-17 12:33:55,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=221480.0, ans=0.0 2024-09-17 12:34:04,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=221520.0, ans=0.1 2024-09-17 12:34:19,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=221560.0, ans=0.125 2024-09-17 12:34:20,746 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.817e+01 9.337e+01 1.034e+02 1.952e+02, threshold=1.867e+02, percent-clipped=0.0 2024-09-17 12:34:27,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=221560.0, ans=0.125 2024-09-17 12:34:32,019 INFO [train.py:1198] (0/2) Epoch 13, batch 1100, loss[loss=0.247, ctc_loss=0.1489, cr_loss=0.3576, attn_decoder_loss=0.2499, over 29468.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1584, cr_loss=0.3975, attn_decoder_loss=0.2607, over 5757738.88 frames. ], batch size: 78, lr: 8.72e-03, grad_scale: 8.0 2024-09-17 12:34:39,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=221600.0, ans=0.125 2024-09-17 12:34:54,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=221640.0, ans=0.04949747468305833 2024-09-17 12:34:55,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=221640.0, ans=0.125 2024-09-17 12:35:14,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=221680.0, ans=0.0 2024-09-17 12:35:23,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2024-09-17 12:35:38,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=221760.0, ans=22.5 2024-09-17 12:35:49,961 INFO [train.py:1198] (0/2) Epoch 13, batch 1150, loss[loss=0.2574, ctc_loss=0.1614, cr_loss=0.3921, attn_decoder_loss=0.2593, over 29443.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1587, cr_loss=0.3981, attn_decoder_loss=0.2607, over 5755295.33 frames. ], batch size: 78, lr: 8.72e-03, grad_scale: 8.0 2024-09-17 12:35:57,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=221800.0, ans=0.1 2024-09-17 12:36:08,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=221840.0, ans=0.125 2024-09-17 12:36:57,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.374e+01 9.019e+01 9.917e+01 1.067e+02 1.578e+02, threshold=1.983e+02, percent-clipped=0.0 2024-09-17 12:37:00,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=221960.0, ans=0.125 2024-09-17 12:37:08,002 INFO [train.py:1198] (0/2) Epoch 13, batch 1200, loss[loss=0.2676, ctc_loss=0.157, cr_loss=0.4101, attn_decoder_loss=0.2707, over 29695.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.159, cr_loss=0.399, attn_decoder_loss=0.2614, over 5747285.07 frames. ], batch size: 85, lr: 8.71e-03, grad_scale: 16.0 2024-09-17 12:37:19,102 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:37:31,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=222040.0, ans=0.2 2024-09-17 12:37:37,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=222080.0, ans=0.125 2024-09-17 12:37:38,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222080.0, ans=0.125 2024-09-17 12:37:39,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2024-09-17 12:37:43,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222080.0, ans=0.1 2024-09-17 12:37:57,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=222120.0, ans=0.09899494936611666 2024-09-17 12:38:00,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=222120.0, ans=0.0 2024-09-17 12:38:00,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=222120.0, ans=0.2 2024-09-17 12:38:09,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=222160.0, ans=0.125 2024-09-17 12:38:24,383 INFO [train.py:1198] (0/2) Epoch 13, batch 1250, loss[loss=0.2895, ctc_loss=0.196, cr_loss=0.4554, attn_decoder_loss=0.2898, over 29539.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1596, cr_loss=0.4005, attn_decoder_loss=0.262, over 5774717.62 frames. ], batch size: 92, lr: 8.71e-03, grad_scale: 8.0 2024-09-17 12:38:48,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=222240.0, ans=0.125 2024-09-17 12:38:54,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.95 vs. limit=10.0 2024-09-17 12:39:00,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=222280.0, ans=0.1 2024-09-17 12:39:21,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=222320.0, ans=0.0 2024-09-17 12:39:24,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=222320.0, ans=0.125 2024-09-17 12:39:26,226 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:39:33,605 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 9.226e+01 9.923e+01 1.052e+02 2.205e+02, threshold=1.985e+02, percent-clipped=1.0 2024-09-17 12:39:35,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=222360.0, ans=0.1 2024-09-17 12:39:36,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=222360.0, ans=0.0 2024-09-17 12:39:42,944 INFO [train.py:1198] (0/2) Epoch 13, batch 1300, loss[loss=0.2781, ctc_loss=0.1772, cr_loss=0.4246, attn_decoder_loss=0.2799, over 28150.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1594, cr_loss=0.4, attn_decoder_loss=0.2616, over 5779372.09 frames. ], batch size: 111, lr: 8.71e-03, grad_scale: 8.0 2024-09-17 12:39:56,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2024-09-17 12:39:58,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=222440.0, ans=0.125 2024-09-17 12:40:22,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=222480.0, ans=0.2 2024-09-17 12:40:32,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222520.0, ans=0.125 2024-09-17 12:40:50,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=222560.0, ans=0.2 2024-09-17 12:40:54,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-09-17 12:41:00,772 INFO [train.py:1198] (0/2) Epoch 13, batch 1350, loss[loss=0.2553, ctc_loss=0.1582, cr_loss=0.3962, attn_decoder_loss=0.2573, over 29757.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1583, cr_loss=0.3988, attn_decoder_loss=0.2608, over 5796347.14 frames. ], batch size: 81, lr: 8.70e-03, grad_scale: 8.0 2024-09-17 12:41:20,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=222640.0, ans=0.125 2024-09-17 12:41:32,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=222680.0, ans=0.025 2024-09-17 12:41:35,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=222680.0, ans=0.125 2024-09-17 12:41:46,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-09-17 12:41:59,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=222760.0, ans=0.125 2024-09-17 12:42:06,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.825e+01 9.390e+01 1.007e+02 1.307e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 12:42:15,644 INFO [train.py:1198] (0/2) Epoch 13, batch 1400, loss[loss=0.2253, ctc_loss=0.1258, cr_loss=0.3299, attn_decoder_loss=0.2291, over 29574.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1579, cr_loss=0.398, attn_decoder_loss=0.2606, over 5806884.53 frames. ], batch size: 69, lr: 8.70e-03, grad_scale: 8.0 2024-09-17 12:42:26,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=12.0 2024-09-17 12:42:49,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=222880.0, ans=0.125 2024-09-17 12:42:51,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=222880.0, ans=0.2 2024-09-17 12:43:05,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.05 vs. limit=22.5 2024-09-17 12:43:33,422 INFO [train.py:1198] (0/2) Epoch 13, batch 1450, loss[loss=0.2729, ctc_loss=0.1704, cr_loss=0.4372, attn_decoder_loss=0.2746, over 29478.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1582, cr_loss=0.3987, attn_decoder_loss=0.2612, over 5803433.11 frames. ], batch size: 94, lr: 8.69e-03, grad_scale: 8.0 2024-09-17 12:43:35,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=223000.0, ans=0.2 2024-09-17 12:43:52,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=223040.0, ans=0.125 2024-09-17 12:43:59,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=223040.0, ans=0.125 2024-09-17 12:43:59,638 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:44:02,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=223080.0, ans=0.0 2024-09-17 12:44:09,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2024-09-17 12:44:28,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=223120.0, ans=0.0 2024-09-17 12:44:31,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=223120.0, ans=0.125 2024-09-17 12:44:38,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=22.5 2024-09-17 12:44:39,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=223160.0, ans=0.2 2024-09-17 12:44:41,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=223160.0, ans=0.125 2024-09-17 12:44:42,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.179e+01 9.900e+01 1.065e+02 2.201e+02, threshold=1.980e+02, percent-clipped=1.0 2024-09-17 12:44:51,611 INFO [train.py:1198] (0/2) Epoch 13, batch 1500, loss[loss=0.2649, ctc_loss=0.1612, cr_loss=0.398, attn_decoder_loss=0.2676, over 29615.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1582, cr_loss=0.3988, attn_decoder_loss=0.2615, over 5804074.34 frames. ], batch size: 86, lr: 8.69e-03, grad_scale: 8.0 2024-09-17 12:45:07,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=223240.0, ans=0.0 2024-09-17 12:45:14,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=223240.0, ans=0.0 2024-09-17 12:45:16,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=223240.0, ans=0.125 2024-09-17 12:45:20,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2024-09-17 12:45:26,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2024-09-17 12:45:43,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=223320.0, ans=0.125 2024-09-17 12:45:48,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=223320.0, ans=0.0 2024-09-17 12:46:07,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.66 vs. limit=15.0 2024-09-17 12:46:08,024 INFO [train.py:1198] (0/2) Epoch 13, batch 1550, loss[loss=0.2726, ctc_loss=0.1636, cr_loss=0.4123, attn_decoder_loss=0.2756, over 29536.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1581, cr_loss=0.3985, attn_decoder_loss=0.2612, over 5781224.51 frames. ], batch size: 90, lr: 8.69e-03, grad_scale: 8.0 2024-09-17 12:46:10,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=12.0 2024-09-17 12:46:27,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=223440.0, ans=0.125 2024-09-17 12:46:42,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-09-17 12:46:49,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=223480.0, ans=0.2 2024-09-17 12:46:52,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=223480.0, ans=0.125 2024-09-17 12:47:01,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=223520.0, ans=0.125 2024-09-17 12:47:16,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.851e+01 1.181e+02 1.437e+02 2.605e+02, threshold=2.361e+02, percent-clipped=3.0 2024-09-17 12:47:19,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-09-17 12:47:25,512 INFO [train.py:1198] (0/2) Epoch 13, batch 1600, loss[loss=0.2693, ctc_loss=0.1629, cr_loss=0.4184, attn_decoder_loss=0.2718, over 29678.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1582, cr_loss=0.3972, attn_decoder_loss=0.261, over 5764523.17 frames. ], batch size: 85, lr: 8.68e-03, grad_scale: 16.0 2024-09-17 12:47:30,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=223600.0, ans=0.07 2024-09-17 12:47:54,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=223680.0, ans=0.0 2024-09-17 12:48:43,161 INFO [train.py:1198] (0/2) Epoch 13, batch 1650, loss[loss=0.2753, ctc_loss=0.1708, cr_loss=0.3982, attn_decoder_loss=0.2781, over 29716.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1577, cr_loss=0.3962, attn_decoder_loss=0.2605, over 5758598.81 frames. ], batch size: 89, lr: 8.68e-03, grad_scale: 8.0 2024-09-17 12:48:49,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=223800.0, ans=0.2 2024-09-17 12:49:29,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=223920.0, ans=0.07 2024-09-17 12:49:30,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=223920.0, ans=0.025 2024-09-17 12:49:41,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-09-17 12:49:50,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=223960.0, ans=0.125 2024-09-17 12:49:51,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.178e+01 9.964e+01 1.088e+02 2.882e+02, threshold=1.993e+02, percent-clipped=2.0 2024-09-17 12:49:57,808 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-56000.pt 2024-09-17 12:50:06,240 INFO [train.py:1198] (0/2) Epoch 13, batch 1700, loss[loss=0.2347, ctc_loss=0.1495, cr_loss=0.3696, attn_decoder_loss=0.2359, over 29610.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1571, cr_loss=0.3961, attn_decoder_loss=0.26, over 5781013.65 frames. ], batch size: 69, lr: 8.68e-03, grad_scale: 8.0 2024-09-17 12:50:22,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-17 12:50:30,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=224040.0, ans=0.125 2024-09-17 12:51:00,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.71 vs. limit=22.5 2024-09-17 12:51:13,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=224160.0, ans=0.2 2024-09-17 12:51:21,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224160.0, ans=0.125 2024-09-17 12:51:24,020 INFO [train.py:1198] (0/2) Epoch 13, batch 1750, loss[loss=0.2366, ctc_loss=0.1416, cr_loss=0.3711, attn_decoder_loss=0.2389, over 29355.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1565, cr_loss=0.3946, attn_decoder_loss=0.2595, over 5789810.81 frames. ], batch size: 67, lr: 8.67e-03, grad_scale: 8.0 2024-09-17 12:51:46,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=224240.0, ans=0.2 2024-09-17 12:52:08,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=224280.0, ans=0.125 2024-09-17 12:52:19,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224320.0, ans=0.125 2024-09-17 12:52:33,950 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.908e+01 9.548e+01 1.035e+02 2.424e+02, threshold=1.910e+02, percent-clipped=1.0 2024-09-17 12:52:40,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=224400.0, ans=0.125 2024-09-17 12:52:41,365 INFO [train.py:1198] (0/2) Epoch 13, batch 1800, loss[loss=0.2739, ctc_loss=0.1682, cr_loss=0.4026, attn_decoder_loss=0.2766, over 29685.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1568, cr_loss=0.3951, attn_decoder_loss=0.26, over 5792227.07 frames. ], batch size: 83, lr: 8.67e-03, grad_scale: 8.0 2024-09-17 12:52:49,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=224400.0, ans=0.125 2024-09-17 12:52:53,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=224400.0, ans=0.0 2024-09-17 12:52:56,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=224440.0, ans=0.125 2024-09-17 12:53:22,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=224480.0, ans=0.125 2024-09-17 12:53:35,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=224520.0, ans=0.125 2024-09-17 12:53:38,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-17 12:53:44,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=224560.0, ans=0.2 2024-09-17 12:53:44,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224560.0, ans=0.125 2024-09-17 12:53:57,464 INFO [train.py:1198] (0/2) Epoch 13, batch 1850, loss[loss=0.2644, ctc_loss=0.1485, cr_loss=0.3851, attn_decoder_loss=0.2687, over 29640.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1567, cr_loss=0.3952, attn_decoder_loss=0.2599, over 5799699.76 frames. ], batch size: 86, lr: 8.66e-03, grad_scale: 8.0 2024-09-17 12:54:05,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=224600.0, ans=0.07 2024-09-17 12:54:08,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=224600.0, ans=0.125 2024-09-17 12:54:24,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=224640.0, ans=0.0 2024-09-17 12:54:45,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=224720.0, ans=0.125 2024-09-17 12:54:48,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224720.0, ans=0.1 2024-09-17 12:54:57,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=224720.0, ans=0.125 2024-09-17 12:55:07,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.966e+01 9.738e+01 1.037e+02 3.444e+02, threshold=1.948e+02, percent-clipped=2.0 2024-09-17 12:55:12,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=224760.0, ans=0.2 2024-09-17 12:55:15,199 INFO [train.py:1198] (0/2) Epoch 13, batch 1900, loss[loss=0.2622, ctc_loss=0.1564, cr_loss=0.4115, attn_decoder_loss=0.2649, over 29732.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1574, cr_loss=0.396, attn_decoder_loss=0.2608, over 5806774.79 frames. ], batch size: 89, lr: 8.66e-03, grad_scale: 8.0 2024-09-17 12:55:21,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=224800.0, ans=0.0 2024-09-17 12:55:26,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-17 12:55:29,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=224840.0, ans=0.025 2024-09-17 12:55:32,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=224840.0, ans=0.0 2024-09-17 12:55:33,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=224840.0, ans=0.2 2024-09-17 12:55:35,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=224840.0, ans=0.2 2024-09-17 12:55:42,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=224840.0, ans=0.125 2024-09-17 12:55:51,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2024-09-17 12:55:53,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=224880.0, ans=10.0 2024-09-17 12:55:53,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=224880.0, ans=0.07 2024-09-17 12:56:10,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.17 vs. limit=15.0 2024-09-17 12:56:29,064 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:56:33,262 INFO [train.py:1198] (0/2) Epoch 13, batch 1950, loss[loss=0.2538, ctc_loss=0.1469, cr_loss=0.3907, attn_decoder_loss=0.257, over 29452.00 frames. ], tot_loss[loss=0.2597, ctc_loss=0.1583, cr_loss=0.3981, attn_decoder_loss=0.2622, over 5820807.59 frames. ], batch size: 78, lr: 8.66e-03, grad_scale: 8.0 2024-09-17 12:56:50,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=225040.0, ans=0.0 2024-09-17 12:57:12,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225080.0, ans=0.0 2024-09-17 12:57:20,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys.whitening_limit, batch_count=225120.0, ans=6.0 2024-09-17 12:57:30,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=225120.0, ans=0.1 2024-09-17 12:57:40,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 9.122e+01 9.575e+01 1.024e+02 4.346e+02, threshold=1.915e+02, percent-clipped=1.0 2024-09-17 12:57:48,512 INFO [train.py:1198] (0/2) Epoch 13, batch 2000, loss[loss=0.226, ctc_loss=0.1322, cr_loss=0.3582, attn_decoder_loss=0.2285, over 29344.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1587, cr_loss=0.3986, attn_decoder_loss=0.2624, over 5798326.15 frames. ], batch size: 67, lr: 8.65e-03, grad_scale: 16.0 2024-09-17 12:57:56,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=225200.0, ans=0.07 2024-09-17 12:57:58,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-17 12:58:15,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.28 vs. limit=10.0 2024-09-17 12:58:54,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=225360.0, ans=0.0 2024-09-17 12:59:06,707 INFO [train.py:1198] (0/2) Epoch 13, batch 2050, loss[loss=0.2327, ctc_loss=0.1339, cr_loss=0.3661, attn_decoder_loss=0.2356, over 29454.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1582, cr_loss=0.3972, attn_decoder_loss=0.2615, over 5788991.15 frames. ], batch size: 70, lr: 8.65e-03, grad_scale: 8.0 2024-09-17 12:59:37,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225480.0, ans=0.125 2024-09-17 12:59:48,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=225480.0, ans=0.125 2024-09-17 13:00:06,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=225520.0, ans=0.0 2024-09-17 13:00:18,377 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 8.858e+01 9.418e+01 1.005e+02 1.765e+02, threshold=1.884e+02, percent-clipped=0.0 2024-09-17 13:00:18,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=225560.0, ans=0.2 2024-09-17 13:00:24,865 INFO [train.py:1198] (0/2) Epoch 13, batch 2100, loss[loss=0.2625, ctc_loss=0.1581, cr_loss=0.3888, attn_decoder_loss=0.2654, over 29777.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1575, cr_loss=0.3963, attn_decoder_loss=0.2607, over 5799424.64 frames. ], batch size: 81, lr: 8.65e-03, grad_scale: 8.0 2024-09-17 13:00:25,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=225600.0, ans=0.125 2024-09-17 13:00:37,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=225600.0, ans=0.5 2024-09-17 13:01:23,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-09-17 13:01:29,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-17 13:01:36,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-09-17 13:01:40,020 INFO [train.py:1198] (0/2) Epoch 13, batch 2150, loss[loss=0.2509, ctc_loss=0.1501, cr_loss=0.4056, attn_decoder_loss=0.2531, over 29454.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1563, cr_loss=0.3949, attn_decoder_loss=0.2599, over 5815450.86 frames. ], batch size: 78, lr: 8.64e-03, grad_scale: 8.0 2024-09-17 13:01:41,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=225800.0, ans=0.125 2024-09-17 13:02:01,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.94 vs. limit=22.5 2024-09-17 13:02:35,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=225920.0, ans=0.0 2024-09-17 13:02:51,961 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 9.039e+01 9.591e+01 1.017e+02 1.428e+02, threshold=1.918e+02, percent-clipped=0.0 2024-09-17 13:02:52,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=225960.0, ans=0.0 2024-09-17 13:02:58,182 INFO [train.py:1198] (0/2) Epoch 13, batch 2200, loss[loss=0.2604, ctc_loss=0.1549, cr_loss=0.3997, attn_decoder_loss=0.2632, over 29642.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1569, cr_loss=0.3963, attn_decoder_loss=0.2601, over 5813621.14 frames. ], batch size: 86, lr: 8.64e-03, grad_scale: 8.0 2024-09-17 13:03:02,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=226000.0, ans=0.125 2024-09-17 13:03:04,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=226000.0, ans=0.125 2024-09-17 13:03:27,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=226080.0, ans=0.125 2024-09-17 13:04:09,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=226160.0, ans=0.125 2024-09-17 13:04:16,304 INFO [train.py:1198] (0/2) Epoch 13, batch 2250, loss[loss=0.2698, ctc_loss=0.1661, cr_loss=0.4401, attn_decoder_loss=0.2715, over 29723.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1567, cr_loss=0.3964, attn_decoder_loss=0.2602, over 5810761.88 frames. ], batch size: 82, lr: 8.63e-03, grad_scale: 4.0 2024-09-17 13:04:39,454 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:05:00,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=226320.0, ans=0.0 2024-09-17 13:05:00,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=226320.0, ans=0.1 2024-09-17 13:05:11,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=226320.0, ans=0.0 2024-09-17 13:05:27,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 9.102e+01 9.665e+01 1.015e+02 1.637e+02, threshold=1.933e+02, percent-clipped=0.0 2024-09-17 13:05:32,600 INFO [train.py:1198] (0/2) Epoch 13, batch 2300, loss[loss=0.2235, ctc_loss=0.125, cr_loss=0.3429, attn_decoder_loss=0.2268, over 29304.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1562, cr_loss=0.395, attn_decoder_loss=0.2592, over 5797021.96 frames. ], batch size: 71, lr: 8.63e-03, grad_scale: 8.0 2024-09-17 13:05:55,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=226440.0, ans=0.0 2024-09-17 13:06:09,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.68 vs. limit=15.0 2024-09-17 13:06:21,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=226520.0, ans=0.125 2024-09-17 13:06:25,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=226520.0, ans=0.2 2024-09-17 13:06:27,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-17 13:06:31,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=226520.0, ans=0.2 2024-09-17 13:06:31,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-17 13:06:50,457 INFO [train.py:1198] (0/2) Epoch 13, batch 2350, loss[loss=0.2711, ctc_loss=0.1663, cr_loss=0.4301, attn_decoder_loss=0.2732, over 29697.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1566, cr_loss=0.3961, attn_decoder_loss=0.2595, over 5802689.70 frames. ], batch size: 83, lr: 8.63e-03, grad_scale: 8.0 2024-09-17 13:06:56,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=226600.0, ans=0.0 2024-09-17 13:07:21,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2024-09-17 13:07:22,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226680.0, ans=0.1 2024-09-17 13:07:33,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=226680.0, ans=0.0 2024-09-17 13:07:46,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=12.0 2024-09-17 13:08:05,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.113e+01 9.470e+01 1.028e+02 1.156e+02 2.779e+02, threshold=2.056e+02, percent-clipped=1.0 2024-09-17 13:08:05,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=226760.0, ans=0.125 2024-09-17 13:08:08,454 INFO [train.py:1198] (0/2) Epoch 13, batch 2400, loss[loss=0.2446, ctc_loss=0.1446, cr_loss=0.3698, attn_decoder_loss=0.2475, over 29518.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.157, cr_loss=0.3971, attn_decoder_loss=0.26, over 5806733.11 frames. ], batch size: 76, lr: 8.62e-03, grad_scale: 8.0 2024-09-17 13:08:26,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=226840.0, ans=0.1 2024-09-17 13:08:28,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=226840.0, ans=0.125 2024-09-17 13:08:51,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.83 vs. limit=15.0 2024-09-17 13:08:51,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-09-17 13:09:15,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.83 vs. limit=22.5 2024-09-17 13:09:22,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=227000.0, ans=0.1 2024-09-17 13:09:24,019 INFO [train.py:1198] (0/2) Epoch 13, batch 2450, loss[loss=0.2622, ctc_loss=0.1519, cr_loss=0.3932, attn_decoder_loss=0.2657, over 29718.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1578, cr_loss=0.398, attn_decoder_loss=0.2612, over 5783114.47 frames. ], batch size: 82, lr: 8.62e-03, grad_scale: 8.0 2024-09-17 13:09:25,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=227000.0, ans=0.1 2024-09-17 13:09:30,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.04 vs. limit=12.0 2024-09-17 13:09:31,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=227000.0, ans=0.125 2024-09-17 13:09:57,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=227080.0, ans=0.07 2024-09-17 13:10:36,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.15 vs. limit=10.0 2024-09-17 13:10:38,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 9.238e+01 9.776e+01 1.100e+02 2.445e+02, threshold=1.955e+02, percent-clipped=1.0 2024-09-17 13:10:40,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=227200.0, ans=0.125 2024-09-17 13:10:41,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2024-09-17 13:10:42,079 INFO [train.py:1198] (0/2) Epoch 13, batch 2500, loss[loss=0.2615, ctc_loss=0.1458, cr_loss=0.3776, attn_decoder_loss=0.266, over 29622.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.158, cr_loss=0.3983, attn_decoder_loss=0.2612, over 5793535.65 frames. ], batch size: 86, lr: 8.62e-03, grad_scale: 8.0 2024-09-17 13:10:45,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=227200.0, ans=0.0 2024-09-17 13:10:47,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=227200.0, ans=0.0 2024-09-17 13:10:47,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227200.0, ans=0.1 2024-09-17 13:10:53,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=227200.0, ans=0.025 2024-09-17 13:11:08,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=227240.0, ans=0.09899494936611666 2024-09-17 13:11:11,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=227280.0, ans=0.95 2024-09-17 13:11:14,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=227280.0, ans=0.0 2024-09-17 13:11:31,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=227320.0, ans=0.125 2024-09-17 13:11:57,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=227360.0, ans=0.2 2024-09-17 13:11:59,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=227400.0, ans=0.025 2024-09-17 13:12:00,449 INFO [train.py:1198] (0/2) Epoch 13, batch 2550, loss[loss=0.2259, ctc_loss=0.1314, cr_loss=0.3645, attn_decoder_loss=0.2283, over 29361.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1579, cr_loss=0.3985, attn_decoder_loss=0.2613, over 5797032.29 frames. ], batch size: 67, lr: 8.61e-03, grad_scale: 8.0 2024-09-17 13:12:31,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.13 vs. limit=15.0 2024-09-17 13:12:35,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=227480.0, ans=0.07 2024-09-17 13:12:36,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=227480.0, ans=0.125 2024-09-17 13:12:47,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=227520.0, ans=0.1 2024-09-17 13:12:48,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227520.0, ans=0.1 2024-09-17 13:12:52,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=227520.0, ans=0.125 2024-09-17 13:12:56,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=227520.0, ans=0.0 2024-09-17 13:13:07,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=227560.0, ans=0.2 2024-09-17 13:13:13,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 9.236e+01 9.928e+01 1.060e+02 5.337e+02, threshold=1.986e+02, percent-clipped=3.0 2024-09-17 13:13:16,294 INFO [train.py:1198] (0/2) Epoch 13, batch 2600, loss[loss=0.2539, ctc_loss=0.15, cr_loss=0.4147, attn_decoder_loss=0.2563, over 29431.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.158, cr_loss=0.3984, attn_decoder_loss=0.2614, over 5793828.63 frames. ], batch size: 78, lr: 8.61e-03, grad_scale: 8.0 2024-09-17 13:13:33,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=227640.0, ans=0.125 2024-09-17 13:13:37,885 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:13:37,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=227640.0, ans=0.025 2024-09-17 13:13:51,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=227680.0, ans=0.0 2024-09-17 13:13:54,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=227680.0, ans=0.0 2024-09-17 13:13:58,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.15 vs. limit=15.0 2024-09-17 13:13:58,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=12.0 2024-09-17 13:14:26,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=227760.0, ans=0.125 2024-09-17 13:14:32,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=227800.0, ans=0.125 2024-09-17 13:14:34,059 INFO [train.py:1198] (0/2) Epoch 13, batch 2650, loss[loss=0.2856, ctc_loss=0.1826, cr_loss=0.4439, attn_decoder_loss=0.2872, over 29270.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1577, cr_loss=0.3984, attn_decoder_loss=0.2614, over 5800571.80 frames. ], batch size: 100, lr: 8.60e-03, grad_scale: 8.0 2024-09-17 13:14:37,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=227800.0, ans=0.1 2024-09-17 13:14:38,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=227800.0, ans=0.0 2024-09-17 13:14:51,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-09-17 13:14:59,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=227840.0, ans=0.07 2024-09-17 13:15:27,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227920.0, ans=0.125 2024-09-17 13:15:39,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=227960.0, ans=0.0 2024-09-17 13:15:41,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.07 vs. limit=12.0 2024-09-17 13:15:48,540 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.244e+01 9.728e+01 1.060e+02 3.050e+02, threshold=1.946e+02, percent-clipped=2.0 2024-09-17 13:15:52,081 INFO [train.py:1198] (0/2) Epoch 13, batch 2700, loss[loss=0.2604, ctc_loss=0.1543, cr_loss=0.4058, attn_decoder_loss=0.2632, over 29530.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1578, cr_loss=0.3982, attn_decoder_loss=0.2615, over 5797366.61 frames. ], batch size: 87, lr: 8.60e-03, grad_scale: 8.0 2024-09-17 13:16:11,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2024-09-17 13:16:28,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=228080.0, ans=0.125 2024-09-17 13:16:47,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=228120.0, ans=0.125 2024-09-17 13:17:07,973 INFO [train.py:1198] (0/2) Epoch 13, batch 2750, loss[loss=0.2563, ctc_loss=0.164, cr_loss=0.4011, attn_decoder_loss=0.2576, over 29509.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1573, cr_loss=0.3968, attn_decoder_loss=0.2606, over 5795452.74 frames. ], batch size: 75, lr: 8.60e-03, grad_scale: 8.0 2024-09-17 13:17:40,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=228280.0, ans=0.125 2024-09-17 13:17:41,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=228280.0, ans=0.025 2024-09-17 13:18:20,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=228360.0, ans=0.125 2024-09-17 13:18:23,264 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.445e+01 9.348e+01 1.004e+02 1.120e+02 2.904e+02, threshold=2.008e+02, percent-clipped=2.0 2024-09-17 13:18:26,304 INFO [train.py:1198] (0/2) Epoch 13, batch 2800, loss[loss=0.2848, ctc_loss=0.2144, cr_loss=0.4482, attn_decoder_loss=0.2827, over 19866.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1579, cr_loss=0.3971, attn_decoder_loss=0.2611, over 5776913.60 frames. ], batch size: 209, lr: 8.59e-03, grad_scale: 16.0 2024-09-17 13:18:41,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=228440.0, ans=0.125 2024-09-17 13:19:00,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=228480.0, ans=0.0 2024-09-17 13:19:29,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=228560.0, ans=0.05 2024-09-17 13:19:30,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=228560.0, ans=0.09899494936611666 2024-09-17 13:19:37,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-17 13:19:44,134 INFO [train.py:1198] (0/2) Epoch 13, batch 2850, loss[loss=0.2412, ctc_loss=0.1384, cr_loss=0.3553, attn_decoder_loss=0.2447, over 29513.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1585, cr_loss=0.3983, attn_decoder_loss=0.2616, over 5763029.28 frames. ], batch size: 77, lr: 8.59e-03, grad_scale: 8.0 2024-09-17 13:19:51,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=228600.0, ans=0.125 2024-09-17 13:20:07,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=228640.0, ans=0.2 2024-09-17 13:20:15,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=22.5 2024-09-17 13:20:18,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=228680.0, ans=0.0 2024-09-17 13:20:20,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228680.0, ans=0.1 2024-09-17 13:20:30,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=228720.0, ans=0.125 2024-09-17 13:20:32,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=228720.0, ans=0.0 2024-09-17 13:20:51,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=228760.0, ans=0.125 2024-09-17 13:21:00,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.259e+01 1.060e+02 1.394e+02 3.143e+02, threshold=2.120e+02, percent-clipped=6.0 2024-09-17 13:21:00,235 INFO [train.py:1198] (0/2) Epoch 13, batch 2900, loss[loss=0.2516, ctc_loss=0.1559, cr_loss=0.3903, attn_decoder_loss=0.2535, over 29420.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1586, cr_loss=0.3997, attn_decoder_loss=0.2623, over 5788822.76 frames. ], batch size: 79, lr: 8.59e-03, grad_scale: 8.0 2024-09-17 13:21:04,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=228800.0, ans=0.025 2024-09-17 13:21:27,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=228840.0, ans=0.125 2024-09-17 13:21:53,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=228920.0, ans=0.125 2024-09-17 13:21:55,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.20 vs. limit=15.0 2024-09-17 13:22:18,343 INFO [train.py:1198] (0/2) Epoch 13, batch 2950, loss[loss=0.2458, ctc_loss=0.1547, cr_loss=0.4209, attn_decoder_loss=0.2466, over 29507.00 frames. ], tot_loss[loss=0.2586, ctc_loss=0.1577, cr_loss=0.3982, attn_decoder_loss=0.2609, over 5783263.48 frames. ], batch size: 75, lr: 8.58e-03, grad_scale: 4.0 2024-09-17 13:22:20,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=229000.0, ans=0.0 2024-09-17 13:23:15,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=229120.0, ans=0.0 2024-09-17 13:23:36,399 INFO [train.py:1198] (0/2) Epoch 13, batch 3000, loss[loss=0.2526, ctc_loss=0.1424, cr_loss=0.3677, attn_decoder_loss=0.2567, over 29754.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1574, cr_loss=0.398, attn_decoder_loss=0.2607, over 5784730.47 frames. ], batch size: 81, lr: 8.58e-03, grad_scale: 8.0 2024-09-17 13:23:36,400 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 13:23:49,537 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.2314, 5.9274, 5.8425, 5.5486], device='cuda:0') 2024-09-17 13:23:54,824 INFO [train.py:1230] (0/2) Epoch 13, validation: loss=0.212, ctc_loss=0.04384, cr_loss=4.97e-15, attn_decoder_loss=0.2307, over 944034.00 frames. 2024-09-17 13:23:54,824 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 13:23:56,318 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.967e+01 9.683e+01 1.075e+02 2.883e+02, threshold=1.937e+02, percent-clipped=1.0 2024-09-17 13:23:57,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.05 vs. limit=15.0 2024-09-17 13:24:02,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229200.0, ans=0.125 2024-09-17 13:24:07,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=229200.0, ans=0.0 2024-09-17 13:24:11,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229240.0, ans=0.1 2024-09-17 13:24:16,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=229240.0, ans=0.125 2024-09-17 13:24:25,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=229280.0, ans=0.2 2024-09-17 13:24:29,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.00 vs. limit=10.0 2024-09-17 13:24:31,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=229280.0, ans=0.0 2024-09-17 13:24:33,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=229280.0, ans=0.0 2024-09-17 13:24:33,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=229280.0, ans=0.07 2024-09-17 13:24:34,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=229280.0, ans=0.125 2024-09-17 13:24:39,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229320.0, ans=0.1 2024-09-17 13:25:04,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=229360.0, ans=0.125 2024-09-17 13:25:10,620 INFO [train.py:1198] (0/2) Epoch 13, batch 3050, loss[loss=0.2498, ctc_loss=0.1511, cr_loss=0.3774, attn_decoder_loss=0.2524, over 29531.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1582, cr_loss=0.399, attn_decoder_loss=0.2618, over 5777319.31 frames. ], batch size: 76, lr: 8.57e-03, grad_scale: 8.0 2024-09-17 13:25:18,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=229400.0, ans=0.0 2024-09-17 13:25:35,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=229440.0, ans=0.125 2024-09-17 13:25:40,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.69 vs. limit=10.0 2024-09-17 13:26:08,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-17 13:26:29,228 INFO [train.py:1198] (0/2) Epoch 13, batch 3100, loss[loss=0.2794, ctc_loss=0.1827, cr_loss=0.4287, attn_decoder_loss=0.2806, over 29350.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1583, cr_loss=0.3988, attn_decoder_loss=0.2614, over 5777495.62 frames. ], batch size: 100, lr: 8.57e-03, grad_scale: 8.0 2024-09-17 13:26:32,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 9.409e+01 1.035e+02 1.210e+02 2.103e+02, threshold=2.070e+02, percent-clipped=1.0 2024-09-17 13:27:20,281 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:27:29,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.28 vs. limit=15.0 2024-09-17 13:27:47,054 INFO [train.py:1198] (0/2) Epoch 13, batch 3150, loss[loss=0.2722, ctc_loss=0.1576, cr_loss=0.3875, attn_decoder_loss=0.2763, over 28812.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.158, cr_loss=0.3981, attn_decoder_loss=0.2613, over 5784431.40 frames. ], batch size: 104, lr: 8.57e-03, grad_scale: 8.0 2024-09-17 13:27:47,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=229800.0, ans=0.025 2024-09-17 13:27:57,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=229800.0, ans=0.125 2024-09-17 13:28:03,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=229840.0, ans=0.125 2024-09-17 13:28:03,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=229840.0, ans=0.0 2024-09-17 13:28:27,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2024-09-17 13:28:45,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=229960.0, ans=0.125 2024-09-17 13:28:48,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=229960.0, ans=0.125 2024-09-17 13:28:50,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229960.0, ans=0.1 2024-09-17 13:29:02,205 INFO [train.py:1198] (0/2) Epoch 13, batch 3200, loss[loss=0.2566, ctc_loss=0.1686, cr_loss=0.4068, attn_decoder_loss=0.2573, over 29420.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1572, cr_loss=0.3976, attn_decoder_loss=0.2605, over 5793648.43 frames. ], batch size: 79, lr: 8.56e-03, grad_scale: 16.0 2024-09-17 13:29:05,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.025e+01 9.709e+01 1.089e+02 2.819e+02, threshold=1.942e+02, percent-clipped=2.0 2024-09-17 13:29:20,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=230040.0, ans=0.1 2024-09-17 13:29:56,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.84 vs. limit=15.0 2024-09-17 13:30:17,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=230160.0, ans=0.0 2024-09-17 13:30:20,633 INFO [train.py:1198] (0/2) Epoch 13, batch 3250, loss[loss=0.2675, ctc_loss=0.1608, cr_loss=0.427, attn_decoder_loss=0.2699, over 29704.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1576, cr_loss=0.3982, attn_decoder_loss=0.2612, over 5799797.33 frames. ], batch size: 84, lr: 8.56e-03, grad_scale: 8.0 2024-09-17 13:30:20,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=230200.0, ans=0.125 2024-09-17 13:30:39,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=230240.0, ans=0.125 2024-09-17 13:30:51,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=230280.0, ans=0.0 2024-09-17 13:31:07,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-09-17 13:31:08,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=230320.0, ans=0.07 2024-09-17 13:31:14,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.85 vs. limit=22.5 2024-09-17 13:31:38,613 INFO [train.py:1198] (0/2) Epoch 13, batch 3300, loss[loss=0.2611, ctc_loss=0.1565, cr_loss=0.3723, attn_decoder_loss=0.2644, over 28169.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1566, cr_loss=0.396, attn_decoder_loss=0.2598, over 5796810.28 frames. ], batch size: 111, lr: 8.56e-03, grad_scale: 8.0 2024-09-17 13:31:39,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230400.0, ans=0.1 2024-09-17 13:31:41,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.916e+01 9.519e+01 1.032e+02 2.087e+02, threshold=1.904e+02, percent-clipped=1.0 2024-09-17 13:32:10,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=230480.0, ans=0.09899494936611666 2024-09-17 13:32:27,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=230520.0, ans=0.5 2024-09-17 13:32:29,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=230520.0, ans=0.2 2024-09-17 13:32:39,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=230560.0, ans=0.125 2024-09-17 13:32:42,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=230560.0, ans=0.2 2024-09-17 13:32:53,828 INFO [train.py:1198] (0/2) Epoch 13, batch 3350, loss[loss=0.2638, ctc_loss=0.1632, cr_loss=0.4085, attn_decoder_loss=0.2659, over 28862.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1574, cr_loss=0.3969, attn_decoder_loss=0.2607, over 5773126.06 frames. ], batch size: 104, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 13:33:03,254 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:33:07,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=230640.0, ans=0.125 2024-09-17 13:33:12,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=230640.0, ans=0.0 2024-09-17 13:33:37,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2024-09-17 13:33:57,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=230760.0, ans=0.125 2024-09-17 13:34:14,315 INFO [train.py:1198] (0/2) Epoch 13, batch 3400, loss[loss=0.2264, ctc_loss=0.1304, cr_loss=0.3573, attn_decoder_loss=0.2292, over 29318.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1577, cr_loss=0.3976, attn_decoder_loss=0.2607, over 5765588.45 frames. ], batch size: 67, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 13:34:14,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=230800.0, ans=0.0 2024-09-17 13:34:17,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.984e+01 9.781e+01 1.096e+02 3.563e+02, threshold=1.956e+02, percent-clipped=2.0 2024-09-17 13:34:19,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-17 13:34:20,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=230800.0, ans=0.125 2024-09-17 13:34:23,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=230800.0, ans=0.125 2024-09-17 13:34:26,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-09-17 13:34:32,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=230840.0, ans=0.125 2024-09-17 13:34:46,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=230880.0, ans=0.1 2024-09-17 13:34:51,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=230880.0, ans=0.0 2024-09-17 13:34:55,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=230880.0, ans=0.2 2024-09-17 13:35:22,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=230960.0, ans=0.0 2024-09-17 13:35:23,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=230960.0, ans=0.125 2024-09-17 13:35:26,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.20 vs. limit=22.5 2024-09-17 13:35:30,141 INFO [train.py:1198] (0/2) Epoch 13, batch 3450, loss[loss=0.2691, ctc_loss=0.1643, cr_loss=0.4234, attn_decoder_loss=0.2714, over 28547.00 frames. ], tot_loss[loss=0.2584, ctc_loss=0.1575, cr_loss=0.3978, attn_decoder_loss=0.2608, over 5774511.30 frames. ], batch size: 112, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 13:35:32,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-17 13:35:36,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231000.0, ans=0.1 2024-09-17 13:35:41,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=231000.0, ans=0.0 2024-09-17 13:35:43,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231040.0, ans=0.1 2024-09-17 13:35:44,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=231040.0, ans=0.125 2024-09-17 13:36:03,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=231080.0, ans=0.2 2024-09-17 13:36:03,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=231080.0, ans=0.125 2024-09-17 13:36:11,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=231080.0, ans=0.125 2024-09-17 13:36:14,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=231120.0, ans=0.1 2024-09-17 13:36:32,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231160.0, ans=0.1 2024-09-17 13:36:45,950 INFO [train.py:1198] (0/2) Epoch 13, batch 3500, loss[loss=0.233, ctc_loss=0.1339, cr_loss=0.3554, attn_decoder_loss=0.2361, over 29343.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1571, cr_loss=0.397, attn_decoder_loss=0.2598, over 5777909.79 frames. ], batch size: 71, lr: 8.54e-03, grad_scale: 8.0 2024-09-17 13:36:49,015 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.194e+01 9.091e+01 9.756e+01 1.067e+02 1.863e+02, threshold=1.951e+02, percent-clipped=0.0 2024-09-17 13:36:50,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=231200.0, ans=0.125 2024-09-17 13:36:58,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=231200.0, ans=0.0 2024-09-17 13:37:05,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231240.0, ans=0.125 2024-09-17 13:37:22,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=231280.0, ans=0.0 2024-09-17 13:37:32,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=231320.0, ans=0.05 2024-09-17 13:37:37,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-17 13:37:53,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=231360.0, ans=0.0 2024-09-17 13:38:00,821 INFO [train.py:1198] (0/2) Epoch 13, batch 3550, loss[loss=0.2615, ctc_loss=0.1516, cr_loss=0.3886, attn_decoder_loss=0.2651, over 29680.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1567, cr_loss=0.3966, attn_decoder_loss=0.2598, over 5783075.48 frames. ], batch size: 89, lr: 8.54e-03, grad_scale: 8.0 2024-09-17 13:38:14,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=231440.0, ans=0.125 2024-09-17 13:38:26,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.57 vs. limit=15.0 2024-09-17 13:38:48,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=231520.0, ans=0.125 2024-09-17 13:39:01,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=231520.0, ans=0.2 2024-09-17 13:39:18,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=231600.0, ans=0.125 2024-09-17 13:39:19,457 INFO [train.py:1198] (0/2) Epoch 13, batch 3600, loss[loss=0.2558, ctc_loss=0.1538, cr_loss=0.4067, attn_decoder_loss=0.2581, over 29498.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1566, cr_loss=0.3969, attn_decoder_loss=0.2601, over 5792395.21 frames. ], batch size: 77, lr: 8.53e-03, grad_scale: 16.0 2024-09-17 13:39:23,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-09-17 13:39:24,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 8.866e+01 9.672e+01 1.060e+02 2.375e+02, threshold=1.934e+02, percent-clipped=1.0 2024-09-17 13:39:41,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=231640.0, ans=0.04949747468305833 2024-09-17 13:40:21,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=231760.0, ans=0.0 2024-09-17 13:40:21,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=231760.0, ans=0.125 2024-09-17 13:40:24,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.84 vs. limit=22.5 2024-09-17 13:40:34,315 INFO [train.py:1198] (0/2) Epoch 13, batch 3650, loss[loss=0.2804, ctc_loss=0.1824, cr_loss=0.4416, attn_decoder_loss=0.2815, over 29485.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1562, cr_loss=0.3962, attn_decoder_loss=0.2599, over 5794459.49 frames. ], batch size: 90, lr: 8.53e-03, grad_scale: 8.0 2024-09-17 13:40:44,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-09-17 13:40:47,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=231840.0, ans=0.125 2024-09-17 13:41:10,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=231880.0, ans=0.05 2024-09-17 13:41:17,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=231920.0, ans=0.125 2024-09-17 13:41:30,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=231920.0, ans=0.0 2024-09-17 13:41:35,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=15.0 2024-09-17 13:41:36,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=231960.0, ans=0.0 2024-09-17 13:41:47,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=232000.0, ans=0.0 2024-09-17 13:41:48,931 INFO [train.py:1198] (0/2) Epoch 13, batch 3700, loss[loss=0.2705, ctc_loss=0.162, cr_loss=0.42, attn_decoder_loss=0.2732, over 29725.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1565, cr_loss=0.3968, attn_decoder_loss=0.2602, over 5805104.57 frames. ], batch size: 84, lr: 8.53e-03, grad_scale: 8.0 2024-09-17 13:41:53,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 9.152e+01 9.843e+01 1.065e+02 3.437e+02, threshold=1.969e+02, percent-clipped=3.0 2024-09-17 13:42:29,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=232080.0, ans=0.07 2024-09-17 13:42:38,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=232120.0, ans=0.025 2024-09-17 13:42:48,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=232160.0, ans=0.125 2024-09-17 13:42:48,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2024-09-17 13:42:51,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=232160.0, ans=0.0 2024-09-17 13:42:59,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.77 vs. limit=22.5 2024-09-17 13:43:02,995 INFO [train.py:1198] (0/2) Epoch 13, batch 3750, loss[loss=0.227, ctc_loss=0.1294, cr_loss=0.3443, attn_decoder_loss=0.2302, over 29358.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1563, cr_loss=0.3967, attn_decoder_loss=0.2598, over 5807994.64 frames. ], batch size: 67, lr: 8.52e-03, grad_scale: 4.0 2024-09-17 13:43:05,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=232200.0, ans=15.0 2024-09-17 13:43:15,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=232200.0, ans=0.125 2024-09-17 13:43:21,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=232240.0, ans=0.125 2024-09-17 13:43:46,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.83 vs. limit=15.0 2024-09-17 13:43:47,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232320.0, ans=0.1 2024-09-17 13:43:49,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=232320.0, ans=0.1 2024-09-17 13:44:17,438 INFO [train.py:1198] (0/2) Epoch 13, batch 3800, loss[loss=0.2672, ctc_loss=0.1556, cr_loss=0.425, attn_decoder_loss=0.2702, over 29623.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1562, cr_loss=0.3966, attn_decoder_loss=0.2595, over 5798528.12 frames. ], batch size: 86, lr: 8.52e-03, grad_scale: 8.0 2024-09-17 13:44:23,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.154e+01 9.685e+01 1.039e+02 2.233e+02, threshold=1.937e+02, percent-clipped=1.0 2024-09-17 13:44:25,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=232400.0, ans=0.125 2024-09-17 13:44:27,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=232400.0, ans=6.0 2024-09-17 13:44:31,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2024-09-17 13:44:35,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=232440.0, ans=0.0 2024-09-17 13:45:01,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=232480.0, ans=0.025 2024-09-17 13:45:05,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.48 vs. limit=15.0 2024-09-17 13:45:06,765 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=24.53 vs. limit=15.0 2024-09-17 13:45:22,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=232560.0, ans=0.04949747468305833 2024-09-17 13:45:23,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=232560.0, ans=0.0 2024-09-17 13:45:35,422 INFO [train.py:1198] (0/2) Epoch 13, batch 3850, loss[loss=0.2741, ctc_loss=0.1726, cr_loss=0.4097, attn_decoder_loss=0.2762, over 29261.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.156, cr_loss=0.3966, attn_decoder_loss=0.2594, over 5810100.79 frames. ], batch size: 100, lr: 8.52e-03, grad_scale: 4.0 2024-09-17 13:45:37,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=232600.0, ans=0.2 2024-09-17 13:45:50,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=232640.0, ans=0.0 2024-09-17 13:45:52,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-09-17 13:46:02,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=232640.0, ans=0.125 2024-09-17 13:46:09,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=232680.0, ans=0.0 2024-09-17 13:46:12,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.90 vs. limit=10.0 2024-09-17 13:46:17,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=232680.0, ans=0.0 2024-09-17 13:46:19,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=232720.0, ans=0.125 2024-09-17 13:46:26,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=232720.0, ans=0.0 2024-09-17 13:46:50,403 INFO [train.py:1198] (0/2) Epoch 13, batch 3900, loss[loss=0.2607, ctc_loss=0.1449, cr_loss=0.3745, attn_decoder_loss=0.2652, over 29640.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1563, cr_loss=0.3972, attn_decoder_loss=0.2601, over 5815219.94 frames. ], batch size: 86, lr: 8.51e-03, grad_scale: 8.0 2024-09-17 13:46:52,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=232800.0, ans=0.0 2024-09-17 13:46:56,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=232800.0, ans=0.2 2024-09-17 13:46:57,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 8.946e+01 9.477e+01 1.034e+02 1.292e+02, threshold=1.895e+02, percent-clipped=0.0 2024-09-17 13:47:44,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=232920.0, ans=0.0 2024-09-17 13:47:55,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=232960.0, ans=0.125 2024-09-17 13:48:04,562 INFO [train.py:1198] (0/2) Epoch 13, batch 3950, loss[loss=0.2674, ctc_loss=0.164, cr_loss=0.4052, attn_decoder_loss=0.2699, over 29487.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1558, cr_loss=0.3971, attn_decoder_loss=0.26, over 5834877.80 frames. ], batch size: 97, lr: 8.51e-03, grad_scale: 8.0 2024-09-17 13:48:19,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=233040.0, ans=0.0 2024-09-17 13:48:32,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=233080.0, ans=0.125 2024-09-17 13:48:34,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-17 13:48:35,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=233080.0, ans=0.0 2024-09-17 13:48:43,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=233080.0, ans=0.2 2024-09-17 13:48:46,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=233080.0, ans=0.025 2024-09-17 13:49:05,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=233160.0, ans=0.125 2024-09-17 13:49:15,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=233160.0, ans=0.125 2024-09-17 13:49:18,341 INFO [train.py:1198] (0/2) Epoch 13, batch 4000, loss[loss=0.2378, ctc_loss=0.1377, cr_loss=0.3628, attn_decoder_loss=0.2408, over 29512.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1562, cr_loss=0.3971, attn_decoder_loss=0.2599, over 5811630.90 frames. ], batch size: 74, lr: 8.51e-03, grad_scale: 16.0 2024-09-17 13:49:19,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=233200.0, ans=0.125 2024-09-17 13:49:22,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=233200.0, ans=0.0 2024-09-17 13:49:24,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233200.0, ans=0.1 2024-09-17 13:49:25,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=233200.0, ans=0.0 2024-09-17 13:49:27,095 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 9.222e+01 9.816e+01 1.053e+02 2.750e+02, threshold=1.963e+02, percent-clipped=1.0 2024-09-17 13:49:39,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=233240.0, ans=0.0 2024-09-17 13:49:45,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=233240.0, ans=0.0 2024-09-17 13:49:49,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=233280.0, ans=0.125 2024-09-17 13:49:52,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=233280.0, ans=0.125 2024-09-17 13:49:58,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=233280.0, ans=0.125 2024-09-17 13:50:00,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=233280.0, ans=0.2 2024-09-17 13:50:20,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.87 vs. limit=22.5 2024-09-17 13:50:23,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=233360.0, ans=0.0 2024-09-17 13:50:34,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=233400.0, ans=0.125 2024-09-17 13:50:35,265 INFO [train.py:1198] (0/2) Epoch 13, batch 4050, loss[loss=0.2821, ctc_loss=0.1975, cr_loss=0.4011, attn_decoder_loss=0.2826, over 20300.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1562, cr_loss=0.3966, attn_decoder_loss=0.2596, over 5796041.71 frames. ], batch size: 209, lr: 8.50e-03, grad_scale: 8.0 2024-09-17 13:50:48,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=233440.0, ans=0.125 2024-09-17 13:51:16,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=233480.0, ans=0.125 2024-09-17 13:51:27,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=233520.0, ans=0.125 2024-09-17 13:51:49,528 INFO [train.py:1198] (0/2) Epoch 13, batch 4100, loss[loss=0.2679, ctc_loss=0.1593, cr_loss=0.404, attn_decoder_loss=0.2709, over 29516.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1561, cr_loss=0.3961, attn_decoder_loss=0.2598, over 5791704.27 frames. ], batch size: 90, lr: 8.50e-03, grad_scale: 8.0 2024-09-17 13:51:59,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 9.234e+01 9.794e+01 1.124e+02 2.298e+02, threshold=1.959e+02, percent-clipped=3.0 2024-09-17 13:52:07,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=233640.0, ans=0.0 2024-09-17 13:52:11,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=233640.0, ans=0.025 2024-09-17 13:52:12,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2024-09-17 13:52:14,570 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:52:27,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=233680.0, ans=0.125 2024-09-17 13:52:49,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=233760.0, ans=0.2 2024-09-17 13:53:00,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=233760.0, ans=0.125 2024-09-17 13:53:02,944 INFO [train.py:1198] (0/2) Epoch 13, batch 4150, loss[loss=0.2471, ctc_loss=0.1518, cr_loss=0.3869, attn_decoder_loss=0.249, over 29487.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1562, cr_loss=0.3955, attn_decoder_loss=0.2596, over 5798081.59 frames. ], batch size: 77, lr: 8.49e-03, grad_scale: 8.0 2024-09-17 13:53:19,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=233840.0, ans=0.125 2024-09-17 13:53:20,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2024-09-17 13:53:22,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=233840.0, ans=0.07 2024-09-17 13:53:51,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-09-17 13:54:18,774 INFO [train.py:1198] (0/2) Epoch 13, batch 4200, loss[loss=0.2771, ctc_loss=0.1712, cr_loss=0.4374, attn_decoder_loss=0.2791, over 29506.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1561, cr_loss=0.396, attn_decoder_loss=0.2597, over 5799614.80 frames. ], batch size: 90, lr: 8.49e-03, grad_scale: 8.0 2024-09-17 13:54:30,795 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.689e+01 8.618e+01 9.139e+01 9.691e+01 3.040e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-17 13:55:20,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=234160.0, ans=12.0 2024-09-17 13:55:28,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=234160.0, ans=0.025 2024-09-17 13:55:32,366 INFO [train.py:1198] (0/2) Epoch 13, batch 4250, loss[loss=0.2418, ctc_loss=0.1357, cr_loss=0.3777, attn_decoder_loss=0.2453, over 29537.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1561, cr_loss=0.3962, attn_decoder_loss=0.26, over 5805963.13 frames. ], batch size: 74, lr: 8.49e-03, grad_scale: 8.0 2024-09-17 13:55:32,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234200.0, ans=0.125 2024-09-17 13:55:38,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=234200.0, ans=0.125 2024-09-17 13:55:45,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=234240.0, ans=0.2 2024-09-17 13:55:59,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=234240.0, ans=0.1 2024-09-17 13:55:59,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=234240.0, ans=0.125 2024-09-17 13:56:46,323 INFO [train.py:1198] (0/2) Epoch 13, batch 4300, loss[loss=0.2642, ctc_loss=0.1552, cr_loss=0.4133, attn_decoder_loss=0.2672, over 29530.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1562, cr_loss=0.3957, attn_decoder_loss=0.2599, over 5795333.35 frames. ], batch size: 87, lr: 8.48e-03, grad_scale: 8.0 2024-09-17 13:56:52,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234400.0, ans=0.125 2024-09-17 13:56:58,272 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.092e+01 9.409e+01 9.956e+01 1.092e+02 6.321e+02, threshold=1.991e+02, percent-clipped=4.0 2024-09-17 13:57:03,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=234440.0, ans=0.125 2024-09-17 13:57:09,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=234440.0, ans=0.125 2024-09-17 13:57:10,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=234440.0, ans=0.09899494936611666 2024-09-17 13:57:38,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234520.0, ans=0.1 2024-09-17 13:57:54,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-09-17 13:58:02,348 INFO [train.py:1198] (0/2) Epoch 13, batch 4350, loss[loss=0.2653, ctc_loss=0.1658, cr_loss=0.4134, attn_decoder_loss=0.2672, over 29490.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1594, cr_loss=0.4013, attn_decoder_loss=0.2636, over 5797594.59 frames. ], batch size: 97, lr: 8.48e-03, grad_scale: 8.0 2024-09-17 13:58:10,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.49 vs. limit=6.0 2024-09-17 13:58:14,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=234600.0, ans=0.0 2024-09-17 13:58:23,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=234640.0, ans=0.0 2024-09-17 13:58:24,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=234640.0, ans=0.1 2024-09-17 13:58:34,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=234680.0, ans=0.125 2024-09-17 13:58:42,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=234680.0, ans=0.025 2024-09-17 13:58:49,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=234720.0, ans=0.125 2024-09-17 13:58:59,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=234760.0, ans=0.125 2024-09-17 13:59:15,471 INFO [train.py:1198] (0/2) Epoch 13, batch 4400, loss[loss=0.2726, ctc_loss=0.1728, cr_loss=0.4252, attn_decoder_loss=0.2743, over 27198.00 frames. ], tot_loss[loss=0.2634, ctc_loss=0.1611, cr_loss=0.4037, attn_decoder_loss=0.2658, over 5768238.50 frames. ], batch size: 124, lr: 8.48e-03, grad_scale: 16.0 2024-09-17 13:59:15,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=234800.0, ans=0.125 2024-09-17 13:59:15,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234800.0, ans=0.1 2024-09-17 13:59:15,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=234800.0, ans=0.2 2024-09-17 13:59:17,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-09-17 13:59:26,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.55 vs. limit=15.0 2024-09-17 13:59:26,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.67 vs. limit=22.5 2024-09-17 13:59:28,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.453e+01 9.581e+01 9.987e+01 1.106e+02 2.626e+02, threshold=1.997e+02, percent-clipped=1.0 2024-09-17 14:00:06,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=234920.0, ans=0.025 2024-09-17 14:00:15,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=234960.0, ans=0.025 2024-09-17 14:00:30,055 INFO [train.py:1198] (0/2) Epoch 13, batch 4450, loss[loss=0.2841, ctc_loss=0.201, cr_loss=0.4171, attn_decoder_loss=0.2841, over 19480.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1669, cr_loss=0.4092, attn_decoder_loss=0.2693, over 5580623.82 frames. ], batch size: 210, lr: 8.47e-03, grad_scale: 8.0 2024-09-17 14:00:34,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=235000.0, ans=0.95 2024-09-17 14:01:03,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=235080.0, ans=0.0 2024-09-17 14:01:33,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.45 vs. limit=15.0 2024-09-17 14:01:46,529 INFO [train.py:1198] (0/2) Epoch 13, batch 4500, loss[loss=0.2799, ctc_loss=0.1926, cr_loss=0.4141, attn_decoder_loss=0.2804, over 20601.00 frames. ], tot_loss[loss=0.2704, ctc_loss=0.1728, cr_loss=0.4105, attn_decoder_loss=0.2721, over 5236862.99 frames. ], batch size: 211, lr: 8.47e-03, grad_scale: 8.0 2024-09-17 14:02:00,043 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.936e+01 1.022e+02 1.119e+02 1.227e+02 3.439e+02, threshold=2.238e+02, percent-clipped=3.0 2024-09-17 14:02:02,117 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:02:07,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=235240.0, ans=0.125 2024-09-17 14:02:16,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=235280.0, ans=0.0 2024-09-17 14:02:23,768 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-13.pt 2024-09-17 14:03:16,468 INFO [train.py:1198] (0/2) Epoch 14, batch 0, loss[loss=0.2307, ctc_loss=0.1288, cr_loss=0.3515, attn_decoder_loss=0.2342, over 29584.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1288, cr_loss=0.3515, attn_decoder_loss=0.2342, over 29584.00 frames. ], batch size: 73, lr: 8.16e-03, grad_scale: 16.0 2024-09-17 14:03:16,468 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 14:03:34,828 INFO [train.py:1230] (0/2) Epoch 14, validation: loss=0.2137, ctc_loss=0.04354, cr_loss=5.325e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-17 14:03:34,828 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 14:03:42,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235300.0, ans=0.125 2024-09-17 14:03:56,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=12.0 2024-09-17 14:04:01,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-09-17 14:04:21,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=235420.0, ans=0.0 2024-09-17 14:04:33,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235420.0, ans=0.1 2024-09-17 14:04:39,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=235460.0, ans=0.0 2024-09-17 14:04:39,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.56 vs. limit=10.0 2024-09-17 14:04:40,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=235460.0, ans=0.09899494936611666 2024-09-17 14:04:50,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=235460.0, ans=0.125 2024-09-17 14:04:52,727 INFO [train.py:1198] (0/2) Epoch 14, batch 50, loss[loss=0.2291, ctc_loss=0.1286, cr_loss=0.3466, attn_decoder_loss=0.2326, over 29434.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1598, cr_loss=0.4001, attn_decoder_loss=0.2616, over 1267581.49 frames. ], batch size: 70, lr: 8.16e-03, grad_scale: 8.0 2024-09-17 14:05:04,955 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:05:07,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.35 vs. limit=15.0 2024-09-17 14:05:28,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.49 vs. limit=10.0 2024-09-17 14:05:29,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=235580.0, ans=0.02 2024-09-17 14:05:45,794 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.961e+01 9.193e+01 1.002e+02 1.099e+02 2.018e+02, threshold=2.003e+02, percent-clipped=0.0 2024-09-17 14:05:56,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235660.0, ans=0.0 2024-09-17 14:06:04,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.53 vs. limit=15.0 2024-09-17 14:06:05,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=235660.0, ans=0.025 2024-09-17 14:06:08,487 INFO [train.py:1198] (0/2) Epoch 14, batch 100, loss[loss=0.2376, ctc_loss=0.1404, cr_loss=0.3589, attn_decoder_loss=0.2404, over 29519.00 frames. ], tot_loss[loss=0.2609, ctc_loss=0.1601, cr_loss=0.402, attn_decoder_loss=0.2632, over 2251805.97 frames. ], batch size: 76, lr: 8.15e-03, grad_scale: 8.0 2024-09-17 14:06:20,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=235700.0, ans=0.125 2024-09-17 14:06:22,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=235740.0, ans=0.1 2024-09-17 14:06:45,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=235780.0, ans=0.125 2024-09-17 14:06:45,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=235780.0, ans=0.125 2024-09-17 14:06:48,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=235780.0, ans=0.125 2024-09-17 14:06:50,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=235780.0, ans=0.125 2024-09-17 14:07:00,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=235820.0, ans=0.125 2024-09-17 14:07:10,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.98 vs. limit=15.0 2024-09-17 14:07:25,367 INFO [train.py:1198] (0/2) Epoch 14, batch 150, loss[loss=0.2303, ctc_loss=0.1337, cr_loss=0.368, attn_decoder_loss=0.2329, over 29412.00 frames. ], tot_loss[loss=0.2579, ctc_loss=0.1567, cr_loss=0.3969, attn_decoder_loss=0.2603, over 3046189.15 frames. ], batch size: 70, lr: 8.15e-03, grad_scale: 8.0 2024-09-17 14:07:36,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=235900.0, ans=0.2 2024-09-17 14:07:43,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=235940.0, ans=0.125 2024-09-17 14:08:01,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.71 vs. limit=15.0 2024-09-17 14:08:20,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 9.181e+01 9.587e+01 1.009e+02 1.798e+02, threshold=1.917e+02, percent-clipped=0.0 2024-09-17 14:08:29,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=236060.0, ans=0.125 2024-09-17 14:08:34,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=236060.0, ans=0.125 2024-09-17 14:08:34,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-09-17 14:08:40,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=236060.0, ans=0.0 2024-09-17 14:08:43,261 INFO [train.py:1198] (0/2) Epoch 14, batch 200, loss[loss=0.2765, ctc_loss=0.1798, cr_loss=0.4266, attn_decoder_loss=0.2777, over 27232.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1558, cr_loss=0.3955, attn_decoder_loss=0.2595, over 3658354.32 frames. ], batch size: 124, lr: 8.15e-03, grad_scale: 8.0 2024-09-17 14:09:10,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=236140.0, ans=0.0 2024-09-17 14:09:58,992 INFO [train.py:1198] (0/2) Epoch 14, batch 250, loss[loss=0.2697, ctc_loss=0.1668, cr_loss=0.4115, attn_decoder_loss=0.272, over 29219.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1556, cr_loss=0.3958, attn_decoder_loss=0.2594, over 4140504.95 frames. ], batch size: 100, lr: 8.14e-03, grad_scale: 8.0 2024-09-17 14:09:59,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=236300.0, ans=0.0 2024-09-17 14:10:07,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=236300.0, ans=15.0 2024-09-17 14:10:09,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-17 14:10:16,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=12.0 2024-09-17 14:10:17,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=236340.0, ans=0.1 2024-09-17 14:10:38,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.33 vs. limit=15.0 2024-09-17 14:10:47,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=236420.0, ans=0.2 2024-09-17 14:10:54,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.995e+01 9.389e+01 1.000e+02 1.684e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 14:11:16,983 INFO [train.py:1198] (0/2) Epoch 14, batch 300, loss[loss=0.2625, ctc_loss=0.1575, cr_loss=0.3873, attn_decoder_loss=0.2656, over 29558.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1544, cr_loss=0.394, attn_decoder_loss=0.2588, over 4508197.05 frames. ], batch size: 92, lr: 8.14e-03, grad_scale: 8.0 2024-09-17 14:12:05,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=236620.0, ans=0.0 2024-09-17 14:12:10,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.23 vs. limit=12.0 2024-09-17 14:12:10,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236620.0, ans=0.125 2024-09-17 14:12:17,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-09-17 14:12:18,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=236660.0, ans=10.0 2024-09-17 14:12:35,064 INFO [train.py:1198] (0/2) Epoch 14, batch 350, loss[loss=0.2407, ctc_loss=0.1424, cr_loss=0.3721, attn_decoder_loss=0.2434, over 29740.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1546, cr_loss=0.395, attn_decoder_loss=0.2591, over 4795616.44 frames. ], batch size: 72, lr: 8.14e-03, grad_scale: 8.0 2024-09-17 14:13:00,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=236740.0, ans=0.125 2024-09-17 14:13:01,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=236740.0, ans=0.07 2024-09-17 14:13:28,295 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.698e+01 9.344e+01 1.025e+02 1.871e+02, threshold=1.869e+02, percent-clipped=0.0 2024-09-17 14:13:50,803 INFO [train.py:1198] (0/2) Epoch 14, batch 400, loss[loss=0.2628, ctc_loss=0.1577, cr_loss=0.4206, attn_decoder_loss=0.2652, over 29676.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1544, cr_loss=0.3946, attn_decoder_loss=0.2588, over 5025228.99 frames. ], batch size: 82, lr: 8.13e-03, grad_scale: 16.0 2024-09-17 14:14:01,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=236900.0, ans=0.125 2024-09-17 14:14:06,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=236940.0, ans=0.0 2024-09-17 14:14:13,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=236940.0, ans=0.2 2024-09-17 14:14:21,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=236980.0, ans=0.125 2024-09-17 14:14:47,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-09-17 14:14:58,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=237060.0, ans=0.125 2024-09-17 14:15:08,978 INFO [train.py:1198] (0/2) Epoch 14, batch 450, loss[loss=0.2706, ctc_loss=0.1589, cr_loss=0.4138, attn_decoder_loss=0.2738, over 29709.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1541, cr_loss=0.3942, attn_decoder_loss=0.2588, over 5186979.11 frames. ], batch size: 83, lr: 8.13e-03, grad_scale: 8.0 2024-09-17 14:15:16,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=237100.0, ans=0.0 2024-09-17 14:15:30,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=237140.0, ans=0.0 2024-09-17 14:15:36,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=237140.0, ans=0.125 2024-09-17 14:15:39,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237180.0, ans=0.1 2024-09-17 14:15:42,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=237180.0, ans=0.04949747468305833 2024-09-17 14:15:48,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=237180.0, ans=0.125 2024-09-17 14:15:50,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=237180.0, ans=0.125 2024-09-17 14:16:03,313 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:16:05,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.900e+01 9.763e+01 1.081e+02 1.650e+02, threshold=1.953e+02, percent-clipped=0.0 2024-09-17 14:16:13,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=237260.0, ans=0.2 2024-09-17 14:16:14,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.38 vs. limit=22.5 2024-09-17 14:16:27,058 INFO [train.py:1198] (0/2) Epoch 14, batch 500, loss[loss=0.2728, ctc_loss=0.1678, cr_loss=0.4188, attn_decoder_loss=0.2752, over 29428.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1537, cr_loss=0.3936, attn_decoder_loss=0.2585, over 5330049.56 frames. ], batch size: 94, lr: 8.13e-03, grad_scale: 8.0 2024-09-17 14:16:36,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=237300.0, ans=0.125 2024-09-17 14:16:46,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=237340.0, ans=0.2 2024-09-17 14:16:54,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=237340.0, ans=0.125 2024-09-17 14:17:05,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=237380.0, ans=0.04949747468305833 2024-09-17 14:17:21,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-17 14:17:23,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.23 vs. limit=22.5 2024-09-17 14:17:42,723 INFO [train.py:1198] (0/2) Epoch 14, batch 550, loss[loss=0.2753, ctc_loss=0.1714, cr_loss=0.4285, attn_decoder_loss=0.2773, over 28851.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1543, cr_loss=0.3944, attn_decoder_loss=0.2588, over 5421815.96 frames. ], batch size: 104, lr: 8.12e-03, grad_scale: 8.0 2024-09-17 14:18:14,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237580.0, ans=0.1 2024-09-17 14:18:35,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=237620.0, ans=0.07 2024-09-17 14:18:40,100 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.963e+01 9.623e+01 1.012e+02 2.800e+02, threshold=1.925e+02, percent-clipped=3.0 2024-09-17 14:18:41,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-17 14:19:01,495 INFO [train.py:1198] (0/2) Epoch 14, batch 600, loss[loss=0.2731, ctc_loss=0.1737, cr_loss=0.4371, attn_decoder_loss=0.2744, over 29207.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1539, cr_loss=0.3939, attn_decoder_loss=0.2586, over 5509942.19 frames. ], batch size: 100, lr: 8.12e-03, grad_scale: 8.0 2024-09-17 14:19:01,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=237700.0, ans=0.125 2024-09-17 14:19:21,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=237740.0, ans=0.0 2024-09-17 14:19:29,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2024-09-17 14:19:35,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-17 14:20:05,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=237860.0, ans=0.125 2024-09-17 14:20:08,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=237860.0, ans=0.125 2024-09-17 14:20:13,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=237860.0, ans=0.125 2024-09-17 14:20:19,153 INFO [train.py:1198] (0/2) Epoch 14, batch 650, loss[loss=0.2507, ctc_loss=0.1493, cr_loss=0.3664, attn_decoder_loss=0.2538, over 29753.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1534, cr_loss=0.3932, attn_decoder_loss=0.2581, over 5586895.04 frames. ], batch size: 81, lr: 8.12e-03, grad_scale: 8.0 2024-09-17 14:20:19,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=237900.0, ans=0.125 2024-09-17 14:20:35,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=237940.0, ans=0.125 2024-09-17 14:20:43,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=237940.0, ans=0.0 2024-09-17 14:20:45,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=237940.0, ans=0.0 2024-09-17 14:20:46,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237940.0, ans=0.1 2024-09-17 14:20:48,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237980.0, ans=0.1 2024-09-17 14:21:05,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238020.0, ans=0.125 2024-09-17 14:21:11,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=238020.0, ans=0.07 2024-09-17 14:21:13,714 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.771e+01 9.255e+01 1.013e+02 1.766e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-17 14:21:19,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=238060.0, ans=0.0 2024-09-17 14:21:19,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238060.0, ans=0.1 2024-09-17 14:21:33,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=238100.0, ans=0.025 2024-09-17 14:21:34,761 INFO [train.py:1198] (0/2) Epoch 14, batch 700, loss[loss=0.2444, ctc_loss=0.1415, cr_loss=0.3678, attn_decoder_loss=0.2477, over 29533.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1541, cr_loss=0.3943, attn_decoder_loss=0.2588, over 5637624.88 frames. ], batch size: 76, lr: 8.11e-03, grad_scale: 8.0 2024-09-17 14:21:36,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=238100.0, ans=0.025 2024-09-17 14:21:43,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2024-09-17 14:21:44,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=238100.0, ans=0.0 2024-09-17 14:21:45,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=238100.0, ans=0.09899494936611666 2024-09-17 14:22:01,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-09-17 14:22:21,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=238220.0, ans=0.2 2024-09-17 14:22:23,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=238220.0, ans=0.125 2024-09-17 14:22:30,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=238220.0, ans=0.125 2024-09-17 14:22:32,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=238220.0, ans=0.0 2024-09-17 14:22:37,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=238260.0, ans=0.125 2024-09-17 14:22:52,564 INFO [train.py:1198] (0/2) Epoch 14, batch 750, loss[loss=0.2513, ctc_loss=0.146, cr_loss=0.3978, attn_decoder_loss=0.2542, over 29725.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1538, cr_loss=0.3938, attn_decoder_loss=0.2585, over 5675731.37 frames. ], batch size: 82, lr: 8.11e-03, grad_scale: 8.0 2024-09-17 14:23:34,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2024-09-17 14:23:49,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.200e+01 9.849e+01 1.104e+02 2.206e+02, threshold=1.970e+02, percent-clipped=2.0 2024-09-17 14:23:54,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238460.0, ans=0.125 2024-09-17 14:24:10,889 INFO [train.py:1198] (0/2) Epoch 14, batch 800, loss[loss=0.2313, ctc_loss=0.1247, cr_loss=0.3366, attn_decoder_loss=0.2357, over 29594.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1535, cr_loss=0.3924, attn_decoder_loss=0.2583, over 5706662.91 frames. ], batch size: 73, lr: 8.11e-03, grad_scale: 16.0 2024-09-17 14:24:14,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=238500.0, ans=0.125 2024-09-17 14:25:10,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238660.0, ans=0.125 2024-09-17 14:25:11,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=238660.0, ans=0.2 2024-09-17 14:25:26,169 INFO [train.py:1198] (0/2) Epoch 14, batch 850, loss[loss=0.277, ctc_loss=0.1683, cr_loss=0.4331, attn_decoder_loss=0.2794, over 29717.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1532, cr_loss=0.3927, attn_decoder_loss=0.258, over 5735619.19 frames. ], batch size: 89, lr: 8.10e-03, grad_scale: 8.0 2024-09-17 14:25:36,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=8.0 2024-09-17 14:25:41,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=238740.0, ans=0.125 2024-09-17 14:26:04,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=238780.0, ans=0.2 2024-09-17 14:26:11,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=238820.0, ans=0.125 2024-09-17 14:26:22,028 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.039e+01 9.635e+01 1.057e+02 1.739e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-17 14:26:31,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=238860.0, ans=0.125 2024-09-17 14:26:31,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=238860.0, ans=0.125 2024-09-17 14:26:44,209 INFO [train.py:1198] (0/2) Epoch 14, batch 900, loss[loss=0.2334, ctc_loss=0.1291, cr_loss=0.3512, attn_decoder_loss=0.2372, over 29616.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1535, cr_loss=0.3931, attn_decoder_loss=0.2582, over 5741470.53 frames. ], batch size: 73, lr: 8.10e-03, grad_scale: 8.0 2024-09-17 14:26:44,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238900.0, ans=0.1 2024-09-17 14:26:52,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=238900.0, ans=0.125 2024-09-17 14:27:01,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2024-09-17 14:27:02,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=238940.0, ans=0.1 2024-09-17 14:27:16,350 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:27:27,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=238980.0, ans=0.125 2024-09-17 14:27:34,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=239020.0, ans=0.125 2024-09-17 14:27:47,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=239060.0, ans=0.125 2024-09-17 14:27:57,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239060.0, ans=0.1 2024-09-17 14:28:01,802 INFO [train.py:1198] (0/2) Epoch 14, batch 950, loss[loss=0.2277, ctc_loss=0.1341, cr_loss=0.3612, attn_decoder_loss=0.2301, over 29527.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1537, cr_loss=0.3938, attn_decoder_loss=0.2582, over 5743491.56 frames. ], batch size: 74, lr: 8.10e-03, grad_scale: 8.0 2024-09-17 14:28:19,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-17 14:28:19,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.06 vs. limit=15.0 2024-09-17 14:28:32,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-17 14:28:49,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239220.0, ans=0.125 2024-09-17 14:28:52,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=22.5 2024-09-17 14:28:58,295 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 9.217e+01 9.958e+01 1.123e+02 9.034e+02, threshold=1.992e+02, percent-clipped=2.0 2024-09-17 14:29:03,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=239260.0, ans=0.125 2024-09-17 14:29:17,716 INFO [train.py:1198] (0/2) Epoch 14, batch 1000, loss[loss=0.2381, ctc_loss=0.14, cr_loss=0.3717, attn_decoder_loss=0.2407, over 29525.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1545, cr_loss=0.3944, attn_decoder_loss=0.2589, over 5736575.52 frames. ], batch size: 77, lr: 8.09e-03, grad_scale: 8.0 2024-09-17 14:29:19,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.74 vs. limit=22.5 2024-09-17 14:29:34,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=239340.0, ans=0.09899494936611666 2024-09-17 14:29:52,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=239380.0, ans=0.0 2024-09-17 14:29:57,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239380.0, ans=0.125 2024-09-17 14:30:08,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=239420.0, ans=0.125 2024-09-17 14:30:35,791 INFO [train.py:1198] (0/2) Epoch 14, batch 1050, loss[loss=0.2621, ctc_loss=0.1539, cr_loss=0.4007, attn_decoder_loss=0.2652, over 29690.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1545, cr_loss=0.395, attn_decoder_loss=0.2587, over 5742600.60 frames. ], batch size: 85, lr: 8.09e-03, grad_scale: 8.0 2024-09-17 14:30:36,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.61 vs. limit=15.0 2024-09-17 14:30:52,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=239540.0, ans=0.125 2024-09-17 14:30:56,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=239540.0, ans=0.1 2024-09-17 14:30:56,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=239540.0, ans=0.0 2024-09-17 14:31:03,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=239540.0, ans=0.07 2024-09-17 14:31:09,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=239580.0, ans=0.2 2024-09-17 14:31:31,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=239620.0, ans=0.125 2024-09-17 14:31:34,117 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.789e+01 9.469e+01 1.013e+02 1.494e+02, threshold=1.894e+02, percent-clipped=0.0 2024-09-17 14:31:45,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.57 vs. limit=15.0 2024-09-17 14:31:52,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=239700.0, ans=0.0 2024-09-17 14:31:53,880 INFO [train.py:1198] (0/2) Epoch 14, batch 1100, loss[loss=0.251, ctc_loss=0.1449, cr_loss=0.3784, attn_decoder_loss=0.2543, over 29444.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1542, cr_loss=0.3939, attn_decoder_loss=0.2584, over 5756596.13 frames. ], batch size: 78, lr: 8.09e-03, grad_scale: 8.0 2024-09-17 14:32:23,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=239780.0, ans=0.025 2024-09-17 14:32:31,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.84 vs. limit=10.0 2024-09-17 14:32:51,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=239820.0, ans=0.125 2024-09-17 14:32:56,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=239860.0, ans=0.2 2024-09-17 14:33:09,748 INFO [train.py:1198] (0/2) Epoch 14, batch 1150, loss[loss=0.2514, ctc_loss=0.1504, cr_loss=0.3884, attn_decoder_loss=0.2539, over 29432.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1544, cr_loss=0.394, attn_decoder_loss=0.2584, over 5755133.88 frames. ], batch size: 78, lr: 8.08e-03, grad_scale: 8.0 2024-09-17 14:33:30,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=239940.0, ans=0.125 2024-09-17 14:33:37,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=239940.0, ans=0.125 2024-09-17 14:33:47,026 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-60000.pt 2024-09-17 14:33:59,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=239980.0, ans=0.0 2024-09-17 14:34:09,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=240020.0, ans=0.125 2024-09-17 14:34:12,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=240020.0, ans=0.0 2024-09-17 14:34:13,791 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.029e+01 9.820e+01 1.050e+02 2.109e+02, threshold=1.964e+02, percent-clipped=1.0 2024-09-17 14:34:21,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=240060.0, ans=0.0 2024-09-17 14:34:36,161 INFO [train.py:1198] (0/2) Epoch 14, batch 1200, loss[loss=0.2613, ctc_loss=0.1533, cr_loss=0.3977, attn_decoder_loss=0.2645, over 29658.00 frames. ], tot_loss[loss=0.2565, ctc_loss=0.155, cr_loss=0.3949, attn_decoder_loss=0.259, over 5749203.46 frames. ], batch size: 85, lr: 8.08e-03, grad_scale: 16.0 2024-09-17 14:34:37,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240100.0, ans=0.1 2024-09-17 14:35:54,244 INFO [train.py:1198] (0/2) Epoch 14, batch 1250, loss[loss=0.2727, ctc_loss=0.1725, cr_loss=0.4223, attn_decoder_loss=0.2744, over 29550.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1556, cr_loss=0.3967, attn_decoder_loss=0.2598, over 5776692.86 frames. ], batch size: 92, lr: 8.08e-03, grad_scale: 8.0 2024-09-17 14:35:59,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240300.0, ans=0.1 2024-09-17 14:36:09,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=240340.0, ans=0.125 2024-09-17 14:36:13,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=240340.0, ans=0.2 2024-09-17 14:36:25,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2024-09-17 14:36:52,184 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.786e+01 9.275e+01 9.951e+01 3.249e+02, threshold=1.855e+02, percent-clipped=3.0 2024-09-17 14:37:09,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=240500.0, ans=0.2 2024-09-17 14:37:10,335 INFO [train.py:1198] (0/2) Epoch 14, batch 1300, loss[loss=0.2614, ctc_loss=0.1532, cr_loss=0.385, attn_decoder_loss=0.2649, over 28227.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1545, cr_loss=0.3952, attn_decoder_loss=0.259, over 5778962.28 frames. ], batch size: 111, lr: 8.07e-03, grad_scale: 8.0 2024-09-17 14:37:30,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-17 14:37:31,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=240540.0, ans=0.2 2024-09-17 14:37:38,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.12 vs. limit=22.5 2024-09-17 14:38:00,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=240620.0, ans=0.0 2024-09-17 14:38:09,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=240660.0, ans=0.025 2024-09-17 14:38:12,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=240660.0, ans=0.125 2024-09-17 14:38:19,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240660.0, ans=0.1 2024-09-17 14:38:24,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=240700.0, ans=0.0 2024-09-17 14:38:25,787 INFO [train.py:1198] (0/2) Epoch 14, batch 1350, loss[loss=0.2595, ctc_loss=0.1526, cr_loss=0.4093, attn_decoder_loss=0.2623, over 29794.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1541, cr_loss=0.3954, attn_decoder_loss=0.2587, over 5796414.13 frames. ], batch size: 81, lr: 8.07e-03, grad_scale: 8.0 2024-09-17 14:38:26,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-09-17 14:38:38,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=240700.0, ans=0.0 2024-09-17 14:38:49,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.22 vs. limit=15.0 2024-09-17 14:38:53,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=240740.0, ans=0.0 2024-09-17 14:39:07,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=240780.0, ans=0.125 2024-09-17 14:39:12,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.17 vs. limit=22.5 2024-09-17 14:39:15,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=240820.0, ans=0.05 2024-09-17 14:39:27,567 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.834e+01 9.288e+01 9.876e+01 1.389e+02, threshold=1.858e+02, percent-clipped=0.0 2024-09-17 14:39:45,916 INFO [train.py:1198] (0/2) Epoch 14, batch 1400, loss[loss=0.2229, ctc_loss=0.128, cr_loss=0.3447, attn_decoder_loss=0.2258, over 29579.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1537, cr_loss=0.3941, attn_decoder_loss=0.2584, over 5807958.36 frames. ], batch size: 69, lr: 8.07e-03, grad_scale: 8.0 2024-09-17 14:41:01,373 INFO [train.py:1198] (0/2) Epoch 14, batch 1450, loss[loss=0.2642, ctc_loss=0.1591, cr_loss=0.4143, attn_decoder_loss=0.2667, over 29453.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1535, cr_loss=0.3939, attn_decoder_loss=0.2587, over 5805378.07 frames. ], batch size: 94, lr: 8.06e-03, grad_scale: 8.0 2024-09-17 14:41:10,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=241100.0, ans=0.125 2024-09-17 14:41:17,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2024-09-17 14:41:28,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=241140.0, ans=0.125 2024-09-17 14:41:42,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=241180.0, ans=0.125 2024-09-17 14:41:53,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.61 vs. limit=15.0 2024-09-17 14:41:58,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 9.187e+01 9.748e+01 1.026e+02 3.155e+02, threshold=1.950e+02, percent-clipped=2.0 2024-09-17 14:42:06,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241260.0, ans=0.1 2024-09-17 14:42:16,803 INFO [train.py:1198] (0/2) Epoch 14, batch 1500, loss[loss=0.2727, ctc_loss=0.1602, cr_loss=0.4277, attn_decoder_loss=0.2757, over 29644.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.154, cr_loss=0.3952, attn_decoder_loss=0.2595, over 5807204.68 frames. ], batch size: 86, lr: 8.06e-03, grad_scale: 8.0 2024-09-17 14:42:35,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=241340.0, ans=0.2 2024-09-17 14:43:11,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241420.0, ans=0.1 2024-09-17 14:43:16,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=241420.0, ans=0.125 2024-09-17 14:43:24,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-17 14:43:26,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-17 14:43:33,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=241460.0, ans=0.125 2024-09-17 14:43:37,276 INFO [train.py:1198] (0/2) Epoch 14, batch 1550, loss[loss=0.2745, ctc_loss=0.1689, cr_loss=0.4126, attn_decoder_loss=0.2771, over 29501.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1545, cr_loss=0.3957, attn_decoder_loss=0.2598, over 5782855.59 frames. ], batch size: 90, lr: 8.06e-03, grad_scale: 8.0 2024-09-17 14:43:51,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=241540.0, ans=0.0 2024-09-17 14:43:52,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=241540.0, ans=0.0 2024-09-17 14:43:57,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.87 vs. limit=15.0 2024-09-17 14:44:09,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=241580.0, ans=0.125 2024-09-17 14:44:10,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241580.0, ans=0.1 2024-09-17 14:44:34,866 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 9.004e+01 9.910e+01 1.078e+02 4.071e+02, threshold=1.982e+02, percent-clipped=2.0 2024-09-17 14:44:44,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2024-09-17 14:44:46,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=241660.0, ans=0.025 2024-09-17 14:44:50,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=241660.0, ans=0.125 2024-09-17 14:44:53,190 INFO [train.py:1198] (0/2) Epoch 14, batch 1600, loss[loss=0.275, ctc_loss=0.1744, cr_loss=0.426, attn_decoder_loss=0.2767, over 29670.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1548, cr_loss=0.3955, attn_decoder_loss=0.2595, over 5765408.60 frames. ], batch size: 85, lr: 8.05e-03, grad_scale: 16.0 2024-09-17 14:45:31,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=241780.0, ans=0.0 2024-09-17 14:45:35,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.11 vs. limit=15.0 2024-09-17 14:45:43,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241820.0, ans=0.1 2024-09-17 14:45:57,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=241860.0, ans=0.0 2024-09-17 14:46:04,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=241860.0, ans=0.0 2024-09-17 14:46:06,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=241860.0, ans=0.2 2024-09-17 14:46:08,674 INFO [train.py:1198] (0/2) Epoch 14, batch 1650, loss[loss=0.2581, ctc_loss=0.1638, cr_loss=0.4097, attn_decoder_loss=0.2595, over 29696.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1547, cr_loss=0.3951, attn_decoder_loss=0.2592, over 5760055.59 frames. ], batch size: 89, lr: 8.05e-03, grad_scale: 8.0 2024-09-17 14:46:52,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241980.0, ans=0.1 2024-09-17 14:47:12,401 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.773e+01 9.391e+01 1.036e+02 1.444e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 14:47:28,904 INFO [train.py:1198] (0/2) Epoch 14, batch 1700, loss[loss=0.2196, ctc_loss=0.1229, cr_loss=0.3368, attn_decoder_loss=0.2228, over 29590.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1539, cr_loss=0.3943, attn_decoder_loss=0.2586, over 5781712.23 frames. ], batch size: 69, lr: 8.05e-03, grad_scale: 8.0 2024-09-17 14:47:48,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.64 vs. limit=22.5 2024-09-17 14:48:04,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=242180.0, ans=0.07 2024-09-17 14:48:05,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=242180.0, ans=0.125 2024-09-17 14:48:17,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-09-17 14:48:25,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=242220.0, ans=0.0 2024-09-17 14:48:28,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=242260.0, ans=0.04949747468305833 2024-09-17 14:48:34,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242260.0, ans=0.1 2024-09-17 14:48:40,585 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:48:40,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=242260.0, ans=0.125 2024-09-17 14:48:44,838 INFO [train.py:1198] (0/2) Epoch 14, batch 1750, loss[loss=0.2282, ctc_loss=0.1337, cr_loss=0.3677, attn_decoder_loss=0.2305, over 29351.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1537, cr_loss=0.3944, attn_decoder_loss=0.2584, over 5788070.99 frames. ], batch size: 67, lr: 8.04e-03, grad_scale: 8.0 2024-09-17 14:49:07,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=242340.0, ans=0.125 2024-09-17 14:49:30,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242420.0, ans=0.1 2024-09-17 14:49:35,532 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:49:39,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242420.0, ans=0.1 2024-09-17 14:49:41,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=242420.0, ans=0.0 2024-09-17 14:49:44,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.812e+01 9.337e+01 1.025e+02 2.569e+02, threshold=1.867e+02, percent-clipped=1.0 2024-09-17 14:49:53,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=242460.0, ans=0.09899494936611666 2024-09-17 14:50:00,705 INFO [train.py:1198] (0/2) Epoch 14, batch 1800, loss[loss=0.2816, ctc_loss=0.1807, cr_loss=0.449, attn_decoder_loss=0.2828, over 29688.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.154, cr_loss=0.3948, attn_decoder_loss=0.2587, over 5790977.29 frames. ], batch size: 83, lr: 8.04e-03, grad_scale: 8.0 2024-09-17 14:50:23,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-09-17 14:50:24,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=242540.0, ans=15.0 2024-09-17 14:50:32,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=242580.0, ans=0.125 2024-09-17 14:50:33,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=242580.0, ans=0.2 2024-09-17 14:50:33,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=242580.0, ans=0.025 2024-09-17 14:50:36,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=242580.0, ans=0.125 2024-09-17 14:50:48,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242620.0, ans=0.1 2024-09-17 14:50:48,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=242620.0, ans=0.2 2024-09-17 14:50:56,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=242620.0, ans=0.025 2024-09-17 14:51:17,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-17 14:51:20,988 INFO [train.py:1198] (0/2) Epoch 14, batch 1850, loss[loss=0.2667, ctc_loss=0.1542, cr_loss=0.3967, attn_decoder_loss=0.2704, over 29649.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1538, cr_loss=0.3948, attn_decoder_loss=0.2588, over 5797135.17 frames. ], batch size: 86, lr: 8.04e-03, grad_scale: 8.0 2024-09-17 14:51:28,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=242700.0, ans=0.1 2024-09-17 14:51:34,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=242740.0, ans=0.0 2024-09-17 14:51:35,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-09-17 14:51:40,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242740.0, ans=0.1 2024-09-17 14:52:05,202 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:52:11,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=242820.0, ans=0.125 2024-09-17 14:52:15,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=242820.0, ans=0.05 2024-09-17 14:52:18,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=242820.0, ans=0.0 2024-09-17 14:52:19,872 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.993e+01 9.601e+01 1.027e+02 2.401e+02, threshold=1.920e+02, percent-clipped=1.0 2024-09-17 14:52:36,268 INFO [train.py:1198] (0/2) Epoch 14, batch 1900, loss[loss=0.259, ctc_loss=0.1392, cr_loss=0.3807, attn_decoder_loss=0.2638, over 29710.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1538, cr_loss=0.3948, attn_decoder_loss=0.2591, over 5804903.88 frames. ], batch size: 89, lr: 8.03e-03, grad_scale: 8.0 2024-09-17 14:53:10,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242980.0, ans=0.1 2024-09-17 14:53:36,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.04 vs. limit=22.5 2024-09-17 14:53:39,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=243060.0, ans=0.125 2024-09-17 14:53:43,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=243060.0, ans=0.0 2024-09-17 14:53:52,712 INFO [train.py:1198] (0/2) Epoch 14, batch 1950, loss[loss=0.2474, ctc_loss=0.1539, cr_loss=0.4006, attn_decoder_loss=0.2489, over 29438.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1546, cr_loss=0.3971, attn_decoder_loss=0.2601, over 5819482.67 frames. ], batch size: 78, lr: 8.03e-03, grad_scale: 8.0 2024-09-17 14:54:26,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.10 vs. limit=22.5 2024-09-17 14:54:33,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=243180.0, ans=0.125 2024-09-17 14:54:48,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-09-17 14:54:57,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.855e+01 9.181e+01 9.574e+01 1.007e+02 1.903e+02, threshold=1.915e+02, percent-clipped=0.0 2024-09-17 14:55:11,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=243300.0, ans=0.0 2024-09-17 14:55:13,071 INFO [train.py:1198] (0/2) Epoch 14, batch 2000, loss[loss=0.2287, ctc_loss=0.1252, cr_loss=0.3569, attn_decoder_loss=0.2323, over 29346.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.155, cr_loss=0.3971, attn_decoder_loss=0.2604, over 5796240.74 frames. ], batch size: 67, lr: 8.03e-03, grad_scale: 8.0 2024-09-17 14:55:28,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=243340.0, ans=0.025 2024-09-17 14:56:02,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.83 vs. limit=15.0 2024-09-17 14:56:06,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=243420.0, ans=0.1 2024-09-17 14:56:29,003 INFO [train.py:1198] (0/2) Epoch 14, batch 2050, loss[loss=0.2256, ctc_loss=0.1263, cr_loss=0.3531, attn_decoder_loss=0.2288, over 29422.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1547, cr_loss=0.3957, attn_decoder_loss=0.2596, over 5787800.97 frames. ], batch size: 70, lr: 8.02e-03, grad_scale: 8.0 2024-09-17 14:56:29,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=243500.0, ans=0.0 2024-09-17 14:56:37,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-09-17 14:57:19,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=243620.0, ans=0.2 2024-09-17 14:57:29,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.883e+01 9.401e+01 1.013e+02 1.488e+02, threshold=1.880e+02, percent-clipped=0.0 2024-09-17 14:57:37,686 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2024-09-17 14:57:44,643 INFO [train.py:1198] (0/2) Epoch 14, batch 2100, loss[loss=0.255, ctc_loss=0.1526, cr_loss=0.4039, attn_decoder_loss=0.2574, over 29754.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1536, cr_loss=0.3942, attn_decoder_loss=0.2586, over 5799297.89 frames. ], batch size: 81, lr: 8.02e-03, grad_scale: 8.0 2024-09-17 14:57:46,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=243700.0, ans=0.125 2024-09-17 14:58:27,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=243780.0, ans=0.2 2024-09-17 14:59:04,622 INFO [train.py:1198] (0/2) Epoch 14, batch 2150, loss[loss=0.2626, ctc_loss=0.1643, cr_loss=0.4281, attn_decoder_loss=0.264, over 29436.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1527, cr_loss=0.3932, attn_decoder_loss=0.2578, over 5814861.69 frames. ], batch size: 78, lr: 8.02e-03, grad_scale: 8.0 2024-09-17 14:59:20,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=243940.0, ans=0.0 2024-09-17 14:59:27,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243940.0, ans=0.0 2024-09-17 14:59:32,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=243940.0, ans=0.1 2024-09-17 14:59:33,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=243980.0, ans=0.125 2024-09-17 14:59:46,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=243980.0, ans=0.025 2024-09-17 14:59:56,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=244020.0, ans=0.125 2024-09-17 15:00:05,494 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.957e+01 9.631e+01 1.031e+02 4.379e+02, threshold=1.926e+02, percent-clipped=1.0 2024-09-17 15:00:10,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=244060.0, ans=0.125 2024-09-17 15:00:20,631 INFO [train.py:1198] (0/2) Epoch 14, batch 2200, loss[loss=0.2599, ctc_loss=0.1542, cr_loss=0.3899, attn_decoder_loss=0.263, over 29629.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1526, cr_loss=0.3932, attn_decoder_loss=0.2578, over 5811169.34 frames. ], batch size: 86, lr: 8.01e-03, grad_scale: 8.0 2024-09-17 15:00:24,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=244100.0, ans=0.2 2024-09-17 15:00:28,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=244100.0, ans=0.125 2024-09-17 15:00:46,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=244140.0, ans=0.2 2024-09-17 15:00:54,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=244180.0, ans=0.125 2024-09-17 15:01:10,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=244220.0, ans=0.0 2024-09-17 15:01:31,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-09-17 15:01:32,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-09-17 15:01:36,352 INFO [train.py:1198] (0/2) Epoch 14, batch 2250, loss[loss=0.2662, ctc_loss=0.1578, cr_loss=0.423, attn_decoder_loss=0.2688, over 29701.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1528, cr_loss=0.3934, attn_decoder_loss=0.258, over 5810835.89 frames. ], batch size: 82, lr: 8.01e-03, grad_scale: 8.0 2024-09-17 15:01:44,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=244300.0, ans=0.0 2024-09-17 15:01:54,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=244340.0, ans=0.125 2024-09-17 15:02:06,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244380.0, ans=0.1 2024-09-17 15:02:10,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=12.0 2024-09-17 15:02:19,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-09-17 15:02:40,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.724e+01 9.348e+01 1.021e+02 5.677e+02, threshold=1.870e+02, percent-clipped=2.0 2024-09-17 15:02:49,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-09-17 15:02:56,102 INFO [train.py:1198] (0/2) Epoch 14, batch 2300, loss[loss=0.2329, ctc_loss=0.1261, cr_loss=0.3595, attn_decoder_loss=0.2368, over 29302.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1518, cr_loss=0.3915, attn_decoder_loss=0.2569, over 5797724.07 frames. ], batch size: 71, lr: 8.01e-03, grad_scale: 8.0 2024-09-17 15:02:58,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=244500.0, ans=0.025 2024-09-17 15:03:01,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-17 15:03:12,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=244540.0, ans=0.0 2024-09-17 15:03:49,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.92 vs. limit=15.0 2024-09-17 15:04:00,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.80 vs. limit=22.5 2024-09-17 15:04:10,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=244700.0, ans=0.125 2024-09-17 15:04:11,746 INFO [train.py:1198] (0/2) Epoch 14, batch 2350, loss[loss=0.2702, ctc_loss=0.1685, cr_loss=0.4192, attn_decoder_loss=0.2722, over 29704.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1524, cr_loss=0.3929, attn_decoder_loss=0.2576, over 5804128.79 frames. ], batch size: 83, lr: 8.00e-03, grad_scale: 8.0 2024-09-17 15:04:29,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=244740.0, ans=0.125 2024-09-17 15:04:33,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-17 15:04:45,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-09-17 15:05:12,275 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.941e+01 9.524e+01 1.022e+02 1.702e+02, threshold=1.905e+02, percent-clipped=0.0 2024-09-17 15:05:15,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=12.0 2024-09-17 15:05:20,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244860.0, ans=0.1 2024-09-17 15:05:24,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=244860.0, ans=0.0 2024-09-17 15:05:26,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=244900.0, ans=0.09899494936611666 2024-09-17 15:05:27,589 INFO [train.py:1198] (0/2) Epoch 14, batch 2400, loss[loss=0.2515, ctc_loss=0.1552, cr_loss=0.4046, attn_decoder_loss=0.2532, over 29539.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1533, cr_loss=0.3941, attn_decoder_loss=0.2585, over 5806919.45 frames. ], batch size: 76, lr: 8.00e-03, grad_scale: 16.0 2024-09-17 15:05:27,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=244900.0, ans=0.125 2024-09-17 15:05:39,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=244900.0, ans=0.07 2024-09-17 15:05:56,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=244980.0, ans=0.125 2024-09-17 15:06:08,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=244980.0, ans=0.125 2024-09-17 15:06:13,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=245020.0, ans=0.125 2024-09-17 15:06:14,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=245020.0, ans=0.125 2024-09-17 15:06:19,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2024-09-17 15:06:26,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245020.0, ans=0.125 2024-09-17 15:06:29,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=245060.0, ans=0.0 2024-09-17 15:06:32,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-09-17 15:06:44,548 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:06:45,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.37 vs. limit=15.0 2024-09-17 15:06:45,766 INFO [train.py:1198] (0/2) Epoch 14, batch 2450, loss[loss=0.254, ctc_loss=0.145, cr_loss=0.3713, attn_decoder_loss=0.2579, over 29712.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1543, cr_loss=0.3957, attn_decoder_loss=0.2596, over 5785425.57 frames. ], batch size: 82, lr: 8.00e-03, grad_scale: 4.0 2024-09-17 15:06:50,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-17 15:06:55,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.91 vs. limit=15.0 2024-09-17 15:06:59,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=245140.0, ans=0.0 2024-09-17 15:07:03,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.97 vs. limit=15.0 2024-09-17 15:07:03,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=245140.0, ans=0.0 2024-09-17 15:07:06,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2024-09-17 15:07:31,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=245220.0, ans=0.0 2024-09-17 15:07:34,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=245220.0, ans=0.125 2024-09-17 15:07:34,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-09-17 15:07:44,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-17 15:07:46,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-09-17 15:07:49,617 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 8.888e+01 9.584e+01 1.028e+02 5.136e+02, threshold=1.917e+02, percent-clipped=2.0 2024-09-17 15:08:01,731 INFO [train.py:1198] (0/2) Epoch 14, batch 2500, loss[loss=0.2616, ctc_loss=0.1494, cr_loss=0.3827, attn_decoder_loss=0.2656, over 29658.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1543, cr_loss=0.3955, attn_decoder_loss=0.2596, over 5796886.75 frames. ], batch size: 86, lr: 7.99e-03, grad_scale: 8.0 2024-09-17 15:08:14,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=245300.0, ans=0.125 2024-09-17 15:08:38,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=245380.0, ans=0.125 2024-09-17 15:09:16,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=245500.0, ans=0.2 2024-09-17 15:09:18,071 INFO [train.py:1198] (0/2) Epoch 14, batch 2550, loss[loss=0.24, ctc_loss=0.1484, cr_loss=0.3863, attn_decoder_loss=0.2416, over 29348.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.154, cr_loss=0.3949, attn_decoder_loss=0.2594, over 5800765.02 frames. ], batch size: 67, lr: 7.99e-03, grad_scale: 8.0 2024-09-17 15:09:18,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245500.0, ans=0.1 2024-09-17 15:09:28,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2024-09-17 15:09:36,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=245540.0, ans=0.0 2024-09-17 15:09:37,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=245540.0, ans=0.2 2024-09-17 15:09:42,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=245540.0, ans=0.0 2024-09-17 15:10:13,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=245620.0, ans=0.0 2024-09-17 15:10:22,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=245660.0, ans=0.125 2024-09-17 15:10:27,974 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.823e+01 9.211e+01 1.016e+02 2.509e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-17 15:10:31,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245660.0, ans=0.1 2024-09-17 15:10:38,504 INFO [train.py:1198] (0/2) Epoch 14, batch 2600, loss[loss=0.2449, ctc_loss=0.1398, cr_loss=0.3626, attn_decoder_loss=0.2485, over 29469.00 frames. ], tot_loss[loss=0.2569, ctc_loss=0.154, cr_loss=0.3946, attn_decoder_loss=0.2596, over 5796834.07 frames. ], batch size: 78, lr: 7.99e-03, grad_scale: 8.0 2024-09-17 15:11:18,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=245780.0, ans=0.125 2024-09-17 15:11:18,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.76 vs. limit=22.5 2024-09-17 15:11:31,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.25 vs. limit=15.0 2024-09-17 15:11:33,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245820.0, ans=0.125 2024-09-17 15:11:54,236 INFO [train.py:1198] (0/2) Epoch 14, batch 2650, loss[loss=0.2764, ctc_loss=0.1739, cr_loss=0.4298, attn_decoder_loss=0.2782, over 29308.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1543, cr_loss=0.3954, attn_decoder_loss=0.2598, over 5802504.93 frames. ], batch size: 100, lr: 7.98e-03, grad_scale: 8.0 2024-09-17 15:11:54,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=245900.0, ans=0.0 2024-09-17 15:11:54,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=245900.0, ans=0.02 2024-09-17 15:11:55,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245900.0, ans=0.1 2024-09-17 15:12:32,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245980.0, ans=0.125 2024-09-17 15:12:41,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=246020.0, ans=0.125 2024-09-17 15:12:51,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2024-09-17 15:12:59,459 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.821e+01 9.405e+01 9.920e+01 1.834e+02, threshold=1.881e+02, percent-clipped=0.0 2024-09-17 15:12:59,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246060.0, ans=0.125 2024-09-17 15:13:02,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=246060.0, ans=0.125 2024-09-17 15:13:10,174 INFO [train.py:1198] (0/2) Epoch 14, batch 2700, loss[loss=0.2668, ctc_loss=0.1636, cr_loss=0.4017, attn_decoder_loss=0.2693, over 29513.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.1541, cr_loss=0.3948, attn_decoder_loss=0.2598, over 5797414.77 frames. ], batch size: 87, lr: 7.98e-03, grad_scale: 8.0 2024-09-17 15:13:35,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.88 vs. limit=15.0 2024-09-17 15:13:37,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=246140.0, ans=0.125 2024-09-17 15:13:55,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=246220.0, ans=0.1 2024-09-17 15:13:55,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=246220.0, ans=0.125 2024-09-17 15:14:04,044 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:14:30,686 INFO [train.py:1198] (0/2) Epoch 14, batch 2750, loss[loss=0.243, ctc_loss=0.1475, cr_loss=0.3977, attn_decoder_loss=0.2448, over 29531.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.153, cr_loss=0.3937, attn_decoder_loss=0.2585, over 5794672.57 frames. ], batch size: 75, lr: 7.98e-03, grad_scale: 8.0 2024-09-17 15:14:53,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=246340.0, ans=0.07 2024-09-17 15:15:01,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=246380.0, ans=0.2 2024-09-17 15:15:35,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=246460.0, ans=0.025 2024-09-17 15:15:36,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.822e+01 9.426e+01 1.011e+02 2.167e+02, threshold=1.885e+02, percent-clipped=1.0 2024-09-17 15:15:41,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=246460.0, ans=0.025 2024-09-17 15:15:42,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=246460.0, ans=0.125 2024-09-17 15:15:47,144 INFO [train.py:1198] (0/2) Epoch 14, batch 2800, loss[loss=0.277, ctc_loss=0.1901, cr_loss=0.4241, attn_decoder_loss=0.2773, over 19850.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1534, cr_loss=0.3939, attn_decoder_loss=0.2587, over 5776466.45 frames. ], batch size: 209, lr: 7.97e-03, grad_scale: 16.0 2024-09-17 15:15:57,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=246500.0, ans=0.1 2024-09-17 15:16:01,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246540.0, ans=0.0 2024-09-17 15:16:09,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246540.0, ans=0.1 2024-09-17 15:16:11,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=246540.0, ans=0.0 2024-09-17 15:16:22,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=246580.0, ans=0.125 2024-09-17 15:16:32,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=246620.0, ans=0.0 2024-09-17 15:16:46,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=246660.0, ans=0.0 2024-09-17 15:16:46,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=246660.0, ans=0.0 2024-09-17 15:16:58,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-17 15:17:02,338 INFO [train.py:1198] (0/2) Epoch 14, batch 2850, loss[loss=0.2367, ctc_loss=0.1362, cr_loss=0.3719, attn_decoder_loss=0.2396, over 29516.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1539, cr_loss=0.3942, attn_decoder_loss=0.259, over 5761316.69 frames. ], batch size: 77, lr: 7.97e-03, grad_scale: 8.0 2024-09-17 15:17:22,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=246740.0, ans=0.09899494936611666 2024-09-17 15:17:27,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.28 vs. limit=15.0 2024-09-17 15:18:06,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=246860.0, ans=0.125 2024-09-17 15:18:13,427 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.023e+01 9.108e+01 9.602e+01 1.037e+02 1.624e+02, threshold=1.920e+02, percent-clipped=0.0 2024-09-17 15:18:22,639 INFO [train.py:1198] (0/2) Epoch 14, batch 2900, loss[loss=0.2389, ctc_loss=0.1355, cr_loss=0.3863, attn_decoder_loss=0.2419, over 29400.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1547, cr_loss=0.3966, attn_decoder_loss=0.2602, over 5786979.97 frames. ], batch size: 79, lr: 7.97e-03, grad_scale: 8.0 2024-09-17 15:18:28,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=246900.0, ans=0.125 2024-09-17 15:18:44,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=246940.0, ans=0.125 2024-09-17 15:18:54,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=246980.0, ans=0.025 2024-09-17 15:19:38,652 INFO [train.py:1198] (0/2) Epoch 14, batch 2950, loss[loss=0.2314, ctc_loss=0.127, cr_loss=0.3582, attn_decoder_loss=0.2351, over 29519.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1535, cr_loss=0.3947, attn_decoder_loss=0.2589, over 5781576.78 frames. ], batch size: 75, lr: 7.97e-03, grad_scale: 8.0 2024-09-17 15:19:40,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.17 vs. limit=15.0 2024-09-17 15:19:43,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=247100.0, ans=0.125 2024-09-17 15:20:03,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=247140.0, ans=0.125 2024-09-17 15:20:07,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.23 vs. limit=22.5 2024-09-17 15:20:09,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=247180.0, ans=0.05 2024-09-17 15:20:21,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.66 vs. limit=15.0 2024-09-17 15:20:25,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=247220.0, ans=0.125 2024-09-17 15:20:46,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.963e+01 9.607e+01 1.034e+02 4.390e+02, threshold=1.921e+02, percent-clipped=3.0 2024-09-17 15:20:55,398 INFO [train.py:1198] (0/2) Epoch 14, batch 3000, loss[loss=0.2541, ctc_loss=0.1491, cr_loss=0.3749, attn_decoder_loss=0.2574, over 29737.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1535, cr_loss=0.3947, attn_decoder_loss=0.2587, over 5783242.02 frames. ], batch size: 81, lr: 7.96e-03, grad_scale: 8.0 2024-09-17 15:20:55,399 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 15:21:06,924 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8895, 5.6860, 5.5123, 5.1472], device='cuda:0') 2024-09-17 15:21:13,886 INFO [train.py:1230] (0/2) Epoch 14, validation: loss=0.212, ctc_loss=0.04343, cr_loss=5.03e-15, attn_decoder_loss=0.2308, over 944034.00 frames. 2024-09-17 15:21:13,886 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 15:21:23,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=247300.0, ans=0.125 2024-09-17 15:21:25,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-09-17 15:21:37,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-17 15:22:16,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=247420.0, ans=0.0 2024-09-17 15:22:34,534 INFO [train.py:1198] (0/2) Epoch 14, batch 3050, loss[loss=0.2484, ctc_loss=0.1519, cr_loss=0.3955, attn_decoder_loss=0.2503, over 29541.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1546, cr_loss=0.3968, attn_decoder_loss=0.2599, over 5777703.82 frames. ], batch size: 76, lr: 7.96e-03, grad_scale: 8.0 2024-09-17 15:22:55,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=247540.0, ans=0.125 2024-09-17 15:22:56,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.54 vs. limit=15.0 2024-09-17 15:23:02,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-17 15:23:12,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=247580.0, ans=0.125 2024-09-17 15:23:40,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.297e+01 9.179e+01 9.620e+01 1.029e+02 1.592e+02, threshold=1.924e+02, percent-clipped=0.0 2024-09-17 15:23:49,652 INFO [train.py:1198] (0/2) Epoch 14, batch 3100, loss[loss=0.2578, ctc_loss=0.1534, cr_loss=0.3913, attn_decoder_loss=0.2607, over 29250.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1539, cr_loss=0.3954, attn_decoder_loss=0.2592, over 5777800.97 frames. ], batch size: 100, lr: 7.96e-03, grad_scale: 8.0 2024-09-17 15:23:54,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=247700.0, ans=0.125 2024-09-17 15:23:55,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=247700.0, ans=0.0 2024-09-17 15:24:14,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=247740.0, ans=0.125 2024-09-17 15:24:15,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=247740.0, ans=0.0 2024-09-17 15:24:42,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.80 vs. limit=15.0 2024-09-17 15:24:46,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=247820.0, ans=0.125 2024-09-17 15:24:50,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-17 15:25:02,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=247860.0, ans=0.2 2024-09-17 15:25:05,570 INFO [train.py:1198] (0/2) Epoch 14, batch 3150, loss[loss=0.2793, ctc_loss=0.1758, cr_loss=0.439, attn_decoder_loss=0.2811, over 28852.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1541, cr_loss=0.396, attn_decoder_loss=0.2594, over 5784014.90 frames. ], batch size: 104, lr: 7.95e-03, grad_scale: 4.0 2024-09-17 15:25:14,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.91 vs. limit=10.0 2024-09-17 15:25:17,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=247900.0, ans=0.025 2024-09-17 15:25:24,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=247940.0, ans=0.0 2024-09-17 15:25:26,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=247940.0, ans=0.125 2024-09-17 15:25:27,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=247940.0, ans=0.125 2024-09-17 15:25:35,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=247940.0, ans=0.125 2024-09-17 15:25:44,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=247980.0, ans=0.02 2024-09-17 15:26:03,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=248020.0, ans=0.125 2024-09-17 15:26:18,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 9.083e+01 9.655e+01 1.039e+02 2.253e+02, threshold=1.931e+02, percent-clipped=1.0 2024-09-17 15:26:25,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2024-09-17 15:26:25,943 INFO [train.py:1198] (0/2) Epoch 14, batch 3200, loss[loss=0.2542, ctc_loss=0.1498, cr_loss=0.393, attn_decoder_loss=0.2571, over 29426.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1538, cr_loss=0.3952, attn_decoder_loss=0.2589, over 5795067.13 frames. ], batch size: 79, lr: 7.95e-03, grad_scale: 8.0 2024-09-17 15:26:26,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.47 vs. limit=6.0 2024-09-17 15:26:53,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=248140.0, ans=0.125 2024-09-17 15:27:08,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=248180.0, ans=0.125 2024-09-17 15:27:12,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.08 vs. limit=22.5 2024-09-17 15:27:26,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248260.0, ans=0.125 2024-09-17 15:27:42,187 INFO [train.py:1198] (0/2) Epoch 14, batch 3250, loss[loss=0.2588, ctc_loss=0.1524, cr_loss=0.3927, attn_decoder_loss=0.2619, over 29725.00 frames. ], tot_loss[loss=0.2566, ctc_loss=0.1538, cr_loss=0.3952, attn_decoder_loss=0.2592, over 5800572.53 frames. ], batch size: 84, lr: 7.95e-03, grad_scale: 8.0 2024-09-17 15:27:43,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.85 vs. limit=22.5 2024-09-17 15:28:15,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=248380.0, ans=0.125 2024-09-17 15:28:24,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=248380.0, ans=0.0 2024-09-17 15:28:37,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=248420.0, ans=0.125 2024-09-17 15:28:49,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.075e+01 9.653e+01 1.031e+02 3.050e+02, threshold=1.931e+02, percent-clipped=1.0 2024-09-17 15:28:57,457 INFO [train.py:1198] (0/2) Epoch 14, batch 3300, loss[loss=0.2639, ctc_loss=0.1552, cr_loss=0.3807, attn_decoder_loss=0.2675, over 28328.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1523, cr_loss=0.3927, attn_decoder_loss=0.2575, over 5798526.54 frames. ], batch size: 111, lr: 7.94e-03, grad_scale: 8.0 2024-09-17 15:29:00,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.99 vs. limit=10.0 2024-09-17 15:29:13,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=248540.0, ans=0.025 2024-09-17 15:29:13,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=248540.0, ans=0.125 2024-09-17 15:29:13,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=248540.0, ans=0.2 2024-09-17 15:29:24,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=248540.0, ans=0.0 2024-09-17 15:29:29,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=248580.0, ans=0.125 2024-09-17 15:30:09,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2024-09-17 15:30:17,821 INFO [train.py:1198] (0/2) Epoch 14, batch 3350, loss[loss=0.2702, ctc_loss=0.1761, cr_loss=0.4186, attn_decoder_loss=0.2714, over 28813.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1534, cr_loss=0.3943, attn_decoder_loss=0.2583, over 5774783.68 frames. ], batch size: 104, lr: 7.94e-03, grad_scale: 8.0 2024-09-17 15:30:22,726 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:30:22,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248700.0, ans=0.1 2024-09-17 15:30:36,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=248740.0, ans=0.0 2024-09-17 15:30:50,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=248780.0, ans=0.1 2024-09-17 15:30:53,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=248780.0, ans=0.0 2024-09-17 15:30:59,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=248780.0, ans=0.125 2024-09-17 15:31:03,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-17 15:31:08,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-17 15:31:26,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 9.128e+01 9.625e+01 1.037e+02 1.571e+02, threshold=1.925e+02, percent-clipped=0.0 2024-09-17 15:31:27,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-09-17 15:31:34,176 INFO [train.py:1198] (0/2) Epoch 14, batch 3400, loss[loss=0.2318, ctc_loss=0.1405, cr_loss=0.361, attn_decoder_loss=0.2339, over 29341.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1535, cr_loss=0.3941, attn_decoder_loss=0.2583, over 5768521.76 frames. ], batch size: 67, lr: 7.94e-03, grad_scale: 8.0 2024-09-17 15:31:36,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-09-17 15:31:43,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248900.0, ans=0.125 2024-09-17 15:31:51,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=248940.0, ans=0.125 2024-09-17 15:31:55,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=248940.0, ans=0.2 2024-09-17 15:32:16,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=248980.0, ans=0.1 2024-09-17 15:32:50,236 INFO [train.py:1198] (0/2) Epoch 14, batch 3450, loss[loss=0.2789, ctc_loss=0.1687, cr_loss=0.4164, attn_decoder_loss=0.2819, over 28254.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1538, cr_loss=0.3949, attn_decoder_loss=0.2588, over 5775546.64 frames. ], batch size: 111, lr: 7.93e-03, grad_scale: 8.0 2024-09-17 15:32:51,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=249100.0, ans=0.125 2024-09-17 15:33:09,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=249140.0, ans=0.125 2024-09-17 15:33:14,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-17 15:33:32,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=249180.0, ans=0.125 2024-09-17 15:33:37,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2024-09-17 15:33:41,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=249220.0, ans=0.02 2024-09-17 15:33:51,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2024-09-17 15:34:03,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.913e+01 9.467e+01 9.956e+01 4.435e+02, threshold=1.893e+02, percent-clipped=2.0 2024-09-17 15:34:10,846 INFO [train.py:1198] (0/2) Epoch 14, batch 3500, loss[loss=0.2394, ctc_loss=0.1417, cr_loss=0.3817, attn_decoder_loss=0.2418, over 29355.00 frames. ], tot_loss[loss=0.2556, ctc_loss=0.1532, cr_loss=0.3937, attn_decoder_loss=0.2583, over 5776803.72 frames. ], batch size: 71, lr: 7.93e-03, grad_scale: 8.0 2024-09-17 15:34:35,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=249340.0, ans=0.125 2024-09-17 15:34:49,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-17 15:35:04,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=249420.0, ans=0.0 2024-09-17 15:35:11,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=249460.0, ans=0.0 2024-09-17 15:35:22,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=249460.0, ans=10.0 2024-09-17 15:35:25,488 INFO [train.py:1198] (0/2) Epoch 14, batch 3550, loss[loss=0.2655, ctc_loss=0.1504, cr_loss=0.3891, attn_decoder_loss=0.2697, over 29737.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1526, cr_loss=0.3934, attn_decoder_loss=0.2579, over 5782880.64 frames. ], batch size: 89, lr: 7.93e-03, grad_scale: 8.0 2024-09-17 15:35:27,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=249500.0, ans=0.025 2024-09-17 15:35:33,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.27 vs. limit=10.0 2024-09-17 15:35:36,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249500.0, ans=0.1 2024-09-17 15:35:43,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=249540.0, ans=0.2 2024-09-17 15:35:52,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=249540.0, ans=0.125 2024-09-17 15:35:58,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=249580.0, ans=0.125 2024-09-17 15:36:31,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=249660.0, ans=0.125 2024-09-17 15:36:33,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.867e+01 9.428e+01 1.003e+02 3.029e+02, threshold=1.886e+02, percent-clipped=1.0 2024-09-17 15:36:40,437 INFO [train.py:1198] (0/2) Epoch 14, batch 3600, loss[loss=0.2377, ctc_loss=0.1372, cr_loss=0.3609, attn_decoder_loss=0.2408, over 29516.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1527, cr_loss=0.3932, attn_decoder_loss=0.2582, over 5791676.02 frames. ], batch size: 77, lr: 7.92e-03, grad_scale: 16.0 2024-09-17 15:37:00,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.32 vs. limit=15.0 2024-09-17 15:37:02,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=22.5 2024-09-17 15:37:04,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=249740.0, ans=0.0 2024-09-17 15:37:07,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=249740.0, ans=0.125 2024-09-17 15:37:12,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=249780.0, ans=0.0 2024-09-17 15:37:12,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-09-17 15:37:16,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249780.0, ans=0.1 2024-09-17 15:37:26,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.72 vs. limit=22.5 2024-09-17 15:37:33,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=249820.0, ans=0.125 2024-09-17 15:37:36,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=249820.0, ans=0.09899494936611666 2024-09-17 15:37:40,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=249860.0, ans=0.125 2024-09-17 15:37:55,265 INFO [train.py:1198] (0/2) Epoch 14, batch 3650, loss[loss=0.2654, ctc_loss=0.1609, cr_loss=0.4105, attn_decoder_loss=0.2679, over 29483.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1522, cr_loss=0.3931, attn_decoder_loss=0.2578, over 5793872.93 frames. ], batch size: 90, lr: 7.92e-03, grad_scale: 8.0 2024-09-17 15:38:18,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2024-09-17 15:38:20,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=249940.0, ans=0.125 2024-09-17 15:38:46,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250020.0, ans=0.1 2024-09-17 15:38:54,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250020.0, ans=0.1 2024-09-17 15:38:55,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=250060.0, ans=0.125 2024-09-17 15:39:05,889 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.885e+01 9.531e+01 1.024e+02 1.907e+02, threshold=1.906e+02, percent-clipped=1.0 2024-09-17 15:39:06,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=250060.0, ans=0.0 2024-09-17 15:39:11,991 INFO [train.py:1198] (0/2) Epoch 14, batch 3700, loss[loss=0.267, ctc_loss=0.1593, cr_loss=0.4002, attn_decoder_loss=0.27, over 29705.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1528, cr_loss=0.3939, attn_decoder_loss=0.2584, over 5803668.44 frames. ], batch size: 84, lr: 7.92e-03, grad_scale: 8.0 2024-09-17 15:39:15,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=250100.0, ans=0.1 2024-09-17 15:39:15,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=250100.0, ans=0.0 2024-09-17 15:39:18,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=12.0 2024-09-17 15:39:23,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-17 15:39:28,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=250140.0, ans=0.125 2024-09-17 15:39:31,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-17 15:39:37,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2024-09-17 15:39:49,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=250180.0, ans=0.07 2024-09-17 15:40:22,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=250260.0, ans=0.125 2024-09-17 15:40:28,282 INFO [train.py:1198] (0/2) Epoch 14, batch 3750, loss[loss=0.224, ctc_loss=0.1268, cr_loss=0.3418, attn_decoder_loss=0.2272, over 29348.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1525, cr_loss=0.3936, attn_decoder_loss=0.2581, over 5806852.01 frames. ], batch size: 67, lr: 7.92e-03, grad_scale: 8.0 2024-09-17 15:40:41,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=250340.0, ans=0.125 2024-09-17 15:40:55,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=250340.0, ans=0.0 2024-09-17 15:41:10,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=250380.0, ans=0.0 2024-09-17 15:41:21,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=250420.0, ans=0.125 2024-09-17 15:41:27,092 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:41:37,288 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.041e+01 9.799e+01 1.080e+02 3.062e+02, threshold=1.960e+02, percent-clipped=2.0 2024-09-17 15:41:42,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.84 vs. limit=10.0 2024-09-17 15:41:43,358 INFO [train.py:1198] (0/2) Epoch 14, batch 3800, loss[loss=0.2611, ctc_loss=0.1454, cr_loss=0.3696, attn_decoder_loss=0.2657, over 29633.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1523, cr_loss=0.3926, attn_decoder_loss=0.2579, over 5796680.26 frames. ], batch size: 86, lr: 7.91e-03, grad_scale: 8.0 2024-09-17 15:41:47,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=250500.0, ans=0.125 2024-09-17 15:41:48,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=250500.0, ans=0.2 2024-09-17 15:41:54,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=250500.0, ans=0.125 2024-09-17 15:42:07,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=250540.0, ans=0.2 2024-09-17 15:42:24,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-17 15:42:25,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2024-09-17 15:42:43,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=250660.0, ans=0.1 2024-09-17 15:42:50,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=250660.0, ans=0.2 2024-09-17 15:42:57,842 INFO [train.py:1198] (0/2) Epoch 14, batch 3850, loss[loss=0.2755, ctc_loss=0.1716, cr_loss=0.4305, attn_decoder_loss=0.2774, over 29295.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1517, cr_loss=0.3925, attn_decoder_loss=0.2576, over 5809725.84 frames. ], batch size: 100, lr: 7.91e-03, grad_scale: 8.0 2024-09-17 15:43:12,908 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:43:18,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=250740.0, ans=0.0 2024-09-17 15:43:37,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.62 vs. limit=15.0 2024-09-17 15:43:54,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=8.0 2024-09-17 15:43:59,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=250860.0, ans=0.125 2024-09-17 15:44:01,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=250860.0, ans=0.0 2024-09-17 15:44:06,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.879e+01 9.449e+01 1.016e+02 1.639e+02, threshold=1.890e+02, percent-clipped=0.0 2024-09-17 15:44:12,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.28 vs. limit=15.0 2024-09-17 15:44:14,352 INFO [train.py:1198] (0/2) Epoch 14, batch 3900, loss[loss=0.2556, ctc_loss=0.1512, cr_loss=0.3891, attn_decoder_loss=0.2585, over 29623.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.152, cr_loss=0.3933, attn_decoder_loss=0.258, over 5815157.84 frames. ], batch size: 86, lr: 7.91e-03, grad_scale: 8.0 2024-09-17 15:44:26,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=250900.0, ans=15.0 2024-09-17 15:44:28,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-09-17 15:44:38,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.20 vs. limit=15.0 2024-09-17 15:45:02,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=251020.0, ans=0.125 2024-09-17 15:45:25,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=251060.0, ans=0.025 2024-09-17 15:45:28,451 INFO [train.py:1198] (0/2) Epoch 14, batch 3950, loss[loss=0.2686, ctc_loss=0.1653, cr_loss=0.4261, attn_decoder_loss=0.2706, over 29473.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1512, cr_loss=0.3925, attn_decoder_loss=0.2577, over 5835004.13 frames. ], batch size: 97, lr: 7.90e-03, grad_scale: 8.0 2024-09-17 15:45:42,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=251100.0, ans=0.0 2024-09-17 15:45:49,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=251140.0, ans=0.2 2024-09-17 15:46:07,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=251180.0, ans=0.125 2024-09-17 15:46:07,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=251180.0, ans=0.09899494936611666 2024-09-17 15:46:20,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-09-17 15:46:28,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=251260.0, ans=0.2 2024-09-17 15:46:29,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-09-17 15:46:38,426 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.707e+01 9.297e+01 1.013e+02 1.953e+02, threshold=1.859e+02, percent-clipped=1.0 2024-09-17 15:46:44,344 INFO [train.py:1198] (0/2) Epoch 14, batch 4000, loss[loss=0.2384, ctc_loss=0.1397, cr_loss=0.3676, attn_decoder_loss=0.2411, over 29489.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1516, cr_loss=0.3924, attn_decoder_loss=0.2577, over 5811830.55 frames. ], batch size: 74, lr: 7.90e-03, grad_scale: 16.0 2024-09-17 15:46:44,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=251300.0, ans=0.025 2024-09-17 15:46:47,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=251300.0, ans=0.125 2024-09-17 15:46:53,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=251300.0, ans=0.125 2024-09-17 15:47:02,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=251340.0, ans=0.0 2024-09-17 15:47:19,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=251380.0, ans=0.95 2024-09-17 15:47:27,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=251420.0, ans=0.125 2024-09-17 15:47:39,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=251420.0, ans=0.05 2024-09-17 15:47:42,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=251460.0, ans=0.0 2024-09-17 15:47:57,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=251500.0, ans=0.2 2024-09-17 15:47:58,628 INFO [train.py:1198] (0/2) Epoch 14, batch 4050, loss[loss=0.2935, ctc_loss=0.214, cr_loss=0.4319, attn_decoder_loss=0.2927, over 19526.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1519, cr_loss=0.3923, attn_decoder_loss=0.2578, over 5794382.49 frames. ], batch size: 209, lr: 7.90e-03, grad_scale: 8.0 2024-09-17 15:48:01,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.0 2024-09-17 15:48:01,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=251500.0, ans=0.04949747468305833 2024-09-17 15:48:11,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=251540.0, ans=0.125 2024-09-17 15:48:12,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-09-17 15:48:19,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=251540.0, ans=0.125 2024-09-17 15:48:22,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=251540.0, ans=0.125 2024-09-17 15:48:31,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=251580.0, ans=0.125 2024-09-17 15:48:34,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-09-17 15:48:41,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=251620.0, ans=0.125 2024-09-17 15:48:41,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.70 vs. limit=22.5 2024-09-17 15:48:45,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=251620.0, ans=0.125 2024-09-17 15:49:04,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251660.0, ans=0.1 2024-09-17 15:49:08,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 9.172e+01 9.805e+01 1.134e+02 3.956e+02, threshold=1.961e+02, percent-clipped=2.0 2024-09-17 15:49:13,316 INFO [train.py:1198] (0/2) Epoch 14, batch 4100, loss[loss=0.2777, ctc_loss=0.1684, cr_loss=0.4152, attn_decoder_loss=0.2807, over 29496.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1524, cr_loss=0.393, attn_decoder_loss=0.2581, over 5791613.65 frames. ], batch size: 90, lr: 7.89e-03, grad_scale: 8.0 2024-09-17 15:49:28,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=251740.0, ans=0.0 2024-09-17 15:49:42,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=251780.0, ans=0.2 2024-09-17 15:49:43,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=251780.0, ans=0.0 2024-09-17 15:49:51,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=251780.0, ans=0.0 2024-09-17 15:50:09,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=12.0 2024-09-17 15:50:28,383 INFO [train.py:1198] (0/2) Epoch 14, batch 4150, loss[loss=0.2511, ctc_loss=0.1545, cr_loss=0.4005, attn_decoder_loss=0.2529, over 29526.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1521, cr_loss=0.3928, attn_decoder_loss=0.2577, over 5796987.72 frames. ], batch size: 77, lr: 7.89e-03, grad_scale: 8.0 2024-09-17 15:50:46,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2024-09-17 15:50:47,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=251940.0, ans=0.2 2024-09-17 15:51:08,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.03 vs. limit=10.0 2024-09-17 15:51:14,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=252020.0, ans=0.025 2024-09-17 15:51:22,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=252020.0, ans=0.125 2024-09-17 15:51:29,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=252060.0, ans=0.0 2024-09-17 15:51:29,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-09-17 15:51:37,954 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.795e+01 9.407e+01 9.935e+01 3.892e+02, threshold=1.881e+02, percent-clipped=3.0 2024-09-17 15:51:42,419 INFO [train.py:1198] (0/2) Epoch 14, batch 4200, loss[loss=0.2751, ctc_loss=0.1785, cr_loss=0.4195, attn_decoder_loss=0.2765, over 29532.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1523, cr_loss=0.3938, attn_decoder_loss=0.258, over 5798155.89 frames. ], batch size: 90, lr: 7.89e-03, grad_scale: 8.0 2024-09-17 15:51:47,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=252100.0, ans=0.125 2024-09-17 15:51:50,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-17 15:51:58,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2024-09-17 15:52:22,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=252180.0, ans=0.0 2024-09-17 15:52:24,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=252180.0, ans=0.2 2024-09-17 15:52:25,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=252220.0, ans=0.0 2024-09-17 15:52:28,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=252220.0, ans=10.0 2024-09-17 15:52:38,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=252220.0, ans=0.125 2024-09-17 15:52:56,848 INFO [train.py:1198] (0/2) Epoch 14, batch 4250, loss[loss=0.2294, ctc_loss=0.1222, cr_loss=0.3553, attn_decoder_loss=0.2334, over 29518.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1518, cr_loss=0.3934, attn_decoder_loss=0.2581, over 5804732.15 frames. ], batch size: 74, lr: 7.88e-03, grad_scale: 8.0 2024-09-17 15:53:31,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=252380.0, ans=0.0 2024-09-17 15:53:57,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=252460.0, ans=0.0 2024-09-17 15:54:06,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 8.931e+01 9.548e+01 1.031e+02 6.441e+02, threshold=1.910e+02, percent-clipped=3.0 2024-09-17 15:54:08,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=252460.0, ans=0.125 2024-09-17 15:54:11,264 INFO [train.py:1198] (0/2) Epoch 14, batch 4300, loss[loss=0.2599, ctc_loss=0.1546, cr_loss=0.425, attn_decoder_loss=0.2622, over 29533.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1522, cr_loss=0.3936, attn_decoder_loss=0.2583, over 5793951.36 frames. ], batch size: 87, lr: 7.88e-03, grad_scale: 8.0 2024-09-17 15:54:12,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-09-17 15:54:21,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-09-17 15:54:49,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.39 vs. limit=15.0 2024-09-17 15:55:15,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=252660.0, ans=0.0 2024-09-17 15:55:25,799 INFO [train.py:1198] (0/2) Epoch 14, batch 4350, loss[loss=0.2722, ctc_loss=0.1589, cr_loss=0.4135, attn_decoder_loss=0.2756, over 29489.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1549, cr_loss=0.3985, attn_decoder_loss=0.2616, over 5795244.14 frames. ], batch size: 97, lr: 7.88e-03, grad_scale: 8.0 2024-09-17 15:55:30,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=252700.0, ans=0.125 2024-09-17 15:55:54,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=252780.0, ans=0.0 2024-09-17 15:55:56,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=252780.0, ans=0.0 2024-09-17 15:56:09,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=252820.0, ans=0.125 2024-09-17 15:56:12,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=252820.0, ans=0.0 2024-09-17 15:56:16,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=252820.0, ans=0.0 2024-09-17 15:56:27,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-09-17 15:56:35,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-17 15:56:35,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.459e+01 9.298e+01 9.711e+01 1.038e+02 2.895e+02, threshold=1.942e+02, percent-clipped=1.0 2024-09-17 15:56:39,910 INFO [train.py:1198] (0/2) Epoch 14, batch 4400, loss[loss=0.2583, ctc_loss=0.1569, cr_loss=0.3871, attn_decoder_loss=0.261, over 27302.00 frames. ], tot_loss[loss=0.2613, ctc_loss=0.1568, cr_loss=0.4012, attn_decoder_loss=0.264, over 5766591.99 frames. ], batch size: 124, lr: 7.87e-03, grad_scale: 16.0 2024-09-17 15:56:45,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252900.0, ans=0.1 2024-09-17 15:57:12,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=252980.0, ans=0.0 2024-09-17 15:57:27,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=253020.0, ans=0.125 2024-09-17 15:57:34,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253020.0, ans=0.1 2024-09-17 15:57:36,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=253020.0, ans=10.0 2024-09-17 15:57:55,065 INFO [train.py:1198] (0/2) Epoch 14, batch 4450, loss[loss=0.2945, ctc_loss=0.2137, cr_loss=0.448, attn_decoder_loss=0.2935, over 19987.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1618, cr_loss=0.4055, attn_decoder_loss=0.2669, over 5579062.46 frames. ], batch size: 210, lr: 7.87e-03, grad_scale: 8.0 2024-09-17 15:58:07,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=253100.0, ans=0.0 2024-09-17 15:58:13,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=253140.0, ans=0.125 2024-09-17 15:58:30,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=253180.0, ans=0.025 2024-09-17 15:58:48,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=253220.0, ans=0.0 2024-09-17 15:58:48,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.37 vs. limit=10.0 2024-09-17 15:58:54,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=10.75 vs. limit=15.0 2024-09-17 15:59:02,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=253260.0, ans=0.1 2024-09-17 15:59:08,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.421e+01 1.004e+02 1.135e+02 1.248e+02 2.199e+02, threshold=2.271e+02, percent-clipped=1.0 2024-09-17 15:59:09,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=253300.0, ans=0.0 2024-09-17 15:59:10,364 INFO [train.py:1198] (0/2) Epoch 14, batch 4500, loss[loss=0.2795, ctc_loss=0.1869, cr_loss=0.3981, attn_decoder_loss=0.2809, over 20589.00 frames. ], tot_loss[loss=0.2672, ctc_loss=0.1674, cr_loss=0.4073, attn_decoder_loss=0.2693, over 5236849.44 frames. ], batch size: 209, lr: 7.87e-03, grad_scale: 8.0 2024-09-17 15:59:27,258 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:59:34,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=253340.0, ans=0.125 2024-09-17 15:59:47,763 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-14.pt 2024-09-17 16:00:38,581 INFO [train.py:1198] (0/2) Epoch 15, batch 0, loss[loss=0.2413, ctc_loss=0.1374, cr_loss=0.3759, attn_decoder_loss=0.2445, over 29563.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1374, cr_loss=0.3759, attn_decoder_loss=0.2445, over 29563.00 frames. ], batch size: 73, lr: 7.60e-03, grad_scale: 16.0 2024-09-17 16:00:38,582 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 16:00:56,922 INFO [train.py:1230] (0/2) Epoch 15, validation: loss=0.2128, ctc_loss=0.04201, cr_loss=5.567e-15, attn_decoder_loss=0.2317, over 944034.00 frames. 2024-09-17 16:00:56,923 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 16:00:58,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=253400.0, ans=0.0 2024-09-17 16:01:02,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-09-17 16:01:19,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=253440.0, ans=0.125 2024-09-17 16:01:39,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=253480.0, ans=0.2 2024-09-17 16:01:48,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=253520.0, ans=0.125 2024-09-17 16:02:03,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=253560.0, ans=0.025 2024-09-17 16:02:09,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.11 vs. limit=10.0 2024-09-17 16:02:15,230 INFO [train.py:1198] (0/2) Epoch 15, batch 50, loss[loss=0.2306, ctc_loss=0.1325, cr_loss=0.356, attn_decoder_loss=0.2336, over 29463.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.154, cr_loss=0.3981, attn_decoder_loss=0.2588, over 1269337.52 frames. ], batch size: 70, lr: 7.60e-03, grad_scale: 8.0 2024-09-17 16:02:17,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.89 vs. limit=15.0 2024-09-17 16:02:20,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=253600.0, ans=0.0 2024-09-17 16:02:24,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=253600.0, ans=0.2 2024-09-17 16:02:53,177 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 9.844e+01 1.055e+02 1.171e+02 3.873e+02, threshold=2.109e+02, percent-clipped=1.0 2024-09-17 16:03:21,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.89 vs. limit=22.5 2024-09-17 16:03:33,144 INFO [train.py:1198] (0/2) Epoch 15, batch 100, loss[loss=0.2503, ctc_loss=0.1535, cr_loss=0.3923, attn_decoder_loss=0.2523, over 29538.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1565, cr_loss=0.4014, attn_decoder_loss=0.2617, over 2252512.09 frames. ], batch size: 76, lr: 7.59e-03, grad_scale: 8.0 2024-09-17 16:03:39,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=253800.0, ans=0.1 2024-09-17 16:03:56,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=253840.0, ans=0.0 2024-09-17 16:04:00,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=253840.0, ans=0.125 2024-09-17 16:04:31,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2024-09-17 16:04:47,906 INFO [train.py:1198] (0/2) Epoch 15, batch 150, loss[loss=0.2342, ctc_loss=0.1379, cr_loss=0.3832, attn_decoder_loss=0.2364, over 29405.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1529, cr_loss=0.3957, attn_decoder_loss=0.2587, over 3046931.10 frames. ], batch size: 70, lr: 7.59e-03, grad_scale: 8.0 2024-09-17 16:04:50,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2024-09-17 16:05:08,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.33 vs. limit=15.0 2024-09-17 16:05:10,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=254040.0, ans=0.2 2024-09-17 16:05:25,782 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 8.837e+01 9.448e+01 1.022e+02 1.353e+02, threshold=1.890e+02, percent-clipped=0.0 2024-09-17 16:05:59,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=254160.0, ans=0.125 2024-09-17 16:06:03,418 INFO [train.py:1198] (0/2) Epoch 15, batch 200, loss[loss=0.2586, ctc_loss=0.1502, cr_loss=0.3838, attn_decoder_loss=0.2621, over 27273.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1521, cr_loss=0.3947, attn_decoder_loss=0.2577, over 3658384.43 frames. ], batch size: 124, lr: 7.59e-03, grad_scale: 8.0 2024-09-17 16:06:10,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=254200.0, ans=0.125 2024-09-17 16:06:44,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=254280.0, ans=0.5 2024-09-17 16:07:19,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=254360.0, ans=0.125 2024-09-17 16:07:24,607 INFO [train.py:1198] (0/2) Epoch 15, batch 250, loss[loss=0.2749, ctc_loss=0.1682, cr_loss=0.4344, attn_decoder_loss=0.2771, over 29250.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.152, cr_loss=0.3949, attn_decoder_loss=0.2578, over 4140721.06 frames. ], batch size: 100, lr: 7.58e-03, grad_scale: 8.0 2024-09-17 16:07:27,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-17 16:07:45,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=254440.0, ans=15.0 2024-09-17 16:08:02,186 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.896e+01 9.283e+01 1.022e+02 2.095e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-17 16:08:26,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=254560.0, ans=0.125 2024-09-17 16:08:39,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=254600.0, ans=0.125 2024-09-17 16:08:40,127 INFO [train.py:1198] (0/2) Epoch 15, batch 300, loss[loss=0.2755, ctc_loss=0.1735, cr_loss=0.4303, attn_decoder_loss=0.2773, over 29516.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1519, cr_loss=0.3943, attn_decoder_loss=0.2575, over 4510343.75 frames. ], batch size: 92, lr: 7.58e-03, grad_scale: 8.0 2024-09-17 16:08:47,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=254600.0, ans=0.05 2024-09-17 16:08:52,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=254600.0, ans=0.0 2024-09-17 16:09:45,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=254760.0, ans=0.0 2024-09-17 16:09:49,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-09-17 16:09:50,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=254760.0, ans=0.125 2024-09-17 16:09:54,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=254800.0, ans=0.125 2024-09-17 16:09:54,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=254800.0, ans=0.05 2024-09-17 16:09:56,022 INFO [train.py:1198] (0/2) Epoch 15, batch 350, loss[loss=0.2247, ctc_loss=0.1305, cr_loss=0.3487, attn_decoder_loss=0.2274, over 29329.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1518, cr_loss=0.3951, attn_decoder_loss=0.2579, over 4795972.86 frames. ], batch size: 71, lr: 7.58e-03, grad_scale: 8.0 2024-09-17 16:10:00,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=22.5 2024-09-17 16:10:15,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=254840.0, ans=0.125 2024-09-17 16:10:15,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=254840.0, ans=0.125 2024-09-17 16:10:16,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=254840.0, ans=0.125 2024-09-17 16:10:18,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=254840.0, ans=0.2 2024-09-17 16:10:23,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.07 vs. limit=22.5 2024-09-17 16:10:32,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.89 vs. limit=15.0 2024-09-17 16:10:35,870 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.706e+01 9.394e+01 1.041e+02 2.813e+02, threshold=1.879e+02, percent-clipped=1.0 2024-09-17 16:10:38,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 16:10:51,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=254920.0, ans=0.0 2024-09-17 16:10:55,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=254920.0, ans=0.125 2024-09-17 16:11:16,018 INFO [train.py:1198] (0/2) Epoch 15, batch 400, loss[loss=0.2605, ctc_loss=0.1531, cr_loss=0.3978, attn_decoder_loss=0.2636, over 29730.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1518, cr_loss=0.3947, attn_decoder_loss=0.2578, over 5024796.63 frames. ], batch size: 82, lr: 7.58e-03, grad_scale: 16.0 2024-09-17 16:11:17,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255000.0, ans=0.0 2024-09-17 16:11:32,512 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2024-09-17 16:11:37,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=255040.0, ans=0.125 2024-09-17 16:12:15,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=255160.0, ans=0.125 2024-09-17 16:12:31,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255200.0, ans=0.125 2024-09-17 16:12:32,477 INFO [train.py:1198] (0/2) Epoch 15, batch 450, loss[loss=0.2729, ctc_loss=0.1576, cr_loss=0.4117, attn_decoder_loss=0.2766, over 29704.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1519, cr_loss=0.3942, attn_decoder_loss=0.258, over 5187054.86 frames. ], batch size: 83, lr: 7.57e-03, grad_scale: 8.0 2024-09-17 16:12:35,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=255200.0, ans=0.05 2024-09-17 16:12:46,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=255240.0, ans=0.125 2024-09-17 16:12:50,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-17 16:12:54,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=22.5 2024-09-17 16:13:09,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=255280.0, ans=0.125 2024-09-17 16:13:10,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=255280.0, ans=0.0 2024-09-17 16:13:11,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.843e+01 9.415e+01 1.015e+02 2.907e+02, threshold=1.883e+02, percent-clipped=1.0 2024-09-17 16:13:16,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=255320.0, ans=0.125 2024-09-17 16:13:37,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.33 vs. limit=15.0 2024-09-17 16:13:38,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=255360.0, ans=0.2 2024-09-17 16:13:48,522 INFO [train.py:1198] (0/2) Epoch 15, batch 500, loss[loss=0.2755, ctc_loss=0.1678, cr_loss=0.4305, attn_decoder_loss=0.2778, over 29436.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1509, cr_loss=0.3928, attn_decoder_loss=0.257, over 5329737.46 frames. ], batch size: 94, lr: 7.57e-03, grad_scale: 8.0 2024-09-17 16:13:50,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=255400.0, ans=0.0 2024-09-17 16:14:08,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=255440.0, ans=0.04949747468305833 2024-09-17 16:14:21,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=255480.0, ans=0.0 2024-09-17 16:14:22,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=255480.0, ans=0.0 2024-09-17 16:15:07,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=255600.0, ans=0.125 2024-09-17 16:15:08,807 INFO [train.py:1198] (0/2) Epoch 15, batch 550, loss[loss=0.2684, ctc_loss=0.159, cr_loss=0.4169, attn_decoder_loss=0.2713, over 28792.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.151, cr_loss=0.3923, attn_decoder_loss=0.2569, over 5421703.10 frames. ], batch size: 104, lr: 7.57e-03, grad_scale: 8.0 2024-09-17 16:15:48,133 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.993e+01 9.917e+01 1.076e+02 7.641e+02, threshold=1.983e+02, percent-clipped=4.0 2024-09-17 16:15:51,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=255680.0, ans=0.0 2024-09-17 16:16:06,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=255720.0, ans=0.125 2024-09-17 16:16:10,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=12.0 2024-09-17 16:16:24,510 INFO [train.py:1198] (0/2) Epoch 15, batch 600, loss[loss=0.2647, ctc_loss=0.158, cr_loss=0.4115, attn_decoder_loss=0.2674, over 29196.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1504, cr_loss=0.3913, attn_decoder_loss=0.2568, over 5507484.68 frames. ], batch size: 100, lr: 7.56e-03, grad_scale: 8.0 2024-09-17 16:16:30,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=12.0 2024-09-17 16:16:31,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=255800.0, ans=0.125 2024-09-17 16:16:37,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.88 vs. limit=15.0 2024-09-17 16:16:49,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=255840.0, ans=0.0 2024-09-17 16:17:17,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=255920.0, ans=0.125 2024-09-17 16:17:25,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=255960.0, ans=0.0 2024-09-17 16:17:35,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=255960.0, ans=0.0 2024-09-17 16:17:38,683 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-64000.pt 2024-09-17 16:17:47,272 INFO [train.py:1198] (0/2) Epoch 15, batch 650, loss[loss=0.2421, ctc_loss=0.139, cr_loss=0.3733, attn_decoder_loss=0.2453, over 29746.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.15, cr_loss=0.3907, attn_decoder_loss=0.2564, over 5584868.09 frames. ], batch size: 81, lr: 7.56e-03, grad_scale: 8.0 2024-09-17 16:18:08,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=256040.0, ans=0.125 2024-09-17 16:18:22,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=256080.0, ans=10.0 2024-09-17 16:18:28,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.547e+01 9.070e+01 9.577e+01 1.264e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-17 16:18:42,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=256120.0, ans=0.0 2024-09-17 16:18:53,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=256160.0, ans=0.0 2024-09-17 16:19:07,304 INFO [train.py:1198] (0/2) Epoch 15, batch 700, loss[loss=0.243, ctc_loss=0.1436, cr_loss=0.3752, attn_decoder_loss=0.2457, over 29529.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1504, cr_loss=0.3918, attn_decoder_loss=0.257, over 5635969.80 frames. ], batch size: 76, lr: 7.56e-03, grad_scale: 8.0 2024-09-17 16:20:08,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2024-09-17 16:20:10,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=256360.0, ans=10.0 2024-09-17 16:20:23,553 INFO [train.py:1198] (0/2) Epoch 15, batch 750, loss[loss=0.256, ctc_loss=0.1482, cr_loss=0.3851, attn_decoder_loss=0.2595, over 29710.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1501, cr_loss=0.3911, attn_decoder_loss=0.2569, over 5675552.67 frames. ], batch size: 82, lr: 7.55e-03, grad_scale: 8.0 2024-09-17 16:20:25,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=256400.0, ans=0.0 2024-09-17 16:21:02,922 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.739e+01 9.295e+01 9.820e+01 3.813e+02, threshold=1.859e+02, percent-clipped=2.0 2024-09-17 16:21:10,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256520.0, ans=0.1 2024-09-17 16:21:39,086 INFO [train.py:1198] (0/2) Epoch 15, batch 800, loss[loss=0.2232, ctc_loss=0.1239, cr_loss=0.3518, attn_decoder_loss=0.2265, over 29618.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1506, cr_loss=0.3919, attn_decoder_loss=0.2571, over 5706763.84 frames. ], batch size: 73, lr: 7.55e-03, grad_scale: 16.0 2024-09-17 16:22:26,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=256720.0, ans=0.0 2024-09-17 16:22:34,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=256720.0, ans=0.2 2024-09-17 16:22:37,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-17 16:22:56,861 INFO [train.py:1198] (0/2) Epoch 15, batch 850, loss[loss=0.2681, ctc_loss=0.1592, cr_loss=0.4215, attn_decoder_loss=0.2708, over 29732.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1496, cr_loss=0.3903, attn_decoder_loss=0.2564, over 5736590.34 frames. ], batch size: 89, lr: 7.55e-03, grad_scale: 8.0 2024-09-17 16:23:00,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=256800.0, ans=0.125 2024-09-17 16:23:06,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=256800.0, ans=0.125 2024-09-17 16:23:12,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=256840.0, ans=0.0 2024-09-17 16:23:18,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=12.0 2024-09-17 16:23:32,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=256880.0, ans=0.0 2024-09-17 16:23:33,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=256880.0, ans=0.125 2024-09-17 16:23:36,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=256880.0, ans=0.125 2024-09-17 16:23:39,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.676e+01 9.240e+01 9.818e+01 3.041e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-17 16:23:46,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=256920.0, ans=0.07 2024-09-17 16:23:51,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-17 16:24:14,685 INFO [train.py:1198] (0/2) Epoch 15, batch 900, loss[loss=0.2342, ctc_loss=0.1326, cr_loss=0.3706, attn_decoder_loss=0.2372, over 29592.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1501, cr_loss=0.3905, attn_decoder_loss=0.2568, over 5740625.06 frames. ], batch size: 73, lr: 7.55e-03, grad_scale: 8.0 2024-09-17 16:24:15,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=257000.0, ans=0.2 2024-09-17 16:24:37,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=257040.0, ans=0.2 2024-09-17 16:24:59,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=257120.0, ans=0.0 2024-09-17 16:25:14,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=257160.0, ans=0.5 2024-09-17 16:25:17,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=257160.0, ans=0.07 2024-09-17 16:25:27,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=257160.0, ans=0.0 2024-09-17 16:25:30,433 INFO [train.py:1198] (0/2) Epoch 15, batch 950, loss[loss=0.23, ctc_loss=0.1265, cr_loss=0.3501, attn_decoder_loss=0.2337, over 29518.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1499, cr_loss=0.3902, attn_decoder_loss=0.2569, over 5742006.31 frames. ], batch size: 74, lr: 7.54e-03, grad_scale: 8.0 2024-09-17 16:25:32,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.69 vs. limit=15.0 2024-09-17 16:25:45,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=257240.0, ans=0.125 2024-09-17 16:25:47,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=257240.0, ans=0.0 2024-09-17 16:25:53,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=257240.0, ans=0.0 2024-09-17 16:25:57,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2024-09-17 16:26:13,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.071e+01 9.347e+01 1.011e+02 1.116e+02 3.125e+02, threshold=2.021e+02, percent-clipped=4.0 2024-09-17 16:26:19,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.60 vs. limit=10.0 2024-09-17 16:26:50,490 INFO [train.py:1198] (0/2) Epoch 15, batch 1000, loss[loss=0.2416, ctc_loss=0.1357, cr_loss=0.3761, attn_decoder_loss=0.245, over 29519.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1511, cr_loss=0.3918, attn_decoder_loss=0.2577, over 5736406.60 frames. ], batch size: 77, lr: 7.54e-03, grad_scale: 8.0 2024-09-17 16:26:55,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.22 vs. limit=10.0 2024-09-17 16:27:04,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=257440.0, ans=0.125 2024-09-17 16:27:05,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=22.5 2024-09-17 16:28:02,463 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:28:06,975 INFO [train.py:1198] (0/2) Epoch 15, batch 1050, loss[loss=0.2621, ctc_loss=0.1541, cr_loss=0.3909, attn_decoder_loss=0.2655, over 29691.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1503, cr_loss=0.39, attn_decoder_loss=0.2569, over 5743830.88 frames. ], batch size: 85, lr: 7.54e-03, grad_scale: 8.0 2024-09-17 16:28:27,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=257640.0, ans=0.125 2024-09-17 16:28:28,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=257640.0, ans=0.0 2024-09-17 16:28:28,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=257640.0, ans=0.0 2024-09-17 16:28:30,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=257640.0, ans=0.0 2024-09-17 16:28:48,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.880e+01 9.520e+01 1.043e+02 1.808e+02, threshold=1.904e+02, percent-clipped=0.0 2024-09-17 16:28:55,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2024-09-17 16:29:23,481 INFO [train.py:1198] (0/2) Epoch 15, batch 1100, loss[loss=0.2488, ctc_loss=0.1476, cr_loss=0.3769, attn_decoder_loss=0.2517, over 29444.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1504, cr_loss=0.3906, attn_decoder_loss=0.257, over 5755748.26 frames. ], batch size: 78, lr: 7.53e-03, grad_scale: 8.0 2024-09-17 16:29:28,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=257800.0, ans=0.0 2024-09-17 16:29:35,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=257800.0, ans=0.07 2024-09-17 16:30:03,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=257880.0, ans=0.125 2024-09-17 16:30:05,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.54 vs. limit=15.0 2024-09-17 16:30:43,787 INFO [train.py:1198] (0/2) Epoch 15, batch 1150, loss[loss=0.2485, ctc_loss=0.1492, cr_loss=0.4006, attn_decoder_loss=0.2506, over 29458.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1504, cr_loss=0.3904, attn_decoder_loss=0.2569, over 5755823.87 frames. ], batch size: 78, lr: 7.53e-03, grad_scale: 8.0 2024-09-17 16:30:50,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=258000.0, ans=0.0 2024-09-17 16:30:53,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=258000.0, ans=0.125 2024-09-17 16:31:03,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=258040.0, ans=0.125 2024-09-17 16:31:08,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=258040.0, ans=0.2 2024-09-17 16:31:10,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=258040.0, ans=0.0 2024-09-17 16:31:16,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=258080.0, ans=0.0 2024-09-17 16:31:22,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=258080.0, ans=10.0 2024-09-17 16:31:25,063 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.854e+01 8.950e+01 9.429e+01 1.052e+02 4.091e+02, threshold=1.886e+02, percent-clipped=2.0 2024-09-17 16:31:43,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=258160.0, ans=0.125 2024-09-17 16:31:47,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=258160.0, ans=0.125 2024-09-17 16:31:56,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-09-17 16:31:57,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=258160.0, ans=0.125 2024-09-17 16:32:00,022 INFO [train.py:1198] (0/2) Epoch 15, batch 1200, loss[loss=0.262, ctc_loss=0.1578, cr_loss=0.3966, attn_decoder_loss=0.2648, over 29686.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1513, cr_loss=0.3921, attn_decoder_loss=0.2579, over 5748659.94 frames. ], batch size: 85, lr: 7.53e-03, grad_scale: 16.0 2024-09-17 16:32:18,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=258240.0, ans=0.0 2024-09-17 16:32:34,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.53 vs. limit=22.5 2024-09-17 16:32:44,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-17 16:32:56,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=258320.0, ans=0.2 2024-09-17 16:32:56,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=258320.0, ans=0.0 2024-09-17 16:33:06,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=258360.0, ans=0.95 2024-09-17 16:33:15,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=258400.0, ans=0.125 2024-09-17 16:33:16,675 INFO [train.py:1198] (0/2) Epoch 15, batch 1250, loss[loss=0.2705, ctc_loss=0.1683, cr_loss=0.4284, attn_decoder_loss=0.2723, over 29544.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1516, cr_loss=0.3932, attn_decoder_loss=0.2583, over 5776231.78 frames. ], batch size: 92, lr: 7.53e-03, grad_scale: 8.0 2024-09-17 16:33:26,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=258400.0, ans=0.125 2024-09-17 16:33:29,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=258400.0, ans=0.0 2024-09-17 16:33:37,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=258440.0, ans=0.025 2024-09-17 16:33:46,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=258480.0, ans=0.0 2024-09-17 16:33:59,437 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.854e+01 8.868e+01 9.518e+01 1.036e+02 1.703e+02, threshold=1.904e+02, percent-clipped=0.0 2024-09-17 16:34:09,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=258520.0, ans=0.0 2024-09-17 16:34:18,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=258560.0, ans=0.2 2024-09-17 16:34:20,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258560.0, ans=0.1 2024-09-17 16:34:24,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=258560.0, ans=0.5 2024-09-17 16:34:37,185 INFO [train.py:1198] (0/2) Epoch 15, batch 1300, loss[loss=0.2741, ctc_loss=0.166, cr_loss=0.4204, attn_decoder_loss=0.2768, over 28290.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.151, cr_loss=0.3925, attn_decoder_loss=0.2577, over 5780101.10 frames. ], batch size: 111, lr: 7.52e-03, grad_scale: 8.0 2024-09-17 16:35:10,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.10 vs. limit=15.0 2024-09-17 16:35:14,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=258680.0, ans=0.125 2024-09-17 16:35:35,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=258720.0, ans=0.0 2024-09-17 16:35:53,263 INFO [train.py:1198] (0/2) Epoch 15, batch 1350, loss[loss=0.2473, ctc_loss=0.1468, cr_loss=0.3856, attn_decoder_loss=0.2499, over 29761.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1503, cr_loss=0.3921, attn_decoder_loss=0.2572, over 5796704.36 frames. ], batch size: 81, lr: 7.52e-03, grad_scale: 8.0 2024-09-17 16:36:18,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=258840.0, ans=22.5 2024-09-17 16:36:32,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=258880.0, ans=0.125 2024-09-17 16:36:35,392 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.925e+01 9.317e+01 1.009e+02 1.483e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-17 16:36:40,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=258920.0, ans=0.125 2024-09-17 16:36:43,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-09-17 16:37:04,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=258960.0, ans=0.05 2024-09-17 16:37:08,673 INFO [train.py:1198] (0/2) Epoch 15, batch 1400, loss[loss=0.2263, ctc_loss=0.1256, cr_loss=0.3269, attn_decoder_loss=0.2303, over 29559.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1499, cr_loss=0.391, attn_decoder_loss=0.257, over 5807329.97 frames. ], batch size: 69, lr: 7.52e-03, grad_scale: 8.0 2024-09-17 16:37:31,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=259040.0, ans=0.125 2024-09-17 16:37:51,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259080.0, ans=0.1 2024-09-17 16:38:19,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=259160.0, ans=0.125 2024-09-17 16:38:19,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259160.0, ans=0.1 2024-09-17 16:38:27,012 INFO [train.py:1198] (0/2) Epoch 15, batch 1450, loss[loss=0.2669, ctc_loss=0.1698, cr_loss=0.4141, attn_decoder_loss=0.2685, over 29428.00 frames. ], tot_loss[loss=0.2549, ctc_loss=0.1507, cr_loss=0.3921, attn_decoder_loss=0.2577, over 5803354.57 frames. ], batch size: 94, lr: 7.51e-03, grad_scale: 8.0 2024-09-17 16:38:28,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=259200.0, ans=0.125 2024-09-17 16:38:50,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=259240.0, ans=0.125 2024-09-17 16:38:58,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=259280.0, ans=0.125 2024-09-17 16:39:05,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=259280.0, ans=0.125 2024-09-17 16:39:11,108 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.827e+01 9.578e+01 1.049e+02 2.248e+02, threshold=1.916e+02, percent-clipped=2.0 2024-09-17 16:39:41,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=259360.0, ans=0.125 2024-09-17 16:39:44,393 INFO [train.py:1198] (0/2) Epoch 15, batch 1500, loss[loss=0.2647, ctc_loss=0.1536, cr_loss=0.413, attn_decoder_loss=0.2678, over 29647.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.1511, cr_loss=0.3931, attn_decoder_loss=0.2582, over 5805023.82 frames. ], batch size: 86, lr: 7.51e-03, grad_scale: 8.0 2024-09-17 16:39:44,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=259400.0, ans=0.125 2024-09-17 16:39:47,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=259400.0, ans=0.125 2024-09-17 16:39:49,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-17 16:40:35,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=259520.0, ans=0.125 2024-09-17 16:40:40,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=259520.0, ans=0.125 2024-09-17 16:40:44,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=259560.0, ans=0.125 2024-09-17 16:40:50,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259560.0, ans=0.1 2024-09-17 16:40:55,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=259560.0, ans=0.125 2024-09-17 16:41:00,899 INFO [train.py:1198] (0/2) Epoch 15, batch 1550, loss[loss=0.279, ctc_loss=0.1746, cr_loss=0.4483, attn_decoder_loss=0.2806, over 29518.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1518, cr_loss=0.3937, attn_decoder_loss=0.2583, over 5779703.75 frames. ], batch size: 90, lr: 7.51e-03, grad_scale: 8.0 2024-09-17 16:41:09,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=8.0 2024-09-17 16:41:25,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-17 16:41:28,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.46 vs. limit=15.0 2024-09-17 16:41:31,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=259680.0, ans=0.125 2024-09-17 16:41:42,748 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.774e+01 9.466e+01 1.042e+02 2.668e+02, threshold=1.893e+02, percent-clipped=3.0 2024-09-17 16:41:59,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=259720.0, ans=0.125 2024-09-17 16:42:03,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=259760.0, ans=0.125 2024-09-17 16:42:12,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=259760.0, ans=0.125 2024-09-17 16:42:19,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.12 vs. limit=15.0 2024-09-17 16:42:20,434 INFO [train.py:1198] (0/2) Epoch 15, batch 1600, loss[loss=0.2586, ctc_loss=0.1542, cr_loss=0.4028, attn_decoder_loss=0.2612, over 29660.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1515, cr_loss=0.393, attn_decoder_loss=0.2579, over 5761772.11 frames. ], batch size: 85, lr: 7.51e-03, grad_scale: 16.0 2024-09-17 16:42:35,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=259840.0, ans=0.025 2024-09-17 16:42:52,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=259880.0, ans=0.125 2024-09-17 16:42:58,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=259880.0, ans=0.125 2024-09-17 16:43:01,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=259880.0, ans=0.125 2024-09-17 16:43:04,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=259920.0, ans=0.0 2024-09-17 16:43:10,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=259920.0, ans=0.125 2024-09-17 16:43:15,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.34 vs. limit=12.0 2024-09-17 16:43:15,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-17 16:43:36,457 INFO [train.py:1198] (0/2) Epoch 15, batch 1650, loss[loss=0.2681, ctc_loss=0.1583, cr_loss=0.3969, attn_decoder_loss=0.2715, over 29718.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1517, cr_loss=0.3931, attn_decoder_loss=0.258, over 5757871.51 frames. ], batch size: 89, lr: 7.50e-03, grad_scale: 8.0 2024-09-17 16:43:56,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=260040.0, ans=0.2 2024-09-17 16:44:08,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=260080.0, ans=0.2 2024-09-17 16:44:11,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=260080.0, ans=0.2 2024-09-17 16:44:17,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=260080.0, ans=0.025 2024-09-17 16:44:20,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.810e+01 9.523e+01 1.053e+02 2.945e+02, threshold=1.905e+02, percent-clipped=2.0 2024-09-17 16:44:26,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=260120.0, ans=0.0 2024-09-17 16:44:28,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=260120.0, ans=0.2 2024-09-17 16:44:36,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.48 vs. limit=15.0 2024-09-17 16:44:41,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=260160.0, ans=0.2 2024-09-17 16:44:44,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=260160.0, ans=0.035 2024-09-17 16:44:45,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-17 16:44:46,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=260160.0, ans=0.125 2024-09-17 16:44:50,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=260200.0, ans=0.09899494936611666 2024-09-17 16:44:51,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2024-09-17 16:44:51,638 INFO [train.py:1198] (0/2) Epoch 15, batch 1700, loss[loss=0.2294, ctc_loss=0.138, cr_loss=0.3753, attn_decoder_loss=0.2313, over 29560.00 frames. ], tot_loss[loss=0.2551, ctc_loss=0.1515, cr_loss=0.3931, attn_decoder_loss=0.2579, over 5779649.02 frames. ], batch size: 69, lr: 7.50e-03, grad_scale: 8.0 2024-09-17 16:44:54,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=260200.0, ans=0.2 2024-09-17 16:44:54,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=260200.0, ans=0.0 2024-09-17 16:44:56,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=260200.0, ans=0.2 2024-09-17 16:45:04,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.48 vs. limit=15.0 2024-09-17 16:45:16,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=260240.0, ans=0.125 2024-09-17 16:45:17,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=260240.0, ans=0.0 2024-09-17 16:45:23,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=260280.0, ans=0.125 2024-09-17 16:45:23,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=260280.0, ans=0.0 2024-09-17 16:45:52,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=260360.0, ans=0.1 2024-09-17 16:46:11,460 INFO [train.py:1198] (0/2) Epoch 15, batch 1750, loss[loss=0.2236, ctc_loss=0.1288, cr_loss=0.3571, attn_decoder_loss=0.2262, over 29390.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1507, cr_loss=0.3922, attn_decoder_loss=0.2572, over 5789162.43 frames. ], batch size: 67, lr: 7.50e-03, grad_scale: 8.0 2024-09-17 16:46:25,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=260440.0, ans=0.025 2024-09-17 16:46:27,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=260440.0, ans=0.025 2024-09-17 16:46:27,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-09-17 16:46:44,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2024-09-17 16:46:48,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=12.0 2024-09-17 16:46:54,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2024-09-17 16:46:55,375 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.854e+01 9.659e+01 1.042e+02 2.660e+02, threshold=1.932e+02, percent-clipped=4.0 2024-09-17 16:47:01,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=260520.0, ans=0.07 2024-09-17 16:47:08,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=12.0 2024-09-17 16:47:26,582 INFO [train.py:1198] (0/2) Epoch 15, batch 1800, loss[loss=0.2603, ctc_loss=0.1563, cr_loss=0.3984, attn_decoder_loss=0.263, over 29688.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1504, cr_loss=0.392, attn_decoder_loss=0.2571, over 5790405.36 frames. ], batch size: 83, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:47:39,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=260600.0, ans=0.0 2024-09-17 16:47:49,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=260640.0, ans=0.0 2024-09-17 16:47:56,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2024-09-17 16:48:08,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=260680.0, ans=0.125 2024-09-17 16:48:29,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=260760.0, ans=0.125 2024-09-17 16:48:35,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=260760.0, ans=0.125 2024-09-17 16:48:37,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=260760.0, ans=0.2 2024-09-17 16:48:43,406 INFO [train.py:1198] (0/2) Epoch 15, batch 1850, loss[loss=0.2576, ctc_loss=0.1452, cr_loss=0.3923, attn_decoder_loss=0.2614, over 29620.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1502, cr_loss=0.3921, attn_decoder_loss=0.2571, over 5797155.55 frames. ], batch size: 86, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:49:27,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.760e+01 9.313e+01 1.001e+02 1.511e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-17 16:49:36,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=260920.0, ans=0.0 2024-09-17 16:50:01,057 INFO [train.py:1198] (0/2) Epoch 15, batch 1900, loss[loss=0.2616, ctc_loss=0.1449, cr_loss=0.3835, attn_decoder_loss=0.2661, over 29666.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1501, cr_loss=0.3928, attn_decoder_loss=0.2574, over 5805517.69 frames. ], batch size: 89, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:50:23,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=261040.0, ans=0.2 2024-09-17 16:51:03,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.93 vs. limit=15.0 2024-09-17 16:51:07,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=261160.0, ans=0.2 2024-09-17 16:51:10,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-17 16:51:13,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=261160.0, ans=0.125 2024-09-17 16:51:18,992 INFO [train.py:1198] (0/2) Epoch 15, batch 1950, loss[loss=0.2577, ctc_loss=0.1562, cr_loss=0.4006, attn_decoder_loss=0.2601, over 29449.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.1511, cr_loss=0.3953, attn_decoder_loss=0.2587, over 5820124.12 frames. ], batch size: 78, lr: 7.49e-03, grad_scale: 8.0 2024-09-17 16:51:35,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=261240.0, ans=0.0 2024-09-17 16:51:56,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=261280.0, ans=0.2 2024-09-17 16:52:02,591 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.191e+01 8.962e+01 9.463e+01 1.031e+02 5.545e+02, threshold=1.893e+02, percent-clipped=1.0 2024-09-17 16:52:27,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261360.0, ans=0.1 2024-09-17 16:52:34,222 INFO [train.py:1198] (0/2) Epoch 15, batch 2000, loss[loss=0.2207, ctc_loss=0.125, cr_loss=0.3377, attn_decoder_loss=0.2238, over 29363.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1517, cr_loss=0.3955, attn_decoder_loss=0.2591, over 5797425.66 frames. ], batch size: 67, lr: 7.48e-03, grad_scale: 16.0 2024-09-17 16:53:00,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=261440.0, ans=0.125 2024-09-17 16:53:11,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.25 vs. limit=10.0 2024-09-17 16:53:16,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2024-09-17 16:53:34,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.16 vs. limit=15.0 2024-09-17 16:53:54,998 INFO [train.py:1198] (0/2) Epoch 15, batch 2050, loss[loss=0.2258, ctc_loss=0.1226, cr_loss=0.3393, attn_decoder_loss=0.2297, over 29450.00 frames. ], tot_loss[loss=0.2553, ctc_loss=0.151, cr_loss=0.3934, attn_decoder_loss=0.2581, over 5790028.81 frames. ], batch size: 70, lr: 7.48e-03, grad_scale: 8.0 2024-09-17 16:53:56,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=261600.0, ans=0.125 2024-09-17 16:54:07,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=261600.0, ans=0.0 2024-09-17 16:54:19,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=261640.0, ans=0.2 2024-09-17 16:54:40,624 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.073e+01 9.648e+01 1.067e+02 2.180e+02, threshold=1.930e+02, percent-clipped=1.0 2024-09-17 16:54:46,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=261720.0, ans=0.2 2024-09-17 16:54:46,954 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:54:48,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=261720.0, ans=0.125 2024-09-17 16:54:58,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261760.0, ans=0.1 2024-09-17 16:55:10,759 INFO [train.py:1198] (0/2) Epoch 15, batch 2100, loss[loss=0.2566, ctc_loss=0.1466, cr_loss=0.3848, attn_decoder_loss=0.2603, over 29762.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1499, cr_loss=0.391, attn_decoder_loss=0.257, over 5802229.02 frames. ], batch size: 81, lr: 7.48e-03, grad_scale: 8.0 2024-09-17 16:55:12,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=261800.0, ans=0.0 2024-09-17 16:55:13,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.95 vs. limit=22.5 2024-09-17 16:55:42,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=261880.0, ans=0.2 2024-09-17 16:55:50,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=261880.0, ans=10.0 2024-09-17 16:55:56,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=261920.0, ans=0.0 2024-09-17 16:56:25,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=262000.0, ans=0.125 2024-09-17 16:56:26,348 INFO [train.py:1198] (0/2) Epoch 15, batch 2150, loss[loss=0.2517, ctc_loss=0.1524, cr_loss=0.4101, attn_decoder_loss=0.2536, over 29455.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1491, cr_loss=0.3902, attn_decoder_loss=0.2561, over 5816688.95 frames. ], batch size: 78, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 16:56:31,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262000.0, ans=0.1 2024-09-17 16:56:33,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-09-17 16:57:11,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.623e+01 9.076e+01 9.705e+01 5.465e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-17 16:57:14,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.25 vs. limit=15.0 2024-09-17 16:57:40,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=262200.0, ans=0.125 2024-09-17 16:57:41,951 INFO [train.py:1198] (0/2) Epoch 15, batch 2200, loss[loss=0.2733, ctc_loss=0.1596, cr_loss=0.4107, attn_decoder_loss=0.2768, over 29628.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1496, cr_loss=0.3913, attn_decoder_loss=0.2568, over 5812763.10 frames. ], batch size: 86, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 16:57:45,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=262200.0, ans=0.125 2024-09-17 16:57:59,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-17 16:58:03,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262240.0, ans=0.1 2024-09-17 16:58:18,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=262280.0, ans=0.025 2024-09-17 16:58:18,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.04 vs. limit=15.0 2024-09-17 16:58:48,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.87 vs. limit=22.5 2024-09-17 16:58:54,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=262360.0, ans=0.125 2024-09-17 16:59:02,695 INFO [train.py:1198] (0/2) Epoch 15, batch 2250, loss[loss=0.2589, ctc_loss=0.1521, cr_loss=0.4103, attn_decoder_loss=0.2616, over 29686.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1494, cr_loss=0.3911, attn_decoder_loss=0.2566, over 5813074.02 frames. ], batch size: 82, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 16:59:07,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=262400.0, ans=0.1 2024-09-17 16:59:15,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=262400.0, ans=0.0 2024-09-17 16:59:30,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=262440.0, ans=0.0 2024-09-17 16:59:30,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=262440.0, ans=0.0 2024-09-17 16:59:37,743 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:59:39,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=262480.0, ans=0.0 2024-09-17 16:59:40,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=262480.0, ans=0.125 2024-09-17 16:59:47,797 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.637e+01 9.303e+01 1.004e+02 1.390e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-17 17:00:04,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=262560.0, ans=0.125 2024-09-17 17:00:06,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=262560.0, ans=0.125 2024-09-17 17:00:10,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=262560.0, ans=0.025 2024-09-17 17:00:18,400 INFO [train.py:1198] (0/2) Epoch 15, batch 2300, loss[loss=0.2306, ctc_loss=0.1286, cr_loss=0.3606, attn_decoder_loss=0.2339, over 29309.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1491, cr_loss=0.3902, attn_decoder_loss=0.2559, over 5800010.59 frames. ], batch size: 71, lr: 7.47e-03, grad_scale: 8.0 2024-09-17 17:00:18,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=262600.0, ans=0.2 2024-09-17 17:00:26,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=262600.0, ans=0.2 2024-09-17 17:00:45,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262640.0, ans=0.0 2024-09-17 17:00:54,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=262680.0, ans=0.125 2024-09-17 17:01:17,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262760.0, ans=0.1 2024-09-17 17:01:26,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=262760.0, ans=0.0 2024-09-17 17:01:30,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=262760.0, ans=0.0 2024-09-17 17:01:34,196 INFO [train.py:1198] (0/2) Epoch 15, batch 2350, loss[loss=0.2664, ctc_loss=0.1648, cr_loss=0.4124, attn_decoder_loss=0.2685, over 29679.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1498, cr_loss=0.3916, attn_decoder_loss=0.2563, over 5804849.86 frames. ], batch size: 83, lr: 7.46e-03, grad_scale: 8.0 2024-09-17 17:01:58,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=262840.0, ans=0.0 2024-09-17 17:02:06,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=12.0 2024-09-17 17:02:22,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=262880.0, ans=15.0 2024-09-17 17:02:24,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.797e+01 9.531e+01 1.053e+02 3.289e+02, threshold=1.906e+02, percent-clipped=2.0 2024-09-17 17:02:29,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=262920.0, ans=0.125 2024-09-17 17:02:30,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=262920.0, ans=0.0 2024-09-17 17:02:39,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=262960.0, ans=0.125 2024-09-17 17:02:54,793 INFO [train.py:1198] (0/2) Epoch 15, batch 2400, loss[loss=0.2388, ctc_loss=0.1325, cr_loss=0.361, attn_decoder_loss=0.2426, over 29531.00 frames. ], tot_loss[loss=0.2541, ctc_loss=0.1502, cr_loss=0.392, attn_decoder_loss=0.2569, over 5809103.59 frames. ], batch size: 76, lr: 7.46e-03, grad_scale: 16.0 2024-09-17 17:02:55,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=263000.0, ans=0.125 2024-09-17 17:03:02,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=263000.0, ans=0.125 2024-09-17 17:03:25,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263080.0, ans=0.1 2024-09-17 17:03:39,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=263120.0, ans=0.0 2024-09-17 17:04:11,096 INFO [train.py:1198] (0/2) Epoch 15, batch 2450, loss[loss=0.2682, ctc_loss=0.1595, cr_loss=0.4289, attn_decoder_loss=0.2708, over 29702.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1508, cr_loss=0.3926, attn_decoder_loss=0.2576, over 5785484.69 frames. ], batch size: 82, lr: 7.46e-03, grad_scale: 8.0 2024-09-17 17:04:27,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=263240.0, ans=0.125 2024-09-17 17:04:57,816 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 9.083e+01 9.931e+01 1.099e+02 3.144e+02, threshold=1.986e+02, percent-clipped=3.0 2024-09-17 17:05:08,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=263320.0, ans=0.125 2024-09-17 17:05:10,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=263360.0, ans=0.025 2024-09-17 17:05:14,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2024-09-17 17:05:26,721 INFO [train.py:1198] (0/2) Epoch 15, batch 2500, loss[loss=0.2719, ctc_loss=0.16, cr_loss=0.4033, attn_decoder_loss=0.2754, over 29629.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1507, cr_loss=0.393, attn_decoder_loss=0.2576, over 5795320.07 frames. ], batch size: 86, lr: 7.46e-03, grad_scale: 8.0 2024-09-17 17:05:29,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=263400.0, ans=0.125 2024-09-17 17:05:41,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=263400.0, ans=0.125 2024-09-17 17:06:12,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=263480.0, ans=0.125 2024-09-17 17:06:26,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=263520.0, ans=0.125 2024-09-17 17:06:36,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=263560.0, ans=0.0 2024-09-17 17:06:41,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=263560.0, ans=0.0 2024-09-17 17:06:44,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=263560.0, ans=0.0 2024-09-17 17:06:47,418 INFO [train.py:1198] (0/2) Epoch 15, batch 2550, loss[loss=0.2165, ctc_loss=0.1169, cr_loss=0.337, attn_decoder_loss=0.22, over 29359.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1506, cr_loss=0.393, attn_decoder_loss=0.2576, over 5798390.17 frames. ], batch size: 67, lr: 7.45e-03, grad_scale: 8.0 2024-09-17 17:07:25,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=263680.0, ans=0.125 2024-09-17 17:07:25,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=263680.0, ans=0.0 2024-09-17 17:07:32,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=263720.0, ans=0.125 2024-09-17 17:07:34,221 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.892e+01 9.357e+01 1.015e+02 2.489e+02, threshold=1.871e+02, percent-clipped=2.0 2024-09-17 17:07:35,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-17 17:07:39,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-09-17 17:07:43,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=263720.0, ans=0.2 2024-09-17 17:07:57,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=263760.0, ans=0.125 2024-09-17 17:08:02,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=22.5 2024-09-17 17:08:03,405 INFO [train.py:1198] (0/2) Epoch 15, batch 2600, loss[loss=0.2536, ctc_loss=0.1521, cr_loss=0.3994, attn_decoder_loss=0.256, over 29447.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1513, cr_loss=0.394, attn_decoder_loss=0.2582, over 5794505.68 frames. ], batch size: 78, lr: 7.45e-03, grad_scale: 8.0 2024-09-17 17:08:11,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=263800.0, ans=0.0 2024-09-17 17:08:12,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=263800.0, ans=0.0 2024-09-17 17:08:20,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=263840.0, ans=0.125 2024-09-17 17:08:37,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=12.0 2024-09-17 17:08:48,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=263920.0, ans=0.125 2024-09-17 17:09:06,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=263960.0, ans=0.07 2024-09-17 17:09:10,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.40 vs. limit=10.0 2024-09-17 17:09:15,529 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:09:15,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=263960.0, ans=0.0 2024-09-17 17:09:18,684 INFO [train.py:1198] (0/2) Epoch 15, batch 2650, loss[loss=0.2671, ctc_loss=0.1604, cr_loss=0.4012, attn_decoder_loss=0.27, over 29282.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1512, cr_loss=0.3936, attn_decoder_loss=0.2584, over 5801075.20 frames. ], batch size: 100, lr: 7.45e-03, grad_scale: 8.0 2024-09-17 17:09:31,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-09-17 17:09:33,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264000.0, ans=0.1 2024-09-17 17:09:39,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=264040.0, ans=0.125 2024-09-17 17:09:46,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-09-17 17:10:00,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=264080.0, ans=0.0 2024-09-17 17:10:09,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 9.042e+01 9.386e+01 1.019e+02 2.005e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 17:10:10,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-17 17:10:12,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=264120.0, ans=15.0 2024-09-17 17:10:28,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264160.0, ans=0.125 2024-09-17 17:10:38,713 INFO [train.py:1198] (0/2) Epoch 15, batch 2700, loss[loss=0.2666, ctc_loss=0.1605, cr_loss=0.4013, attn_decoder_loss=0.2694, over 29537.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1518, cr_loss=0.3943, attn_decoder_loss=0.2587, over 5797393.42 frames. ], batch size: 87, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 17:10:38,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=264200.0, ans=0.025 2024-09-17 17:10:54,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=264240.0, ans=0.1 2024-09-17 17:11:07,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=264280.0, ans=0.025 2024-09-17 17:11:38,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=264360.0, ans=0.0 2024-09-17 17:11:48,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=264360.0, ans=0.2 2024-09-17 17:11:52,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=264360.0, ans=0.125 2024-09-17 17:11:55,013 INFO [train.py:1198] (0/2) Epoch 15, batch 2750, loss[loss=0.25, ctc_loss=0.1545, cr_loss=0.3859, attn_decoder_loss=0.252, over 29520.00 frames. ], tot_loss[loss=0.2546, ctc_loss=0.1506, cr_loss=0.3925, attn_decoder_loss=0.2574, over 5796200.17 frames. ], batch size: 75, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 17:12:12,101 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:12:24,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=12.0 2024-09-17 17:12:33,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.46 vs. limit=15.0 2024-09-17 17:12:41,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.996e+01 9.846e+01 1.075e+02 1.941e+02, threshold=1.969e+02, percent-clipped=1.0 2024-09-17 17:12:47,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2024-09-17 17:13:13,279 INFO [train.py:1198] (0/2) Epoch 15, batch 2800, loss[loss=0.267, ctc_loss=0.1715, cr_loss=0.3922, attn_decoder_loss=0.2689, over 20280.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1509, cr_loss=0.3927, attn_decoder_loss=0.2575, over 5776975.42 frames. ], batch size: 210, lr: 7.44e-03, grad_scale: 16.0 2024-09-17 17:13:13,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=264600.0, ans=0.125 2024-09-17 17:13:19,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=264600.0, ans=0.1 2024-09-17 17:13:19,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264600.0, ans=0.1 2024-09-17 17:13:25,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=264600.0, ans=0.025 2024-09-17 17:14:25,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264760.0, ans=0.1 2024-09-17 17:14:31,449 INFO [train.py:1198] (0/2) Epoch 15, batch 2850, loss[loss=0.2396, ctc_loss=0.1432, cr_loss=0.3893, attn_decoder_loss=0.2417, over 29498.00 frames. ], tot_loss[loss=0.255, ctc_loss=0.1513, cr_loss=0.393, attn_decoder_loss=0.2578, over 5763409.72 frames. ], batch size: 77, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 17:14:45,447 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:14:47,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-09-17 17:15:05,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=264880.0, ans=0.2 2024-09-17 17:15:11,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=264880.0, ans=0.125 2024-09-17 17:15:11,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=264880.0, ans=0.125 2024-09-17 17:15:20,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 9.061e+01 9.943e+01 1.094e+02 2.532e+02, threshold=1.989e+02, percent-clipped=2.0 2024-09-17 17:15:47,574 INFO [train.py:1198] (0/2) Epoch 15, batch 2900, loss[loss=0.2446, ctc_loss=0.1425, cr_loss=0.3768, attn_decoder_loss=0.2476, over 29415.00 frames. ], tot_loss[loss=0.256, ctc_loss=0.1515, cr_loss=0.3943, attn_decoder_loss=0.2589, over 5788003.58 frames. ], batch size: 79, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 17:15:56,789 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:16:38,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.90 vs. limit=15.0 2024-09-17 17:16:43,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.05 vs. limit=22.5 2024-09-17 17:16:49,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-17 17:17:05,973 INFO [train.py:1198] (0/2) Epoch 15, batch 2950, loss[loss=0.2418, ctc_loss=0.141, cr_loss=0.3738, attn_decoder_loss=0.2447, over 29501.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1502, cr_loss=0.3922, attn_decoder_loss=0.2572, over 5782760.38 frames. ], batch size: 75, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 17:17:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=265200.0, ans=0.0 2024-09-17 17:17:13,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=265200.0, ans=0.0 2024-09-17 17:17:15,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=265200.0, ans=0.025 2024-09-17 17:17:27,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=265240.0, ans=0.0 2024-09-17 17:17:56,690 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 8.981e+01 9.731e+01 1.093e+02 3.344e+02, threshold=1.946e+02, percent-clipped=1.0 2024-09-17 17:18:05,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-09-17 17:18:07,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=265360.0, ans=0.0 2024-09-17 17:18:10,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=265360.0, ans=0.0 2024-09-17 17:18:13,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265360.0, ans=0.1 2024-09-17 17:18:24,078 INFO [train.py:1198] (0/2) Epoch 15, batch 3000, loss[loss=0.259, ctc_loss=0.1534, cr_loss=0.4145, attn_decoder_loss=0.2615, over 29751.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1499, cr_loss=0.3915, attn_decoder_loss=0.2572, over 5784078.60 frames. ], batch size: 81, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 17:18:24,079 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 17:18:42,411 INFO [train.py:1230] (0/2) Epoch 15, validation: loss=0.2111, ctc_loss=0.04175, cr_loss=4.872e-15, attn_decoder_loss=0.23, over 944034.00 frames. 2024-09-17 17:18:42,411 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 17:18:48,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-17 17:18:56,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=265440.0, ans=0.0 2024-09-17 17:19:54,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=265560.0, ans=0.0 2024-09-17 17:19:57,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=265600.0, ans=0.1 2024-09-17 17:19:58,993 INFO [train.py:1198] (0/2) Epoch 15, batch 3050, loss[loss=0.239, ctc_loss=0.1377, cr_loss=0.3894, attn_decoder_loss=0.2416, over 29540.00 frames. ], tot_loss[loss=0.2552, ctc_loss=0.1508, cr_loss=0.3933, attn_decoder_loss=0.2581, over 5777185.16 frames. ], batch size: 76, lr: 7.42e-03, grad_scale: 8.0 2024-09-17 17:20:11,620 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:20:13,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=265640.0, ans=0.0 2024-09-17 17:20:15,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265640.0, ans=0.1 2024-09-17 17:20:45,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=265720.0, ans=0.2 2024-09-17 17:20:49,578 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 9.345e+01 1.016e+02 1.110e+02 2.723e+02, threshold=2.032e+02, percent-clipped=2.0 2024-09-17 17:20:51,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.30 vs. limit=10.0 2024-09-17 17:20:59,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-17 17:21:03,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=265760.0, ans=0.0 2024-09-17 17:21:16,563 INFO [train.py:1198] (0/2) Epoch 15, batch 3100, loss[loss=0.2721, ctc_loss=0.1563, cr_loss=0.4015, attn_decoder_loss=0.2761, over 29300.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1502, cr_loss=0.3927, attn_decoder_loss=0.2574, over 5777404.79 frames. ], batch size: 100, lr: 7.42e-03, grad_scale: 8.0 2024-09-17 17:21:46,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-09-17 17:21:49,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-09-17 17:22:12,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=265920.0, ans=0.125 2024-09-17 17:22:16,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=265920.0, ans=0.04949747468305833 2024-09-17 17:22:34,714 INFO [train.py:1198] (0/2) Epoch 15, batch 3150, loss[loss=0.2675, ctc_loss=0.161, cr_loss=0.3955, attn_decoder_loss=0.2706, over 28827.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1498, cr_loss=0.3915, attn_decoder_loss=0.2572, over 5783964.54 frames. ], batch size: 104, lr: 7.42e-03, grad_scale: 8.0 2024-09-17 17:22:38,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266000.0, ans=0.1 2024-09-17 17:22:41,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266000.0, ans=0.0 2024-09-17 17:22:48,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=266040.0, ans=0.0 2024-09-17 17:22:50,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2024-09-17 17:23:01,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-17 17:23:23,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.876e+01 9.219e+01 9.735e+01 3.011e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-17 17:23:34,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-17 17:23:43,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.15 vs. limit=15.0 2024-09-17 17:23:49,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=266200.0, ans=0.125 2024-09-17 17:23:50,659 INFO [train.py:1198] (0/2) Epoch 15, batch 3200, loss[loss=0.2563, ctc_loss=0.1521, cr_loss=0.4137, attn_decoder_loss=0.2587, over 29430.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1494, cr_loss=0.3912, attn_decoder_loss=0.2568, over 5793834.60 frames. ], batch size: 79, lr: 7.42e-03, grad_scale: 16.0 2024-09-17 17:24:12,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=266240.0, ans=0.0 2024-09-17 17:24:14,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-09-17 17:24:22,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=266280.0, ans=0.125 2024-09-17 17:24:30,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=266280.0, ans=0.125 2024-09-17 17:24:36,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266320.0, ans=0.1 2024-09-17 17:24:55,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=266360.0, ans=0.025 2024-09-17 17:25:09,281 INFO [train.py:1198] (0/2) Epoch 15, batch 3250, loss[loss=0.2574, ctc_loss=0.1425, cr_loss=0.3785, attn_decoder_loss=0.2617, over 29704.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1496, cr_loss=0.3919, attn_decoder_loss=0.2572, over 5800882.32 frames. ], batch size: 84, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:25:17,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=266400.0, ans=0.0 2024-09-17 17:25:31,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=266440.0, ans=0.125 2024-09-17 17:26:00,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.12 vs. limit=15.0 2024-09-17 17:26:01,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.753e+01 9.181e+01 1.001e+02 1.564e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-17 17:26:26,784 INFO [train.py:1198] (0/2) Epoch 15, batch 3300, loss[loss=0.2573, ctc_loss=0.1457, cr_loss=0.3755, attn_decoder_loss=0.2613, over 28326.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1484, cr_loss=0.39, attn_decoder_loss=0.2557, over 5798298.49 frames. ], batch size: 111, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:26:38,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.76 vs. limit=22.5 2024-09-17 17:26:59,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2024-09-17 17:27:03,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=266680.0, ans=0.05 2024-09-17 17:27:05,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=266680.0, ans=0.125 2024-09-17 17:27:19,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=266720.0, ans=0.0 2024-09-17 17:27:20,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=22.5 2024-09-17 17:27:21,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266720.0, ans=0.0 2024-09-17 17:27:22,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-09-17 17:27:29,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266760.0, ans=0.0 2024-09-17 17:27:38,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=266760.0, ans=0.125 2024-09-17 17:27:42,413 INFO [train.py:1198] (0/2) Epoch 15, batch 3350, loss[loss=0.2736, ctc_loss=0.1714, cr_loss=0.4245, attn_decoder_loss=0.2756, over 28936.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1495, cr_loss=0.3915, attn_decoder_loss=0.2568, over 5775258.89 frames. ], batch size: 104, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:27:47,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=266800.0, ans=0.025 2024-09-17 17:27:47,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2024-09-17 17:28:05,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=266840.0, ans=0.0 2024-09-17 17:28:08,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=266840.0, ans=0.05 2024-09-17 17:28:11,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=266880.0, ans=0.95 2024-09-17 17:28:23,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=266880.0, ans=0.2 2024-09-17 17:28:34,702 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.002e+01 8.982e+01 9.740e+01 1.080e+02 2.374e+02, threshold=1.948e+02, percent-clipped=1.0 2024-09-17 17:28:35,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-17 17:28:42,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266920.0, ans=0.1 2024-09-17 17:28:48,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=266960.0, ans=0.125 2024-09-17 17:28:57,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=266960.0, ans=0.125 2024-09-17 17:29:00,482 INFO [train.py:1198] (0/2) Epoch 15, batch 3400, loss[loss=0.2286, ctc_loss=0.135, cr_loss=0.3675, attn_decoder_loss=0.2308, over 29323.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1496, cr_loss=0.3908, attn_decoder_loss=0.2568, over 5767254.96 frames. ], batch size: 67, lr: 7.41e-03, grad_scale: 8.0 2024-09-17 17:29:06,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=267000.0, ans=0.2 2024-09-17 17:29:14,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=267040.0, ans=0.125 2024-09-17 17:29:27,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=267040.0, ans=0.0 2024-09-17 17:29:30,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=267040.0, ans=0.125 2024-09-17 17:29:42,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=267080.0, ans=0.125 2024-09-17 17:29:56,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=267120.0, ans=0.025 2024-09-17 17:30:01,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-17 17:30:05,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2024-09-17 17:30:18,827 INFO [train.py:1198] (0/2) Epoch 15, batch 3450, loss[loss=0.2599, ctc_loss=0.1528, cr_loss=0.3888, attn_decoder_loss=0.2632, over 28277.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.15, cr_loss=0.3919, attn_decoder_loss=0.2573, over 5776079.01 frames. ], batch size: 111, lr: 7.40e-03, grad_scale: 8.0 2024-09-17 17:30:21,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-09-17 17:30:22,162 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:30:25,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=267200.0, ans=0.07 2024-09-17 17:30:35,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=267240.0, ans=0.125 2024-09-17 17:30:40,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=267240.0, ans=0.2 2024-09-17 17:31:08,470 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.765e+01 9.280e+01 9.883e+01 2.461e+02, threshold=1.856e+02, percent-clipped=1.0 2024-09-17 17:31:30,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.41 vs. limit=15.0 2024-09-17 17:31:34,397 INFO [train.py:1198] (0/2) Epoch 15, batch 3500, loss[loss=0.2362, ctc_loss=0.1369, cr_loss=0.3832, attn_decoder_loss=0.2388, over 29327.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1493, cr_loss=0.3908, attn_decoder_loss=0.2566, over 5778309.39 frames. ], batch size: 71, lr: 7.40e-03, grad_scale: 8.0 2024-09-17 17:31:45,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=267400.0, ans=0.0 2024-09-17 17:31:54,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=267440.0, ans=0.0 2024-09-17 17:31:58,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=267440.0, ans=0.0 2024-09-17 17:32:11,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=267480.0, ans=0.125 2024-09-17 17:32:19,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=267520.0, ans=0.125 2024-09-17 17:32:28,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=267520.0, ans=0.2 2024-09-17 17:32:33,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=267560.0, ans=0.125 2024-09-17 17:32:42,544 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:32:51,219 INFO [train.py:1198] (0/2) Epoch 15, batch 3550, loss[loss=0.2562, ctc_loss=0.1403, cr_loss=0.3888, attn_decoder_loss=0.2605, over 29713.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1491, cr_loss=0.3903, attn_decoder_loss=0.2567, over 5784464.16 frames. ], batch size: 89, lr: 7.40e-03, grad_scale: 8.0 2024-09-17 17:33:04,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=267640.0, ans=0.0 2024-09-17 17:33:10,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=267640.0, ans=0.125 2024-09-17 17:33:13,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.21 vs. limit=22.5 2024-09-17 17:33:39,995 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.842e+01 9.279e+01 9.951e+01 4.838e+02, threshold=1.856e+02, percent-clipped=2.0 2024-09-17 17:34:00,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=267760.0, ans=0.125 2024-09-17 17:34:05,140 INFO [train.py:1198] (0/2) Epoch 15, batch 3600, loss[loss=0.2483, ctc_loss=0.1351, cr_loss=0.3697, attn_decoder_loss=0.2527, over 29523.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.149, cr_loss=0.3908, attn_decoder_loss=0.2569, over 5792742.25 frames. ], batch size: 77, lr: 7.39e-03, grad_scale: 16.0 2024-09-17 17:34:15,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=267800.0, ans=0.025 2024-09-17 17:34:34,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=267840.0, ans=0.025 2024-09-17 17:34:37,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=267880.0, ans=0.125 2024-09-17 17:34:41,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.43 vs. limit=15.0 2024-09-17 17:34:49,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=267880.0, ans=0.125 2024-09-17 17:34:51,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=267920.0, ans=0.1 2024-09-17 17:35:22,549 INFO [train.py:1198] (0/2) Epoch 15, batch 3650, loss[loss=0.26, ctc_loss=0.1472, cr_loss=0.3816, attn_decoder_loss=0.2641, over 29519.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1484, cr_loss=0.3896, attn_decoder_loss=0.2563, over 5795247.96 frames. ], batch size: 90, lr: 7.39e-03, grad_scale: 8.0 2024-09-17 17:35:25,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268000.0, ans=0.1 2024-09-17 17:35:27,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268000.0, ans=0.1 2024-09-17 17:35:28,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=268000.0, ans=0.0 2024-09-17 17:35:30,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=268000.0, ans=0.1 2024-09-17 17:35:34,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=268000.0, ans=0.125 2024-09-17 17:35:38,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-09-17 17:35:39,385 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:35:44,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.0 2024-09-17 17:36:00,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=268080.0, ans=0.04949747468305833 2024-09-17 17:36:13,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.685e+01 9.367e+01 9.867e+01 1.459e+02, threshold=1.873e+02, percent-clipped=0.0 2024-09-17 17:36:24,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.04 vs. limit=22.5 2024-09-17 17:36:29,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-17 17:36:30,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=268160.0, ans=0.125 2024-09-17 17:36:37,436 INFO [train.py:1198] (0/2) Epoch 15, batch 3700, loss[loss=0.261, ctc_loss=0.1531, cr_loss=0.39, attn_decoder_loss=0.2643, over 29698.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1483, cr_loss=0.3895, attn_decoder_loss=0.2563, over 5804548.43 frames. ], batch size: 84, lr: 7.39e-03, grad_scale: 8.0 2024-09-17 17:36:39,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=268200.0, ans=0.04949747468305833 2024-09-17 17:36:48,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=268200.0, ans=0.125 2024-09-17 17:37:02,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268240.0, ans=0.1 2024-09-17 17:37:43,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=268360.0, ans=0.0 2024-09-17 17:37:51,820 INFO [train.py:1198] (0/2) Epoch 15, batch 3750, loss[loss=0.2284, ctc_loss=0.1328, cr_loss=0.3598, attn_decoder_loss=0.231, over 29324.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1483, cr_loss=0.3895, attn_decoder_loss=0.2561, over 5808547.96 frames. ], batch size: 67, lr: 7.39e-03, grad_scale: 8.0 2024-09-17 17:38:03,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=268400.0, ans=0.0 2024-09-17 17:38:05,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=268440.0, ans=0.125 2024-09-17 17:38:08,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=268440.0, ans=0.125 2024-09-17 17:38:25,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.13 vs. limit=15.0 2024-09-17 17:38:42,232 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 8.742e+01 9.217e+01 9.952e+01 4.415e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-17 17:38:48,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=268520.0, ans=0.2 2024-09-17 17:39:07,748 INFO [train.py:1198] (0/2) Epoch 15, batch 3800, loss[loss=0.2613, ctc_loss=0.1544, cr_loss=0.4144, attn_decoder_loss=0.2639, over 29639.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1483, cr_loss=0.3888, attn_decoder_loss=0.2558, over 5798923.00 frames. ], batch size: 86, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:39:35,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268640.0, ans=0.1 2024-09-17 17:39:42,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=268680.0, ans=0.0 2024-09-17 17:39:53,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-17 17:39:56,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=268720.0, ans=0.0 2024-09-17 17:40:06,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=268760.0, ans=0.0 2024-09-17 17:40:12,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.58 vs. limit=10.0 2024-09-17 17:40:15,414 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:40:24,555 INFO [train.py:1198] (0/2) Epoch 15, batch 3850, loss[loss=0.2711, ctc_loss=0.1621, cr_loss=0.4053, attn_decoder_loss=0.2743, over 29173.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.148, cr_loss=0.3886, attn_decoder_loss=0.2556, over 5812029.28 frames. ], batch size: 100, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:40:30,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-09-17 17:40:42,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=268840.0, ans=0.0 2024-09-17 17:41:01,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-09-17 17:41:08,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=268920.0, ans=0.2 2024-09-17 17:41:16,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.057e+01 9.751e+01 1.063e+02 2.027e+02, threshold=1.950e+02, percent-clipped=1.0 2024-09-17 17:41:17,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=268920.0, ans=0.125 2024-09-17 17:41:19,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=268920.0, ans=0.125 2024-09-17 17:41:36,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=268960.0, ans=0.125 2024-09-17 17:41:38,864 INFO [train.py:1198] (0/2) Epoch 15, batch 3900, loss[loss=0.256, ctc_loss=0.1493, cr_loss=0.3943, attn_decoder_loss=0.2591, over 29631.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1483, cr_loss=0.389, attn_decoder_loss=0.2559, over 5817144.81 frames. ], batch size: 86, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:41:42,257 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:41:42,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-17 17:41:55,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=269040.0, ans=0.125 2024-09-17 17:42:08,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.81 vs. limit=22.5 2024-09-17 17:42:24,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=269120.0, ans=0.0 2024-09-17 17:42:39,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=269160.0, ans=0.125 2024-09-17 17:42:49,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=269160.0, ans=0.5 2024-09-17 17:42:52,609 INFO [train.py:1198] (0/2) Epoch 15, batch 3950, loss[loss=0.2598, ctc_loss=0.1508, cr_loss=0.3809, attn_decoder_loss=0.2635, over 29481.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1487, cr_loss=0.3904, attn_decoder_loss=0.2563, over 5836455.69 frames. ], batch size: 97, lr: 7.38e-03, grad_scale: 8.0 2024-09-17 17:42:52,998 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:43:10,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2024-09-17 17:43:16,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269240.0, ans=0.1 2024-09-17 17:43:19,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=269240.0, ans=0.1 2024-09-17 17:43:32,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=269280.0, ans=0.125 2024-09-17 17:43:44,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.088e+01 9.576e+01 1.054e+02 2.878e+02, threshold=1.915e+02, percent-clipped=1.0 2024-09-17 17:43:45,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=22.5 2024-09-17 17:44:07,787 INFO [train.py:1198] (0/2) Epoch 15, batch 4000, loss[loss=0.2367, ctc_loss=0.129, cr_loss=0.348, attn_decoder_loss=0.241, over 29512.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.149, cr_loss=0.3899, attn_decoder_loss=0.2562, over 5812695.14 frames. ], batch size: 74, lr: 7.37e-03, grad_scale: 16.0 2024-09-17 17:44:09,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=12.0 2024-09-17 17:44:14,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-09-17 17:44:18,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=269400.0, ans=0.125 2024-09-17 17:44:27,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=269440.0, ans=0.125 2024-09-17 17:44:49,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.75 vs. limit=15.0 2024-09-17 17:44:50,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=269520.0, ans=0.05 2024-09-17 17:44:53,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269520.0, ans=0.125 2024-09-17 17:45:08,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=269560.0, ans=0.1 2024-09-17 17:45:22,419 INFO [train.py:1198] (0/2) Epoch 15, batch 4050, loss[loss=0.2871, ctc_loss=0.2084, cr_loss=0.4481, attn_decoder_loss=0.2859, over 20184.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1489, cr_loss=0.3891, attn_decoder_loss=0.256, over 5796553.32 frames. ], batch size: 210, lr: 7.37e-03, grad_scale: 8.0 2024-09-17 17:46:16,373 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.913e+01 9.474e+01 1.030e+02 4.406e+02, threshold=1.895e+02, percent-clipped=2.0 2024-09-17 17:46:25,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=269760.0, ans=0.125 2024-09-17 17:46:37,091 INFO [train.py:1198] (0/2) Epoch 15, batch 4100, loss[loss=0.2635, ctc_loss=0.1552, cr_loss=0.3957, attn_decoder_loss=0.2667, over 29503.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.149, cr_loss=0.3892, attn_decoder_loss=0.2561, over 5791044.48 frames. ], batch size: 90, lr: 7.37e-03, grad_scale: 8.0 2024-09-17 17:46:53,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=269840.0, ans=0.2 2024-09-17 17:46:55,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2024-09-17 17:47:00,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=269840.0, ans=0.125 2024-09-17 17:47:09,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=269880.0, ans=0.0 2024-09-17 17:47:14,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=269880.0, ans=0.0 2024-09-17 17:47:36,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=269960.0, ans=0.125 2024-09-17 17:47:50,742 INFO [train.py:1198] (0/2) Epoch 15, batch 4150, loss[loss=0.2539, ctc_loss=0.1519, cr_loss=0.3934, attn_decoder_loss=0.2565, over 29511.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1488, cr_loss=0.389, attn_decoder_loss=0.256, over 5796622.69 frames. ], batch size: 77, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:47:58,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=270000.0, ans=0.0 2024-09-17 17:48:21,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=270080.0, ans=0.0 2024-09-17 17:48:41,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=270120.0, ans=0.125 2024-09-17 17:48:43,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.23 vs. limit=15.0 2024-09-17 17:48:45,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.781e+01 9.267e+01 9.931e+01 2.534e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-17 17:49:00,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270160.0, ans=0.1 2024-09-17 17:49:02,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.51 vs. limit=22.5 2024-09-17 17:49:06,155 INFO [train.py:1198] (0/2) Epoch 15, batch 4200, loss[loss=0.2727, ctc_loss=0.1663, cr_loss=0.4252, attn_decoder_loss=0.2751, over 29483.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1492, cr_loss=0.3906, attn_decoder_loss=0.2565, over 5799218.26 frames. ], batch size: 90, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:49:07,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=270200.0, ans=0.125 2024-09-17 17:49:18,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=270200.0, ans=0.0 2024-09-17 17:49:22,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=270240.0, ans=0.125 2024-09-17 17:49:23,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-17 17:49:51,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=270320.0, ans=0.05 2024-09-17 17:49:55,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=270320.0, ans=0.2 2024-09-17 17:49:59,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=270320.0, ans=0.025 2024-09-17 17:50:01,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=270320.0, ans=0.0 2024-09-17 17:50:21,303 INFO [train.py:1198] (0/2) Epoch 15, batch 4250, loss[loss=0.2338, ctc_loss=0.1343, cr_loss=0.3669, attn_decoder_loss=0.2367, over 29511.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.149, cr_loss=0.3903, attn_decoder_loss=0.2564, over 5805216.19 frames. ], batch size: 74, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:50:31,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270400.0, ans=0.1 2024-09-17 17:50:44,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=270440.0, ans=0.125 2024-09-17 17:51:14,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.903e+01 9.469e+01 1.020e+02 2.237e+02, threshold=1.894e+02, percent-clipped=2.0 2024-09-17 17:51:35,157 INFO [train.py:1198] (0/2) Epoch 15, batch 4300, loss[loss=0.2652, ctc_loss=0.1595, cr_loss=0.3996, attn_decoder_loss=0.2681, over 29538.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1493, cr_loss=0.3907, attn_decoder_loss=0.2567, over 5794554.82 frames. ], batch size: 87, lr: 7.36e-03, grad_scale: 8.0 2024-09-17 17:51:35,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=12.0 2024-09-17 17:51:40,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=270600.0, ans=0.125 2024-09-17 17:51:45,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=270600.0, ans=0.125 2024-09-17 17:52:11,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-09-17 17:52:30,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.43 vs. limit=6.0 2024-09-17 17:52:47,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=270760.0, ans=0.07 2024-09-17 17:52:50,150 INFO [train.py:1198] (0/2) Epoch 15, batch 4350, loss[loss=0.2686, ctc_loss=0.1691, cr_loss=0.4199, attn_decoder_loss=0.2703, over 29480.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1521, cr_loss=0.396, attn_decoder_loss=0.2601, over 5796972.83 frames. ], batch size: 97, lr: 7.35e-03, grad_scale: 8.0 2024-09-17 17:53:10,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=270840.0, ans=0.2 2024-09-17 17:53:11,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270840.0, ans=0.1 2024-09-17 17:53:26,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=270880.0, ans=0.04949747468305833 2024-09-17 17:53:33,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=270920.0, ans=0.0 2024-09-17 17:53:43,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.462e+01 9.074e+01 9.575e+01 1.004e+02 1.676e+02, threshold=1.915e+02, percent-clipped=0.0 2024-09-17 17:53:46,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=270920.0, ans=0.125 2024-09-17 17:54:04,393 INFO [train.py:1198] (0/2) Epoch 15, batch 4400, loss[loss=0.2732, ctc_loss=0.1709, cr_loss=0.4242, attn_decoder_loss=0.2752, over 27497.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1537, cr_loss=0.3984, attn_decoder_loss=0.2624, over 5768045.99 frames. ], batch size: 124, lr: 7.35e-03, grad_scale: 16.0 2024-09-17 17:54:26,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2024-09-17 17:54:59,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=271120.0, ans=0.125 2024-09-17 17:55:06,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.62 vs. limit=10.0 2024-09-17 17:55:07,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271160.0, ans=0.125 2024-09-17 17:55:12,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=271160.0, ans=0.0 2024-09-17 17:55:15,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=271160.0, ans=0.1 2024-09-17 17:55:15,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.20 vs. limit=15.0 2024-09-17 17:55:19,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-17 17:55:19,873 INFO [train.py:1198] (0/2) Epoch 15, batch 4450, loss[loss=0.2903, ctc_loss=0.1907, cr_loss=0.4169, attn_decoder_loss=0.2921, over 19964.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1585, cr_loss=0.4026, attn_decoder_loss=0.2652, over 5582213.43 frames. ], batch size: 210, lr: 7.35e-03, grad_scale: 8.0 2024-09-17 17:55:21,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=271200.0, ans=0.07 2024-09-17 17:55:33,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271240.0, ans=0.125 2024-09-17 17:55:38,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271240.0, ans=0.0 2024-09-17 17:55:47,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=271240.0, ans=10.0 2024-09-17 17:56:17,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 9.506e+01 1.069e+02 1.178e+02 1.981e+02, threshold=2.138e+02, percent-clipped=1.0 2024-09-17 17:56:17,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=271320.0, ans=0.0 2024-09-17 17:56:26,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271360.0, ans=0.125 2024-09-17 17:56:28,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-09-17 17:56:34,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-09-17 17:56:35,243 INFO [train.py:1198] (0/2) Epoch 15, batch 4500, loss[loss=0.2829, ctc_loss=0.1976, cr_loss=0.429, attn_decoder_loss=0.2828, over 20017.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.1646, cr_loss=0.4051, attn_decoder_loss=0.2681, over 5242031.97 frames. ], batch size: 209, lr: 7.35e-03, grad_scale: 8.0 2024-09-17 17:56:41,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-09-17 17:57:03,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271480.0, ans=0.1 2024-09-17 17:57:12,404 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-15.pt 2024-09-17 17:58:05,194 INFO [train.py:1198] (0/2) Epoch 16, batch 0, loss[loss=0.2381, ctc_loss=0.1341, cr_loss=0.3719, attn_decoder_loss=0.2414, over 29607.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1341, cr_loss=0.3719, attn_decoder_loss=0.2414, over 29607.00 frames. ], batch size: 73, lr: 7.11e-03, grad_scale: 16.0 2024-09-17 17:58:05,195 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 17:58:11,275 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([4.6519, 4.1508, 3.7418, 4.4602, 3.5596, 3.6036, 3.7345, 3.8424], device='cuda:0') 2024-09-17 17:58:23,633 INFO [train.py:1230] (0/2) Epoch 16, validation: loss=0.2124, ctc_loss=0.04089, cr_loss=4.638e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-17 17:58:23,633 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 17:59:07,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271580.0, ans=0.1 2024-09-17 17:59:31,701 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:59:34,718 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:59:37,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=271660.0, ans=0.2 2024-09-17 17:59:40,418 INFO [train.py:1198] (0/2) Epoch 16, batch 50, loss[loss=0.2235, ctc_loss=0.1261, cr_loss=0.3493, attn_decoder_loss=0.2265, over 29473.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.1504, cr_loss=0.3931, attn_decoder_loss=0.2571, over 1269053.65 frames. ], batch size: 70, lr: 7.11e-03, grad_scale: 8.0 2024-09-17 17:59:45,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-09-17 17:59:52,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=271700.0, ans=0.025 2024-09-17 18:00:01,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 1.005e+02 1.104e+02 1.206e+02 4.510e+02, threshold=2.208e+02, percent-clipped=2.0 2024-09-17 18:00:27,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271820.0, ans=0.1 2024-09-17 18:00:29,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=271820.0, ans=0.025 2024-09-17 18:00:49,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=271860.0, ans=0.125 2024-09-17 18:00:56,278 INFO [train.py:1198] (0/2) Epoch 16, batch 100, loss[loss=0.2485, ctc_loss=0.1447, cr_loss=0.3974, attn_decoder_loss=0.2512, over 29537.00 frames. ], tot_loss[loss=0.2575, ctc_loss=0.1534, cr_loss=0.3982, attn_decoder_loss=0.2602, over 2254360.75 frames. ], batch size: 76, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:01:01,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=271900.0, ans=0.04949747468305833 2024-09-17 18:01:08,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271900.0, ans=0.0 2024-09-17 18:01:13,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=271940.0, ans=0.0 2024-09-17 18:01:20,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=271940.0, ans=0.5 2024-09-17 18:01:35,169 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-68000.pt 2024-09-17 18:01:53,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-09-17 18:01:54,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=22.5 2024-09-17 18:02:00,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=272020.0, ans=0.0 2024-09-17 18:02:00,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=272020.0, ans=0.05 2024-09-17 18:02:16,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=272060.0, ans=0.125 2024-09-17 18:02:21,070 INFO [train.py:1198] (0/2) Epoch 16, batch 150, loss[loss=0.2335, ctc_loss=0.1354, cr_loss=0.3821, attn_decoder_loss=0.2359, over 29424.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1503, cr_loss=0.3947, attn_decoder_loss=0.2577, over 3048836.26 frames. ], batch size: 70, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:02:32,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=272100.0, ans=0.125 2024-09-17 18:02:34,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-17 18:02:42,307 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.615e+01 9.462e+01 1.007e+02 3.571e+02, threshold=1.892e+02, percent-clipped=1.0 2024-09-17 18:02:49,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=272140.0, ans=0.125 2024-09-17 18:02:53,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=272180.0, ans=0.125 2024-09-17 18:02:53,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=272180.0, ans=0.95 2024-09-17 18:03:05,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=272180.0, ans=0.125 2024-09-17 18:03:13,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272220.0, ans=0.0 2024-09-17 18:03:13,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272220.0, ans=0.0 2024-09-17 18:03:22,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272260.0, ans=0.0 2024-09-17 18:03:26,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=272260.0, ans=0.2 2024-09-17 18:03:32,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=272260.0, ans=0.0 2024-09-17 18:03:38,613 INFO [train.py:1198] (0/2) Epoch 16, batch 200, loss[loss=0.26, ctc_loss=0.1474, cr_loss=0.3844, attn_decoder_loss=0.264, over 27525.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.149, cr_loss=0.392, attn_decoder_loss=0.2565, over 3661025.70 frames. ], batch size: 125, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:03:44,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.17 vs. limit=15.0 2024-09-17 18:03:54,029 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:04:31,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-17 18:04:49,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-17 18:04:54,435 INFO [train.py:1198] (0/2) Epoch 16, batch 250, loss[loss=0.2656, ctc_loss=0.1601, cr_loss=0.3972, attn_decoder_loss=0.2685, over 29222.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.149, cr_loss=0.3921, attn_decoder_loss=0.2566, over 4140752.08 frames. ], batch size: 100, lr: 7.10e-03, grad_scale: 8.0 2024-09-17 18:05:15,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.691e+01 9.311e+01 9.688e+01 2.016e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-17 18:05:17,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=272540.0, ans=0.125 2024-09-17 18:05:17,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=272540.0, ans=0.125 2024-09-17 18:05:46,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=272620.0, ans=0.1 2024-09-17 18:06:00,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=272660.0, ans=0.2 2024-09-17 18:06:00,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-17 18:06:01,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=272660.0, ans=0.0 2024-09-17 18:06:08,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-17 18:06:12,051 INFO [train.py:1198] (0/2) Epoch 16, batch 300, loss[loss=0.2698, ctc_loss=0.1632, cr_loss=0.4016, attn_decoder_loss=0.2727, over 29513.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1483, cr_loss=0.3904, attn_decoder_loss=0.2561, over 4508665.41 frames. ], batch size: 92, lr: 7.09e-03, grad_scale: 8.0 2024-09-17 18:06:49,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=272780.0, ans=0.0 2024-09-17 18:06:52,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=272780.0, ans=0.2 2024-09-17 18:06:58,812 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:07:09,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=272820.0, ans=0.0 2024-09-17 18:07:16,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-17 18:07:25,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-09-17 18:07:30,500 INFO [train.py:1198] (0/2) Epoch 16, batch 350, loss[loss=0.2245, ctc_loss=0.1252, cr_loss=0.3522, attn_decoder_loss=0.2277, over 29358.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1481, cr_loss=0.39, attn_decoder_loss=0.2563, over 4794839.04 frames. ], batch size: 71, lr: 7.09e-03, grad_scale: 8.0 2024-09-17 18:07:51,719 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.832e+01 9.583e+01 1.052e+02 2.461e+02, threshold=1.917e+02, percent-clipped=3.0 2024-09-17 18:08:06,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-09-17 18:08:11,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=272980.0, ans=0.125 2024-09-17 18:08:14,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=273020.0, ans=0.125 2024-09-17 18:08:19,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=273020.0, ans=0.0 2024-09-17 18:08:22,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=273020.0, ans=0.125 2024-09-17 18:08:28,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=273020.0, ans=0.2 2024-09-17 18:08:45,896 INFO [train.py:1198] (0/2) Epoch 16, batch 400, loss[loss=0.2575, ctc_loss=0.1454, cr_loss=0.3875, attn_decoder_loss=0.2614, over 29700.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1476, cr_loss=0.389, attn_decoder_loss=0.2559, over 5024708.72 frames. ], batch size: 82, lr: 7.09e-03, grad_scale: 16.0 2024-09-17 18:08:53,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=273100.0, ans=0.0 2024-09-17 18:09:19,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=273180.0, ans=0.07 2024-09-17 18:09:32,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273220.0, ans=0.125 2024-09-17 18:09:35,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=273220.0, ans=0.2 2024-09-17 18:09:37,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=273220.0, ans=0.125 2024-09-17 18:09:56,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273260.0, ans=0.125 2024-09-17 18:10:04,238 INFO [train.py:1198] (0/2) Epoch 16, batch 450, loss[loss=0.2514, ctc_loss=0.14, cr_loss=0.3828, attn_decoder_loss=0.2553, over 29695.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1479, cr_loss=0.3893, attn_decoder_loss=0.2562, over 5186869.79 frames. ], batch size: 83, lr: 7.09e-03, grad_scale: 8.0 2024-09-17 18:10:09,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2024-09-17 18:10:10,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=273300.0, ans=0.5 2024-09-17 18:10:19,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=273340.0, ans=0.04949747468305833 2024-09-17 18:10:26,870 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.756e+01 9.412e+01 1.001e+02 2.554e+02, threshold=1.882e+02, percent-clipped=1.0 2024-09-17 18:11:19,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=273460.0, ans=0.0 2024-09-17 18:11:22,632 INFO [train.py:1198] (0/2) Epoch 16, batch 500, loss[loss=0.2749, ctc_loss=0.1688, cr_loss=0.4389, attn_decoder_loss=0.2769, over 29398.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1467, cr_loss=0.3877, attn_decoder_loss=0.2553, over 5329425.74 frames. ], batch size: 94, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:11:27,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273500.0, ans=0.1 2024-09-17 18:11:32,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=273500.0, ans=0.05 2024-09-17 18:11:44,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=273540.0, ans=0.0 2024-09-17 18:11:56,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-17 18:12:12,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=273620.0, ans=0.0 2024-09-17 18:12:39,146 INFO [train.py:1198] (0/2) Epoch 16, batch 550, loss[loss=0.2645, ctc_loss=0.155, cr_loss=0.3878, attn_decoder_loss=0.2681, over 28739.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1469, cr_loss=0.388, attn_decoder_loss=0.2555, over 5423177.23 frames. ], batch size: 104, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:12:42,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=273700.0, ans=0.125 2024-09-17 18:12:57,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273740.0, ans=0.1 2024-09-17 18:12:58,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2024-09-17 18:13:01,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.696e+01 8.801e+01 9.400e+01 1.011e+02 1.613e+02, threshold=1.880e+02, percent-clipped=0.0 2024-09-17 18:13:15,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=273780.0, ans=0.0 2024-09-17 18:13:24,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-09-17 18:13:33,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2024-09-17 18:13:40,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=273860.0, ans=0.0 2024-09-17 18:13:57,164 INFO [train.py:1198] (0/2) Epoch 16, batch 600, loss[loss=0.2657, ctc_loss=0.1564, cr_loss=0.4048, attn_decoder_loss=0.2689, over 29204.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1474, cr_loss=0.389, attn_decoder_loss=0.2559, over 5509145.01 frames. ], batch size: 100, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:14:42,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=273980.0, ans=0.2 2024-09-17 18:14:43,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=274020.0, ans=0.0 2024-09-17 18:14:49,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=274020.0, ans=0.125 2024-09-17 18:15:03,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=274060.0, ans=0.0 2024-09-17 18:15:04,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=274060.0, ans=0.2 2024-09-17 18:15:15,061 INFO [train.py:1198] (0/2) Epoch 16, batch 650, loss[loss=0.2463, ctc_loss=0.1346, cr_loss=0.3688, attn_decoder_loss=0.2506, over 29742.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1465, cr_loss=0.3882, attn_decoder_loss=0.2552, over 5587059.91 frames. ], batch size: 81, lr: 7.08e-03, grad_scale: 8.0 2024-09-17 18:15:21,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=274100.0, ans=0.125 2024-09-17 18:15:22,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=274100.0, ans=0.125 2024-09-17 18:15:28,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=274140.0, ans=0.015 2024-09-17 18:15:32,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.74 vs. limit=10.0 2024-09-17 18:15:37,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 8.981e+01 9.378e+01 1.004e+02 1.703e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-17 18:15:56,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=274180.0, ans=0.125 2024-09-17 18:15:56,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=274180.0, ans=0.0 2024-09-17 18:16:08,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=274220.0, ans=22.5 2024-09-17 18:16:14,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=274260.0, ans=0.125 2024-09-17 18:16:30,948 INFO [train.py:1198] (0/2) Epoch 16, batch 700, loss[loss=0.2418, ctc_loss=0.1377, cr_loss=0.3823, attn_decoder_loss=0.2449, over 29539.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1472, cr_loss=0.3894, attn_decoder_loss=0.2557, over 5636577.89 frames. ], batch size: 76, lr: 7.07e-03, grad_scale: 8.0 2024-09-17 18:16:35,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=274300.0, ans=0.0 2024-09-17 18:16:40,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=274300.0, ans=0.125 2024-09-17 18:16:44,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=274340.0, ans=0.1 2024-09-17 18:16:52,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=274340.0, ans=0.07 2024-09-17 18:17:03,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.08 vs. limit=15.0 2024-09-17 18:17:05,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=274380.0, ans=0.125 2024-09-17 18:17:21,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=274420.0, ans=0.015 2024-09-17 18:17:23,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=274420.0, ans=0.125 2024-09-17 18:17:34,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2024-09-17 18:17:49,231 INFO [train.py:1198] (0/2) Epoch 16, batch 750, loss[loss=0.2581, ctc_loss=0.1551, cr_loss=0.4156, attn_decoder_loss=0.2603, over 29693.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1468, cr_loss=0.3887, attn_decoder_loss=0.2554, over 5675367.89 frames. ], batch size: 82, lr: 7.07e-03, grad_scale: 8.0 2024-09-17 18:17:50,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=274500.0, ans=0.025 2024-09-17 18:17:52,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=274500.0, ans=0.125 2024-09-17 18:17:55,662 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:18:05,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-09-17 18:18:05,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=274540.0, ans=0.125 2024-09-17 18:18:07,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=274540.0, ans=0.025 2024-09-17 18:18:11,564 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.564e+01 9.225e+01 9.974e+01 3.199e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-17 18:18:16,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274540.0, ans=0.1 2024-09-17 18:18:40,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=274620.0, ans=0.125 2024-09-17 18:18:47,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=274620.0, ans=0.0 2024-09-17 18:18:56,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=274660.0, ans=0.0 2024-09-17 18:19:02,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=274660.0, ans=0.0 2024-09-17 18:19:06,958 INFO [train.py:1198] (0/2) Epoch 16, batch 800, loss[loss=0.2278, ctc_loss=0.1285, cr_loss=0.3665, attn_decoder_loss=0.2307, over 29625.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1463, cr_loss=0.3878, attn_decoder_loss=0.255, over 5705197.54 frames. ], batch size: 73, lr: 7.07e-03, grad_scale: 16.0 2024-09-17 18:19:10,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=274700.0, ans=0.1 2024-09-17 18:19:15,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.08 vs. limit=10.0 2024-09-17 18:19:25,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=274740.0, ans=0.0 2024-09-17 18:19:26,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=274740.0, ans=0.125 2024-09-17 18:19:32,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=274740.0, ans=0.125 2024-09-17 18:19:46,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.13 vs. limit=15.0 2024-09-17 18:19:50,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=274820.0, ans=0.0 2024-09-17 18:19:53,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=274820.0, ans=0.025 2024-09-17 18:19:55,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=274820.0, ans=0.125 2024-09-17 18:20:02,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=274820.0, ans=0.125 2024-09-17 18:20:22,032 INFO [train.py:1198] (0/2) Epoch 16, batch 850, loss[loss=0.2642, ctc_loss=0.1459, cr_loss=0.3819, attn_decoder_loss=0.2689, over 29715.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1459, cr_loss=0.3867, attn_decoder_loss=0.2545, over 5734938.20 frames. ], batch size: 89, lr: 7.07e-03, grad_scale: 8.0 2024-09-17 18:20:37,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=274940.0, ans=0.125 2024-09-17 18:20:40,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=274940.0, ans=0.2 2024-09-17 18:20:45,875 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.953e+01 9.515e+01 1.010e+02 2.580e+02, threshold=1.903e+02, percent-clipped=2.0 2024-09-17 18:20:52,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=274980.0, ans=0.025 2024-09-17 18:21:04,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=274980.0, ans=0.2 2024-09-17 18:21:13,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=275020.0, ans=0.125 2024-09-17 18:21:28,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2024-09-17 18:21:39,955 INFO [train.py:1198] (0/2) Epoch 16, batch 900, loss[loss=0.2319, ctc_loss=0.1322, cr_loss=0.3775, attn_decoder_loss=0.2346, over 29620.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1467, cr_loss=0.3882, attn_decoder_loss=0.2554, over 5739658.09 frames. ], batch size: 73, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:21:49,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=275100.0, ans=0.025 2024-09-17 18:22:22,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=275180.0, ans=0.025 2024-09-17 18:22:32,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=275220.0, ans=0.1 2024-09-17 18:22:39,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-17 18:22:43,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=275260.0, ans=0.0 2024-09-17 18:22:43,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=275260.0, ans=0.0 2024-09-17 18:22:57,941 INFO [train.py:1198] (0/2) Epoch 16, batch 950, loss[loss=0.2373, ctc_loss=0.1331, cr_loss=0.3602, attn_decoder_loss=0.2408, over 29521.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1472, cr_loss=0.3884, attn_decoder_loss=0.2558, over 5742554.00 frames. ], batch size: 74, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:22:59,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=275300.0, ans=0.125 2024-09-17 18:23:13,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=275340.0, ans=0.0 2024-09-17 18:23:21,954 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.870e+01 9.132e+01 9.762e+01 1.082e+02 2.725e+02, threshold=1.952e+02, percent-clipped=3.0 2024-09-17 18:23:35,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=275380.0, ans=0.05 2024-09-17 18:23:50,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=275420.0, ans=0.125 2024-09-17 18:24:13,029 INFO [train.py:1198] (0/2) Epoch 16, batch 1000, loss[loss=0.2485, ctc_loss=0.1492, cr_loss=0.3975, attn_decoder_loss=0.2507, over 29492.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1481, cr_loss=0.3898, attn_decoder_loss=0.2566, over 5736603.71 frames. ], batch size: 77, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:24:22,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-09-17 18:25:01,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=275620.0, ans=0.0 2024-09-17 18:25:30,816 INFO [train.py:1198] (0/2) Epoch 16, batch 1050, loss[loss=0.2672, ctc_loss=0.157, cr_loss=0.3932, attn_decoder_loss=0.2707, over 29668.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1474, cr_loss=0.3882, attn_decoder_loss=0.2556, over 5743351.25 frames. ], batch size: 85, lr: 7.06e-03, grad_scale: 8.0 2024-09-17 18:25:36,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=12.0 2024-09-17 18:25:46,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=275740.0, ans=0.125 2024-09-17 18:25:55,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.680e+01 9.093e+01 1.030e+02 1.882e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-17 18:26:04,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=275780.0, ans=0.025 2024-09-17 18:26:04,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275780.0, ans=0.1 2024-09-17 18:26:13,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=275780.0, ans=0.0 2024-09-17 18:26:13,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.46 vs. limit=15.0 2024-09-17 18:26:49,383 INFO [train.py:1198] (0/2) Epoch 16, batch 1100, loss[loss=0.2622, ctc_loss=0.1655, cr_loss=0.4261, attn_decoder_loss=0.2635, over 29443.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1476, cr_loss=0.3888, attn_decoder_loss=0.2555, over 5757095.14 frames. ], batch size: 78, lr: 7.05e-03, grad_scale: 8.0 2024-09-17 18:26:58,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=275900.0, ans=0.125 2024-09-17 18:27:00,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=275900.0, ans=0.2 2024-09-17 18:27:07,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=275940.0, ans=0.0 2024-09-17 18:27:15,345 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:27:18,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=275980.0, ans=0.0 2024-09-17 18:27:44,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=276020.0, ans=0.0 2024-09-17 18:28:05,490 INFO [train.py:1198] (0/2) Epoch 16, batch 1150, loss[loss=0.2499, ctc_loss=0.1527, cr_loss=0.3898, attn_decoder_loss=0.252, over 29457.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1476, cr_loss=0.3884, attn_decoder_loss=0.2554, over 5755959.48 frames. ], batch size: 78, lr: 7.05e-03, grad_scale: 8.0 2024-09-17 18:28:25,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=276140.0, ans=0.125 2024-09-17 18:28:29,846 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.744e+01 9.236e+01 1.006e+02 2.528e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-17 18:28:54,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=276220.0, ans=0.0 2024-09-17 18:29:19,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=276260.0, ans=0.0 2024-09-17 18:29:23,625 INFO [train.py:1198] (0/2) Epoch 16, batch 1200, loss[loss=0.2621, ctc_loss=0.1573, cr_loss=0.4058, attn_decoder_loss=0.2647, over 29684.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1482, cr_loss=0.3898, attn_decoder_loss=0.2563, over 5748902.78 frames. ], batch size: 85, lr: 7.05e-03, grad_scale: 16.0 2024-09-17 18:29:31,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=276300.0, ans=0.0 2024-09-17 18:29:42,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.92 vs. limit=15.0 2024-09-17 18:30:05,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276380.0, ans=0.1 2024-09-17 18:30:22,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=276420.0, ans=0.09899494936611666 2024-09-17 18:30:33,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=276460.0, ans=0.1 2024-09-17 18:30:34,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=276460.0, ans=0.125 2024-09-17 18:30:41,713 INFO [train.py:1198] (0/2) Epoch 16, batch 1250, loss[loss=0.2754, ctc_loss=0.1716, cr_loss=0.4463, attn_decoder_loss=0.277, over 29534.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1485, cr_loss=0.391, attn_decoder_loss=0.2568, over 5776132.50 frames. ], batch size: 92, lr: 7.05e-03, grad_scale: 8.0 2024-09-17 18:30:44,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=276500.0, ans=0.0 2024-09-17 18:30:54,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=276500.0, ans=0.125 2024-09-17 18:31:06,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=276540.0, ans=0.0 2024-09-17 18:31:07,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.801e+01 9.250e+01 9.945e+01 2.307e+02, threshold=1.850e+02, percent-clipped=1.0 2024-09-17 18:31:13,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276580.0, ans=0.1 2024-09-17 18:31:15,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=276580.0, ans=0.125 2024-09-17 18:31:38,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=276620.0, ans=0.025 2024-09-17 18:31:41,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2024-09-17 18:31:43,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276660.0, ans=0.1 2024-09-17 18:31:57,698 INFO [train.py:1198] (0/2) Epoch 16, batch 1300, loss[loss=0.2639, ctc_loss=0.1513, cr_loss=0.3964, attn_decoder_loss=0.2676, over 28321.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1477, cr_loss=0.3899, attn_decoder_loss=0.2562, over 5780147.25 frames. ], batch size: 111, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:31:58,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=276700.0, ans=0.2 2024-09-17 18:32:01,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=276700.0, ans=0.0 2024-09-17 18:32:04,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-17 18:32:31,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=276780.0, ans=0.0 2024-09-17 18:32:40,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=276780.0, ans=0.125 2024-09-17 18:32:41,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=276780.0, ans=0.125 2024-09-17 18:32:43,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=276820.0, ans=0.125 2024-09-17 18:33:14,164 INFO [train.py:1198] (0/2) Epoch 16, batch 1350, loss[loss=0.2488, ctc_loss=0.1406, cr_loss=0.3784, attn_decoder_loss=0.2524, over 29782.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1477, cr_loss=0.3901, attn_decoder_loss=0.256, over 5799631.17 frames. ], batch size: 81, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:33:24,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=276900.0, ans=0.1 2024-09-17 18:33:28,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=276900.0, ans=0.125 2024-09-17 18:33:31,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=276940.0, ans=0.025 2024-09-17 18:33:31,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=276940.0, ans=0.125 2024-09-17 18:33:41,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.515e+01 9.086e+01 9.689e+01 1.239e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-17 18:33:59,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=276980.0, ans=0.0 2024-09-17 18:34:10,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=277020.0, ans=0.125 2024-09-17 18:34:19,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=277060.0, ans=0.125 2024-09-17 18:34:21,463 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.33 vs. limit=6.0 2024-09-17 18:34:30,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.29 vs. limit=15.0 2024-09-17 18:34:34,350 INFO [train.py:1198] (0/2) Epoch 16, batch 1400, loss[loss=0.2291, ctc_loss=0.1264, cr_loss=0.3568, attn_decoder_loss=0.2325, over 29610.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.147, cr_loss=0.3888, attn_decoder_loss=0.2553, over 5809866.56 frames. ], batch size: 69, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:35:01,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277140.0, ans=0.0 2024-09-17 18:35:06,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-17 18:35:15,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277180.0, ans=0.1 2024-09-17 18:35:30,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-17 18:35:35,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=277260.0, ans=0.05 2024-09-17 18:35:38,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=277260.0, ans=0.2 2024-09-17 18:35:49,912 INFO [train.py:1198] (0/2) Epoch 16, batch 1450, loss[loss=0.272, ctc_loss=0.1667, cr_loss=0.428, attn_decoder_loss=0.2742, over 29466.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1473, cr_loss=0.3895, attn_decoder_loss=0.256, over 5806372.22 frames. ], batch size: 94, lr: 7.04e-03, grad_scale: 8.0 2024-09-17 18:36:15,713 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.888e+01 9.569e+01 1.025e+02 2.533e+02, threshold=1.914e+02, percent-clipped=1.0 2024-09-17 18:36:34,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=277420.0, ans=0.125 2024-09-17 18:36:59,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=277460.0, ans=0.025 2024-09-17 18:37:05,624 INFO [train.py:1198] (0/2) Epoch 16, batch 1500, loss[loss=0.2518, ctc_loss=0.1436, cr_loss=0.3741, attn_decoder_loss=0.2555, over 29648.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1473, cr_loss=0.3899, attn_decoder_loss=0.2561, over 5805872.59 frames. ], batch size: 86, lr: 7.03e-03, grad_scale: 8.0 2024-09-17 18:37:06,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-09-17 18:37:28,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-09-17 18:37:50,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=277580.0, ans=0.125 2024-09-17 18:37:55,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=277620.0, ans=0.07 2024-09-17 18:37:55,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=277620.0, ans=0.125 2024-09-17 18:37:58,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=277620.0, ans=0.125 2024-09-17 18:38:26,894 INFO [train.py:1198] (0/2) Epoch 16, batch 1550, loss[loss=0.2541, ctc_loss=0.1512, cr_loss=0.4036, attn_decoder_loss=0.2565, over 29504.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1479, cr_loss=0.3902, attn_decoder_loss=0.2559, over 5781039.01 frames. ], batch size: 90, lr: 7.03e-03, grad_scale: 8.0 2024-09-17 18:38:30,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=277700.0, ans=0.125 2024-09-17 18:38:48,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.28 vs. limit=15.0 2024-09-17 18:38:52,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.047e+01 8.973e+01 9.587e+01 1.017e+02 1.956e+02, threshold=1.917e+02, percent-clipped=1.0 2024-09-17 18:38:54,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.73 vs. limit=10.0 2024-09-17 18:39:03,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=277780.0, ans=0.0 2024-09-17 18:39:16,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=277820.0, ans=0.125 2024-09-17 18:39:24,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=277820.0, ans=0.125 2024-09-17 18:39:28,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=277860.0, ans=0.125 2024-09-17 18:39:41,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.47 vs. limit=15.0 2024-09-17 18:39:42,135 INFO [train.py:1198] (0/2) Epoch 16, batch 1600, loss[loss=0.2592, ctc_loss=0.1453, cr_loss=0.3701, attn_decoder_loss=0.2636, over 29673.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1477, cr_loss=0.3894, attn_decoder_loss=0.2558, over 5764150.76 frames. ], batch size: 85, lr: 7.03e-03, grad_scale: 16.0 2024-09-17 18:39:43,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=277900.0, ans=0.125 2024-09-17 18:39:59,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-17 18:40:08,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.17 vs. limit=10.0 2024-09-17 18:40:09,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=277940.0, ans=0.125 2024-09-17 18:40:17,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=277980.0, ans=0.0 2024-09-17 18:40:23,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=277980.0, ans=0.2 2024-09-17 18:40:42,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=278060.0, ans=0.125 2024-09-17 18:40:56,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=278100.0, ans=0.125 2024-09-17 18:40:57,568 INFO [train.py:1198] (0/2) Epoch 16, batch 1650, loss[loss=0.2716, ctc_loss=0.1602, cr_loss=0.4149, attn_decoder_loss=0.2748, over 29720.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1476, cr_loss=0.389, attn_decoder_loss=0.2555, over 5760374.03 frames. ], batch size: 89, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:41:26,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.637e+01 9.438e+01 1.013e+02 1.642e+02, threshold=1.888e+02, percent-clipped=0.0 2024-09-17 18:41:55,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=278220.0, ans=0.0 2024-09-17 18:41:57,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-17 18:41:59,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=278220.0, ans=0.09899494936611666 2024-09-17 18:42:17,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.44 vs. limit=22.5 2024-09-17 18:42:17,549 INFO [train.py:1198] (0/2) Epoch 16, batch 1700, loss[loss=0.2239, ctc_loss=0.1198, cr_loss=0.3455, attn_decoder_loss=0.2278, over 29588.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1471, cr_loss=0.3886, attn_decoder_loss=0.2552, over 5780645.65 frames. ], batch size: 69, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:42:23,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=278300.0, ans=0.2 2024-09-17 18:42:47,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-17 18:42:55,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=278380.0, ans=0.035 2024-09-17 18:43:06,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=278420.0, ans=0.5 2024-09-17 18:43:08,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.45 vs. limit=10.0 2024-09-17 18:43:33,563 INFO [train.py:1198] (0/2) Epoch 16, batch 1750, loss[loss=0.2275, ctc_loss=0.1338, cr_loss=0.3626, attn_decoder_loss=0.2298, over 29360.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1466, cr_loss=0.3883, attn_decoder_loss=0.2548, over 5786877.18 frames. ], batch size: 67, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:43:43,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=278500.0, ans=0.0 2024-09-17 18:43:59,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=278540.0, ans=0.125 2024-09-17 18:44:00,899 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.367e+01 8.955e+01 9.624e+01 1.381e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-17 18:44:31,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=278620.0, ans=0.125 2024-09-17 18:44:49,086 INFO [train.py:1198] (0/2) Epoch 16, batch 1800, loss[loss=0.2632, ctc_loss=0.1549, cr_loss=0.4071, attn_decoder_loss=0.2662, over 29686.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1472, cr_loss=0.3889, attn_decoder_loss=0.2551, over 5790621.66 frames. ], batch size: 83, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:44:53,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-17 18:45:06,856 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:45:11,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-17 18:45:36,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=278780.0, ans=0.1 2024-09-17 18:45:44,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-09-17 18:46:09,899 INFO [train.py:1198] (0/2) Epoch 16, batch 1850, loss[loss=0.2616, ctc_loss=0.1561, cr_loss=0.3947, attn_decoder_loss=0.2646, over 29639.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1473, cr_loss=0.39, attn_decoder_loss=0.2552, over 5796361.04 frames. ], batch size: 86, lr: 7.02e-03, grad_scale: 8.0 2024-09-17 18:46:23,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=278940.0, ans=0.125 2024-09-17 18:46:36,824 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.719e+01 9.438e+01 1.018e+02 2.897e+02, threshold=1.888e+02, percent-clipped=1.0 2024-09-17 18:46:47,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=278980.0, ans=0.0 2024-09-17 18:46:58,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=279020.0, ans=0.125 2024-09-17 18:47:24,870 INFO [train.py:1198] (0/2) Epoch 16, batch 1900, loss[loss=0.2612, ctc_loss=0.1541, cr_loss=0.4219, attn_decoder_loss=0.2637, over 29685.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1478, cr_loss=0.3913, attn_decoder_loss=0.2559, over 5804609.83 frames. ], batch size: 89, lr: 7.01e-03, grad_scale: 8.0 2024-09-17 18:47:28,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=279100.0, ans=0.125 2024-09-17 18:47:32,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=279100.0, ans=0.125 2024-09-17 18:47:44,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-17 18:48:32,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=279260.0, ans=0.2 2024-09-17 18:48:37,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=279260.0, ans=0.125 2024-09-17 18:48:41,361 INFO [train.py:1198] (0/2) Epoch 16, batch 1950, loss[loss=0.2548, ctc_loss=0.1555, cr_loss=0.404, attn_decoder_loss=0.2569, over 29446.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1482, cr_loss=0.3925, attn_decoder_loss=0.2568, over 5819413.50 frames. ], batch size: 78, lr: 7.01e-03, grad_scale: 8.0 2024-09-17 18:48:41,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=279300.0, ans=0.2 2024-09-17 18:48:47,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=279300.0, ans=0.125 2024-09-17 18:48:47,755 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:48:48,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.99 vs. limit=15.0 2024-09-17 18:49:01,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2024-09-17 18:49:11,015 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.858e+01 9.399e+01 1.005e+02 1.788e+02, threshold=1.880e+02, percent-clipped=1.0 2024-09-17 18:49:22,679 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:49:26,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2024-09-17 18:49:56,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=279460.0, ans=0.125 2024-09-17 18:50:01,808 INFO [train.py:1198] (0/2) Epoch 16, batch 2000, loss[loss=0.2251, ctc_loss=0.1302, cr_loss=0.3628, attn_decoder_loss=0.2275, over 29342.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1491, cr_loss=0.3934, attn_decoder_loss=0.2575, over 5796717.16 frames. ], batch size: 67, lr: 7.01e-03, grad_scale: 16.0 2024-09-17 18:50:20,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=279540.0, ans=0.0 2024-09-17 18:50:28,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=279540.0, ans=0.125 2024-09-17 18:50:48,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-09-17 18:50:50,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=279620.0, ans=0.125 2024-09-17 18:50:56,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=279620.0, ans=0.0 2024-09-17 18:51:17,831 INFO [train.py:1198] (0/2) Epoch 16, batch 2050, loss[loss=0.2229, ctc_loss=0.1252, cr_loss=0.3575, attn_decoder_loss=0.2259, over 29440.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1481, cr_loss=0.3911, attn_decoder_loss=0.2561, over 5788642.51 frames. ], batch size: 70, lr: 7.01e-03, grad_scale: 8.0 2024-09-17 18:51:20,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.83 vs. limit=15.0 2024-09-17 18:51:39,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=279740.0, ans=0.125 2024-09-17 18:51:46,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.024e+01 1.001e+02 1.116e+02 1.891e+02, threshold=2.001e+02, percent-clipped=1.0 2024-09-17 18:51:50,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=279780.0, ans=0.0 2024-09-17 18:51:56,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279780.0, ans=0.1 2024-09-17 18:52:11,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=279820.0, ans=0.125 2024-09-17 18:52:33,739 INFO [train.py:1198] (0/2) Epoch 16, batch 2100, loss[loss=0.2513, ctc_loss=0.1399, cr_loss=0.4035, attn_decoder_loss=0.2547, over 29766.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1474, cr_loss=0.3896, attn_decoder_loss=0.2556, over 5801259.92 frames. ], batch size: 81, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:52:34,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=279900.0, ans=0.125 2024-09-17 18:53:22,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=280020.0, ans=0.025 2024-09-17 18:53:28,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=280020.0, ans=0.125 2024-09-17 18:53:53,808 INFO [train.py:1198] (0/2) Epoch 16, batch 2150, loss[loss=0.2494, ctc_loss=0.1389, cr_loss=0.3917, attn_decoder_loss=0.253, over 29442.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1467, cr_loss=0.3888, attn_decoder_loss=0.2551, over 5816222.00 frames. ], batch size: 78, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:54:18,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=280140.0, ans=0.0 2024-09-17 18:54:21,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=280140.0, ans=0.0 2024-09-17 18:54:22,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 8.753e+01 9.321e+01 9.810e+01 1.786e+02, threshold=1.864e+02, percent-clipped=0.0 2024-09-17 18:54:30,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=280180.0, ans=0.0 2024-09-17 18:54:58,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=280260.0, ans=0.0 2024-09-17 18:55:10,162 INFO [train.py:1198] (0/2) Epoch 16, batch 2200, loss[loss=0.267, ctc_loss=0.1649, cr_loss=0.4167, attn_decoder_loss=0.2691, over 29602.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1469, cr_loss=0.3892, attn_decoder_loss=0.2552, over 5812594.36 frames. ], batch size: 86, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:55:12,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.98 vs. limit=15.0 2024-09-17 18:55:16,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=280300.0, ans=0.125 2024-09-17 18:55:57,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=280420.0, ans=0.125 2024-09-17 18:56:03,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=280420.0, ans=0.125 2024-09-17 18:56:03,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=280420.0, ans=0.125 2024-09-17 18:56:06,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=280420.0, ans=0.125 2024-09-17 18:56:25,749 INFO [train.py:1198] (0/2) Epoch 16, batch 2250, loss[loss=0.2484, ctc_loss=0.1387, cr_loss=0.3715, attn_decoder_loss=0.2523, over 29699.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1467, cr_loss=0.3891, attn_decoder_loss=0.2552, over 5812252.83 frames. ], batch size: 82, lr: 7.00e-03, grad_scale: 8.0 2024-09-17 18:56:26,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=280500.0, ans=0.125 2024-09-17 18:56:50,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=280540.0, ans=0.025 2024-09-17 18:56:53,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=280540.0, ans=0.125 2024-09-17 18:56:53,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=280540.0, ans=0.0 2024-09-17 18:56:54,263 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.834e+01 9.325e+01 1.002e+02 2.125e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-17 18:57:10,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=280580.0, ans=0.015 2024-09-17 18:57:12,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=280580.0, ans=0.04949747468305833 2024-09-17 18:57:16,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=280620.0, ans=0.0 2024-09-17 18:57:24,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=280620.0, ans=0.125 2024-09-17 18:57:35,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=22.5 2024-09-17 18:57:45,346 INFO [train.py:1198] (0/2) Epoch 16, batch 2300, loss[loss=0.2262, ctc_loss=0.1223, cr_loss=0.3323, attn_decoder_loss=0.2303, over 29328.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1463, cr_loss=0.3872, attn_decoder_loss=0.2543, over 5800807.70 frames. ], batch size: 71, lr: 6.99e-03, grad_scale: 8.0 2024-09-17 18:57:58,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=280740.0, ans=0.0 2024-09-17 18:58:23,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=280780.0, ans=0.05 2024-09-17 18:58:31,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=280820.0, ans=0.2 2024-09-17 18:58:38,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=280820.0, ans=0.125 2024-09-17 18:58:38,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=280820.0, ans=0.125 2024-09-17 18:58:40,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=280820.0, ans=0.125 2024-09-17 18:58:46,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=280860.0, ans=0.125 2024-09-17 18:58:54,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=280860.0, ans=0.0 2024-09-17 18:59:01,558 INFO [train.py:1198] (0/2) Epoch 16, batch 2350, loss[loss=0.2569, ctc_loss=0.1493, cr_loss=0.392, attn_decoder_loss=0.2601, over 29694.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1463, cr_loss=0.3875, attn_decoder_loss=0.2544, over 5805997.70 frames. ], batch size: 83, lr: 6.99e-03, grad_scale: 8.0 2024-09-17 18:59:09,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.69 vs. limit=15.0 2024-09-17 18:59:30,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.837e+01 9.442e+01 1.004e+02 6.270e+02, threshold=1.888e+02, percent-clipped=1.0 2024-09-17 18:59:42,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 18:59:43,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-09-17 19:00:13,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=281060.0, ans=0.125 2024-09-17 19:00:17,727 INFO [train.py:1198] (0/2) Epoch 16, batch 2400, loss[loss=0.237, ctc_loss=0.1328, cr_loss=0.3747, attn_decoder_loss=0.2402, over 29544.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1464, cr_loss=0.3885, attn_decoder_loss=0.2549, over 5809008.32 frames. ], batch size: 76, lr: 6.99e-03, grad_scale: 16.0 2024-09-17 19:00:18,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=281100.0, ans=0.125 2024-09-17 19:00:25,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=281100.0, ans=0.025 2024-09-17 19:00:58,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=281180.0, ans=0.025 2024-09-17 19:00:59,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=281180.0, ans=0.0 2024-09-17 19:01:01,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=281180.0, ans=0.0 2024-09-17 19:01:22,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281260.0, ans=0.1 2024-09-17 19:01:36,124 INFO [train.py:1198] (0/2) Epoch 16, batch 2450, loss[loss=0.2723, ctc_loss=0.1692, cr_loss=0.4378, attn_decoder_loss=0.274, over 29715.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1473, cr_loss=0.3894, attn_decoder_loss=0.2558, over 5786125.56 frames. ], batch size: 82, lr: 6.99e-03, grad_scale: 8.0 2024-09-17 19:01:43,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=281300.0, ans=0.125 2024-09-17 19:02:06,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 9.367e+01 1.015e+02 1.200e+02 3.423e+02, threshold=2.029e+02, percent-clipped=2.0 2024-09-17 19:02:08,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=281380.0, ans=0.125 2024-09-17 19:02:20,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281420.0, ans=0.1 2024-09-17 19:02:34,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=281420.0, ans=0.125 2024-09-17 19:02:51,887 INFO [train.py:1198] (0/2) Epoch 16, batch 2500, loss[loss=0.2644, ctc_loss=0.147, cr_loss=0.3932, attn_decoder_loss=0.2687, over 29631.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1472, cr_loss=0.3893, attn_decoder_loss=0.2557, over 5796359.09 frames. ], batch size: 86, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:02:59,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=281500.0, ans=0.0 2024-09-17 19:03:13,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=281540.0, ans=0.125 2024-09-17 19:04:01,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=281660.0, ans=0.125 2024-09-17 19:04:07,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=281700.0, ans=0.125 2024-09-17 19:04:08,348 INFO [train.py:1198] (0/2) Epoch 16, batch 2550, loss[loss=0.2243, ctc_loss=0.1295, cr_loss=0.3531, attn_decoder_loss=0.227, over 29296.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.147, cr_loss=0.3889, attn_decoder_loss=0.2558, over 5800194.08 frames. ], batch size: 67, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:04:21,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-17 19:04:26,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=281740.0, ans=0.0 2024-09-17 19:04:29,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=281740.0, ans=0.0 2024-09-17 19:04:31,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=281740.0, ans=0.125 2024-09-17 19:04:37,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=281780.0, ans=0.0 2024-09-17 19:04:38,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.661e+01 9.348e+01 1.013e+02 3.774e+02, threshold=1.870e+02, percent-clipped=2.0 2024-09-17 19:04:41,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=281780.0, ans=0.0 2024-09-17 19:04:43,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=281780.0, ans=0.125 2024-09-17 19:04:43,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=281780.0, ans=0.125 2024-09-17 19:04:55,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=281780.0, ans=0.125 2024-09-17 19:05:28,338 INFO [train.py:1198] (0/2) Epoch 16, batch 2600, loss[loss=0.2494, ctc_loss=0.1358, cr_loss=0.3805, attn_decoder_loss=0.2535, over 29420.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1473, cr_loss=0.39, attn_decoder_loss=0.2561, over 5796004.96 frames. ], batch size: 78, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:05:30,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=281900.0, ans=0.125 2024-09-17 19:05:30,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-09-17 19:05:43,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=281940.0, ans=0.125 2024-09-17 19:05:49,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.13 vs. limit=15.0 2024-09-17 19:05:53,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=281940.0, ans=0.2 2024-09-17 19:06:04,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=281980.0, ans=0.125 2024-09-17 19:06:16,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=282020.0, ans=0.125 2024-09-17 19:06:33,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-17 19:06:43,662 INFO [train.py:1198] (0/2) Epoch 16, batch 2650, loss[loss=0.2651, ctc_loss=0.1523, cr_loss=0.3971, attn_decoder_loss=0.2688, over 29211.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1475, cr_loss=0.3904, attn_decoder_loss=0.2564, over 5802902.76 frames. ], batch size: 100, lr: 6.98e-03, grad_scale: 8.0 2024-09-17 19:06:48,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=282100.0, ans=0.1 2024-09-17 19:07:01,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.34 vs. limit=22.5 2024-09-17 19:07:09,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=282140.0, ans=0.2 2024-09-17 19:07:12,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=282180.0, ans=0.125 2024-09-17 19:07:13,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 8.834e+01 9.287e+01 9.746e+01 1.582e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-17 19:07:17,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282180.0, ans=0.1 2024-09-17 19:07:19,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=282180.0, ans=0.025 2024-09-17 19:07:51,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2024-09-17 19:07:59,131 INFO [train.py:1198] (0/2) Epoch 16, batch 2700, loss[loss=0.2549, ctc_loss=0.1452, cr_loss=0.3854, attn_decoder_loss=0.2585, over 29560.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1471, cr_loss=0.3897, attn_decoder_loss=0.2564, over 5797904.31 frames. ], batch size: 87, lr: 6.97e-03, grad_scale: 8.0 2024-09-17 19:08:07,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten.whitening_limit, batch_count=282300.0, ans=15.0 2024-09-17 19:08:42,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=282380.0, ans=0.125 2024-09-17 19:08:49,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=282420.0, ans=0.0 2024-09-17 19:08:52,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282420.0, ans=0.1 2024-09-17 19:09:04,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-09-17 19:09:16,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=282460.0, ans=0.2 2024-09-17 19:09:19,236 INFO [train.py:1198] (0/2) Epoch 16, batch 2750, loss[loss=0.2441, ctc_loss=0.1405, cr_loss=0.3686, attn_decoder_loss=0.2474, over 29534.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1462, cr_loss=0.3881, attn_decoder_loss=0.2552, over 5796513.51 frames. ], batch size: 75, lr: 6.97e-03, grad_scale: 8.0 2024-09-17 19:09:21,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=282500.0, ans=0.125 2024-09-17 19:09:23,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=22.5 2024-09-17 19:09:24,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=282500.0, ans=0.1 2024-09-17 19:09:49,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.861e+01 8.782e+01 9.545e+01 1.036e+02 3.066e+02, threshold=1.909e+02, percent-clipped=3.0 2024-09-17 19:10:35,385 INFO [train.py:1198] (0/2) Epoch 16, batch 2800, loss[loss=0.2658, ctc_loss=0.1693, cr_loss=0.385, attn_decoder_loss=0.2679, over 20683.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1466, cr_loss=0.388, attn_decoder_loss=0.2553, over 5776866.88 frames. ], batch size: 213, lr: 6.97e-03, grad_scale: 16.0 2024-09-17 19:10:43,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=282700.0, ans=0.05 2024-09-17 19:10:59,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=282740.0, ans=0.2 2024-09-17 19:11:16,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.76 vs. limit=10.0 2024-09-17 19:11:17,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=282780.0, ans=0.0 2024-09-17 19:11:28,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282820.0, ans=0.1 2024-09-17 19:11:39,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=282860.0, ans=0.125 2024-09-17 19:11:42,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=282860.0, ans=0.025 2024-09-17 19:11:50,921 INFO [train.py:1198] (0/2) Epoch 16, batch 2850, loss[loss=0.2437, ctc_loss=0.1374, cr_loss=0.3687, attn_decoder_loss=0.2473, over 29496.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1471, cr_loss=0.3887, attn_decoder_loss=0.2559, over 5761665.89 frames. ], batch size: 77, lr: 6.97e-03, grad_scale: 8.0 2024-09-17 19:12:01,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=282900.0, ans=0.125 2024-09-17 19:12:04,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=282940.0, ans=0.0 2024-09-17 19:12:09,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=282940.0, ans=0.07 2024-09-17 19:12:24,809 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.830e+01 9.442e+01 1.037e+02 2.855e+02, threshold=1.888e+02, percent-clipped=3.0 2024-09-17 19:12:32,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=282980.0, ans=0.1 2024-09-17 19:13:10,860 INFO [train.py:1198] (0/2) Epoch 16, batch 2900, loss[loss=0.2331, ctc_loss=0.1247, cr_loss=0.3349, attn_decoder_loss=0.2377, over 29412.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1476, cr_loss=0.3908, attn_decoder_loss=0.2572, over 5787685.95 frames. ], batch size: 79, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:13:26,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=283140.0, ans=0.125 2024-09-17 19:13:44,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=283180.0, ans=0.025 2024-09-17 19:13:50,944 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:13:59,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=283220.0, ans=0.1 2024-09-17 19:14:02,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.68 vs. limit=22.5 2024-09-17 19:14:07,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=283220.0, ans=0.125 2024-09-17 19:14:15,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-17 19:14:18,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=283260.0, ans=0.125 2024-09-17 19:14:27,151 INFO [train.py:1198] (0/2) Epoch 16, batch 2950, loss[loss=0.2466, ctc_loss=0.1436, cr_loss=0.3891, attn_decoder_loss=0.2494, over 29511.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1472, cr_loss=0.3898, attn_decoder_loss=0.2559, over 5781383.24 frames. ], batch size: 75, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:14:39,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=283300.0, ans=0.05 2024-09-17 19:14:58,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.668e+01 9.077e+01 9.673e+01 1.448e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-17 19:15:05,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=283380.0, ans=0.1 2024-09-17 19:15:27,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=283460.0, ans=0.2 2024-09-17 19:15:42,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.43 vs. limit=15.0 2024-09-17 19:15:42,946 INFO [train.py:1198] (0/2) Epoch 16, batch 3000, loss[loss=0.2522, ctc_loss=0.1476, cr_loss=0.4124, attn_decoder_loss=0.2547, over 29760.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1475, cr_loss=0.3905, attn_decoder_loss=0.2561, over 5782068.62 frames. ], batch size: 81, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:15:42,947 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 19:16:01,439 INFO [train.py:1230] (0/2) Epoch 16, validation: loss=0.2115, ctc_loss=0.04131, cr_loss=4.919e-15, attn_decoder_loss=0.2304, over 944034.00 frames. 2024-09-17 19:16:01,439 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 19:16:48,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=283580.0, ans=0.0 2024-09-17 19:16:50,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=283620.0, ans=0.0 2024-09-17 19:16:54,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=283620.0, ans=0.125 2024-09-17 19:17:16,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=283660.0, ans=0.125 2024-09-17 19:17:22,004 INFO [train.py:1198] (0/2) Epoch 16, batch 3050, loss[loss=0.2496, ctc_loss=0.1521, cr_loss=0.3951, attn_decoder_loss=0.2517, over 29539.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1479, cr_loss=0.3907, attn_decoder_loss=0.2566, over 5776071.40 frames. ], batch size: 76, lr: 6.96e-03, grad_scale: 8.0 2024-09-17 19:17:27,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.22 vs. limit=15.0 2024-09-17 19:17:43,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=283740.0, ans=0.125 2024-09-17 19:17:53,959 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.926e+01 9.487e+01 1.019e+02 3.855e+02, threshold=1.897e+02, percent-clipped=1.0 2024-09-17 19:18:05,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.43 vs. limit=22.5 2024-09-17 19:18:06,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=283820.0, ans=0.125 2024-09-17 19:18:22,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=283860.0, ans=0.125 2024-09-17 19:18:37,844 INFO [train.py:1198] (0/2) Epoch 16, batch 3100, loss[loss=0.2691, ctc_loss=0.1604, cr_loss=0.4174, attn_decoder_loss=0.2718, over 29204.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1477, cr_loss=0.3906, attn_decoder_loss=0.2562, over 5776328.13 frames. ], batch size: 100, lr: 6.95e-03, grad_scale: 8.0 2024-09-17 19:18:57,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=283940.0, ans=0.0 2024-09-17 19:19:11,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=283980.0, ans=0.0 2024-09-17 19:19:54,440 INFO [train.py:1198] (0/2) Epoch 16, batch 3150, loss[loss=0.2757, ctc_loss=0.164, cr_loss=0.4316, attn_decoder_loss=0.2785, over 28844.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1475, cr_loss=0.39, attn_decoder_loss=0.256, over 5782777.57 frames. ], batch size: 104, lr: 6.95e-03, grad_scale: 8.0 2024-09-17 19:20:10,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-09-17 19:20:17,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=284140.0, ans=0.125 2024-09-17 19:20:26,927 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:20:28,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.635e+01 9.420e+01 9.793e+01 2.697e+02, threshold=1.884e+02, percent-clipped=2.0 2024-09-17 19:20:31,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284180.0, ans=0.1 2024-09-17 19:20:46,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.98 vs. limit=15.0 2024-09-17 19:20:56,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=284220.0, ans=0.125 2024-09-17 19:21:06,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=284260.0, ans=0.0 2024-09-17 19:21:13,978 INFO [train.py:1198] (0/2) Epoch 16, batch 3200, loss[loss=0.2489, ctc_loss=0.1419, cr_loss=0.3788, attn_decoder_loss=0.2524, over 29428.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1464, cr_loss=0.3886, attn_decoder_loss=0.2551, over 5793938.76 frames. ], batch size: 79, lr: 6.95e-03, grad_scale: 16.0 2024-09-17 19:21:20,820 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.19 vs. limit=15.0 2024-09-17 19:21:21,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=284300.0, ans=0.125 2024-09-17 19:21:30,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=284340.0, ans=0.5 2024-09-17 19:21:44,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=284380.0, ans=0.0 2024-09-17 19:21:46,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284380.0, ans=0.1 2024-09-17 19:21:56,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=284380.0, ans=0.0 2024-09-17 19:22:01,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284420.0, ans=0.1 2024-09-17 19:22:06,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284420.0, ans=0.1 2024-09-17 19:22:15,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=284460.0, ans=0.125 2024-09-17 19:22:25,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=284460.0, ans=0.0 2024-09-17 19:22:29,947 INFO [train.py:1198] (0/2) Epoch 16, batch 3250, loss[loss=0.2587, ctc_loss=0.1489, cr_loss=0.4002, attn_decoder_loss=0.262, over 29703.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1462, cr_loss=0.3889, attn_decoder_loss=0.2555, over 5800888.20 frames. ], batch size: 84, lr: 6.95e-03, grad_scale: 8.0 2024-09-17 19:22:42,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=284500.0, ans=0.0 2024-09-17 19:22:48,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=284540.0, ans=0.0 2024-09-17 19:23:03,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.619e+01 9.155e+01 9.687e+01 2.235e+02, threshold=1.831e+02, percent-clipped=1.0 2024-09-17 19:23:07,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=284580.0, ans=0.125 2024-09-17 19:23:16,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-17 19:23:45,584 INFO [train.py:1198] (0/2) Epoch 16, batch 3300, loss[loss=0.2626, ctc_loss=0.1521, cr_loss=0.4003, attn_decoder_loss=0.266, over 28533.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1456, cr_loss=0.3874, attn_decoder_loss=0.2544, over 5797880.63 frames. ], batch size: 112, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:23:49,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=284700.0, ans=0.125 2024-09-17 19:23:51,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=284700.0, ans=0.125 2024-09-17 19:23:52,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.96 vs. limit=15.0 2024-09-17 19:24:12,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=284740.0, ans=0.025 2024-09-17 19:24:28,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=284780.0, ans=0.125 2024-09-17 19:24:36,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.20 vs. limit=15.0 2024-09-17 19:24:37,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=284820.0, ans=0.125 2024-09-17 19:24:45,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2024-09-17 19:25:01,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=284860.0, ans=0.0 2024-09-17 19:25:06,083 INFO [train.py:1198] (0/2) Epoch 16, batch 3350, loss[loss=0.2645, ctc_loss=0.1574, cr_loss=0.3966, attn_decoder_loss=0.2676, over 28931.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1468, cr_loss=0.3891, attn_decoder_loss=0.2553, over 5773658.95 frames. ], batch size: 104, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:25:09,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=284900.0, ans=0.04949747468305833 2024-09-17 19:25:39,363 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.948e+01 9.628e+01 1.043e+02 1.952e+02, threshold=1.926e+02, percent-clipped=2.0 2024-09-17 19:25:49,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.76 vs. limit=15.0 2024-09-17 19:26:03,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=285020.0, ans=0.125 2024-09-17 19:26:21,750 INFO [train.py:1198] (0/2) Epoch 16, batch 3400, loss[loss=0.2202, ctc_loss=0.1221, cr_loss=0.3509, attn_decoder_loss=0.2233, over 29384.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.147, cr_loss=0.3894, attn_decoder_loss=0.2553, over 5765994.03 frames. ], batch size: 67, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:26:35,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=285140.0, ans=0.1 2024-09-17 19:26:38,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=285140.0, ans=0.05 2024-09-17 19:26:44,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=285140.0, ans=0.0 2024-09-17 19:26:46,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=285140.0, ans=0.125 2024-09-17 19:27:01,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=285180.0, ans=0.125 2024-09-17 19:27:01,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=285180.0, ans=0.07 2024-09-17 19:27:13,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=285220.0, ans=0.0 2024-09-17 19:27:14,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-17 19:27:20,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=285260.0, ans=0.125 2024-09-17 19:27:34,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=285260.0, ans=0.125 2024-09-17 19:27:37,294 INFO [train.py:1198] (0/2) Epoch 16, batch 3450, loss[loss=0.2738, ctc_loss=0.1597, cr_loss=0.3921, attn_decoder_loss=0.2777, over 28214.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1473, cr_loss=0.39, attn_decoder_loss=0.2558, over 5773387.09 frames. ], batch size: 111, lr: 6.94e-03, grad_scale: 8.0 2024-09-17 19:27:56,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=285340.0, ans=0.125 2024-09-17 19:28:12,412 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.055e+01 9.633e+01 1.034e+02 1.561e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-17 19:28:57,136 INFO [train.py:1198] (0/2) Epoch 16, batch 3500, loss[loss=0.2232, ctc_loss=0.122, cr_loss=0.3388, attn_decoder_loss=0.2269, over 29357.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1468, cr_loss=0.3889, attn_decoder_loss=0.2552, over 5775426.32 frames. ], batch size: 71, lr: 6.93e-03, grad_scale: 8.0 2024-09-17 19:29:06,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285500.0, ans=0.1 2024-09-17 19:29:18,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=285540.0, ans=0.05 2024-09-17 19:30:02,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=285660.0, ans=0.025 2024-09-17 19:30:06,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=285660.0, ans=0.0 2024-09-17 19:30:12,389 INFO [train.py:1198] (0/2) Epoch 16, batch 3550, loss[loss=0.2576, ctc_loss=0.149, cr_loss=0.4023, attn_decoder_loss=0.2607, over 29710.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1468, cr_loss=0.3887, attn_decoder_loss=0.2553, over 5781765.41 frames. ], batch size: 89, lr: 6.93e-03, grad_scale: 8.0 2024-09-17 19:30:23,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=285700.0, ans=0.0 2024-09-17 19:30:39,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=285740.0, ans=0.025 2024-09-17 19:30:45,353 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.552e+01 9.135e+01 9.623e+01 1.565e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-17 19:30:51,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=285780.0, ans=0.0 2024-09-17 19:30:51,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=285780.0, ans=0.125 2024-09-17 19:31:06,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=285820.0, ans=0.0 2024-09-17 19:31:15,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=285860.0, ans=0.1 2024-09-17 19:31:26,865 INFO [train.py:1198] (0/2) Epoch 16, batch 3600, loss[loss=0.2523, ctc_loss=0.1439, cr_loss=0.3603, attn_decoder_loss=0.2564, over 29481.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1466, cr_loss=0.3882, attn_decoder_loss=0.2555, over 5791055.38 frames. ], batch size: 77, lr: 6.93e-03, grad_scale: 16.0 2024-09-17 19:31:38,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=285900.0, ans=0.125 2024-09-17 19:31:48,186 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:31:56,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=285980.0, ans=0.125 2024-09-17 19:32:00,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.09 vs. limit=15.0 2024-09-17 19:32:13,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=286020.0, ans=0.0 2024-09-17 19:32:15,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.18 vs. limit=15.0 2024-09-17 19:32:40,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=286100.0, ans=0.125 2024-09-17 19:32:41,299 INFO [train.py:1198] (0/2) Epoch 16, batch 3650, loss[loss=0.2727, ctc_loss=0.1672, cr_loss=0.4431, attn_decoder_loss=0.2746, over 29510.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.146, cr_loss=0.3874, attn_decoder_loss=0.2547, over 5793005.86 frames. ], batch size: 90, lr: 6.93e-03, grad_scale: 8.0 2024-09-17 19:32:41,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=286100.0, ans=0.125 2024-09-17 19:32:43,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=286100.0, ans=0.125 2024-09-17 19:32:45,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=286100.0, ans=0.125 2024-09-17 19:33:15,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.18 vs. limit=15.0 2024-09-17 19:33:17,578 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.668e+01 9.269e+01 9.880e+01 1.402e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-17 19:33:17,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=286180.0, ans=0.125 2024-09-17 19:33:19,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=286180.0, ans=0.125 2024-09-17 19:33:23,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=286180.0, ans=0.125 2024-09-17 19:33:40,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=286220.0, ans=0.0 2024-09-17 19:33:57,799 INFO [train.py:1198] (0/2) Epoch 16, batch 3700, loss[loss=0.2531, ctc_loss=0.138, cr_loss=0.3819, attn_decoder_loss=0.2574, over 29700.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1454, cr_loss=0.3865, attn_decoder_loss=0.2544, over 5803462.36 frames. ], batch size: 84, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:34:10,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=286300.0, ans=0.0 2024-09-17 19:34:21,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.83 vs. limit=15.0 2024-09-17 19:34:41,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=286380.0, ans=0.025 2024-09-17 19:34:43,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=286420.0, ans=0.0 2024-09-17 19:35:07,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=286460.0, ans=0.125 2024-09-17 19:35:14,229 INFO [train.py:1198] (0/2) Epoch 16, batch 3750, loss[loss=0.2148, ctc_loss=0.1153, cr_loss=0.3368, attn_decoder_loss=0.2184, over 29324.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1458, cr_loss=0.3873, attn_decoder_loss=0.2545, over 5807711.39 frames. ], batch size: 67, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:35:22,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286500.0, ans=0.125 2024-09-17 19:35:25,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.33 vs. limit=12.0 2024-09-17 19:35:47,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=286580.0, ans=0.0 2024-09-17 19:35:48,295 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 8.849e+01 9.329e+01 1.007e+02 6.454e+02, threshold=1.866e+02, percent-clipped=5.0 2024-09-17 19:35:51,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=286580.0, ans=0.0 2024-09-17 19:36:03,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=286620.0, ans=0.125 2024-09-17 19:36:12,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=286660.0, ans=0.1 2024-09-17 19:36:18,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=286660.0, ans=0.125 2024-09-17 19:36:24,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=286660.0, ans=0.025 2024-09-17 19:36:28,745 INFO [train.py:1198] (0/2) Epoch 16, batch 3800, loss[loss=0.2662, ctc_loss=0.1513, cr_loss=0.421, attn_decoder_loss=0.2696, over 29635.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1455, cr_loss=0.3867, attn_decoder_loss=0.254, over 5798501.19 frames. ], batch size: 86, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:36:39,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=286700.0, ans=0.0 2024-09-17 19:36:43,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=286740.0, ans=0.025 2024-09-17 19:36:49,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286740.0, ans=0.125 2024-09-17 19:36:54,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=286740.0, ans=0.2 2024-09-17 19:36:55,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=286740.0, ans=0.125 2024-09-17 19:37:10,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286780.0, ans=0.125 2024-09-17 19:37:20,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=286820.0, ans=0.0 2024-09-17 19:37:28,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=286860.0, ans=0.125 2024-09-17 19:37:42,782 INFO [train.py:1198] (0/2) Epoch 16, batch 3850, loss[loss=0.2782, ctc_loss=0.1776, cr_loss=0.4328, attn_decoder_loss=0.2798, over 29247.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1456, cr_loss=0.3868, attn_decoder_loss=0.2542, over 5813362.39 frames. ], batch size: 100, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:37:43,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=286900.0, ans=0.125 2024-09-17 19:37:46,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=22.5 2024-09-17 19:37:48,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=286900.0, ans=0.125 2024-09-17 19:37:48,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=286900.0, ans=0.0 2024-09-17 19:38:16,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.163e+01 9.754e+01 1.076e+02 2.177e+02, threshold=1.951e+02, percent-clipped=1.0 2024-09-17 19:38:26,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-17 19:38:39,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=287020.0, ans=0.5 2024-09-17 19:38:56,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.61 vs. limit=22.5 2024-09-17 19:38:58,597 INFO [train.py:1198] (0/2) Epoch 16, batch 3900, loss[loss=0.2582, ctc_loss=0.149, cr_loss=0.3795, attn_decoder_loss=0.2618, over 29631.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1463, cr_loss=0.3882, attn_decoder_loss=0.255, over 5817229.83 frames. ], batch size: 86, lr: 6.92e-03, grad_scale: 8.0 2024-09-17 19:39:09,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=287100.0, ans=0.0 2024-09-17 19:39:36,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=287180.0, ans=0.125 2024-09-17 19:39:49,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-09-17 19:40:03,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.30 vs. limit=15.0 2024-09-17 19:40:07,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=287260.0, ans=0.125 2024-09-17 19:40:14,827 INFO [train.py:1198] (0/2) Epoch 16, batch 3950, loss[loss=0.277, ctc_loss=0.1676, cr_loss=0.4295, attn_decoder_loss=0.2796, over 29444.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1464, cr_loss=0.389, attn_decoder_loss=0.2552, over 5836337.45 frames. ], batch size: 97, lr: 6.91e-03, grad_scale: 8.0 2024-09-17 19:40:30,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=12.0 2024-09-17 19:40:41,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=287340.0, ans=0.025 2024-09-17 19:40:42,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-17 19:40:43,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=287380.0, ans=0.0 2024-09-17 19:40:48,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.783e+01 9.413e+01 1.005e+02 2.800e+02, threshold=1.883e+02, percent-clipped=1.0 2024-09-17 19:41:09,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=287420.0, ans=0.125 2024-09-17 19:41:25,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=287460.0, ans=0.07 2024-09-17 19:41:27,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=287500.0, ans=0.2 2024-09-17 19:41:28,447 INFO [train.py:1198] (0/2) Epoch 16, batch 4000, loss[loss=0.2371, ctc_loss=0.1296, cr_loss=0.356, attn_decoder_loss=0.2411, over 29498.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1463, cr_loss=0.3882, attn_decoder_loss=0.2552, over 5813896.97 frames. ], batch size: 74, lr: 6.91e-03, grad_scale: 16.0 2024-09-17 19:41:30,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=287500.0, ans=0.2 2024-09-17 19:42:09,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-17 19:42:14,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-09-17 19:42:15,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=287620.0, ans=0.2 2024-09-17 19:42:42,630 INFO [train.py:1198] (0/2) Epoch 16, batch 4050, loss[loss=0.283, ctc_loss=0.1954, cr_loss=0.424, attn_decoder_loss=0.2833, over 19357.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1463, cr_loss=0.388, attn_decoder_loss=0.2551, over 5796254.64 frames. ], batch size: 209, lr: 6.91e-03, grad_scale: 8.0 2024-09-17 19:42:45,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=287700.0, ans=0.0 2024-09-17 19:42:57,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2024-09-17 19:43:01,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.48 vs. limit=12.0 2024-09-17 19:43:17,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.851e+01 8.954e+01 9.709e+01 1.044e+02 2.247e+02, threshold=1.942e+02, percent-clipped=1.0 2024-09-17 19:43:18,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=287780.0, ans=0.125 2024-09-17 19:43:18,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2024-09-17 19:43:38,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=287820.0, ans=0.125 2024-09-17 19:43:41,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=287860.0, ans=0.025 2024-09-17 19:43:57,678 INFO [train.py:1198] (0/2) Epoch 16, batch 4100, loss[loss=0.2673, ctc_loss=0.1619, cr_loss=0.4284, attn_decoder_loss=0.2695, over 29508.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.1464, cr_loss=0.3878, attn_decoder_loss=0.2549, over 5792714.26 frames. ], batch size: 90, lr: 6.91e-03, grad_scale: 8.0 2024-09-17 19:44:11,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.05 vs. limit=15.0 2024-09-17 19:44:33,217 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-72000.pt 2024-09-17 19:44:49,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=288020.0, ans=10.0 2024-09-17 19:44:50,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=288020.0, ans=0.07 2024-09-17 19:45:03,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.01 vs. limit=10.0 2024-09-17 19:45:08,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=288060.0, ans=0.0 2024-09-17 19:45:20,188 INFO [train.py:1198] (0/2) Epoch 16, batch 4150, loss[loss=0.2379, ctc_loss=0.136, cr_loss=0.3732, attn_decoder_loss=0.241, over 29498.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1464, cr_loss=0.3881, attn_decoder_loss=0.2548, over 5798203.19 frames. ], batch size: 77, lr: 6.90e-03, grad_scale: 8.0 2024-09-17 19:45:32,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=288100.0, ans=0.125 2024-09-17 19:45:54,966 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.454e+01 9.164e+01 9.745e+01 4.465e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-17 19:45:55,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=288180.0, ans=0.125 2024-09-17 19:45:59,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=288180.0, ans=0.0 2024-09-17 19:46:14,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=288220.0, ans=0.125 2024-09-17 19:46:15,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2024-09-17 19:46:20,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=288260.0, ans=0.125 2024-09-17 19:46:32,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=288300.0, ans=0.125 2024-09-17 19:46:33,304 INFO [train.py:1198] (0/2) Epoch 16, batch 4200, loss[loss=0.2754, ctc_loss=0.1661, cr_loss=0.4084, attn_decoder_loss=0.2785, over 29481.00 frames. ], tot_loss[loss=0.2522, ctc_loss=0.1467, cr_loss=0.3889, attn_decoder_loss=0.2553, over 5800196.64 frames. ], batch size: 90, lr: 6.90e-03, grad_scale: 8.0 2024-09-17 19:46:42,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=288300.0, ans=0.125 2024-09-17 19:46:50,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=288340.0, ans=0.2 2024-09-17 19:46:54,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=288340.0, ans=0.2 2024-09-17 19:46:55,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=288340.0, ans=0.125 2024-09-17 19:46:58,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=288340.0, ans=0.125 2024-09-17 19:47:00,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=288340.0, ans=0.125 2024-09-17 19:47:04,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=288380.0, ans=0.2 2024-09-17 19:47:13,287 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:47:22,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=288420.0, ans=0.025 2024-09-17 19:47:30,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=288420.0, ans=0.0 2024-09-17 19:47:36,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=288460.0, ans=0.07 2024-09-17 19:47:40,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=288460.0, ans=0.125 2024-09-17 19:47:47,733 INFO [train.py:1198] (0/2) Epoch 16, batch 4250, loss[loss=0.2369, ctc_loss=0.1292, cr_loss=0.3561, attn_decoder_loss=0.241, over 29509.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1465, cr_loss=0.389, attn_decoder_loss=0.2554, over 5806454.89 frames. ], batch size: 74, lr: 6.90e-03, grad_scale: 4.0 2024-09-17 19:47:48,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=288500.0, ans=0.125 2024-09-17 19:47:53,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=8.0 2024-09-17 19:47:53,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=288500.0, ans=0.125 2024-09-17 19:48:15,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=288580.0, ans=0.0 2024-09-17 19:48:24,161 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.844e+01 9.399e+01 1.005e+02 1.682e+02, threshold=1.880e+02, percent-clipped=0.0 2024-09-17 19:48:35,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=288620.0, ans=0.05 2024-09-17 19:48:43,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=288620.0, ans=0.125 2024-09-17 19:49:01,841 INFO [train.py:1198] (0/2) Epoch 16, batch 4300, loss[loss=0.26, ctc_loss=0.1514, cr_loss=0.4004, attn_decoder_loss=0.2632, over 29540.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1466, cr_loss=0.3893, attn_decoder_loss=0.2557, over 5795294.69 frames. ], batch size: 87, lr: 6.90e-03, grad_scale: 8.0 2024-09-17 19:49:32,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-09-17 19:49:57,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=288820.0, ans=0.025 2024-09-17 19:50:12,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=288860.0, ans=0.125 2024-09-17 19:50:16,551 INFO [train.py:1198] (0/2) Epoch 16, batch 4350, loss[loss=0.2705, ctc_loss=0.1613, cr_loss=0.4074, attn_decoder_loss=0.2735, over 29542.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1497, cr_loss=0.3951, attn_decoder_loss=0.2591, over 5796785.43 frames. ], batch size: 97, lr: 6.89e-03, grad_scale: 8.0 2024-09-17 19:50:25,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=288900.0, ans=0.0 2024-09-17 19:50:30,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=288940.0, ans=0.0 2024-09-17 19:50:34,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.93 vs. limit=22.5 2024-09-17 19:50:53,800 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.345e+01 8.913e+01 9.427e+01 9.937e+01 2.646e+02, threshold=1.885e+02, percent-clipped=2.0 2024-09-17 19:51:12,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=289020.0, ans=0.0 2024-09-17 19:51:21,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=289060.0, ans=0.1 2024-09-17 19:51:27,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=289060.0, ans=0.0 2024-09-17 19:51:31,264 INFO [train.py:1198] (0/2) Epoch 16, batch 4400, loss[loss=0.2696, ctc_loss=0.1628, cr_loss=0.431, attn_decoder_loss=0.2719, over 27178.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1514, cr_loss=0.3972, attn_decoder_loss=0.2613, over 5767594.20 frames. ], batch size: 124, lr: 6.89e-03, grad_scale: 16.0 2024-09-17 19:51:49,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=289140.0, ans=0.2 2024-09-17 19:51:50,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-09-17 19:51:53,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=289140.0, ans=0.2 2024-09-17 19:52:04,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.93 vs. limit=22.5 2024-09-17 19:52:25,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=12.0 2024-09-17 19:52:33,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=289260.0, ans=0.0 2024-09-17 19:52:44,526 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:52:45,507 INFO [train.py:1198] (0/2) Epoch 16, batch 4450, loss[loss=0.2774, ctc_loss=0.1871, cr_loss=0.4243, attn_decoder_loss=0.278, over 20105.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1562, cr_loss=0.4015, attn_decoder_loss=0.2638, over 5576562.96 frames. ], batch size: 209, lr: 6.89e-03, grad_scale: 4.0 2024-09-17 19:53:05,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=289340.0, ans=0.0 2024-09-17 19:53:16,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=9.81 vs. limit=12.0 2024-09-17 19:53:25,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=289380.0, ans=0.0 2024-09-17 19:53:26,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 9.461e+01 1.058e+02 1.169e+02 3.185e+02, threshold=2.116e+02, percent-clipped=2.0 2024-09-17 19:53:59,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.56 vs. limit=10.0 2024-09-17 19:54:01,252 INFO [train.py:1198] (0/2) Epoch 16, batch 4500, loss[loss=0.2798, ctc_loss=0.1902, cr_loss=0.4156, attn_decoder_loss=0.2805, over 20427.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.1615, cr_loss=0.404, attn_decoder_loss=0.2666, over 5237425.48 frames. ], batch size: 209, lr: 6.89e-03, grad_scale: 8.0 2024-09-17 19:54:03,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-17 19:54:23,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=289540.0, ans=0.2 2024-09-17 19:54:38,429 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-16.pt 2024-09-17 19:55:24,188 INFO [train.py:1198] (0/2) Epoch 17, batch 0, loss[loss=0.2367, ctc_loss=0.1266, cr_loss=0.3529, attn_decoder_loss=0.2411, over 29605.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1266, cr_loss=0.3529, attn_decoder_loss=0.2411, over 29605.00 frames. ], batch size: 73, lr: 6.68e-03, grad_scale: 16.0 2024-09-17 19:55:24,189 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 19:55:42,761 INFO [train.py:1230] (0/2) Epoch 17, validation: loss=0.2133, ctc_loss=0.04137, cr_loss=4.881e-15, attn_decoder_loss=0.2324, over 944034.00 frames. 2024-09-17 19:55:42,762 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 19:55:43,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2024-09-17 19:55:50,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=289600.0, ans=0.0 2024-09-17 19:55:53,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=289600.0, ans=0.05 2024-09-17 19:55:59,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=289640.0, ans=0.125 2024-09-17 19:56:01,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=289640.0, ans=0.0 2024-09-17 19:56:01,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=289640.0, ans=0.0 2024-09-17 19:56:17,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=289680.0, ans=0.125 2024-09-17 19:56:29,716 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:56:31,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289720.0, ans=0.1 2024-09-17 19:56:33,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2024-09-17 19:56:57,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=289760.0, ans=0.125 2024-09-17 19:56:58,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-09-17 19:57:00,372 INFO [train.py:1198] (0/2) Epoch 17, batch 50, loss[loss=0.2295, ctc_loss=0.1296, cr_loss=0.3507, attn_decoder_loss=0.2328, over 29411.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1479, cr_loss=0.3932, attn_decoder_loss=0.2566, over 1266247.01 frames. ], batch size: 70, lr: 6.68e-03, grad_scale: 8.0 2024-09-17 19:57:05,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.027e+01 9.620e+01 1.078e+02 1.162e+02 4.794e+02, threshold=2.156e+02, percent-clipped=2.0 2024-09-17 19:57:06,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=289800.0, ans=0.09899494936611666 2024-09-17 19:57:42,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-09-17 19:58:10,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=289960.0, ans=15.0 2024-09-17 19:58:18,302 INFO [train.py:1198] (0/2) Epoch 17, batch 100, loss[loss=0.238, ctc_loss=0.1339, cr_loss=0.3767, attn_decoder_loss=0.2412, over 29534.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1481, cr_loss=0.3922, attn_decoder_loss=0.2576, over 2251685.56 frames. ], batch size: 76, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 19:58:23,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.78 vs. limit=10.0 2024-09-17 19:58:38,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=290040.0, ans=0.125 2024-09-17 19:58:45,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-17 19:58:52,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=290080.0, ans=0.125 2024-09-17 19:59:01,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=290120.0, ans=0.035 2024-09-17 19:59:04,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.68 vs. limit=15.0 2024-09-17 19:59:10,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=290120.0, ans=0.125 2024-09-17 19:59:12,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=290120.0, ans=0.125 2024-09-17 19:59:14,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=290120.0, ans=0.125 2024-09-17 19:59:15,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=290120.0, ans=0.015 2024-09-17 19:59:19,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=290160.0, ans=0.125 2024-09-17 19:59:32,868 INFO [train.py:1198] (0/2) Epoch 17, batch 150, loss[loss=0.2281, ctc_loss=0.1236, cr_loss=0.3459, attn_decoder_loss=0.232, over 29425.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1454, cr_loss=0.3874, attn_decoder_loss=0.2548, over 3046282.09 frames. ], batch size: 70, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 19:59:36,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290200.0, ans=0.1 2024-09-17 19:59:37,318 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.030e+01 8.871e+01 9.281e+01 1.009e+02 2.332e+02, threshold=1.856e+02, percent-clipped=1.0 2024-09-17 19:59:54,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=290240.0, ans=0.1 2024-09-17 19:59:55,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=290240.0, ans=0.0 2024-09-17 20:00:20,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.76 vs. limit=10.0 2024-09-17 20:00:30,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=290320.0, ans=0.025 2024-09-17 20:00:47,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=290360.0, ans=0.0 2024-09-17 20:00:50,789 INFO [train.py:1198] (0/2) Epoch 17, batch 200, loss[loss=0.2642, ctc_loss=0.1583, cr_loss=0.3874, attn_decoder_loss=0.2674, over 27204.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1453, cr_loss=0.3871, attn_decoder_loss=0.2543, over 3657952.42 frames. ], batch size: 124, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 20:01:00,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=290400.0, ans=0.09899494936611666 2024-09-17 20:01:14,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=290440.0, ans=0.125 2024-09-17 20:01:33,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=290480.0, ans=0.125 2024-09-17 20:01:42,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290520.0, ans=0.1 2024-09-17 20:01:50,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-17 20:02:09,186 INFO [train.py:1198] (0/2) Epoch 17, batch 250, loss[loss=0.2673, ctc_loss=0.1571, cr_loss=0.4192, attn_decoder_loss=0.2703, over 29289.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1454, cr_loss=0.3876, attn_decoder_loss=0.2547, over 4139569.96 frames. ], batch size: 100, lr: 6.67e-03, grad_scale: 8.0 2024-09-17 20:02:11,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-17 20:02:13,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.517e+01 9.040e+01 9.817e+01 1.381e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-17 20:02:14,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=290600.0, ans=0.125 2024-09-17 20:02:20,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=290600.0, ans=0.125 2024-09-17 20:02:27,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=290640.0, ans=10.0 2024-09-17 20:02:27,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290640.0, ans=0.1 2024-09-17 20:02:27,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=290640.0, ans=10.0 2024-09-17 20:02:35,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=290640.0, ans=0.09899494936611666 2024-09-17 20:02:35,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=290640.0, ans=0.0 2024-09-17 20:02:44,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-17 20:02:53,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=290720.0, ans=0.0 2024-09-17 20:02:59,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=290720.0, ans=0.125 2024-09-17 20:03:01,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=22.5 2024-09-17 20:03:19,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2024-09-17 20:03:24,667 INFO [train.py:1198] (0/2) Epoch 17, batch 300, loss[loss=0.2681, ctc_loss=0.1588, cr_loss=0.4063, attn_decoder_loss=0.2712, over 29563.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1446, cr_loss=0.3865, attn_decoder_loss=0.2539, over 4508307.28 frames. ], batch size: 92, lr: 6.66e-03, grad_scale: 8.0 2024-09-17 20:03:37,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=290800.0, ans=0.125 2024-09-17 20:03:46,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-09-17 20:04:04,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=290880.0, ans=0.1 2024-09-17 20:04:07,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=290880.0, ans=0.125 2024-09-17 20:04:19,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=290920.0, ans=0.025 2024-09-17 20:04:42,400 INFO [train.py:1198] (0/2) Epoch 17, batch 350, loss[loss=0.2211, ctc_loss=0.1236, cr_loss=0.3476, attn_decoder_loss=0.2242, over 29345.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.145, cr_loss=0.3877, attn_decoder_loss=0.2547, over 4794779.12 frames. ], batch size: 71, lr: 6.66e-03, grad_scale: 8.0 2024-09-17 20:04:46,785 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.690e+01 9.264e+01 9.789e+01 1.817e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-17 20:04:48,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291000.0, ans=0.0 2024-09-17 20:05:02,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=291040.0, ans=0.0 2024-09-17 20:05:09,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=291040.0, ans=0.0 2024-09-17 20:05:15,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=291080.0, ans=0.2 2024-09-17 20:05:33,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291120.0, ans=0.1 2024-09-17 20:05:40,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-17 20:05:53,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.50 vs. limit=15.0 2024-09-17 20:05:58,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291200.0, ans=0.1 2024-09-17 20:06:00,102 INFO [train.py:1198] (0/2) Epoch 17, batch 400, loss[loss=0.2562, ctc_loss=0.1478, cr_loss=0.3908, attn_decoder_loss=0.2596, over 29678.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1446, cr_loss=0.3868, attn_decoder_loss=0.2545, over 5024044.36 frames. ], batch size: 82, lr: 6.66e-03, grad_scale: 16.0 2024-09-17 20:06:37,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=291280.0, ans=12.0 2024-09-17 20:06:45,851 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:06:47,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=291320.0, ans=0.125 2024-09-17 20:07:08,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=291360.0, ans=0.0 2024-09-17 20:07:15,622 INFO [train.py:1198] (0/2) Epoch 17, batch 450, loss[loss=0.2702, ctc_loss=0.1646, cr_loss=0.4285, attn_decoder_loss=0.2724, over 29700.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1447, cr_loss=0.3868, attn_decoder_loss=0.2544, over 5186332.54 frames. ], batch size: 83, lr: 6.66e-03, grad_scale: 8.0 2024-09-17 20:07:21,595 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.659e+01 9.188e+01 9.784e+01 2.602e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-17 20:07:28,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=291400.0, ans=0.2 2024-09-17 20:07:29,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-09-17 20:07:42,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.01 vs. limit=15.0 2024-09-17 20:07:47,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=291480.0, ans=0.0 2024-09-17 20:08:19,966 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:08:33,722 INFO [train.py:1198] (0/2) Epoch 17, batch 500, loss[loss=0.2715, ctc_loss=0.1627, cr_loss=0.4325, attn_decoder_loss=0.274, over 29447.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1444, cr_loss=0.3871, attn_decoder_loss=0.2539, over 5329065.67 frames. ], batch size: 94, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:08:44,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=291600.0, ans=0.0 2024-09-17 20:08:45,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.41 vs. limit=22.5 2024-09-17 20:09:17,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291720.0, ans=0.1 2024-09-17 20:09:19,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=291720.0, ans=0.2 2024-09-17 20:09:36,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-09-17 20:09:40,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=291760.0, ans=0.0 2024-09-17 20:09:40,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=291760.0, ans=0.125 2024-09-17 20:09:51,608 INFO [train.py:1198] (0/2) Epoch 17, batch 550, loss[loss=0.2636, ctc_loss=0.15, cr_loss=0.4055, attn_decoder_loss=0.2672, over 28803.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1446, cr_loss=0.3867, attn_decoder_loss=0.254, over 5423617.86 frames. ], batch size: 104, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:09:57,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 9.075e+01 9.597e+01 1.052e+02 1.735e+02, threshold=1.919e+02, percent-clipped=0.0 2024-09-17 20:09:58,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=291800.0, ans=0.0 2024-09-17 20:10:20,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2024-09-17 20:10:20,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=291880.0, ans=0.0 2024-09-17 20:10:26,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=291880.0, ans=0.0 2024-09-17 20:10:30,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=291880.0, ans=0.0 2024-09-17 20:10:49,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291920.0, ans=0.1 2024-09-17 20:10:56,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=291960.0, ans=0.0 2024-09-17 20:11:08,250 INFO [train.py:1198] (0/2) Epoch 17, batch 600, loss[loss=0.261, ctc_loss=0.1457, cr_loss=0.3917, attn_decoder_loss=0.2651, over 29320.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.145, cr_loss=0.3878, attn_decoder_loss=0.2547, over 5509737.78 frames. ], batch size: 100, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:11:08,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=292000.0, ans=0.025 2024-09-17 20:11:17,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=292000.0, ans=0.125 2024-09-17 20:11:25,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=292040.0, ans=0.2 2024-09-17 20:11:30,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-17 20:11:45,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=292080.0, ans=0.2 2024-09-17 20:11:56,081 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:11:57,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=292120.0, ans=0.07 2024-09-17 20:12:11,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=292160.0, ans=0.125 2024-09-17 20:12:15,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.93 vs. limit=15.0 2024-09-17 20:12:23,226 INFO [train.py:1198] (0/2) Epoch 17, batch 650, loss[loss=0.257, ctc_loss=0.1421, cr_loss=0.4108, attn_decoder_loss=0.2607, over 29768.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.144, cr_loss=0.3859, attn_decoder_loss=0.2538, over 5586661.30 frames. ], batch size: 81, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:12:29,211 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.569e+01 9.101e+01 9.967e+01 2.303e+02, threshold=1.820e+02, percent-clipped=2.0 2024-09-17 20:12:34,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=292200.0, ans=0.125 2024-09-17 20:12:57,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=292280.0, ans=0.025 2024-09-17 20:12:58,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.58 vs. limit=22.5 2024-09-17 20:13:17,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=292320.0, ans=0.0 2024-09-17 20:13:18,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=292320.0, ans=0.1 2024-09-17 20:13:19,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=292320.0, ans=0.125 2024-09-17 20:13:21,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=292320.0, ans=0.125 2024-09-17 20:13:24,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=292360.0, ans=0.0 2024-09-17 20:13:30,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.20 vs. limit=15.0 2024-09-17 20:13:43,857 INFO [train.py:1198] (0/2) Epoch 17, batch 700, loss[loss=0.2393, ctc_loss=0.1393, cr_loss=0.3805, attn_decoder_loss=0.242, over 29545.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1449, cr_loss=0.3876, attn_decoder_loss=0.2547, over 5637985.84 frames. ], batch size: 76, lr: 6.65e-03, grad_scale: 8.0 2024-09-17 20:13:48,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=292400.0, ans=0.025 2024-09-17 20:14:06,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=292440.0, ans=0.025 2024-09-17 20:14:13,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-09-17 20:14:17,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=292480.0, ans=0.125 2024-09-17 20:14:29,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=292520.0, ans=0.125 2024-09-17 20:14:40,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=292520.0, ans=0.0 2024-09-17 20:14:58,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=292600.0, ans=0.0 2024-09-17 20:14:59,468 INFO [train.py:1198] (0/2) Epoch 17, batch 750, loss[loss=0.25, ctc_loss=0.1421, cr_loss=0.3897, attn_decoder_loss=0.2533, over 29710.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1449, cr_loss=0.3874, attn_decoder_loss=0.2542, over 5676910.20 frames. ], batch size: 82, lr: 6.64e-03, grad_scale: 8.0 2024-09-17 20:15:05,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.913e+01 9.464e+01 1.024e+02 2.439e+02, threshold=1.893e+02, percent-clipped=2.0 2024-09-17 20:15:05,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=292600.0, ans=0.0 2024-09-17 20:15:14,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=292640.0, ans=0.0 2024-09-17 20:15:24,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-09-17 20:15:25,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=292640.0, ans=0.125 2024-09-17 20:15:36,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.66 vs. limit=22.5 2024-09-17 20:15:49,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=292720.0, ans=0.125 2024-09-17 20:15:57,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.18 vs. limit=15.0 2024-09-17 20:16:07,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=292760.0, ans=0.0 2024-09-17 20:16:15,492 INFO [train.py:1198] (0/2) Epoch 17, batch 800, loss[loss=0.2232, ctc_loss=0.1176, cr_loss=0.3294, attn_decoder_loss=0.2276, over 29606.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1448, cr_loss=0.3874, attn_decoder_loss=0.2538, over 5707700.67 frames. ], batch size: 73, lr: 6.64e-03, grad_scale: 16.0 2024-09-17 20:16:18,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=292800.0, ans=0.0 2024-09-17 20:16:44,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.25 vs. limit=10.0 2024-09-17 20:17:07,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=292920.0, ans=0.125 2024-09-17 20:17:14,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.70 vs. limit=15.0 2024-09-17 20:17:17,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=292960.0, ans=0.125 2024-09-17 20:17:33,057 INFO [train.py:1198] (0/2) Epoch 17, batch 850, loss[loss=0.2586, ctc_loss=0.1485, cr_loss=0.3985, attn_decoder_loss=0.2619, over 29710.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1441, cr_loss=0.3859, attn_decoder_loss=0.2534, over 5737157.47 frames. ], batch size: 89, lr: 6.64e-03, grad_scale: 8.0 2024-09-17 20:17:33,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=293000.0, ans=0.2 2024-09-17 20:17:42,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.745e+01 9.386e+01 1.018e+02 1.977e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 20:17:52,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.48 vs. limit=10.0 2024-09-17 20:17:58,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=293040.0, ans=0.2 2024-09-17 20:18:16,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=293080.0, ans=0.0 2024-09-17 20:18:27,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293120.0, ans=0.1 2024-09-17 20:18:30,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=293120.0, ans=0.07 2024-09-17 20:18:46,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=293160.0, ans=0.2 2024-09-17 20:18:47,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=12.0 2024-09-17 20:18:51,106 INFO [train.py:1198] (0/2) Epoch 17, batch 900, loss[loss=0.2208, ctc_loss=0.1142, cr_loss=0.3311, attn_decoder_loss=0.2253, over 29633.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1445, cr_loss=0.3861, attn_decoder_loss=0.2536, over 5739813.89 frames. ], batch size: 73, lr: 6.64e-03, grad_scale: 8.0 2024-09-17 20:19:00,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=293200.0, ans=0.015 2024-09-17 20:19:14,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=293240.0, ans=0.0 2024-09-17 20:19:15,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=293240.0, ans=0.125 2024-09-17 20:19:52,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-09-17 20:19:59,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=293360.0, ans=0.125 2024-09-17 20:20:06,843 INFO [train.py:1198] (0/2) Epoch 17, batch 950, loss[loss=0.2352, ctc_loss=0.1279, cr_loss=0.3525, attn_decoder_loss=0.2393, over 29507.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1451, cr_loss=0.387, attn_decoder_loss=0.2539, over 5741936.81 frames. ], batch size: 74, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:20:14,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.889e+01 9.768e+01 1.117e+02 1.855e+02, threshold=1.954e+02, percent-clipped=0.0 2024-09-17 20:20:14,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293400.0, ans=0.0 2024-09-17 20:20:32,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-09-17 20:21:14,161 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:21:26,829 INFO [train.py:1198] (0/2) Epoch 17, batch 1000, loss[loss=0.2437, ctc_loss=0.1376, cr_loss=0.384, attn_decoder_loss=0.2469, over 29517.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1459, cr_loss=0.3882, attn_decoder_loss=0.2548, over 5735828.37 frames. ], batch size: 77, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:21:28,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=293600.0, ans=0.0 2024-09-17 20:21:45,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=293640.0, ans=0.125 2024-09-17 20:21:59,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=293680.0, ans=0.0 2024-09-17 20:22:04,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-09-17 20:22:10,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=293680.0, ans=0.2 2024-09-17 20:22:21,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293720.0, ans=0.1 2024-09-17 20:22:22,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=293720.0, ans=0.0 2024-09-17 20:22:34,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=293760.0, ans=0.0 2024-09-17 20:22:35,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=293760.0, ans=0.0 2024-09-17 20:22:42,665 INFO [train.py:1198] (0/2) Epoch 17, batch 1050, loss[loss=0.2505, ctc_loss=0.1366, cr_loss=0.3657, attn_decoder_loss=0.255, over 29697.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1447, cr_loss=0.3863, attn_decoder_loss=0.2537, over 5744695.28 frames. ], batch size: 85, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:22:44,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=293800.0, ans=0.0 2024-09-17 20:22:50,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.852e+01 9.385e+01 1.035e+02 1.958e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 20:23:01,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=293840.0, ans=0.0 2024-09-17 20:23:25,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=293880.0, ans=0.1 2024-09-17 20:23:40,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=293920.0, ans=0.0 2024-09-17 20:23:42,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=12.0 2024-09-17 20:23:43,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=293960.0, ans=0.0 2024-09-17 20:23:58,419 INFO [train.py:1198] (0/2) Epoch 17, batch 1100, loss[loss=0.2414, ctc_loss=0.1341, cr_loss=0.3684, attn_decoder_loss=0.2451, over 29458.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1447, cr_loss=0.3868, attn_decoder_loss=0.2541, over 5757298.26 frames. ], batch size: 78, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:24:12,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=294040.0, ans=0.125 2024-09-17 20:24:13,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=294040.0, ans=0.125 2024-09-17 20:24:28,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=294080.0, ans=0.125 2024-09-17 20:24:43,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=294080.0, ans=0.125 2024-09-17 20:24:58,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294120.0, ans=0.1 2024-09-17 20:25:01,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294160.0, ans=0.1 2024-09-17 20:25:10,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=294160.0, ans=0.025 2024-09-17 20:25:18,662 INFO [train.py:1198] (0/2) Epoch 17, batch 1150, loss[loss=0.2391, ctc_loss=0.1368, cr_loss=0.3774, attn_decoder_loss=0.242, over 29432.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1443, cr_loss=0.3861, attn_decoder_loss=0.2537, over 5755154.96 frames. ], batch size: 78, lr: 6.63e-03, grad_scale: 8.0 2024-09-17 20:25:26,292 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.857e+01 8.746e+01 9.258e+01 9.833e+01 4.199e+02, threshold=1.852e+02, percent-clipped=3.0 2024-09-17 20:25:38,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=294240.0, ans=0.125 2024-09-17 20:25:45,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=294240.0, ans=0.125 2024-09-17 20:26:09,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=294320.0, ans=0.2 2024-09-17 20:26:34,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-09-17 20:26:34,875 INFO [train.py:1198] (0/2) Epoch 17, batch 1200, loss[loss=0.2584, ctc_loss=0.1382, cr_loss=0.3832, attn_decoder_loss=0.2632, over 29670.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1449, cr_loss=0.3872, attn_decoder_loss=0.2546, over 5748065.58 frames. ], batch size: 85, lr: 6.62e-03, grad_scale: 16.0 2024-09-17 20:26:39,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=294400.0, ans=0.0 2024-09-17 20:26:42,909 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:26:48,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=294440.0, ans=0.125 2024-09-17 20:26:50,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=294440.0, ans=0.0 2024-09-17 20:27:20,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294520.0, ans=0.125 2024-09-17 20:27:32,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-17 20:27:36,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-09-17 20:27:50,825 INFO [train.py:1198] (0/2) Epoch 17, batch 1250, loss[loss=0.2635, ctc_loss=0.1556, cr_loss=0.4229, attn_decoder_loss=0.266, over 29484.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1451, cr_loss=0.3885, attn_decoder_loss=0.2553, over 5775309.35 frames. ], batch size: 92, lr: 6.62e-03, grad_scale: 8.0 2024-09-17 20:27:54,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2024-09-17 20:27:59,838 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.886e+01 9.388e+01 9.868e+01 1.541e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-17 20:28:06,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=294640.0, ans=0.125 2024-09-17 20:28:19,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=294680.0, ans=0.125 2024-09-17 20:28:19,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=294680.0, ans=0.0 2024-09-17 20:28:32,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=294680.0, ans=0.0 2024-09-17 20:29:01,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=294760.0, ans=0.2 2024-09-17 20:29:01,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=294760.0, ans=0.0 2024-09-17 20:29:07,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=294760.0, ans=0.125 2024-09-17 20:29:10,540 INFO [train.py:1198] (0/2) Epoch 17, batch 1300, loss[loss=0.2618, ctc_loss=0.1501, cr_loss=0.3928, attn_decoder_loss=0.2655, over 28248.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1445, cr_loss=0.3867, attn_decoder_loss=0.2545, over 5779838.36 frames. ], batch size: 111, lr: 6.62e-03, grad_scale: 8.0 2024-09-17 20:29:12,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=294800.0, ans=0.1 2024-09-17 20:29:19,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-17 20:29:31,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294840.0, ans=0.1 2024-09-17 20:29:41,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=294880.0, ans=0.125 2024-09-17 20:29:50,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=294880.0, ans=0.125 2024-09-17 20:29:50,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=294880.0, ans=0.125 2024-09-17 20:30:11,255 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:30:17,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=294960.0, ans=0.125 2024-09-17 20:30:24,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-17 20:30:26,392 INFO [train.py:1198] (0/2) Epoch 17, batch 1350, loss[loss=0.2597, ctc_loss=0.1498, cr_loss=0.3808, attn_decoder_loss=0.2634, over 29762.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.144, cr_loss=0.3862, attn_decoder_loss=0.254, over 5798142.37 frames. ], batch size: 81, lr: 6.62e-03, grad_scale: 8.0 2024-09-17 20:30:26,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=295000.0, ans=0.125 2024-09-17 20:30:29,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=295000.0, ans=0.125 2024-09-17 20:30:35,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.707e+01 9.188e+01 9.676e+01 1.559e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-17 20:30:43,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=295040.0, ans=0.125 2024-09-17 20:30:43,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=295040.0, ans=0.125 2024-09-17 20:30:49,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295040.0, ans=0.1 2024-09-17 20:30:53,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=295040.0, ans=0.0 2024-09-17 20:30:54,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=295080.0, ans=0.2 2024-09-17 20:31:02,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=295080.0, ans=0.125 2024-09-17 20:31:15,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=22.5 2024-09-17 20:31:31,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=295160.0, ans=0.2 2024-09-17 20:31:35,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=295160.0, ans=0.0 2024-09-17 20:31:35,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295160.0, ans=0.1 2024-09-17 20:31:41,837 INFO [train.py:1198] (0/2) Epoch 17, batch 1400, loss[loss=0.229, ctc_loss=0.1287, cr_loss=0.3675, attn_decoder_loss=0.232, over 29600.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1437, cr_loss=0.3855, attn_decoder_loss=0.2538, over 5809301.54 frames. ], batch size: 69, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:32:03,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=295240.0, ans=0.125 2024-09-17 20:32:28,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2024-09-17 20:32:43,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.23 vs. limit=15.0 2024-09-17 20:32:55,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-17 20:33:01,972 INFO [train.py:1198] (0/2) Epoch 17, batch 1450, loss[loss=0.2592, ctc_loss=0.1448, cr_loss=0.3958, attn_decoder_loss=0.2632, over 29394.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1438, cr_loss=0.3858, attn_decoder_loss=0.2543, over 5805868.02 frames. ], batch size: 94, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:33:10,927 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.631e+01 9.209e+01 9.989e+01 1.746e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-17 20:33:18,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2024-09-17 20:33:49,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295520.0, ans=0.1 2024-09-17 20:34:00,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=295560.0, ans=0.125 2024-09-17 20:34:08,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=295560.0, ans=0.125 2024-09-17 20:34:17,470 INFO [train.py:1198] (0/2) Epoch 17, batch 1500, loss[loss=0.2618, ctc_loss=0.1466, cr_loss=0.3878, attn_decoder_loss=0.266, over 29637.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1437, cr_loss=0.3863, attn_decoder_loss=0.2545, over 5807127.13 frames. ], batch size: 86, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:34:25,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=295600.0, ans=0.125 2024-09-17 20:34:43,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=295640.0, ans=0.025 2024-09-17 20:34:55,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=295680.0, ans=0.0 2024-09-17 20:34:56,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.22 vs. limit=15.0 2024-09-17 20:34:57,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=295680.0, ans=0.5 2024-09-17 20:35:07,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.96 vs. limit=15.0 2024-09-17 20:35:14,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=295720.0, ans=15.0 2024-09-17 20:35:27,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=295760.0, ans=0.125 2024-09-17 20:35:28,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-09-17 20:35:32,364 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:35:33,484 INFO [train.py:1198] (0/2) Epoch 17, batch 1550, loss[loss=0.2579, ctc_loss=0.149, cr_loss=0.3931, attn_decoder_loss=0.2612, over 29513.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1439, cr_loss=0.3858, attn_decoder_loss=0.2544, over 5782081.95 frames. ], batch size: 90, lr: 6.61e-03, grad_scale: 8.0 2024-09-17 20:35:42,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 9.019e+01 9.707e+01 1.076e+02 7.268e+02, threshold=1.941e+02, percent-clipped=2.0 2024-09-17 20:35:44,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=295800.0, ans=0.125 2024-09-17 20:35:50,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=12.0 2024-09-17 20:36:02,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.23 vs. limit=22.5 2024-09-17 20:36:43,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=295960.0, ans=0.0 2024-09-17 20:36:43,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=295960.0, ans=0.05 2024-09-17 20:36:53,524 INFO [train.py:1198] (0/2) Epoch 17, batch 1600, loss[loss=0.2557, ctc_loss=0.1435, cr_loss=0.3949, attn_decoder_loss=0.2594, over 29682.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1449, cr_loss=0.387, attn_decoder_loss=0.2548, over 5763617.13 frames. ], batch size: 85, lr: 6.61e-03, grad_scale: 16.0 2024-09-17 20:37:43,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=296120.0, ans=0.09899494936611666 2024-09-17 20:37:51,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=296120.0, ans=10.0 2024-09-17 20:37:51,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=296120.0, ans=0.125 2024-09-17 20:38:08,991 INFO [train.py:1198] (0/2) Epoch 17, batch 1650, loss[loss=0.2647, ctc_loss=0.151, cr_loss=0.4132, attn_decoder_loss=0.2681, over 29708.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1441, cr_loss=0.3859, attn_decoder_loss=0.2544, over 5757173.09 frames. ], batch size: 89, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:38:19,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.329e+01 8.617e+01 9.352e+01 1.025e+02 5.265e+02, threshold=1.870e+02, percent-clipped=3.0 2024-09-17 20:38:22,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.71 vs. limit=15.0 2024-09-17 20:38:29,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=296240.0, ans=0.025 2024-09-17 20:39:24,821 INFO [train.py:1198] (0/2) Epoch 17, batch 1700, loss[loss=0.2251, ctc_loss=0.1258, cr_loss=0.3556, attn_decoder_loss=0.2282, over 29591.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1435, cr_loss=0.3849, attn_decoder_loss=0.2538, over 5779372.50 frames. ], batch size: 69, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:39:46,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=296440.0, ans=0.125 2024-09-17 20:39:55,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=296480.0, ans=0.2 2024-09-17 20:40:07,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=296480.0, ans=0.0 2024-09-17 20:40:25,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=296560.0, ans=0.0 2024-09-17 20:40:27,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296560.0, ans=0.125 2024-09-17 20:40:31,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-09-17 20:40:33,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=296560.0, ans=0.1 2024-09-17 20:40:35,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=12.0 2024-09-17 20:40:44,386 INFO [train.py:1198] (0/2) Epoch 17, batch 1750, loss[loss=0.2244, ctc_loss=0.1243, cr_loss=0.3519, attn_decoder_loss=0.2277, over 29368.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1436, cr_loss=0.3853, attn_decoder_loss=0.2537, over 5787502.18 frames. ], batch size: 67, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:40:54,994 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.563e+01 9.059e+01 9.719e+01 2.142e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-17 20:41:04,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=296640.0, ans=0.2 2024-09-17 20:41:13,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=296680.0, ans=0.0 2024-09-17 20:41:16,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=296680.0, ans=0.125 2024-09-17 20:41:21,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-09-17 20:41:26,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=296680.0, ans=0.025 2024-09-17 20:41:29,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=296720.0, ans=0.125 2024-09-17 20:41:35,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=296720.0, ans=0.125 2024-09-17 20:41:40,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=296720.0, ans=0.0 2024-09-17 20:41:46,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=296760.0, ans=0.02 2024-09-17 20:41:49,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=296760.0, ans=0.0 2024-09-17 20:42:00,034 INFO [train.py:1198] (0/2) Epoch 17, batch 1800, loss[loss=0.2656, ctc_loss=0.1542, cr_loss=0.414, attn_decoder_loss=0.2688, over 29715.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1439, cr_loss=0.3862, attn_decoder_loss=0.2539, over 5791716.06 frames. ], batch size: 83, lr: 6.60e-03, grad_scale: 8.0 2024-09-17 20:42:11,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=296800.0, ans=0.125 2024-09-17 20:42:29,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=296880.0, ans=0.125 2024-09-17 20:42:30,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=296880.0, ans=0.125 2024-09-17 20:42:54,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-17 20:43:11,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=296960.0, ans=0.125 2024-09-17 20:43:16,002 INFO [train.py:1198] (0/2) Epoch 17, batch 1850, loss[loss=0.2499, ctc_loss=0.1376, cr_loss=0.3883, attn_decoder_loss=0.2538, over 29608.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1432, cr_loss=0.3852, attn_decoder_loss=0.2533, over 5796413.63 frames. ], batch size: 86, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:43:24,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.75 vs. limit=12.0 2024-09-17 20:43:26,337 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.992e+01 9.506e+01 1.016e+02 2.077e+02, threshold=1.901e+02, percent-clipped=1.0 2024-09-17 20:43:27,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.44 vs. limit=15.0 2024-09-17 20:43:40,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=297040.0, ans=0.0 2024-09-17 20:43:48,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297080.0, ans=0.1 2024-09-17 20:43:49,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=297080.0, ans=0.2 2024-09-17 20:44:03,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2024-09-17 20:44:16,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=297120.0, ans=0.125 2024-09-17 20:44:16,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-09-17 20:44:26,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=297160.0, ans=0.125 2024-09-17 20:44:27,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=297160.0, ans=0.125 2024-09-17 20:44:27,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=297160.0, ans=0.1 2024-09-17 20:44:31,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2024-09-17 20:44:35,820 INFO [train.py:1198] (0/2) Epoch 17, batch 1900, loss[loss=0.2672, ctc_loss=0.161, cr_loss=0.4189, attn_decoder_loss=0.2697, over 29701.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1443, cr_loss=0.387, attn_decoder_loss=0.2546, over 5805065.92 frames. ], batch size: 89, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:44:49,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=297240.0, ans=0.0 2024-09-17 20:45:23,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=297320.0, ans=0.125 2024-09-17 20:45:33,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-09-17 20:45:49,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=297360.0, ans=0.125 2024-09-17 20:45:50,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297400.0, ans=0.125 2024-09-17 20:45:52,068 INFO [train.py:1198] (0/2) Epoch 17, batch 1950, loss[loss=0.2489, ctc_loss=0.151, cr_loss=0.3893, attn_decoder_loss=0.2511, over 29441.00 frames. ], tot_loss[loss=0.2523, ctc_loss=0.1447, cr_loss=0.388, attn_decoder_loss=0.2556, over 5819226.68 frames. ], batch size: 78, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:46:02,795 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.875e+01 9.464e+01 9.894e+01 2.247e+02, threshold=1.893e+02, percent-clipped=1.0 2024-09-17 20:46:16,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=297440.0, ans=0.125 2024-09-17 20:46:25,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297480.0, ans=0.1 2024-09-17 20:46:33,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=297480.0, ans=0.0 2024-09-17 20:47:00,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=297560.0, ans=0.125 2024-09-17 20:47:08,298 INFO [train.py:1198] (0/2) Epoch 17, batch 2000, loss[loss=0.2225, ctc_loss=0.1232, cr_loss=0.3587, attn_decoder_loss=0.2255, over 29335.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1453, cr_loss=0.3887, attn_decoder_loss=0.256, over 5795625.69 frames. ], batch size: 67, lr: 6.59e-03, grad_scale: 16.0 2024-09-17 20:47:12,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-17 20:47:33,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-09-17 20:47:34,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=297640.0, ans=0.0 2024-09-17 20:47:50,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297680.0, ans=0.125 2024-09-17 20:48:27,879 INFO [train.py:1198] (0/2) Epoch 17, batch 2050, loss[loss=0.2231, ctc_loss=0.1224, cr_loss=0.347, attn_decoder_loss=0.2266, over 29408.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1448, cr_loss=0.3879, attn_decoder_loss=0.2552, over 5788726.52 frames. ], batch size: 70, lr: 6.59e-03, grad_scale: 8.0 2024-09-17 20:48:34,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=297800.0, ans=0.025 2024-09-17 20:48:40,029 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.707e+01 9.110e+01 9.757e+01 1.726e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-17 20:48:46,391 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:49:08,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=297880.0, ans=0.125 2024-09-17 20:49:14,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=297920.0, ans=0.0 2024-09-17 20:49:37,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=297960.0, ans=0.5 2024-09-17 20:49:43,052 INFO [train.py:1198] (0/2) Epoch 17, batch 2100, loss[loss=0.261, ctc_loss=0.1504, cr_loss=0.3985, attn_decoder_loss=0.2644, over 29784.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1443, cr_loss=0.3874, attn_decoder_loss=0.2545, over 5800465.65 frames. ], batch size: 81, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:49:47,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=298000.0, ans=0.1 2024-09-17 20:49:50,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=298000.0, ans=0.125 2024-09-17 20:50:04,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.81 vs. limit=10.0 2024-09-17 20:50:08,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=298040.0, ans=0.2 2024-09-17 20:50:10,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=298040.0, ans=0.125 2024-09-17 20:50:34,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.18 vs. limit=22.5 2024-09-17 20:50:34,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.21 vs. limit=15.0 2024-09-17 20:50:58,161 INFO [train.py:1198] (0/2) Epoch 17, batch 2150, loss[loss=0.2437, ctc_loss=0.1341, cr_loss=0.3651, attn_decoder_loss=0.2478, over 29462.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1438, cr_loss=0.3869, attn_decoder_loss=0.2538, over 5814872.94 frames. ], batch size: 78, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:51:01,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=298200.0, ans=0.125 2024-09-17 20:51:10,385 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.718e+01 9.185e+01 9.940e+01 1.615e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-17 20:51:34,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=298280.0, ans=0.125 2024-09-17 20:51:40,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=298280.0, ans=0.125 2024-09-17 20:51:54,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=298320.0, ans=0.2 2024-09-17 20:51:55,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=298320.0, ans=0.125 2024-09-17 20:52:10,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=298360.0, ans=0.125 2024-09-17 20:52:18,711 INFO [train.py:1198] (0/2) Epoch 17, batch 2200, loss[loss=0.2586, ctc_loss=0.1481, cr_loss=0.3894, attn_decoder_loss=0.2623, over 29626.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1443, cr_loss=0.3871, attn_decoder_loss=0.2541, over 5811417.99 frames. ], batch size: 86, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:52:28,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=298400.0, ans=0.125 2024-09-17 20:52:40,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=22.5 2024-09-17 20:52:45,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=298440.0, ans=0.2 2024-09-17 20:53:01,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=298480.0, ans=0.1 2024-09-17 20:53:33,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-17 20:53:34,402 INFO [train.py:1198] (0/2) Epoch 17, batch 2250, loss[loss=0.2501, ctc_loss=0.1434, cr_loss=0.3989, attn_decoder_loss=0.2531, over 29721.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1439, cr_loss=0.386, attn_decoder_loss=0.2538, over 5810205.05 frames. ], batch size: 82, lr: 6.58e-03, grad_scale: 8.0 2024-09-17 20:53:36,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=298600.0, ans=0.0 2024-09-17 20:53:37,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=298600.0, ans=0.125 2024-09-17 20:53:46,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.540e+01 9.223e+01 9.820e+01 2.780e+02, threshold=1.845e+02, percent-clipped=3.0 2024-09-17 20:53:56,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2024-09-17 20:54:14,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2024-09-17 20:54:44,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=298760.0, ans=0.125 2024-09-17 20:54:50,230 INFO [train.py:1198] (0/2) Epoch 17, batch 2300, loss[loss=0.2365, ctc_loss=0.1391, cr_loss=0.3811, attn_decoder_loss=0.2389, over 29295.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1437, cr_loss=0.3859, attn_decoder_loss=0.2531, over 5797890.23 frames. ], batch size: 71, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 20:54:55,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=298800.0, ans=0.125 2024-09-17 20:55:03,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.04 vs. limit=15.0 2024-09-17 20:55:04,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=298840.0, ans=0.125 2024-09-17 20:55:19,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298880.0, ans=0.1 2024-09-17 20:55:21,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-17 20:55:23,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=298880.0, ans=0.125 2024-09-17 20:55:26,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298880.0, ans=0.1 2024-09-17 20:55:54,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=298960.0, ans=0.125 2024-09-17 20:55:58,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=298960.0, ans=0.125 2024-09-17 20:56:03,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=298960.0, ans=0.125 2024-09-17 20:56:07,995 INFO [train.py:1198] (0/2) Epoch 17, batch 2350, loss[loss=0.262, ctc_loss=0.1494, cr_loss=0.4111, attn_decoder_loss=0.2654, over 29697.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1434, cr_loss=0.3856, attn_decoder_loss=0.2532, over 5803368.27 frames. ], batch size: 83, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 20:56:15,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=299000.0, ans=0.0 2024-09-17 20:56:21,979 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.873e+01 9.644e+01 1.055e+02 1.144e+03, threshold=1.929e+02, percent-clipped=2.0 2024-09-17 20:56:49,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=299080.0, ans=0.125 2024-09-17 20:56:49,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299080.0, ans=0.1 2024-09-17 20:57:04,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299120.0, ans=0.1 2024-09-17 20:57:04,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=299120.0, ans=0.2 2024-09-17 20:57:26,179 INFO [train.py:1198] (0/2) Epoch 17, batch 2400, loss[loss=0.2393, ctc_loss=0.1356, cr_loss=0.3689, attn_decoder_loss=0.2427, over 29532.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1443, cr_loss=0.387, attn_decoder_loss=0.254, over 5807626.88 frames. ], batch size: 76, lr: 6.57e-03, grad_scale: 16.0 2024-09-17 20:57:50,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=299240.0, ans=0.2 2024-09-17 20:57:52,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=299240.0, ans=0.0 2024-09-17 20:58:02,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-17 20:58:13,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299320.0, ans=0.125 2024-09-17 20:58:13,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.69 vs. limit=22.5 2024-09-17 20:58:14,858 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:58:16,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-09-17 20:58:25,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=299360.0, ans=0.0 2024-09-17 20:58:34,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2024-09-17 20:58:41,909 INFO [train.py:1198] (0/2) Epoch 17, batch 2450, loss[loss=0.2594, ctc_loss=0.1495, cr_loss=0.4, attn_decoder_loss=0.2627, over 29730.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1453, cr_loss=0.3884, attn_decoder_loss=0.2552, over 5784646.38 frames. ], batch size: 82, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 20:58:45,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=299400.0, ans=0.125 2024-09-17 20:58:55,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 9.066e+01 9.720e+01 1.171e+02 1.991e+02, threshold=1.944e+02, percent-clipped=1.0 2024-09-17 20:59:13,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-17 20:59:21,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.29 vs. limit=15.0 2024-09-17 20:59:33,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=299520.0, ans=0.0 2024-09-17 20:59:37,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=22.5 2024-09-17 20:59:41,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=299560.0, ans=0.1 2024-09-17 20:59:59,629 INFO [train.py:1198] (0/2) Epoch 17, batch 2500, loss[loss=0.2567, ctc_loss=0.1448, cr_loss=0.3626, attn_decoder_loss=0.261, over 29630.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1454, cr_loss=0.3887, attn_decoder_loss=0.2552, over 5794424.14 frames. ], batch size: 86, lr: 6.57e-03, grad_scale: 8.0 2024-09-17 21:00:01,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=299600.0, ans=0.125 2024-09-17 21:00:11,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=299600.0, ans=0.2 2024-09-17 21:00:32,725 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:00:34,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=299680.0, ans=0.1 2024-09-17 21:00:35,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=299680.0, ans=0.125 2024-09-17 21:00:58,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=299720.0, ans=0.025 2024-09-17 21:01:00,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2024-09-17 21:01:18,018 INFO [train.py:1198] (0/2) Epoch 17, batch 2550, loss[loss=0.2227, ctc_loss=0.1243, cr_loss=0.3479, attn_decoder_loss=0.2259, over 29312.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1447, cr_loss=0.3875, attn_decoder_loss=0.2547, over 5797491.95 frames. ], batch size: 67, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:01:31,611 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 8.659e+01 9.126e+01 9.764e+01 1.342e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-17 21:01:52,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=299880.0, ans=0.125 2024-09-17 21:02:08,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=299920.0, ans=0.125 2024-09-17 21:02:22,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.51 vs. limit=15.0 2024-09-17 21:02:26,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=299960.0, ans=0.0 2024-09-17 21:02:27,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=299960.0, ans=0.125 2024-09-17 21:02:30,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=299960.0, ans=0.0 2024-09-17 21:02:30,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=299960.0, ans=0.125 2024-09-17 21:02:34,197 INFO [train.py:1198] (0/2) Epoch 17, batch 2600, loss[loss=0.2493, ctc_loss=0.1509, cr_loss=0.4101, attn_decoder_loss=0.2511, over 29435.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.1454, cr_loss=0.3887, attn_decoder_loss=0.2552, over 5794620.53 frames. ], batch size: 78, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:02:36,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=300000.0, ans=0.0 2024-09-17 21:02:42,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300000.0, ans=0.1 2024-09-17 21:03:10,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=300080.0, ans=0.2 2024-09-17 21:03:33,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-09-17 21:03:38,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=300160.0, ans=0.125 2024-09-17 21:03:51,168 INFO [train.py:1198] (0/2) Epoch 17, batch 2650, loss[loss=0.2624, ctc_loss=0.1546, cr_loss=0.4168, attn_decoder_loss=0.2651, over 29347.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1454, cr_loss=0.3893, attn_decoder_loss=0.2556, over 5800801.04 frames. ], batch size: 100, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:04:06,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 8.955e+01 9.384e+01 9.945e+01 2.228e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-17 21:04:18,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=300240.0, ans=0.0 2024-09-17 21:04:22,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=300280.0, ans=0.2 2024-09-17 21:05:01,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=300360.0, ans=0.125 2024-09-17 21:05:06,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=300360.0, ans=0.125 2024-09-17 21:05:09,160 INFO [train.py:1198] (0/2) Epoch 17, batch 2700, loss[loss=0.2625, ctc_loss=0.1415, cr_loss=0.3916, attn_decoder_loss=0.2672, over 29495.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1453, cr_loss=0.3893, attn_decoder_loss=0.2556, over 5795946.09 frames. ], batch size: 87, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:05:16,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-09-17 21:06:03,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300520.0, ans=0.1 2024-09-17 21:06:24,705 INFO [train.py:1198] (0/2) Epoch 17, batch 2750, loss[loss=0.2377, ctc_loss=0.1408, cr_loss=0.3957, attn_decoder_loss=0.2397, over 29506.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.144, cr_loss=0.3865, attn_decoder_loss=0.2541, over 5794646.24 frames. ], batch size: 75, lr: 6.56e-03, grad_scale: 8.0 2024-09-17 21:06:38,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.681e+01 9.439e+01 1.052e+02 4.745e+02, threshold=1.888e+02, percent-clipped=3.0 2024-09-17 21:07:07,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-09-17 21:07:13,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=300720.0, ans=0.2 2024-09-17 21:07:29,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=300760.0, ans=0.125 2024-09-17 21:07:43,693 INFO [train.py:1198] (0/2) Epoch 17, batch 2800, loss[loss=0.2732, ctc_loss=0.1806, cr_loss=0.3908, attn_decoder_loss=0.2748, over 20466.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1443, cr_loss=0.3867, attn_decoder_loss=0.2542, over 5775472.51 frames. ], batch size: 209, lr: 6.55e-03, grad_scale: 16.0 2024-09-17 21:07:56,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.90 vs. limit=15.0 2024-09-17 21:08:04,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=300840.0, ans=0.125 2024-09-17 21:08:07,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=300840.0, ans=0.0 2024-09-17 21:08:07,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-09-17 21:08:17,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=300880.0, ans=0.1 2024-09-17 21:08:23,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=300880.0, ans=0.125 2024-09-17 21:08:28,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=300880.0, ans=0.07 2024-09-17 21:08:34,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=300920.0, ans=0.125 2024-09-17 21:08:46,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=300960.0, ans=0.125 2024-09-17 21:08:54,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=300960.0, ans=0.125 2024-09-17 21:09:00,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=301000.0, ans=0.0 2024-09-17 21:09:01,392 INFO [train.py:1198] (0/2) Epoch 17, batch 2850, loss[loss=0.242, ctc_loss=0.1305, cr_loss=0.3617, attn_decoder_loss=0.2464, over 29495.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1447, cr_loss=0.3869, attn_decoder_loss=0.2548, over 5761255.37 frames. ], batch size: 77, lr: 6.55e-03, grad_scale: 8.0 2024-09-17 21:09:01,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=301000.0, ans=0.0 2024-09-17 21:09:12,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=301000.0, ans=0.0 2024-09-17 21:09:16,420 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.947e+01 9.466e+01 1.049e+02 1.883e+02, threshold=1.893e+02, percent-clipped=0.0 2024-09-17 21:09:51,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=301120.0, ans=0.125 2024-09-17 21:09:52,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=301120.0, ans=0.125 2024-09-17 21:10:17,125 INFO [train.py:1198] (0/2) Epoch 17, batch 2900, loss[loss=0.2481, ctc_loss=0.1422, cr_loss=0.3717, attn_decoder_loss=0.2517, over 29399.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1451, cr_loss=0.3886, attn_decoder_loss=0.2557, over 5786254.18 frames. ], batch size: 79, lr: 6.55e-03, grad_scale: 8.0 2024-09-17 21:10:24,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.90 vs. limit=22.5 2024-09-17 21:10:26,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=301200.0, ans=0.125 2024-09-17 21:10:32,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301240.0, ans=0.1 2024-09-17 21:10:41,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=301240.0, ans=0.2 2024-09-17 21:10:47,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301280.0, ans=0.1 2024-09-17 21:10:52,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301280.0, ans=0.1 2024-09-17 21:10:53,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2024-09-17 21:11:04,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=301320.0, ans=0.125 2024-09-17 21:11:13,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=301320.0, ans=0.2 2024-09-17 21:11:17,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.05 vs. limit=15.0 2024-09-17 21:11:35,005 INFO [train.py:1198] (0/2) Epoch 17, batch 2950, loss[loss=0.2371, ctc_loss=0.1328, cr_loss=0.36, attn_decoder_loss=0.2406, over 29502.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1443, cr_loss=0.387, attn_decoder_loss=0.2546, over 5781933.98 frames. ], batch size: 75, lr: 6.55e-03, grad_scale: 8.0 2024-09-17 21:11:39,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=301400.0, ans=0.0 2024-09-17 21:11:52,335 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.656e+01 9.103e+01 9.738e+01 1.377e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-17 21:12:30,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=301520.0, ans=0.0 2024-09-17 21:12:33,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=301520.0, ans=0.125 2024-09-17 21:12:36,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-17 21:12:51,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=301600.0, ans=0.025 2024-09-17 21:12:52,858 INFO [train.py:1198] (0/2) Epoch 17, batch 3000, loss[loss=0.2565, ctc_loss=0.1498, cr_loss=0.3735, attn_decoder_loss=0.2601, over 29726.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1438, cr_loss=0.3854, attn_decoder_loss=0.2542, over 5782913.88 frames. ], batch size: 81, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:12:52,858 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 21:13:11,357 INFO [train.py:1230] (0/2) Epoch 17, validation: loss=0.2115, ctc_loss=0.04066, cr_loss=4.995e-15, attn_decoder_loss=0.2305, over 944034.00 frames. 2024-09-17 21:13:11,357 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 21:13:16,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=301600.0, ans=0.2 2024-09-17 21:13:25,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-09-17 21:13:31,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=301640.0, ans=0.0 2024-09-17 21:13:31,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-17 21:13:32,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=301640.0, ans=0.125 2024-09-17 21:13:57,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=301720.0, ans=0.07 2024-09-17 21:14:10,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=22.5 2024-09-17 21:14:10,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=301760.0, ans=0.2 2024-09-17 21:14:18,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=301760.0, ans=0.025 2024-09-17 21:14:20,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2024-09-17 21:14:27,363 INFO [train.py:1198] (0/2) Epoch 17, batch 3050, loss[loss=0.2461, ctc_loss=0.1366, cr_loss=0.377, attn_decoder_loss=0.2498, over 29552.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1446, cr_loss=0.387, attn_decoder_loss=0.2549, over 5776293.57 frames. ], batch size: 76, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:14:28,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-17 21:14:41,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301840.0, ans=0.1 2024-09-17 21:14:42,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 9.363e+01 1.016e+02 1.140e+02 2.796e+02, threshold=2.033e+02, percent-clipped=4.0 2024-09-17 21:14:47,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=301840.0, ans=0.0 2024-09-17 21:14:50,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.10 vs. limit=22.5 2024-09-17 21:15:02,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301880.0, ans=0.1 2024-09-17 21:15:11,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301880.0, ans=0.1 2024-09-17 21:15:15,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2024-09-17 21:15:20,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=301920.0, ans=0.125 2024-09-17 21:15:46,821 INFO [train.py:1198] (0/2) Epoch 17, batch 3100, loss[loss=0.2618, ctc_loss=0.1541, cr_loss=0.4122, attn_decoder_loss=0.2646, over 29286.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1449, cr_loss=0.3874, attn_decoder_loss=0.2549, over 5776997.24 frames. ], batch size: 100, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:15:50,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=302000.0, ans=0.0 2024-09-17 21:16:02,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=302040.0, ans=0.0 2024-09-17 21:16:53,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=302160.0, ans=0.07 2024-09-17 21:16:54,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2024-09-17 21:16:58,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=302160.0, ans=0.125 2024-09-17 21:17:02,401 INFO [train.py:1198] (0/2) Epoch 17, batch 3150, loss[loss=0.2729, ctc_loss=0.1601, cr_loss=0.4242, attn_decoder_loss=0.276, over 28854.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1445, cr_loss=0.387, attn_decoder_loss=0.2546, over 5782652.84 frames. ], batch size: 104, lr: 6.54e-03, grad_scale: 8.0 2024-09-17 21:17:17,548 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.912e+01 9.257e+01 9.921e+01 1.761e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-17 21:17:23,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302240.0, ans=0.125 2024-09-17 21:17:37,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=12.0 2024-09-17 21:18:09,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302360.0, ans=0.125 2024-09-17 21:18:18,472 INFO [train.py:1198] (0/2) Epoch 17, batch 3200, loss[loss=0.2508, ctc_loss=0.1402, cr_loss=0.3736, attn_decoder_loss=0.2548, over 29401.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1439, cr_loss=0.3857, attn_decoder_loss=0.254, over 5792875.79 frames. ], batch size: 79, lr: 6.54e-03, grad_scale: 16.0 2024-09-17 21:18:20,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.03 vs. limit=22.5 2024-09-17 21:18:23,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302400.0, ans=0.125 2024-09-17 21:18:32,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=302440.0, ans=0.125 2024-09-17 21:18:49,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.34 vs. limit=15.0 2024-09-17 21:18:50,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=302480.0, ans=0.2 2024-09-17 21:18:51,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=22.5 2024-09-17 21:19:01,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=302480.0, ans=0.0 2024-09-17 21:19:27,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=302560.0, ans=0.125 2024-09-17 21:19:38,257 INFO [train.py:1198] (0/2) Epoch 17, batch 3250, loss[loss=0.2546, ctc_loss=0.1408, cr_loss=0.374, attn_decoder_loss=0.259, over 29694.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.144, cr_loss=0.3864, attn_decoder_loss=0.2543, over 5798816.63 frames. ], batch size: 84, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:19:41,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=302600.0, ans=0.125 2024-09-17 21:19:46,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=302600.0, ans=0.125 2024-09-17 21:19:54,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.527e+01 9.036e+01 9.665e+01 1.223e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-17 21:20:07,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=302680.0, ans=0.2 2024-09-17 21:20:49,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=302760.0, ans=0.2 2024-09-17 21:20:53,932 INFO [train.py:1198] (0/2) Epoch 17, batch 3300, loss[loss=0.2555, ctc_loss=0.1587, cr_loss=0.3821, attn_decoder_loss=0.2578, over 28177.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.143, cr_loss=0.3844, attn_decoder_loss=0.2532, over 5796316.58 frames. ], batch size: 111, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:21:15,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=302840.0, ans=0.0 2024-09-17 21:21:26,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302880.0, ans=0.1 2024-09-17 21:21:47,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=302920.0, ans=0.2 2024-09-17 21:22:01,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=302960.0, ans=0.05 2024-09-17 21:22:09,783 INFO [train.py:1198] (0/2) Epoch 17, batch 3350, loss[loss=0.2662, ctc_loss=0.1529, cr_loss=0.4045, attn_decoder_loss=0.2698, over 28765.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1438, cr_loss=0.3856, attn_decoder_loss=0.2541, over 5771790.42 frames. ], batch size: 104, lr: 6.53e-03, grad_scale: 4.0 2024-09-17 21:22:20,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=303000.0, ans=0.0 2024-09-17 21:22:28,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.973e+01 8.919e+01 9.576e+01 1.043e+02 2.558e+02, threshold=1.915e+02, percent-clipped=2.0 2024-09-17 21:22:31,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=303040.0, ans=0.125 2024-09-17 21:22:38,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=303080.0, ans=0.2 2024-09-17 21:22:50,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=303080.0, ans=0.0 2024-09-17 21:23:03,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=303120.0, ans=0.125 2024-09-17 21:23:23,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=303160.0, ans=0.125 2024-09-17 21:23:29,943 INFO [train.py:1198] (0/2) Epoch 17, batch 3400, loss[loss=0.22, ctc_loss=0.1214, cr_loss=0.3465, attn_decoder_loss=0.2233, over 29356.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1441, cr_loss=0.3859, attn_decoder_loss=0.2541, over 5764131.15 frames. ], batch size: 67, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:23:44,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.38 vs. limit=15.0 2024-09-17 21:23:48,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.82 vs. limit=10.0 2024-09-17 21:23:53,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=303240.0, ans=0.125 2024-09-17 21:24:11,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=303280.0, ans=0.125 2024-09-17 21:24:19,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.37 vs. limit=12.0 2024-09-17 21:24:35,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=303360.0, ans=0.0 2024-09-17 21:24:40,227 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:24:45,977 INFO [train.py:1198] (0/2) Epoch 17, batch 3450, loss[loss=0.26, ctc_loss=0.1464, cr_loss=0.3916, attn_decoder_loss=0.2639, over 28561.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1437, cr_loss=0.3853, attn_decoder_loss=0.2542, over 5773406.34 frames. ], batch size: 112, lr: 6.53e-03, grad_scale: 8.0 2024-09-17 21:25:04,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 9.059e+01 9.380e+01 1.001e+02 2.094e+02, threshold=1.876e+02, percent-clipped=1.0 2024-09-17 21:25:22,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=303480.0, ans=0.125 2024-09-17 21:25:33,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=303520.0, ans=0.125 2024-09-17 21:25:36,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=303520.0, ans=0.125 2024-09-17 21:25:41,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=303520.0, ans=0.0 2024-09-17 21:25:47,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=303560.0, ans=0.0 2024-09-17 21:26:00,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=303600.0, ans=0.2 2024-09-17 21:26:01,989 INFO [train.py:1198] (0/2) Epoch 17, batch 3500, loss[loss=0.216, ctc_loss=0.1078, cr_loss=0.3113, attn_decoder_loss=0.2211, over 29320.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1432, cr_loss=0.3845, attn_decoder_loss=0.2535, over 5775091.45 frames. ], batch size: 71, lr: 6.52e-03, grad_scale: 8.0 2024-09-17 21:26:12,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=303600.0, ans=0.0 2024-09-17 21:26:35,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=12.0 2024-09-17 21:26:41,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.37 vs. limit=15.0 2024-09-17 21:27:08,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=303760.0, ans=0.95 2024-09-17 21:27:18,813 INFO [train.py:1198] (0/2) Epoch 17, batch 3550, loss[loss=0.2545, ctc_loss=0.1388, cr_loss=0.3781, attn_decoder_loss=0.259, over 29692.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1433, cr_loss=0.3852, attn_decoder_loss=0.2535, over 5782345.25 frames. ], batch size: 89, lr: 6.52e-03, grad_scale: 8.0 2024-09-17 21:27:22,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=303800.0, ans=0.2 2024-09-17 21:27:36,516 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.716e+01 9.254e+01 9.841e+01 2.209e+02, threshold=1.851e+02, percent-clipped=2.0 2024-09-17 21:27:36,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303840.0, ans=0.0 2024-09-17 21:28:11,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=22.5 2024-09-17 21:28:17,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=303920.0, ans=0.025 2024-09-17 21:28:23,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2024-09-17 21:28:25,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=303960.0, ans=0.09899494936611666 2024-09-17 21:28:25,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=303960.0, ans=0.125 2024-09-17 21:28:34,212 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-76000.pt 2024-09-17 21:28:43,008 INFO [train.py:1198] (0/2) Epoch 17, batch 3600, loss[loss=0.2422, ctc_loss=0.1365, cr_loss=0.367, attn_decoder_loss=0.2458, over 29488.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1432, cr_loss=0.3854, attn_decoder_loss=0.2536, over 5791096.16 frames. ], batch size: 77, lr: 6.52e-03, grad_scale: 16.0 2024-09-17 21:29:25,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2024-09-17 21:29:31,255 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:29:37,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=304120.0, ans=0.125 2024-09-17 21:29:37,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=304120.0, ans=0.125 2024-09-17 21:29:38,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=304120.0, ans=0.0 2024-09-17 21:29:41,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=304160.0, ans=0.125 2024-09-17 21:29:57,761 INFO [train.py:1198] (0/2) Epoch 17, batch 3650, loss[loss=0.2696, ctc_loss=0.1558, cr_loss=0.4176, attn_decoder_loss=0.273, over 29501.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.143, cr_loss=0.3853, attn_decoder_loss=0.2529, over 5793711.44 frames. ], batch size: 90, lr: 6.52e-03, grad_scale: 8.0 2024-09-17 21:29:58,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=22.5 2024-09-17 21:30:14,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304240.0, ans=0.1 2024-09-17 21:30:17,007 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.843e+01 9.212e+01 9.798e+01 3.342e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-17 21:30:26,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=304280.0, ans=0.125 2024-09-17 21:30:39,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=304280.0, ans=0.0 2024-09-17 21:30:40,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.79 vs. limit=15.0 2024-09-17 21:30:44,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=304320.0, ans=0.0 2024-09-17 21:31:08,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=304360.0, ans=0.2 2024-09-17 21:31:12,237 INFO [train.py:1198] (0/2) Epoch 17, batch 3700, loss[loss=0.2466, ctc_loss=0.1347, cr_loss=0.381, attn_decoder_loss=0.2506, over 29704.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1429, cr_loss=0.3853, attn_decoder_loss=0.2531, over 5803591.63 frames. ], batch size: 84, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:31:18,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=304400.0, ans=0.125 2024-09-17 21:31:18,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=304400.0, ans=0.1 2024-09-17 21:31:42,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=304480.0, ans=0.025 2024-09-17 21:31:45,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304480.0, ans=0.1 2024-09-17 21:31:46,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=304480.0, ans=0.125 2024-09-17 21:31:50,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.67 vs. limit=10.0 2024-09-17 21:31:59,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=304520.0, ans=10.0 2024-09-17 21:32:07,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=304520.0, ans=0.125 2024-09-17 21:32:14,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304560.0, ans=0.1 2024-09-17 21:32:14,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=304560.0, ans=0.125 2024-09-17 21:32:16,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2024-09-17 21:32:23,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=304560.0, ans=0.09899494936611666 2024-09-17 21:32:26,341 INFO [train.py:1198] (0/2) Epoch 17, batch 3750, loss[loss=0.2215, ctc_loss=0.1221, cr_loss=0.3657, attn_decoder_loss=0.2244, over 29308.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1428, cr_loss=0.3855, attn_decoder_loss=0.2531, over 5807789.49 frames. ], batch size: 67, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:32:45,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.729e+01 9.186e+01 9.795e+01 2.542e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-17 21:33:43,100 INFO [train.py:1198] (0/2) Epoch 17, batch 3800, loss[loss=0.2588, ctc_loss=0.1483, cr_loss=0.4047, attn_decoder_loss=0.2621, over 29639.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1421, cr_loss=0.3843, attn_decoder_loss=0.2525, over 5798135.34 frames. ], batch size: 86, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:33:46,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-17 21:33:54,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=304800.0, ans=15.0 2024-09-17 21:34:13,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=304880.0, ans=0.0 2024-09-17 21:34:23,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=304880.0, ans=0.125 2024-09-17 21:34:33,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.94 vs. limit=10.0 2024-09-17 21:34:42,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2024-09-17 21:34:44,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304960.0, ans=0.1 2024-09-17 21:34:56,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=304960.0, ans=0.125 2024-09-17 21:34:59,147 INFO [train.py:1198] (0/2) Epoch 17, batch 3850, loss[loss=0.2677, ctc_loss=0.1564, cr_loss=0.4035, attn_decoder_loss=0.2711, over 29263.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1423, cr_loss=0.3848, attn_decoder_loss=0.2527, over 5812220.78 frames. ], batch size: 100, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:35:00,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=305000.0, ans=0.2 2024-09-17 21:35:15,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=305040.0, ans=0.2 2024-09-17 21:35:18,453 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.722e+01 9.215e+01 9.828e+01 1.401e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-17 21:35:20,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=305040.0, ans=0.125 2024-09-17 21:35:21,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=305040.0, ans=0.125 2024-09-17 21:35:23,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2024-09-17 21:35:30,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=305080.0, ans=0.0 2024-09-17 21:35:43,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-09-17 21:35:47,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=305120.0, ans=0.125 2024-09-17 21:35:53,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=305120.0, ans=0.2 2024-09-17 21:35:58,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=305160.0, ans=0.0 2024-09-17 21:36:13,796 INFO [train.py:1198] (0/2) Epoch 17, batch 3900, loss[loss=0.2622, ctc_loss=0.1457, cr_loss=0.4016, attn_decoder_loss=0.2663, over 29642.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1428, cr_loss=0.3854, attn_decoder_loss=0.2534, over 5816899.45 frames. ], batch size: 86, lr: 6.51e-03, grad_scale: 8.0 2024-09-17 21:36:15,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=305200.0, ans=0.025 2024-09-17 21:36:20,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-17 21:36:22,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=305200.0, ans=0.0 2024-09-17 21:36:39,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=305240.0, ans=0.125 2024-09-17 21:36:41,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.97 vs. limit=10.0 2024-09-17 21:37:01,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.58 vs. limit=10.0 2024-09-17 21:37:05,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.15 vs. limit=22.5 2024-09-17 21:37:16,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=305360.0, ans=0.0 2024-09-17 21:37:26,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=305400.0, ans=0.125 2024-09-17 21:37:28,046 INFO [train.py:1198] (0/2) Epoch 17, batch 3950, loss[loss=0.2632, ctc_loss=0.156, cr_loss=0.4379, attn_decoder_loss=0.2654, over 29466.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1423, cr_loss=0.3848, attn_decoder_loss=0.2531, over 5836385.71 frames. ], batch size: 97, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:37:34,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=305400.0, ans=0.125 2024-09-17 21:37:47,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.762e+01 9.164e+01 9.964e+01 1.868e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-17 21:37:59,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=305480.0, ans=0.125 2024-09-17 21:38:15,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=305520.0, ans=0.125 2024-09-17 21:38:36,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=305560.0, ans=0.0 2024-09-17 21:38:39,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=305560.0, ans=0.0 2024-09-17 21:38:39,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=305560.0, ans=0.125 2024-09-17 21:38:44,017 INFO [train.py:1198] (0/2) Epoch 17, batch 4000, loss[loss=0.2189, ctc_loss=0.1149, cr_loss=0.3269, attn_decoder_loss=0.2232, over 29497.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1425, cr_loss=0.384, attn_decoder_loss=0.2531, over 5814099.91 frames. ], batch size: 74, lr: 6.50e-03, grad_scale: 16.0 2024-09-17 21:38:58,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=305640.0, ans=0.0 2024-09-17 21:39:02,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.90 vs. limit=15.0 2024-09-17 21:39:03,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=305640.0, ans=0.025 2024-09-17 21:39:29,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=305720.0, ans=10.0 2024-09-17 21:39:57,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.04 vs. limit=10.0 2024-09-17 21:39:59,430 INFO [train.py:1198] (0/2) Epoch 17, batch 4050, loss[loss=0.2829, ctc_loss=0.189, cr_loss=0.4341, attn_decoder_loss=0.2837, over 21019.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1424, cr_loss=0.384, attn_decoder_loss=0.2529, over 5798794.33 frames. ], batch size: 210, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:40:02,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2024-09-17 21:40:08,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=305800.0, ans=0.0 2024-09-17 21:40:12,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2024-09-17 21:40:19,858 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.726e+01 9.314e+01 1.066e+02 2.595e+02, threshold=1.863e+02, percent-clipped=1.0 2024-09-17 21:40:31,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=305880.0, ans=0.125 2024-09-17 21:40:53,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=305920.0, ans=0.0 2024-09-17 21:40:55,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305920.0, ans=0.1 2024-09-17 21:41:07,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=305960.0, ans=0.95 2024-09-17 21:41:12,872 INFO [train.py:1198] (0/2) Epoch 17, batch 4100, loss[loss=0.2693, ctc_loss=0.1569, cr_loss=0.4108, attn_decoder_loss=0.2727, over 29503.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1429, cr_loss=0.3843, attn_decoder_loss=0.2533, over 5795021.44 frames. ], batch size: 90, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:41:38,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=306040.0, ans=0.0 2024-09-17 21:41:58,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=306120.0, ans=0.125 2024-09-17 21:42:16,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=306160.0, ans=0.125 2024-09-17 21:42:16,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=306160.0, ans=0.0 2024-09-17 21:42:22,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=306160.0, ans=0.0 2024-09-17 21:42:26,512 INFO [train.py:1198] (0/2) Epoch 17, batch 4150, loss[loss=0.2438, ctc_loss=0.1366, cr_loss=0.3676, attn_decoder_loss=0.2476, over 29508.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1426, cr_loss=0.384, attn_decoder_loss=0.2531, over 5800570.82 frames. ], batch size: 77, lr: 6.50e-03, grad_scale: 8.0 2024-09-17 21:42:34,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=306200.0, ans=0.125 2024-09-17 21:42:48,272 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.896e+01 9.299e+01 9.873e+01 2.442e+02, threshold=1.860e+02, percent-clipped=1.0 2024-09-17 21:43:22,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=22.5 2024-09-17 21:43:27,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=306360.0, ans=0.125 2024-09-17 21:43:38,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=306360.0, ans=0.125 2024-09-17 21:43:41,501 INFO [train.py:1198] (0/2) Epoch 17, batch 4200, loss[loss=0.2684, ctc_loss=0.1669, cr_loss=0.4313, attn_decoder_loss=0.2701, over 29520.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1429, cr_loss=0.3849, attn_decoder_loss=0.2534, over 5802492.57 frames. ], batch size: 90, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:43:43,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=306400.0, ans=0.125 2024-09-17 21:43:59,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=306440.0, ans=0.1 2024-09-17 21:44:00,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=306440.0, ans=0.05 2024-09-17 21:44:05,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=306440.0, ans=0.0 2024-09-17 21:44:05,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306440.0, ans=0.1 2024-09-17 21:44:08,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=306440.0, ans=0.125 2024-09-17 21:44:56,159 INFO [train.py:1198] (0/2) Epoch 17, batch 4250, loss[loss=0.2343, ctc_loss=0.1221, cr_loss=0.3561, attn_decoder_loss=0.2389, over 29501.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1426, cr_loss=0.3848, attn_decoder_loss=0.2534, over 5808361.63 frames. ], batch size: 74, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:45:03,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=306600.0, ans=10.0 2024-09-17 21:45:04,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.65 vs. limit=15.0 2024-09-17 21:45:16,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.736e+01 9.267e+01 9.996e+01 5.774e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-17 21:45:35,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=306680.0, ans=0.125 2024-09-17 21:45:37,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=306680.0, ans=0.125 2024-09-17 21:45:40,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=306720.0, ans=0.125 2024-09-17 21:45:49,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=306720.0, ans=0.125 2024-09-17 21:45:52,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=306720.0, ans=0.07 2024-09-17 21:45:53,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=306760.0, ans=0.1 2024-09-17 21:45:57,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2024-09-17 21:46:09,569 INFO [train.py:1198] (0/2) Epoch 17, batch 4300, loss[loss=0.259, ctc_loss=0.1503, cr_loss=0.3898, attn_decoder_loss=0.2624, over 29527.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1428, cr_loss=0.3847, attn_decoder_loss=0.2538, over 5798023.06 frames. ], batch size: 87, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:46:30,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=306840.0, ans=0.125 2024-09-17 21:46:31,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=306840.0, ans=0.2 2024-09-17 21:46:33,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=306840.0, ans=0.125 2024-09-17 21:46:34,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306840.0, ans=0.1 2024-09-17 21:46:45,082 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:46:47,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.71 vs. limit=10.0 2024-09-17 21:46:59,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=306920.0, ans=0.09899494936611666 2024-09-17 21:47:25,582 INFO [train.py:1198] (0/2) Epoch 17, batch 4350, loss[loss=0.2744, ctc_loss=0.1654, cr_loss=0.4352, attn_decoder_loss=0.2768, over 29462.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1457, cr_loss=0.3904, attn_decoder_loss=0.2574, over 5799885.99 frames. ], batch size: 97, lr: 6.49e-03, grad_scale: 8.0 2024-09-17 21:47:39,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=307040.0, ans=0.125 2024-09-17 21:47:46,023 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.056e+01 9.451e+01 1.005e+02 2.709e+02, threshold=1.890e+02, percent-clipped=3.0 2024-09-17 21:47:53,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=307080.0, ans=0.125 2024-09-17 21:47:56,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=307080.0, ans=0.0 2024-09-17 21:48:19,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=307120.0, ans=0.1 2024-09-17 21:48:33,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2024-09-17 21:48:39,028 INFO [train.py:1198] (0/2) Epoch 17, batch 4400, loss[loss=0.2664, ctc_loss=0.1561, cr_loss=0.4021, attn_decoder_loss=0.2697, over 27428.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1475, cr_loss=0.3927, attn_decoder_loss=0.2594, over 5769998.93 frames. ], batch size: 124, lr: 6.49e-03, grad_scale: 16.0 2024-09-17 21:48:53,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307240.0, ans=0.125 2024-09-17 21:49:01,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=307240.0, ans=0.125 2024-09-17 21:49:01,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=307240.0, ans=0.125 2024-09-17 21:49:09,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=307280.0, ans=0.04949747468305833 2024-09-17 21:49:36,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.01 vs. limit=15.0 2024-09-17 21:49:43,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=307360.0, ans=0.125 2024-09-17 21:49:46,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.11 vs. limit=6.0 2024-09-17 21:49:48,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=307360.0, ans=0.07 2024-09-17 21:49:54,142 INFO [train.py:1198] (0/2) Epoch 17, batch 4450, loss[loss=0.2783, ctc_loss=0.1924, cr_loss=0.432, attn_decoder_loss=0.2783, over 20154.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1525, cr_loss=0.3983, attn_decoder_loss=0.2621, over 5574068.38 frames. ], batch size: 209, lr: 6.48e-03, grad_scale: 8.0 2024-09-17 21:50:02,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=307400.0, ans=0.0 2024-09-17 21:50:16,944 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.924e+01 9.297e+01 9.769e+01 1.205e+02 1.699e+02, threshold=1.954e+02, percent-clipped=0.0 2024-09-17 21:50:21,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=307440.0, ans=0.2 2024-09-17 21:50:31,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=307480.0, ans=0.125 2024-09-17 21:50:53,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=307560.0, ans=0.125 2024-09-17 21:51:05,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=307560.0, ans=0.0 2024-09-17 21:51:10,082 INFO [train.py:1198] (0/2) Epoch 17, batch 4500, loss[loss=0.2735, ctc_loss=0.1848, cr_loss=0.4315, attn_decoder_loss=0.2738, over 20248.00 frames. ], tot_loss[loss=0.262, ctc_loss=0.1578, cr_loss=0.4005, attn_decoder_loss=0.2647, over 5237255.77 frames. ], batch size: 210, lr: 6.48e-03, grad_scale: 8.0 2024-09-17 21:51:26,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=307640.0, ans=0.0 2024-09-17 21:51:38,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=10.85 vs. limit=12.0 2024-09-17 21:51:44,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.52 vs. limit=6.0 2024-09-17 21:51:47,560 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-17.pt 2024-09-17 21:52:34,424 INFO [train.py:1198] (0/2) Epoch 18, batch 0, loss[loss=0.2364, ctc_loss=0.1369, cr_loss=0.3919, attn_decoder_loss=0.2388, over 29594.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1369, cr_loss=0.3919, attn_decoder_loss=0.2388, over 29594.00 frames. ], batch size: 73, lr: 6.29e-03, grad_scale: 16.0 2024-09-17 21:52:34,424 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 21:52:52,855 INFO [train.py:1230] (0/2) Epoch 18, validation: loss=0.2122, ctc_loss=0.03991, cr_loss=4.926e-15, attn_decoder_loss=0.2314, over 944034.00 frames. 2024-09-17 21:52:52,855 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 21:53:02,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=307700.0, ans=0.125 2024-09-17 21:53:11,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=307740.0, ans=0.0 2024-09-17 21:53:37,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=307780.0, ans=0.125 2024-09-17 21:53:56,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.012e+01 9.686e+01 1.126e+02 1.212e+02 3.801e+02, threshold=2.253e+02, percent-clipped=2.0 2024-09-17 21:54:00,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=307860.0, ans=0.2 2024-09-17 21:54:10,449 INFO [train.py:1198] (0/2) Epoch 18, batch 50, loss[loss=0.2243, ctc_loss=0.1174, cr_loss=0.325, attn_decoder_loss=0.229, over 29422.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1459, cr_loss=0.3888, attn_decoder_loss=0.2546, over 1268301.28 frames. ], batch size: 70, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:54:12,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=307900.0, ans=0.125 2024-09-17 21:54:15,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307900.0, ans=0.1 2024-09-17 21:54:25,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=307940.0, ans=0.0 2024-09-17 21:54:57,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-17 21:55:22,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=308060.0, ans=0.0 2024-09-17 21:55:26,560 INFO [train.py:1198] (0/2) Epoch 18, batch 100, loss[loss=0.2463, ctc_loss=0.1394, cr_loss=0.3702, attn_decoder_loss=0.2499, over 29545.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1468, cr_loss=0.3921, attn_decoder_loss=0.2568, over 2253368.39 frames. ], batch size: 76, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:55:50,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=308140.0, ans=0.0 2024-09-17 21:55:52,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=308140.0, ans=0.04949747468305833 2024-09-17 21:56:27,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308260.0, ans=0.1 2024-09-17 21:56:27,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2024-09-17 21:56:30,073 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.618e+01 9.118e+01 9.635e+01 1.582e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-17 21:56:43,549 INFO [train.py:1198] (0/2) Epoch 18, batch 150, loss[loss=0.2306, ctc_loss=0.1251, cr_loss=0.3355, attn_decoder_loss=0.2349, over 29423.00 frames. ], tot_loss[loss=0.2513, ctc_loss=0.1442, cr_loss=0.3876, attn_decoder_loss=0.2546, over 3048627.94 frames. ], batch size: 70, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:56:48,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=308300.0, ans=0.07 2024-09-17 21:56:53,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.01 vs. limit=22.5 2024-09-17 21:57:22,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=308380.0, ans=0.0 2024-09-17 21:57:40,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308420.0, ans=0.1 2024-09-17 21:57:43,591 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:57:49,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=308460.0, ans=0.0 2024-09-17 21:57:51,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.84 vs. limit=15.0 2024-09-17 21:57:58,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=308460.0, ans=0.0 2024-09-17 21:58:01,101 INFO [train.py:1198] (0/2) Epoch 18, batch 200, loss[loss=0.261, ctc_loss=0.1552, cr_loss=0.3994, attn_decoder_loss=0.2639, over 27295.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1438, cr_loss=0.3864, attn_decoder_loss=0.254, over 3660314.48 frames. ], batch size: 125, lr: 6.29e-03, grad_scale: 8.0 2024-09-17 21:58:15,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=308540.0, ans=0.1 2024-09-17 21:58:17,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-17 21:58:30,097 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:58:48,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308620.0, ans=0.125 2024-09-17 21:58:50,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=12.0 2024-09-17 21:58:52,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=308620.0, ans=0.125 2024-09-17 21:58:53,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2024-09-17 21:59:03,039 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.718e+01 9.535e+01 1.012e+02 1.370e+02, threshold=1.907e+02, percent-clipped=0.0 2024-09-17 21:59:16,551 INFO [train.py:1198] (0/2) Epoch 18, batch 250, loss[loss=0.2799, ctc_loss=0.1811, cr_loss=0.4628, attn_decoder_loss=0.2806, over 29251.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1429, cr_loss=0.3858, attn_decoder_loss=0.2535, over 4142435.30 frames. ], batch size: 100, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 21:59:29,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308700.0, ans=0.1 2024-09-17 21:59:29,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=308700.0, ans=0.0 2024-09-17 21:59:35,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308740.0, ans=0.125 2024-09-17 21:59:36,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=308740.0, ans=0.2 2024-09-17 21:59:50,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-17 22:00:10,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.36 vs. limit=22.5 2024-09-17 22:00:24,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2024-09-17 22:00:35,429 INFO [train.py:1198] (0/2) Epoch 18, batch 300, loss[loss=0.2671, ctc_loss=0.1503, cr_loss=0.3955, attn_decoder_loss=0.2713, over 29553.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.142, cr_loss=0.3844, attn_decoder_loss=0.253, over 4510235.92 frames. ], batch size: 92, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 22:00:40,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=308900.0, ans=0.0 2024-09-17 22:00:59,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-09-17 22:01:02,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=308940.0, ans=0.0 2024-09-17 22:01:14,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=308980.0, ans=0.0 2024-09-17 22:01:17,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=308980.0, ans=0.0 2024-09-17 22:01:39,753 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.552e+01 9.008e+01 9.448e+01 1.517e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-17 22:01:41,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=309060.0, ans=0.1 2024-09-17 22:01:43,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309060.0, ans=0.1 2024-09-17 22:01:53,396 INFO [train.py:1198] (0/2) Epoch 18, batch 350, loss[loss=0.2285, ctc_loss=0.1274, cr_loss=0.3454, attn_decoder_loss=0.232, over 29317.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1422, cr_loss=0.3846, attn_decoder_loss=0.2535, over 4794597.09 frames. ], batch size: 71, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 22:02:05,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=309100.0, ans=0.1 2024-09-17 22:02:55,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=309260.0, ans=0.125 2024-09-17 22:03:08,711 INFO [train.py:1198] (0/2) Epoch 18, batch 400, loss[loss=0.2428, ctc_loss=0.1333, cr_loss=0.3755, attn_decoder_loss=0.2466, over 29662.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1418, cr_loss=0.3844, attn_decoder_loss=0.2532, over 5024247.19 frames. ], batch size: 82, lr: 6.28e-03, grad_scale: 16.0 2024-09-17 22:03:12,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=309300.0, ans=0.025 2024-09-17 22:03:22,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309340.0, ans=0.1 2024-09-17 22:03:31,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-09-17 22:03:32,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.93 vs. limit=10.0 2024-09-17 22:03:51,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2024-09-17 22:03:58,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=309420.0, ans=0.025 2024-09-17 22:04:15,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.825e+01 9.596e+01 1.056e+02 3.642e+02, threshold=1.919e+02, percent-clipped=2.0 2024-09-17 22:04:15,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309460.0, ans=0.1 2024-09-17 22:04:27,498 INFO [train.py:1198] (0/2) Epoch 18, batch 450, loss[loss=0.2574, ctc_loss=0.1423, cr_loss=0.3824, attn_decoder_loss=0.2617, over 29694.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1419, cr_loss=0.3838, attn_decoder_loss=0.2533, over 5187876.96 frames. ], batch size: 83, lr: 6.28e-03, grad_scale: 8.0 2024-09-17 22:04:32,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=309500.0, ans=0.2 2024-09-17 22:04:34,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-17 22:04:45,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-17 22:04:49,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=309540.0, ans=0.0 2024-09-17 22:05:03,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=309580.0, ans=0.125 2024-09-17 22:05:16,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=309620.0, ans=0.0 2024-09-17 22:05:17,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309620.0, ans=0.1 2024-09-17 22:05:19,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=309620.0, ans=0.125 2024-09-17 22:05:46,478 INFO [train.py:1198] (0/2) Epoch 18, batch 500, loss[loss=0.2663, ctc_loss=0.1642, cr_loss=0.4206, attn_decoder_loss=0.2683, over 29440.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1415, cr_loss=0.3834, attn_decoder_loss=0.2526, over 5330261.76 frames. ], batch size: 94, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:05:50,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-09-17 22:05:52,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=309700.0, ans=0.125 2024-09-17 22:06:05,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=309740.0, ans=0.05 2024-09-17 22:06:12,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309740.0, ans=0.1 2024-09-17 22:06:27,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=309780.0, ans=0.125 2024-09-17 22:06:36,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=309820.0, ans=0.2 2024-09-17 22:06:45,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-17 22:06:50,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.497e+01 9.185e+01 1.006e+02 4.777e+02, threshold=1.837e+02, percent-clipped=3.0 2024-09-17 22:06:51,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=309860.0, ans=0.2 2024-09-17 22:07:02,206 INFO [train.py:1198] (0/2) Epoch 18, batch 550, loss[loss=0.2553, ctc_loss=0.1357, cr_loss=0.3699, attn_decoder_loss=0.2604, over 28778.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1417, cr_loss=0.3838, attn_decoder_loss=0.2527, over 5422721.09 frames. ], batch size: 104, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:07:04,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2024-09-17 22:07:16,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=309940.0, ans=0.05 2024-09-17 22:07:59,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=310020.0, ans=0.05 2024-09-17 22:08:20,549 INFO [train.py:1198] (0/2) Epoch 18, batch 600, loss[loss=0.2806, ctc_loss=0.1695, cr_loss=0.4491, attn_decoder_loss=0.2829, over 29292.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1423, cr_loss=0.3852, attn_decoder_loss=0.253, over 5509475.60 frames. ], batch size: 100, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:08:20,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=310100.0, ans=0.125 2024-09-17 22:08:25,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=310100.0, ans=0.1 2024-09-17 22:08:33,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-17 22:08:37,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=310140.0, ans=0.0 2024-09-17 22:08:48,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=310140.0, ans=0.2 2024-09-17 22:08:51,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310180.0, ans=0.1 2024-09-17 22:09:12,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=310220.0, ans=0.07 2024-09-17 22:09:21,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=310260.0, ans=0.125 2024-09-17 22:09:25,966 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.500e+01 9.114e+01 9.640e+01 1.427e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-17 22:09:26,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=310260.0, ans=0.2 2024-09-17 22:09:32,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=310260.0, ans=0.0 2024-09-17 22:09:38,126 INFO [train.py:1198] (0/2) Epoch 18, batch 650, loss[loss=0.2423, ctc_loss=0.1367, cr_loss=0.3953, attn_decoder_loss=0.2452, over 29779.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.141, cr_loss=0.3832, attn_decoder_loss=0.2519, over 5585863.90 frames. ], batch size: 81, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:09:42,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-09-17 22:09:43,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.70 vs. limit=15.0 2024-09-17 22:10:27,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=310420.0, ans=0.0 2024-09-17 22:10:45,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=310460.0, ans=0.125 2024-09-17 22:10:48,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=310460.0, ans=0.0 2024-09-17 22:10:49,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=310460.0, ans=0.2 2024-09-17 22:10:54,174 INFO [train.py:1198] (0/2) Epoch 18, batch 700, loss[loss=0.2405, ctc_loss=0.1357, cr_loss=0.3839, attn_decoder_loss=0.2436, over 29541.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1418, cr_loss=0.3845, attn_decoder_loss=0.2527, over 5636170.48 frames. ], batch size: 76, lr: 6.27e-03, grad_scale: 8.0 2024-09-17 22:10:57,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=310500.0, ans=0.0 2024-09-17 22:11:12,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=310540.0, ans=0.1 2024-09-17 22:11:16,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=310540.0, ans=0.0 2024-09-17 22:11:16,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=310540.0, ans=0.125 2024-09-17 22:11:51,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=310620.0, ans=0.0 2024-09-17 22:11:54,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=310660.0, ans=0.0 2024-09-17 22:11:57,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.847e+01 9.240e+01 9.883e+01 4.255e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-17 22:11:58,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.35 vs. limit=22.5 2024-09-17 22:12:11,912 INFO [train.py:1198] (0/2) Epoch 18, batch 750, loss[loss=0.2523, ctc_loss=0.1353, cr_loss=0.3721, attn_decoder_loss=0.257, over 29712.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1417, cr_loss=0.3844, attn_decoder_loss=0.2526, over 5675070.55 frames. ], batch size: 82, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:12:18,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=310700.0, ans=0.0 2024-09-17 22:12:18,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=310700.0, ans=0.125 2024-09-17 22:12:31,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=310740.0, ans=0.0 2024-09-17 22:12:46,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-09-17 22:12:54,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.95 vs. limit=22.5 2024-09-17 22:12:55,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=310780.0, ans=0.2 2024-09-17 22:13:10,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.25 vs. limit=15.0 2024-09-17 22:13:23,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=310860.0, ans=0.0 2024-09-17 22:13:25,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=310860.0, ans=0.125 2024-09-17 22:13:29,310 INFO [train.py:1198] (0/2) Epoch 18, batch 800, loss[loss=0.2198, ctc_loss=0.111, cr_loss=0.3272, attn_decoder_loss=0.2246, over 29606.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1411, cr_loss=0.3834, attn_decoder_loss=0.2521, over 5705996.53 frames. ], batch size: 73, lr: 6.26e-03, grad_scale: 16.0 2024-09-17 22:13:37,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=310900.0, ans=0.0 2024-09-17 22:14:10,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=310980.0, ans=0.0 2024-09-17 22:14:12,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=310980.0, ans=0.2 2024-09-17 22:14:28,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=311060.0, ans=0.125 2024-09-17 22:14:34,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.772e+01 9.230e+01 9.952e+01 3.129e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-17 22:14:43,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=311100.0, ans=0.125 2024-09-17 22:14:44,935 INFO [train.py:1198] (0/2) Epoch 18, batch 850, loss[loss=0.2584, ctc_loss=0.1429, cr_loss=0.3894, attn_decoder_loss=0.2626, over 29712.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1407, cr_loss=0.3826, attn_decoder_loss=0.2519, over 5734606.06 frames. ], batch size: 89, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:15:06,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=311140.0, ans=0.2 2024-09-17 22:15:19,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=311180.0, ans=0.1 2024-09-17 22:15:29,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=311220.0, ans=0.07 2024-09-17 22:15:41,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=311220.0, ans=0.0 2024-09-17 22:16:03,556 INFO [train.py:1198] (0/2) Epoch 18, batch 900, loss[loss=0.2217, ctc_loss=0.1174, cr_loss=0.3299, attn_decoder_loss=0.226, over 29607.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1414, cr_loss=0.3835, attn_decoder_loss=0.2523, over 5739519.84 frames. ], batch size: 73, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:16:24,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=311340.0, ans=0.2 2024-09-17 22:16:27,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=311340.0, ans=0.125 2024-09-17 22:16:29,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=311340.0, ans=0.125 2024-09-17 22:16:39,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=311380.0, ans=0.0 2024-09-17 22:17:10,604 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.926e+01 9.503e+01 1.103e+02 2.746e+02, threshold=1.901e+02, percent-clipped=1.0 2024-09-17 22:17:21,151 INFO [train.py:1198] (0/2) Epoch 18, batch 950, loss[loss=0.2333, ctc_loss=0.1344, cr_loss=0.3847, attn_decoder_loss=0.2358, over 29520.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1416, cr_loss=0.3838, attn_decoder_loss=0.2526, over 5742880.60 frames. ], batch size: 74, lr: 6.26e-03, grad_scale: 8.0 2024-09-17 22:17:29,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-09-17 22:17:35,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=12.0 2024-09-17 22:18:29,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=311660.0, ans=0.0 2024-09-17 22:18:36,543 INFO [train.py:1198] (0/2) Epoch 18, batch 1000, loss[loss=0.2267, ctc_loss=0.1224, cr_loss=0.355, attn_decoder_loss=0.2304, over 29489.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1423, cr_loss=0.3849, attn_decoder_loss=0.2533, over 5737153.99 frames. ], batch size: 77, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:18:37,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2024-09-17 22:18:38,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311700.0, ans=0.1 2024-09-17 22:18:49,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-09-17 22:19:14,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=311780.0, ans=0.125 2024-09-17 22:19:19,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=311780.0, ans=0.0 2024-09-17 22:19:28,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=311820.0, ans=0.0 2024-09-17 22:19:41,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.748e+01 9.283e+01 1.021e+02 2.281e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-17 22:19:44,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=311860.0, ans=0.2 2024-09-17 22:19:51,944 INFO [train.py:1198] (0/2) Epoch 18, batch 1050, loss[loss=0.249, ctc_loss=0.1354, cr_loss=0.3841, attn_decoder_loss=0.2531, over 29662.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1416, cr_loss=0.3831, attn_decoder_loss=0.2526, over 5744257.62 frames. ], batch size: 85, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:20:28,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=311980.0, ans=10.0 2024-09-17 22:20:28,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=311980.0, ans=0.0 2024-09-17 22:20:28,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=311980.0, ans=0.125 2024-09-17 22:20:35,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.18 vs. limit=15.0 2024-09-17 22:20:36,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=311980.0, ans=0.125 2024-09-17 22:20:43,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=312020.0, ans=0.125 2024-09-17 22:20:48,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.93 vs. limit=10.0 2024-09-17 22:21:04,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=312060.0, ans=0.0 2024-09-17 22:21:10,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312060.0, ans=0.125 2024-09-17 22:21:13,191 INFO [train.py:1198] (0/2) Epoch 18, batch 1100, loss[loss=0.2474, ctc_loss=0.1429, cr_loss=0.3897, attn_decoder_loss=0.2504, over 29458.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1416, cr_loss=0.3834, attn_decoder_loss=0.2524, over 5754970.74 frames. ], batch size: 78, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:21:33,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=312140.0, ans=0.0 2024-09-17 22:21:43,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=312180.0, ans=0.125 2024-09-17 22:22:18,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.510e+01 9.386e+01 9.841e+01 2.672e+02, threshold=1.877e+02, percent-clipped=2.0 2024-09-17 22:22:23,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312260.0, ans=0.125 2024-09-17 22:22:28,966 INFO [train.py:1198] (0/2) Epoch 18, batch 1150, loss[loss=0.2439, ctc_loss=0.1359, cr_loss=0.3803, attn_decoder_loss=0.2474, over 29482.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1415, cr_loss=0.3839, attn_decoder_loss=0.2524, over 5753171.49 frames. ], batch size: 78, lr: 6.25e-03, grad_scale: 8.0 2024-09-17 22:22:38,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=312300.0, ans=0.125 2024-09-17 22:22:59,911 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:23:05,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=312380.0, ans=0.0 2024-09-17 22:23:17,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=312420.0, ans=0.2 2024-09-17 22:23:33,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=312460.0, ans=0.125 2024-09-17 22:23:44,704 INFO [train.py:1198] (0/2) Epoch 18, batch 1200, loss[loss=0.2548, ctc_loss=0.1461, cr_loss=0.4095, attn_decoder_loss=0.2577, over 29671.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1422, cr_loss=0.3843, attn_decoder_loss=0.2533, over 5745648.90 frames. ], batch size: 85, lr: 6.25e-03, grad_scale: 16.0 2024-09-17 22:23:50,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=312500.0, ans=0.035 2024-09-17 22:23:57,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=312500.0, ans=0.09899494936611666 2024-09-17 22:24:04,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-09-17 22:24:05,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=312540.0, ans=0.1 2024-09-17 22:24:35,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.14 vs. limit=22.5 2024-09-17 22:24:41,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.16 vs. limit=22.5 2024-09-17 22:24:45,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=312620.0, ans=0.125 2024-09-17 22:24:48,704 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:24:54,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=312660.0, ans=0.125 2024-09-17 22:24:55,875 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 9.002e+01 9.543e+01 1.051e+02 1.930e+02, threshold=1.909e+02, percent-clipped=1.0 2024-09-17 22:25:04,860 INFO [train.py:1198] (0/2) Epoch 18, batch 1250, loss[loss=0.2694, ctc_loss=0.1583, cr_loss=0.4159, attn_decoder_loss=0.2725, over 29552.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1425, cr_loss=0.3853, attn_decoder_loss=0.2536, over 5773448.35 frames. ], batch size: 92, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:25:06,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=312700.0, ans=0.07 2024-09-17 22:25:07,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2024-09-17 22:25:07,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-09-17 22:25:08,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=312700.0, ans=0.0 2024-09-17 22:25:29,703 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:26:06,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=312860.0, ans=0.2 2024-09-17 22:26:11,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=312860.0, ans=0.95 2024-09-17 22:26:20,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-17 22:26:21,429 INFO [train.py:1198] (0/2) Epoch 18, batch 1300, loss[loss=0.2677, ctc_loss=0.1521, cr_loss=0.4118, attn_decoder_loss=0.2714, over 28286.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1418, cr_loss=0.3839, attn_decoder_loss=0.2529, over 5778803.67 frames. ], batch size: 111, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:26:21,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=312900.0, ans=0.125 2024-09-17 22:26:26,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=312900.0, ans=0.0 2024-09-17 22:26:27,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-17 22:26:51,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-17 22:26:56,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-17 22:26:58,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=312980.0, ans=0.0 2024-09-17 22:27:05,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=313020.0, ans=0.04949747468305833 2024-09-17 22:27:05,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=313020.0, ans=0.125 2024-09-17 22:27:17,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2024-09-17 22:27:28,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.616e+01 9.113e+01 9.632e+01 1.228e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-17 22:27:28,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=313060.0, ans=0.04949747468305833 2024-09-17 22:27:30,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=313060.0, ans=0.05 2024-09-17 22:27:32,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313060.0, ans=0.125 2024-09-17 22:27:36,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=313100.0, ans=0.07 2024-09-17 22:27:37,658 INFO [train.py:1198] (0/2) Epoch 18, batch 1350, loss[loss=0.254, ctc_loss=0.1503, cr_loss=0.4249, attn_decoder_loss=0.2561, over 29767.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.141, cr_loss=0.3829, attn_decoder_loss=0.2523, over 5795203.96 frames. ], batch size: 81, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:27:42,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313100.0, ans=0.1 2024-09-17 22:27:51,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=313140.0, ans=0.125 2024-09-17 22:27:53,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.34 vs. limit=15.0 2024-09-17 22:28:17,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-09-17 22:28:30,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=313220.0, ans=0.2 2024-09-17 22:28:44,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=313260.0, ans=0.0 2024-09-17 22:28:57,963 INFO [train.py:1198] (0/2) Epoch 18, batch 1400, loss[loss=0.2209, ctc_loss=0.1236, cr_loss=0.3415, attn_decoder_loss=0.2242, over 29561.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1408, cr_loss=0.3823, attn_decoder_loss=0.2522, over 5807002.89 frames. ], batch size: 69, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:29:33,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-09-17 22:29:50,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-09-17 22:30:00,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=313460.0, ans=0.0 2024-09-17 22:30:04,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.593e+01 9.088e+01 9.649e+01 1.870e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-17 22:30:05,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=313460.0, ans=0.2 2024-09-17 22:30:11,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=313460.0, ans=0.0 2024-09-17 22:30:13,983 INFO [train.py:1198] (0/2) Epoch 18, batch 1450, loss[loss=0.2697, ctc_loss=0.1536, cr_loss=0.4106, attn_decoder_loss=0.2735, over 29481.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1417, cr_loss=0.384, attn_decoder_loss=0.2531, over 5803146.41 frames. ], batch size: 94, lr: 6.24e-03, grad_scale: 8.0 2024-09-17 22:30:34,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=313540.0, ans=0.0 2024-09-17 22:30:41,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2024-09-17 22:30:41,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=313540.0, ans=0.125 2024-09-17 22:31:03,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=313620.0, ans=0.125 2024-09-17 22:31:28,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2024-09-17 22:31:30,413 INFO [train.py:1198] (0/2) Epoch 18, batch 1500, loss[loss=0.2545, ctc_loss=0.1449, cr_loss=0.379, attn_decoder_loss=0.2583, over 29642.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1423, cr_loss=0.3852, attn_decoder_loss=0.2535, over 5804318.85 frames. ], batch size: 86, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:31:31,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.05 vs. limit=6.0 2024-09-17 22:31:58,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=313740.0, ans=0.125 2024-09-17 22:32:13,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=313780.0, ans=0.2 2024-09-17 22:32:24,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=15.0 2024-09-17 22:32:33,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=313820.0, ans=0.035 2024-09-17 22:32:37,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=313860.0, ans=0.2 2024-09-17 22:32:42,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.582e+01 9.289e+01 9.829e+01 2.134e+02, threshold=1.858e+02, percent-clipped=2.0 2024-09-17 22:32:51,081 INFO [train.py:1198] (0/2) Epoch 18, batch 1550, loss[loss=0.2761, ctc_loss=0.1626, cr_loss=0.4474, attn_decoder_loss=0.2787, over 29477.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1422, cr_loss=0.3842, attn_decoder_loss=0.2534, over 5781931.49 frames. ], batch size: 90, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:33:07,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=313940.0, ans=0.0 2024-09-17 22:33:18,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-09-17 22:33:24,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=313980.0, ans=0.0 2024-09-17 22:33:36,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=314020.0, ans=0.125 2024-09-17 22:33:38,213 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:33:41,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.60 vs. limit=15.0 2024-09-17 22:33:51,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314060.0, ans=0.1 2024-09-17 22:33:58,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=314060.0, ans=0.0 2024-09-17 22:33:59,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=314060.0, ans=0.0 2024-09-17 22:34:06,771 INFO [train.py:1198] (0/2) Epoch 18, batch 1600, loss[loss=0.2439, ctc_loss=0.1302, cr_loss=0.3665, attn_decoder_loss=0.2484, over 29665.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1423, cr_loss=0.384, attn_decoder_loss=0.2532, over 5765828.51 frames. ], batch size: 85, lr: 6.23e-03, grad_scale: 16.0 2024-09-17 22:34:15,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=314100.0, ans=0.125 2024-09-17 22:34:34,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=314140.0, ans=0.0 2024-09-17 22:34:38,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2024-09-17 22:34:43,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=314180.0, ans=0.05 2024-09-17 22:34:54,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=314220.0, ans=0.025 2024-09-17 22:35:00,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=314220.0, ans=0.1 2024-09-17 22:35:03,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=314220.0, ans=10.0 2024-09-17 22:35:14,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.911e+01 9.926e+01 1.138e+02 4.601e+02, threshold=1.985e+02, percent-clipped=5.0 2024-09-17 22:35:22,532 INFO [train.py:1198] (0/2) Epoch 18, batch 1650, loss[loss=0.2573, ctc_loss=0.1458, cr_loss=0.3958, attn_decoder_loss=0.2609, over 29699.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1425, cr_loss=0.3848, attn_decoder_loss=0.2532, over 5761616.25 frames. ], batch size: 89, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:35:47,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=314340.0, ans=0.125 2024-09-17 22:36:14,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=314420.0, ans=0.0 2024-09-17 22:36:24,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=314460.0, ans=0.125 2024-09-17 22:36:29,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=314460.0, ans=0.125 2024-09-17 22:36:29,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=314460.0, ans=0.0 2024-09-17 22:36:30,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=314460.0, ans=0.0 2024-09-17 22:36:40,864 INFO [train.py:1198] (0/2) Epoch 18, batch 1700, loss[loss=0.2249, ctc_loss=0.1213, cr_loss=0.3342, attn_decoder_loss=0.229, over 29608.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1418, cr_loss=0.3835, attn_decoder_loss=0.253, over 5781514.64 frames. ], batch size: 69, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:36:45,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-17 22:36:54,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-09-17 22:36:54,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-17 22:36:56,515 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:37:04,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314540.0, ans=0.1 2024-09-17 22:37:04,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=314540.0, ans=0.07 2024-09-17 22:37:14,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=314580.0, ans=0.125 2024-09-17 22:37:20,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=314580.0, ans=0.0 2024-09-17 22:37:29,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=314620.0, ans=0.025 2024-09-17 22:37:35,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=314620.0, ans=0.0 2024-09-17 22:37:35,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=314620.0, ans=0.04949747468305833 2024-09-17 22:37:49,120 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.791e+01 8.699e+01 9.289e+01 9.898e+01 1.574e+02, threshold=1.858e+02, percent-clipped=0.0 2024-09-17 22:37:56,803 INFO [train.py:1198] (0/2) Epoch 18, batch 1750, loss[loss=0.2259, ctc_loss=0.1291, cr_loss=0.3617, attn_decoder_loss=0.2286, over 29304.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1416, cr_loss=0.3836, attn_decoder_loss=0.2526, over 5790128.61 frames. ], batch size: 67, lr: 6.23e-03, grad_scale: 8.0 2024-09-17 22:38:00,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.59 vs. limit=22.5 2024-09-17 22:38:03,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=314700.0, ans=0.125 2024-09-17 22:38:23,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2024-09-17 22:38:30,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=314780.0, ans=0.125 2024-09-17 22:38:30,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=314780.0, ans=0.07 2024-09-17 22:38:35,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.39 vs. limit=15.0 2024-09-17 22:38:37,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.89 vs. limit=15.0 2024-09-17 22:38:42,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314820.0, ans=0.1 2024-09-17 22:38:47,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=314820.0, ans=0.2 2024-09-17 22:39:12,153 INFO [train.py:1198] (0/2) Epoch 18, batch 1800, loss[loss=0.2589, ctc_loss=0.1414, cr_loss=0.3807, attn_decoder_loss=0.2635, over 29696.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1416, cr_loss=0.3842, attn_decoder_loss=0.2529, over 5793199.21 frames. ], batch size: 83, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:40:18,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=315060.0, ans=10.0 2024-09-17 22:40:24,496 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.622e+01 9.178e+01 9.904e+01 1.304e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-17 22:40:32,253 INFO [train.py:1198] (0/2) Epoch 18, batch 1850, loss[loss=0.2597, ctc_loss=0.1455, cr_loss=0.3867, attn_decoder_loss=0.2638, over 29614.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1412, cr_loss=0.384, attn_decoder_loss=0.2526, over 5795971.06 frames. ], batch size: 86, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:40:35,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=315100.0, ans=0.09899494936611666 2024-09-17 22:40:40,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=6.0 2024-09-17 22:40:41,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=315100.0, ans=0.5 2024-09-17 22:40:57,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.36 vs. limit=15.0 2024-09-17 22:41:02,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=315180.0, ans=0.125 2024-09-17 22:41:13,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=315180.0, ans=0.07 2024-09-17 22:41:27,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.96 vs. limit=15.0 2024-09-17 22:41:33,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=315260.0, ans=0.125 2024-09-17 22:41:41,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=315260.0, ans=0.125 2024-09-17 22:41:47,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=315300.0, ans=0.0 2024-09-17 22:41:48,286 INFO [train.py:1198] (0/2) Epoch 18, batch 1900, loss[loss=0.2566, ctc_loss=0.1435, cr_loss=0.4012, attn_decoder_loss=0.2603, over 29733.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1419, cr_loss=0.3854, attn_decoder_loss=0.2533, over 5804126.64 frames. ], batch size: 89, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:41:54,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=315300.0, ans=0.125 2024-09-17 22:42:10,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-17 22:42:17,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=315380.0, ans=0.1 2024-09-17 22:42:22,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-17 22:42:44,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=315420.0, ans=0.0 2024-09-17 22:42:55,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=315460.0, ans=0.0 2024-09-17 22:42:56,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.839e+01 9.337e+01 9.876e+01 1.224e+02, threshold=1.867e+02, percent-clipped=0.0 2024-09-17 22:43:04,092 INFO [train.py:1198] (0/2) Epoch 18, batch 1950, loss[loss=0.242, ctc_loss=0.1359, cr_loss=0.3761, attn_decoder_loss=0.2454, over 29436.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1427, cr_loss=0.3867, attn_decoder_loss=0.2542, over 5818627.54 frames. ], batch size: 78, lr: 6.22e-03, grad_scale: 8.0 2024-09-17 22:43:13,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315500.0, ans=0.1 2024-09-17 22:43:13,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=315500.0, ans=0.125 2024-09-17 22:43:20,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=315540.0, ans=0.0 2024-09-17 22:43:56,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=315620.0, ans=0.125 2024-09-17 22:44:06,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=315660.0, ans=0.025 2024-09-17 22:44:23,109 INFO [train.py:1198] (0/2) Epoch 18, batch 2000, loss[loss=0.2141, ctc_loss=0.1164, cr_loss=0.3357, attn_decoder_loss=0.2175, over 29339.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1428, cr_loss=0.3866, attn_decoder_loss=0.2543, over 5795590.60 frames. ], batch size: 67, lr: 6.22e-03, grad_scale: 16.0 2024-09-17 22:44:23,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=315700.0, ans=0.125 2024-09-17 22:44:46,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=315740.0, ans=0.125 2024-09-17 22:44:52,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315780.0, ans=0.1 2024-09-17 22:45:00,197 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:45:30,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=315860.0, ans=0.025 2024-09-17 22:45:31,344 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.725e+01 9.284e+01 1.004e+02 4.329e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-17 22:45:38,901 INFO [train.py:1198] (0/2) Epoch 18, batch 2050, loss[loss=0.229, ctc_loss=0.1313, cr_loss=0.369, attn_decoder_loss=0.2317, over 29442.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1421, cr_loss=0.3849, attn_decoder_loss=0.2533, over 5788479.11 frames. ], batch size: 70, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:45:40,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=315900.0, ans=0.125 2024-09-17 22:45:44,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=315900.0, ans=0.0 2024-09-17 22:46:02,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=315940.0, ans=0.1 2024-09-17 22:46:32,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2024-09-17 22:46:53,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316100.0, ans=0.125 2024-09-17 22:46:54,761 INFO [train.py:1198] (0/2) Epoch 18, batch 2100, loss[loss=0.2576, ctc_loss=0.1393, cr_loss=0.4039, attn_decoder_loss=0.2618, over 29755.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1417, cr_loss=0.3842, attn_decoder_loss=0.2529, over 5800257.60 frames. ], batch size: 81, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:48:02,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316260.0, ans=0.125 2024-09-17 22:48:05,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=316260.0, ans=0.04949747468305833 2024-09-17 22:48:08,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.905e+01 8.595e+01 9.065e+01 9.620e+01 1.690e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-17 22:48:14,164 INFO [train.py:1198] (0/2) Epoch 18, batch 2150, loss[loss=0.2438, ctc_loss=0.141, cr_loss=0.38, attn_decoder_loss=0.2468, over 29455.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1409, cr_loss=0.3828, attn_decoder_loss=0.252, over 5815119.35 frames. ], batch size: 78, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:48:19,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.17 vs. limit=15.0 2024-09-17 22:48:57,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.12 vs. limit=15.0 2024-09-17 22:49:13,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=316460.0, ans=0.1 2024-09-17 22:49:25,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=316460.0, ans=0.125 2024-09-17 22:49:25,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=316460.0, ans=0.125 2024-09-17 22:49:29,890 INFO [train.py:1198] (0/2) Epoch 18, batch 2200, loss[loss=0.256, ctc_loss=0.1436, cr_loss=0.3777, attn_decoder_loss=0.26, over 29601.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1408, cr_loss=0.3827, attn_decoder_loss=0.2521, over 5811238.50 frames. ], batch size: 86, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:49:30,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=316500.0, ans=0.0 2024-09-17 22:49:37,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=316500.0, ans=0.125 2024-09-17 22:49:39,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=316500.0, ans=0.2 2024-09-17 22:49:51,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=316540.0, ans=0.125 2024-09-17 22:50:07,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2024-09-17 22:50:08,016 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:50:30,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=316660.0, ans=0.125 2024-09-17 22:50:39,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.790e+01 9.365e+01 1.003e+02 3.289e+02, threshold=1.873e+02, percent-clipped=2.0 2024-09-17 22:50:41,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=316660.0, ans=0.125 2024-09-17 22:50:45,434 INFO [train.py:1198] (0/2) Epoch 18, batch 2250, loss[loss=0.2474, ctc_loss=0.1311, cr_loss=0.3582, attn_decoder_loss=0.2523, over 29696.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1402, cr_loss=0.3818, attn_decoder_loss=0.2517, over 5810528.58 frames. ], batch size: 82, lr: 6.21e-03, grad_scale: 8.0 2024-09-17 22:50:50,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=316700.0, ans=0.125 2024-09-17 22:50:50,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316700.0, ans=0.1 2024-09-17 22:51:14,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=316780.0, ans=0.125 2024-09-17 22:51:18,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-09-17 22:51:33,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=316820.0, ans=0.125 2024-09-17 22:51:37,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-09-17 22:51:47,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=316820.0, ans=0.0 2024-09-17 22:51:55,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316860.0, ans=0.1 2024-09-17 22:52:05,592 INFO [train.py:1198] (0/2) Epoch 18, batch 2300, loss[loss=0.2228, ctc_loss=0.1163, cr_loss=0.3352, attn_decoder_loss=0.2271, over 29301.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1399, cr_loss=0.3811, attn_decoder_loss=0.251, over 5797674.93 frames. ], batch size: 71, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:52:07,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=316900.0, ans=0.1 2024-09-17 22:52:20,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316940.0, ans=0.1 2024-09-17 22:52:20,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=316940.0, ans=0.1 2024-09-17 22:52:29,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-09-17 22:52:32,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=316940.0, ans=0.0 2024-09-17 22:52:57,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=317020.0, ans=0.025 2024-09-17 22:52:57,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=317020.0, ans=0.125 2024-09-17 22:53:10,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.78 vs. limit=10.0 2024-09-17 22:53:15,288 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.549e+01 8.919e+01 9.975e+01 1.965e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-17 22:53:21,365 INFO [train.py:1198] (0/2) Epoch 18, batch 2350, loss[loss=0.2605, ctc_loss=0.1498, cr_loss=0.4024, attn_decoder_loss=0.2638, over 29687.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.14, cr_loss=0.3816, attn_decoder_loss=0.2512, over 5802612.47 frames. ], batch size: 83, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:53:30,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317100.0, ans=0.1 2024-09-17 22:53:56,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=317180.0, ans=0.125 2024-09-17 22:54:08,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=317220.0, ans=0.0 2024-09-17 22:54:11,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=317220.0, ans=0.125 2024-09-17 22:54:14,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=317220.0, ans=0.125 2024-09-17 22:54:15,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=317220.0, ans=0.2 2024-09-17 22:54:31,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=317260.0, ans=0.125 2024-09-17 22:54:35,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-09-17 22:54:37,095 INFO [train.py:1198] (0/2) Epoch 18, batch 2400, loss[loss=0.2373, ctc_loss=0.1226, cr_loss=0.3516, attn_decoder_loss=0.2422, over 29543.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1406, cr_loss=0.3826, attn_decoder_loss=0.2517, over 5805544.25 frames. ], batch size: 76, lr: 6.20e-03, grad_scale: 16.0 2024-09-17 22:55:07,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=317380.0, ans=0.125 2024-09-17 22:55:32,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=317420.0, ans=0.1 2024-09-17 22:55:35,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=317420.0, ans=0.5 2024-09-17 22:55:52,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2024-09-17 22:55:52,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.608e+01 9.226e+01 9.922e+01 2.120e+02, threshold=1.845e+02, percent-clipped=2.0 2024-09-17 22:55:57,153 INFO [train.py:1198] (0/2) Epoch 18, batch 2450, loss[loss=0.2535, ctc_loss=0.1465, cr_loss=0.394, attn_decoder_loss=0.2566, over 29698.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1416, cr_loss=0.3844, attn_decoder_loss=0.253, over 5782246.46 frames. ], batch size: 82, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:57:13,353 INFO [train.py:1198] (0/2) Epoch 18, batch 2500, loss[loss=0.2613, ctc_loss=0.1472, cr_loss=0.3931, attn_decoder_loss=0.2652, over 29649.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.142, cr_loss=0.3851, attn_decoder_loss=0.2532, over 5793396.81 frames. ], batch size: 86, lr: 6.20e-03, grad_scale: 8.0 2024-09-17 22:57:27,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-17 22:57:28,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=317740.0, ans=0.125 2024-09-17 22:57:53,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=317780.0, ans=0.0 2024-09-17 22:57:53,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=317780.0, ans=0.125 2024-09-17 22:58:09,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=317820.0, ans=0.0 2024-09-17 22:58:24,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.741e+01 9.201e+01 9.902e+01 1.726e+02, threshold=1.840e+02, percent-clipped=0.0 2024-09-17 22:58:29,152 INFO [train.py:1198] (0/2) Epoch 18, batch 2550, loss[loss=0.2211, ctc_loss=0.1183, cr_loss=0.3389, attn_decoder_loss=0.225, over 29304.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1423, cr_loss=0.3855, attn_decoder_loss=0.2534, over 5796707.71 frames. ], batch size: 67, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 22:58:32,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=317900.0, ans=0.0 2024-09-17 22:58:41,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=317900.0, ans=0.125 2024-09-17 22:58:50,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=317940.0, ans=0.0 2024-09-17 22:58:50,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-17 22:59:07,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=317980.0, ans=0.0 2024-09-17 22:59:43,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=318060.0, ans=0.0 2024-09-17 22:59:45,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=318060.0, ans=0.125 2024-09-17 22:59:46,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=318060.0, ans=0.1 2024-09-17 22:59:49,439 INFO [train.py:1198] (0/2) Epoch 18, batch 2600, loss[loss=0.2475, ctc_loss=0.1399, cr_loss=0.3811, attn_decoder_loss=0.251, over 29442.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1424, cr_loss=0.386, attn_decoder_loss=0.2537, over 5793517.16 frames. ], batch size: 78, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:00:05,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.24 vs. limit=12.0 2024-09-17 23:00:12,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=318140.0, ans=0.125 2024-09-17 23:00:32,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=318180.0, ans=0.0 2024-09-17 23:01:00,211 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.599e+01 9.133e+01 9.930e+01 1.773e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-17 23:01:04,597 INFO [train.py:1198] (0/2) Epoch 18, batch 2650, loss[loss=0.2684, ctc_loss=0.1561, cr_loss=0.4335, attn_decoder_loss=0.2712, over 29178.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1417, cr_loss=0.3852, attn_decoder_loss=0.2536, over 5801044.93 frames. ], batch size: 100, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:01:09,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=318300.0, ans=0.0 2024-09-17 23:01:09,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=318300.0, ans=0.125 2024-09-17 23:01:33,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318380.0, ans=0.125 2024-09-17 23:02:00,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318420.0, ans=0.125 2024-09-17 23:02:10,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-09-17 23:02:20,279 INFO [train.py:1198] (0/2) Epoch 18, batch 2700, loss[loss=0.2578, ctc_loss=0.1462, cr_loss=0.3924, attn_decoder_loss=0.2615, over 29527.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1421, cr_loss=0.3863, attn_decoder_loss=0.2542, over 5796833.38 frames. ], batch size: 87, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:02:24,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-09-17 23:02:41,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=318540.0, ans=0.125 2024-09-17 23:03:11,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=318620.0, ans=0.125 2024-09-17 23:03:27,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=318660.0, ans=0.1 2024-09-17 23:03:28,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=318660.0, ans=0.015 2024-09-17 23:03:36,478 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 8.599e+01 9.139e+01 9.802e+01 1.659e+02, threshold=1.828e+02, percent-clipped=0.0 2024-09-17 23:03:41,088 INFO [train.py:1198] (0/2) Epoch 18, batch 2750, loss[loss=0.2417, ctc_loss=0.1344, cr_loss=0.3741, attn_decoder_loss=0.2453, over 29505.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1411, cr_loss=0.3841, attn_decoder_loss=0.2529, over 5795395.17 frames. ], batch size: 75, lr: 6.19e-03, grad_scale: 8.0 2024-09-17 23:03:51,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318700.0, ans=0.1 2024-09-17 23:03:57,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=318740.0, ans=0.0 2024-09-17 23:04:02,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=318740.0, ans=0.1 2024-09-17 23:04:21,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=318780.0, ans=0.1 2024-09-17 23:04:29,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=318820.0, ans=0.125 2024-09-17 23:04:41,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=318860.0, ans=0.0 2024-09-17 23:04:57,081 INFO [train.py:1198] (0/2) Epoch 18, batch 2800, loss[loss=0.2715, ctc_loss=0.1814, cr_loss=0.3997, attn_decoder_loss=0.2727, over 20263.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1415, cr_loss=0.3843, attn_decoder_loss=0.253, over 5775974.89 frames. ], batch size: 210, lr: 6.18e-03, grad_scale: 16.0 2024-09-17 23:05:08,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.83 vs. limit=10.0 2024-09-17 23:05:11,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=318940.0, ans=0.2 2024-09-17 23:05:23,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=318940.0, ans=0.0 2024-09-17 23:05:31,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318980.0, ans=0.1 2024-09-17 23:05:38,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=318980.0, ans=0.125 2024-09-17 23:06:10,006 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.902e+01 8.963e+01 9.729e+01 1.060e+02 3.606e+02, threshold=1.946e+02, percent-clipped=3.0 2024-09-17 23:06:13,100 INFO [train.py:1198] (0/2) Epoch 18, batch 2850, loss[loss=0.2443, ctc_loss=0.1429, cr_loss=0.4056, attn_decoder_loss=0.2465, over 29515.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.142, cr_loss=0.3852, attn_decoder_loss=0.2534, over 5761703.10 frames. ], batch size: 77, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:06:17,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2024-09-17 23:06:20,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2024-09-17 23:06:26,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-09-17 23:06:31,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=319140.0, ans=0.125 2024-09-17 23:06:46,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=319180.0, ans=0.125 2024-09-17 23:07:12,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=12.0 2024-09-17 23:07:26,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=12.0 2024-09-17 23:07:30,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=319260.0, ans=0.0 2024-09-17 23:07:33,503 INFO [train.py:1198] (0/2) Epoch 18, batch 2900, loss[loss=0.2375, ctc_loss=0.132, cr_loss=0.3795, attn_decoder_loss=0.2408, over 29406.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1428, cr_loss=0.3871, attn_decoder_loss=0.2545, over 5787509.68 frames. ], batch size: 79, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:07:42,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=319300.0, ans=0.2 2024-09-17 23:07:46,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-09-17 23:07:56,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=319340.0, ans=0.0 2024-09-17 23:08:02,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=319380.0, ans=10.0 2024-09-17 23:08:15,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=319380.0, ans=0.125 2024-09-17 23:08:46,531 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.577e+01 9.174e+01 9.696e+01 1.530e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-17 23:08:49,582 INFO [train.py:1198] (0/2) Epoch 18, batch 2950, loss[loss=0.2482, ctc_loss=0.1463, cr_loss=0.403, attn_decoder_loss=0.2506, over 29559.00 frames. ], tot_loss[loss=0.2499, ctc_loss=0.1419, cr_loss=0.385, attn_decoder_loss=0.2534, over 5780262.89 frames. ], batch size: 75, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:09:02,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.97 vs. limit=22.5 2024-09-17 23:09:16,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2024-09-17 23:09:18,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=319580.0, ans=0.2 2024-09-17 23:09:36,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319620.0, ans=0.1 2024-09-17 23:10:05,408 INFO [train.py:1198] (0/2) Epoch 18, batch 3000, loss[loss=0.2441, ctc_loss=0.1337, cr_loss=0.3685, attn_decoder_loss=0.2481, over 29748.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1417, cr_loss=0.3838, attn_decoder_loss=0.2532, over 5781857.33 frames. ], batch size: 81, lr: 6.18e-03, grad_scale: 8.0 2024-09-17 23:10:05,408 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 23:10:24,020 INFO [train.py:1230] (0/2) Epoch 18, validation: loss=0.211, ctc_loss=0.04071, cr_loss=4.994e-15, attn_decoder_loss=0.23, over 944034.00 frames. 2024-09-17 23:10:24,020 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 23:10:40,377 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:10:47,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=319740.0, ans=0.025 2024-09-17 23:10:57,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=319780.0, ans=0.125 2024-09-17 23:11:01,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.00 vs. limit=15.0 2024-09-17 23:11:32,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=319860.0, ans=0.0 2024-09-17 23:11:34,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=319860.0, ans=0.125 2024-09-17 23:11:41,318 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 9.075e+01 9.591e+01 1.002e+02 5.340e+02, threshold=1.918e+02, percent-clipped=4.0 2024-09-17 23:11:44,494 INFO [train.py:1198] (0/2) Epoch 18, batch 3050, loss[loss=0.2467, ctc_loss=0.1398, cr_loss=0.364, attn_decoder_loss=0.2505, over 29541.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1427, cr_loss=0.386, attn_decoder_loss=0.2541, over 5776783.09 frames. ], batch size: 76, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:12:21,286 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-80000.pt 2024-09-17 23:12:33,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=319980.0, ans=0.125 2024-09-17 23:12:36,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=320020.0, ans=0.125 2024-09-17 23:12:54,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=320060.0, ans=0.09899494936611666 2024-09-17 23:12:55,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320060.0, ans=0.1 2024-09-17 23:13:07,356 INFO [train.py:1198] (0/2) Epoch 18, batch 3100, loss[loss=0.2707, ctc_loss=0.1582, cr_loss=0.4045, attn_decoder_loss=0.2742, over 29226.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1422, cr_loss=0.3848, attn_decoder_loss=0.2535, over 5777066.19 frames. ], batch size: 100, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:13:07,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=320100.0, ans=0.0 2024-09-17 23:13:30,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=320140.0, ans=0.125 2024-09-17 23:13:54,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=320220.0, ans=0.025 2024-09-17 23:13:57,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=320220.0, ans=0.125 2024-09-17 23:13:58,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=320220.0, ans=0.125 2024-09-17 23:14:12,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=320260.0, ans=0.125 2024-09-17 23:14:13,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2024-09-17 23:14:21,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 8.516e+01 8.988e+01 9.440e+01 2.574e+02, threshold=1.798e+02, percent-clipped=3.0 2024-09-17 23:14:23,080 INFO [train.py:1198] (0/2) Epoch 18, batch 3150, loss[loss=0.2573, ctc_loss=0.1412, cr_loss=0.3744, attn_decoder_loss=0.2619, over 28743.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1421, cr_loss=0.3846, attn_decoder_loss=0.2535, over 5783800.35 frames. ], batch size: 104, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:14:33,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-17 23:14:48,255 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:14:48,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=320340.0, ans=0.0 2024-09-17 23:14:54,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=320380.0, ans=0.125 2024-09-17 23:15:03,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=320380.0, ans=0.0 2024-09-17 23:15:03,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=320380.0, ans=0.025 2024-09-17 23:15:06,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.25 vs. limit=15.0 2024-09-17 23:15:15,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2024-09-17 23:15:32,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=320460.0, ans=0.2 2024-09-17 23:15:38,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=320460.0, ans=0.0 2024-09-17 23:15:43,028 INFO [train.py:1198] (0/2) Epoch 18, batch 3200, loss[loss=0.2376, ctc_loss=0.1346, cr_loss=0.3703, attn_decoder_loss=0.2408, over 29425.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1421, cr_loss=0.3851, attn_decoder_loss=0.2534, over 5793477.74 frames. ], batch size: 79, lr: 6.17e-03, grad_scale: 16.0 2024-09-17 23:15:49,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320500.0, ans=0.1 2024-09-17 23:16:18,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=320580.0, ans=0.125 2024-09-17 23:16:19,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=320580.0, ans=0.125 2024-09-17 23:16:21,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=320580.0, ans=0.0 2024-09-17 23:16:25,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320580.0, ans=0.1 2024-09-17 23:16:49,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-17 23:16:58,812 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.773e+01 9.216e+01 9.587e+01 1.476e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-17 23:16:58,833 INFO [train.py:1198] (0/2) Epoch 18, batch 3250, loss[loss=0.2607, ctc_loss=0.1453, cr_loss=0.3989, attn_decoder_loss=0.2647, over 29721.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1419, cr_loss=0.3853, attn_decoder_loss=0.2535, over 5800785.54 frames. ], batch size: 84, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:17:05,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=320700.0, ans=0.1 2024-09-17 23:17:24,920 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:17:44,904 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:17:54,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=320820.0, ans=15.0 2024-09-17 23:18:12,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=320860.0, ans=0.125 2024-09-17 23:18:12,222 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:18:14,952 INFO [train.py:1198] (0/2) Epoch 18, batch 3300, loss[loss=0.2605, ctc_loss=0.1482, cr_loss=0.4051, attn_decoder_loss=0.264, over 28325.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1408, cr_loss=0.3836, attn_decoder_loss=0.2523, over 5797946.48 frames. ], batch size: 111, lr: 6.17e-03, grad_scale: 8.0 2024-09-17 23:18:20,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=320900.0, ans=0.1 2024-09-17 23:18:32,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=320940.0, ans=0.125 2024-09-17 23:18:35,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=320940.0, ans=0.125 2024-09-17 23:18:49,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.93 vs. limit=15.0 2024-09-17 23:18:57,091 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:19:01,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2024-09-17 23:19:02,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2024-09-17 23:19:14,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=321020.0, ans=0.125 2024-09-17 23:19:29,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=321060.0, ans=0.125 2024-09-17 23:19:35,431 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.646e+01 9.242e+01 9.946e+01 1.965e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-17 23:19:35,453 INFO [train.py:1198] (0/2) Epoch 18, batch 3350, loss[loss=0.2647, ctc_loss=0.1498, cr_loss=0.4029, attn_decoder_loss=0.2685, over 28755.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1416, cr_loss=0.3845, attn_decoder_loss=0.2531, over 5774386.44 frames. ], batch size: 104, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:20:12,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=321180.0, ans=0.125 2024-09-17 23:20:21,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=321220.0, ans=0.125 2024-09-17 23:20:51,642 INFO [train.py:1198] (0/2) Epoch 18, batch 3400, loss[loss=0.2141, ctc_loss=0.1124, cr_loss=0.3266, attn_decoder_loss=0.2182, over 29318.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1413, cr_loss=0.3831, attn_decoder_loss=0.2525, over 5766115.50 frames. ], batch size: 67, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:21:03,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.95 vs. limit=22.5 2024-09-17 23:21:38,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=321420.0, ans=0.025 2024-09-17 23:22:05,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=321460.0, ans=0.0 2024-09-17 23:22:09,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.690e+01 9.213e+01 9.933e+01 2.095e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-17 23:22:09,848 INFO [train.py:1198] (0/2) Epoch 18, batch 3450, loss[loss=0.2632, ctc_loss=0.1526, cr_loss=0.3901, attn_decoder_loss=0.2668, over 28138.00 frames. ], tot_loss[loss=0.2494, ctc_loss=0.1412, cr_loss=0.3834, attn_decoder_loss=0.2529, over 5773617.76 frames. ], batch size: 111, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:22:23,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=321540.0, ans=0.0 2024-09-17 23:22:27,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-09-17 23:22:52,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2024-09-17 23:23:16,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=321660.0, ans=0.025 2024-09-17 23:23:28,091 INFO [train.py:1198] (0/2) Epoch 18, batch 3500, loss[loss=0.2173, ctc_loss=0.1216, cr_loss=0.3567, attn_decoder_loss=0.22, over 29310.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1406, cr_loss=0.3826, attn_decoder_loss=0.2522, over 5775591.99 frames. ], batch size: 71, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:23:34,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.03 vs. limit=22.5 2024-09-17 23:23:45,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=321740.0, ans=0.0 2024-09-17 23:23:49,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=321740.0, ans=0.2 2024-09-17 23:23:58,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=321780.0, ans=0.0 2024-09-17 23:23:59,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321780.0, ans=0.1 2024-09-17 23:24:01,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=321780.0, ans=0.125 2024-09-17 23:24:04,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-09-17 23:24:34,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=321860.0, ans=0.1 2024-09-17 23:24:42,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.524e+01 9.219e+01 9.966e+01 2.449e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-17 23:24:42,830 INFO [train.py:1198] (0/2) Epoch 18, batch 3550, loss[loss=0.262, ctc_loss=0.143, cr_loss=0.3928, attn_decoder_loss=0.2665, over 29704.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1406, cr_loss=0.3821, attn_decoder_loss=0.2522, over 5782377.76 frames. ], batch size: 89, lr: 6.16e-03, grad_scale: 8.0 2024-09-17 23:24:43,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=321900.0, ans=0.0 2024-09-17 23:24:44,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=321900.0, ans=0.0 2024-09-17 23:24:53,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=321900.0, ans=0.125 2024-09-17 23:24:58,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2024-09-17 23:25:20,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321980.0, ans=0.0 2024-09-17 23:25:41,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=322060.0, ans=0.125 2024-09-17 23:25:41,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=322060.0, ans=0.0 2024-09-17 23:25:52,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=322060.0, ans=0.015 2024-09-17 23:25:57,195 INFO [train.py:1198] (0/2) Epoch 18, batch 3600, loss[loss=0.2454, ctc_loss=0.1371, cr_loss=0.3873, attn_decoder_loss=0.2488, over 29494.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1404, cr_loss=0.3819, attn_decoder_loss=0.2525, over 5792002.85 frames. ], batch size: 77, lr: 6.15e-03, grad_scale: 16.0 2024-09-17 23:26:06,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=322100.0, ans=0.0 2024-09-17 23:26:13,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=322140.0, ans=0.125 2024-09-17 23:26:15,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=322140.0, ans=0.125 2024-09-17 23:26:18,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=322140.0, ans=0.2 2024-09-17 23:26:36,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322180.0, ans=0.1 2024-09-17 23:27:04,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=322260.0, ans=0.1 2024-09-17 23:27:11,828 INFO [train.py:1198] (0/2) Epoch 18, batch 3650, loss[loss=0.2789, ctc_loss=0.1764, cr_loss=0.4497, attn_decoder_loss=0.2803, over 29515.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1401, cr_loss=0.3811, attn_decoder_loss=0.2521, over 5792698.77 frames. ], batch size: 90, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:27:13,216 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.529e+01 9.051e+01 9.513e+01 1.639e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-17 23:27:34,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2024-09-17 23:27:35,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-09-17 23:27:57,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322420.0, ans=0.1 2024-09-17 23:27:59,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=322420.0, ans=0.125 2024-09-17 23:28:13,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=322460.0, ans=0.0 2024-09-17 23:28:13,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=322460.0, ans=0.0 2024-09-17 23:28:25,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-17 23:28:29,089 INFO [train.py:1198] (0/2) Epoch 18, batch 3700, loss[loss=0.2599, ctc_loss=0.1543, cr_loss=0.4198, attn_decoder_loss=0.2623, over 29705.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1404, cr_loss=0.3822, attn_decoder_loss=0.2524, over 5803408.95 frames. ], batch size: 84, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:28:44,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=322540.0, ans=0.125 2024-09-17 23:28:51,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322540.0, ans=0.1 2024-09-17 23:28:56,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=322540.0, ans=0.125 2024-09-17 23:29:45,331 INFO [train.py:1198] (0/2) Epoch 18, batch 3750, loss[loss=0.2238, ctc_loss=0.1296, cr_loss=0.3542, attn_decoder_loss=0.2264, over 29348.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1403, cr_loss=0.3819, attn_decoder_loss=0.2521, over 5807019.52 frames. ], batch size: 67, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:29:46,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.942e+01 8.687e+01 9.263e+01 1.001e+02 2.346e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-17 23:29:52,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-17 23:29:56,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-09-17 23:30:12,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=322740.0, ans=0.125 2024-09-17 23:30:32,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2024-09-17 23:30:42,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=322820.0, ans=0.125 2024-09-17 23:30:43,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=322860.0, ans=0.125 2024-09-17 23:31:00,144 INFO [train.py:1198] (0/2) Epoch 18, batch 3800, loss[loss=0.2463, ctc_loss=0.1283, cr_loss=0.3586, attn_decoder_loss=0.2515, over 29643.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1399, cr_loss=0.3806, attn_decoder_loss=0.2515, over 5799056.09 frames. ], batch size: 86, lr: 6.15e-03, grad_scale: 8.0 2024-09-17 23:31:11,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2024-09-17 23:31:13,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=322940.0, ans=0.07 2024-09-17 23:31:45,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=323020.0, ans=0.125 2024-09-17 23:31:53,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-17 23:32:14,622 INFO [train.py:1198] (0/2) Epoch 18, batch 3850, loss[loss=0.2661, ctc_loss=0.1564, cr_loss=0.4328, attn_decoder_loss=0.2687, over 29258.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1396, cr_loss=0.3808, attn_decoder_loss=0.2514, over 5811914.34 frames. ], batch size: 100, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:32:16,117 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.799e+01 9.187e+01 9.877e+01 1.493e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-17 23:32:19,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.54 vs. limit=15.0 2024-09-17 23:32:29,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=323140.0, ans=0.5 2024-09-17 23:32:35,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323140.0, ans=0.1 2024-09-17 23:32:38,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=323140.0, ans=0.025 2024-09-17 23:33:12,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=323220.0, ans=10.0 2024-09-17 23:33:15,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=323260.0, ans=0.0 2024-09-17 23:33:17,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=22.5 2024-09-17 23:33:31,053 INFO [train.py:1198] (0/2) Epoch 18, batch 3900, loss[loss=0.2609, ctc_loss=0.1387, cr_loss=0.3916, attn_decoder_loss=0.2657, over 29629.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1405, cr_loss=0.3825, attn_decoder_loss=0.2521, over 5817571.01 frames. ], batch size: 86, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:33:44,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=323340.0, ans=0.125 2024-09-17 23:33:59,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=323380.0, ans=0.2 2024-09-17 23:34:01,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=323380.0, ans=0.125 2024-09-17 23:34:08,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=323380.0, ans=0.0 2024-09-17 23:34:44,941 INFO [train.py:1198] (0/2) Epoch 18, batch 3950, loss[loss=0.2625, ctc_loss=0.1567, cr_loss=0.4102, attn_decoder_loss=0.2651, over 29465.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1403, cr_loss=0.3823, attn_decoder_loss=0.2522, over 5836895.65 frames. ], batch size: 97, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:34:46,427 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 8.710e+01 9.175e+01 9.677e+01 1.510e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-17 23:34:48,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=323500.0, ans=0.0 2024-09-17 23:34:48,252 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:34:55,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=323500.0, ans=0.0 2024-09-17 23:35:06,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.14 vs. limit=15.0 2024-09-17 23:35:13,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=323540.0, ans=0.025 2024-09-17 23:35:14,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=22.5 2024-09-17 23:35:14,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=323580.0, ans=0.0 2024-09-17 23:35:17,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=323580.0, ans=0.025 2024-09-17 23:35:19,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=323580.0, ans=0.5 2024-09-17 23:35:22,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=323580.0, ans=0.0 2024-09-17 23:35:29,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=323620.0, ans=0.0 2024-09-17 23:35:32,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.69 vs. limit=22.5 2024-09-17 23:35:47,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2024-09-17 23:36:00,072 INFO [train.py:1198] (0/2) Epoch 18, batch 4000, loss[loss=0.2301, ctc_loss=0.1305, cr_loss=0.3667, attn_decoder_loss=0.2331, over 29506.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1407, cr_loss=0.3823, attn_decoder_loss=0.2524, over 5814696.51 frames. ], batch size: 74, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:36:05,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2024-09-17 23:36:43,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=323820.0, ans=0.05 2024-09-17 23:36:48,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-09-17 23:36:50,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=323820.0, ans=0.0 2024-09-17 23:37:14,028 INFO [train.py:1198] (0/2) Epoch 18, batch 4050, loss[loss=0.2737, ctc_loss=0.1727, cr_loss=0.4204, attn_decoder_loss=0.2756, over 19843.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1405, cr_loss=0.3816, attn_decoder_loss=0.2523, over 5796895.58 frames. ], batch size: 209, lr: 6.14e-03, grad_scale: 8.0 2024-09-17 23:37:16,868 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.693e+01 8.741e+01 9.284e+01 9.840e+01 3.533e+02, threshold=1.857e+02, percent-clipped=2.0 2024-09-17 23:37:21,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=323900.0, ans=0.125 2024-09-17 23:37:24,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=323900.0, ans=0.125 2024-09-17 23:37:25,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=323900.0, ans=0.125 2024-09-17 23:37:38,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=323940.0, ans=0.125 2024-09-17 23:37:41,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=323980.0, ans=0.95 2024-09-17 23:37:49,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.39 vs. limit=15.0 2024-09-17 23:37:53,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=323980.0, ans=0.2 2024-09-17 23:37:58,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=324020.0, ans=0.0 2024-09-17 23:38:15,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=324060.0, ans=0.2 2024-09-17 23:38:17,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=324060.0, ans=0.125 2024-09-17 23:38:28,776 INFO [train.py:1198] (0/2) Epoch 18, batch 4100, loss[loss=0.2617, ctc_loss=0.151, cr_loss=0.4061, attn_decoder_loss=0.2649, over 29497.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1409, cr_loss=0.3822, attn_decoder_loss=0.2526, over 5792479.79 frames. ], batch size: 90, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:38:29,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=324100.0, ans=0.0 2024-09-17 23:38:48,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.92 vs. limit=10.0 2024-09-17 23:38:58,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=324180.0, ans=0.0 2024-09-17 23:39:01,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.05 vs. limit=10.0 2024-09-17 23:39:07,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=324180.0, ans=0.125 2024-09-17 23:39:07,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=324180.0, ans=0.025 2024-09-17 23:39:09,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-09-17 23:39:14,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=324220.0, ans=0.125 2024-09-17 23:39:20,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=324220.0, ans=0.125 2024-09-17 23:39:20,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=324220.0, ans=0.025 2024-09-17 23:39:38,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-09-17 23:39:40,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=324260.0, ans=0.0 2024-09-17 23:39:43,551 INFO [train.py:1198] (0/2) Epoch 18, batch 4150, loss[loss=0.2417, ctc_loss=0.1353, cr_loss=0.3681, attn_decoder_loss=0.2454, over 29490.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1408, cr_loss=0.3825, attn_decoder_loss=0.2523, over 5797291.30 frames. ], batch size: 77, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:39:46,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.386e+01 9.045e+01 9.725e+01 1.428e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-17 23:39:47,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-09-17 23:40:14,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324380.0, ans=0.1 2024-09-17 23:40:19,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=324380.0, ans=0.125 2024-09-17 23:40:29,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=324420.0, ans=0.0 2024-09-17 23:40:37,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-09-17 23:40:48,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=324460.0, ans=0.0 2024-09-17 23:40:57,212 INFO [train.py:1198] (0/2) Epoch 18, batch 4200, loss[loss=0.2716, ctc_loss=0.1737, cr_loss=0.4533, attn_decoder_loss=0.2723, over 29499.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.141, cr_loss=0.3833, attn_decoder_loss=0.2527, over 5798403.16 frames. ], batch size: 90, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:40:58,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=324500.0, ans=0.0 2024-09-17 23:41:13,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.90 vs. limit=10.0 2024-09-17 23:41:21,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=324540.0, ans=0.2 2024-09-17 23:41:33,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=324580.0, ans=0.0 2024-09-17 23:42:06,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=324660.0, ans=0.125 2024-09-17 23:42:06,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=324660.0, ans=0.025 2024-09-17 23:42:11,839 INFO [train.py:1198] (0/2) Epoch 18, batch 4250, loss[loss=0.2389, ctc_loss=0.1351, cr_loss=0.3871, attn_decoder_loss=0.2418, over 29504.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.141, cr_loss=0.3832, attn_decoder_loss=0.2529, over 5805148.88 frames. ], batch size: 74, lr: 6.13e-03, grad_scale: 4.0 2024-09-17 23:42:16,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.827e+01 9.431e+01 1.016e+02 4.056e+02, threshold=1.886e+02, percent-clipped=2.0 2024-09-17 23:42:16,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=324700.0, ans=0.125 2024-09-17 23:42:59,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=324820.0, ans=0.0 2024-09-17 23:43:01,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=324820.0, ans=0.125 2024-09-17 23:43:08,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=324820.0, ans=0.5 2024-09-17 23:43:12,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=324860.0, ans=0.125 2024-09-17 23:43:27,302 INFO [train.py:1198] (0/2) Epoch 18, batch 4300, loss[loss=0.2586, ctc_loss=0.1373, cr_loss=0.384, attn_decoder_loss=0.2635, over 29546.00 frames. ], tot_loss[loss=0.2495, ctc_loss=0.1409, cr_loss=0.3828, attn_decoder_loss=0.253, over 5794049.95 frames. ], batch size: 87, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:43:33,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=324900.0, ans=0.0 2024-09-17 23:43:35,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324900.0, ans=0.1 2024-09-17 23:43:39,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=324900.0, ans=0.0 2024-09-17 23:44:02,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.67 vs. limit=15.0 2024-09-17 23:44:10,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=325020.0, ans=0.0 2024-09-17 23:44:41,172 INFO [train.py:1198] (0/2) Epoch 18, batch 4350, loss[loss=0.2627, ctc_loss=0.153, cr_loss=0.4239, attn_decoder_loss=0.2655, over 29500.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1432, cr_loss=0.3875, attn_decoder_loss=0.256, over 5795891.09 frames. ], batch size: 97, lr: 6.13e-03, grad_scale: 8.0 2024-09-17 23:44:46,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.876e+01 8.831e+01 9.306e+01 9.822e+01 6.484e+02, threshold=1.861e+02, percent-clipped=2.0 2024-09-17 23:45:08,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=325140.0, ans=0.2 2024-09-17 23:45:11,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=325180.0, ans=0.0 2024-09-17 23:45:42,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325260.0, ans=0.125 2024-09-17 23:45:43,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325260.0, ans=0.1 2024-09-17 23:45:54,951 INFO [train.py:1198] (0/2) Epoch 18, batch 4400, loss[loss=0.2512, ctc_loss=0.144, cr_loss=0.3893, attn_decoder_loss=0.2545, over 27414.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1446, cr_loss=0.3897, attn_decoder_loss=0.258, over 5768819.32 frames. ], batch size: 124, lr: 6.12e-03, grad_scale: 16.0 2024-09-17 23:46:00,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.22 vs. limit=15.0 2024-09-17 23:46:59,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=325460.0, ans=0.125 2024-09-17 23:47:10,206 INFO [train.py:1198] (0/2) Epoch 18, batch 4450, loss[loss=0.2709, ctc_loss=0.1828, cr_loss=0.3999, attn_decoder_loss=0.2718, over 20133.00 frames. ], tot_loss[loss=0.2573, ctc_loss=0.1489, cr_loss=0.3948, attn_decoder_loss=0.2606, over 5582217.17 frames. ], batch size: 210, lr: 6.12e-03, grad_scale: 8.0 2024-09-17 23:47:16,217 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 9.154e+01 9.637e+01 1.052e+02 1.489e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-17 23:47:22,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=325500.0, ans=0.125 2024-09-17 23:47:25,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=325540.0, ans=0.125 2024-09-17 23:47:27,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2024-09-17 23:47:27,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=22.5 2024-09-17 23:47:34,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=325540.0, ans=0.1 2024-09-17 23:47:59,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-09-17 23:48:04,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-09-17 23:48:26,307 INFO [train.py:1198] (0/2) Epoch 18, batch 4500, loss[loss=0.2791, ctc_loss=0.1843, cr_loss=0.4493, attn_decoder_loss=0.2796, over 19723.00 frames. ], tot_loss[loss=0.2601, ctc_loss=0.1541, cr_loss=0.3976, attn_decoder_loss=0.263, over 5240094.54 frames. ], batch size: 209, lr: 6.12e-03, grad_scale: 8.0 2024-09-17 23:48:30,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.87 vs. limit=22.5 2024-09-17 23:48:31,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=325700.0, ans=0.04949747468305833 2024-09-17 23:48:37,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=325700.0, ans=0.025 2024-09-17 23:48:46,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=325740.0, ans=0.125 2024-09-17 23:48:58,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=325780.0, ans=0.125 2024-09-17 23:48:59,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=325780.0, ans=0.125 2024-09-17 23:48:59,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=325780.0, ans=0.0 2024-09-17 23:49:03,625 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-18.pt 2024-09-17 23:49:55,588 INFO [train.py:1198] (0/2) Epoch 19, batch 0, loss[loss=0.2316, ctc_loss=0.1232, cr_loss=0.3617, attn_decoder_loss=0.2356, over 29606.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1232, cr_loss=0.3617, attn_decoder_loss=0.2356, over 29606.00 frames. ], batch size: 73, lr: 5.95e-03, grad_scale: 16.0 2024-09-17 23:49:55,589 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-17 23:50:13,880 INFO [train.py:1230] (0/2) Epoch 19, validation: loss=0.2122, ctc_loss=0.03932, cr_loss=5e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-17 23:50:13,881 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-17 23:50:14,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=325800.0, ans=0.0 2024-09-17 23:50:30,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=325840.0, ans=0.125 2024-09-17 23:50:44,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=325880.0, ans=0.125 2024-09-17 23:50:48,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=325880.0, ans=0.125 2024-09-17 23:50:59,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.957e+01 1.057e+02 1.132e+02 1.239e+02 3.685e+02, threshold=2.265e+02, percent-clipped=3.0 2024-09-17 23:51:00,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.31 vs. limit=15.0 2024-09-17 23:51:01,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=325920.0, ans=0.125 2024-09-17 23:51:07,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=325920.0, ans=0.125 2024-09-17 23:51:31,615 INFO [train.py:1198] (0/2) Epoch 19, batch 50, loss[loss=0.2246, ctc_loss=0.1255, cr_loss=0.3655, attn_decoder_loss=0.2275, over 29487.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.146, cr_loss=0.3928, attn_decoder_loss=0.2547, over 1267258.49 frames. ], batch size: 70, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:51:42,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=326000.0, ans=0.125 2024-09-17 23:51:53,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=326040.0, ans=0.125 2024-09-17 23:52:05,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326080.0, ans=0.0 2024-09-17 23:52:07,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-09-17 23:52:08,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=326080.0, ans=0.0 2024-09-17 23:52:16,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=326080.0, ans=0.0 2024-09-17 23:52:28,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=326120.0, ans=0.2 2024-09-17 23:52:49,544 INFO [train.py:1198] (0/2) Epoch 19, batch 100, loss[loss=0.2375, ctc_loss=0.1282, cr_loss=0.3585, attn_decoder_loss=0.2417, over 29566.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1456, cr_loss=0.393, attn_decoder_loss=0.2561, over 2252771.49 frames. ], batch size: 76, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:53:18,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=326280.0, ans=0.125 2024-09-17 23:53:21,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=326280.0, ans=0.2 2024-09-17 23:53:29,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=326280.0, ans=15.0 2024-09-17 23:53:31,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=326280.0, ans=0.0 2024-09-17 23:53:34,402 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.614e+01 9.117e+01 9.815e+01 1.763e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-17 23:53:39,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=326320.0, ans=0.2 2024-09-17 23:53:57,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2024-09-17 23:54:04,658 INFO [train.py:1198] (0/2) Epoch 19, batch 150, loss[loss=0.2291, ctc_loss=0.1272, cr_loss=0.3658, attn_decoder_loss=0.2323, over 29467.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1429, cr_loss=0.3875, attn_decoder_loss=0.2539, over 3047411.51 frames. ], batch size: 70, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:54:04,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326400.0, ans=0.125 2024-09-17 23:54:08,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=326400.0, ans=0.2 2024-09-17 23:54:29,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=326440.0, ans=0.125 2024-09-17 23:54:35,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-17 23:54:40,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-17 23:54:47,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-09-17 23:54:57,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=326520.0, ans=0.04949747468305833 2024-09-17 23:55:05,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326560.0, ans=0.1 2024-09-17 23:55:20,168 INFO [train.py:1198] (0/2) Epoch 19, batch 200, loss[loss=0.2612, ctc_loss=0.1551, cr_loss=0.4037, attn_decoder_loss=0.264, over 27254.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1423, cr_loss=0.3865, attn_decoder_loss=0.253, over 3660252.05 frames. ], batch size: 124, lr: 5.95e-03, grad_scale: 8.0 2024-09-17 23:55:22,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=326600.0, ans=0.025 2024-09-17 23:55:40,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=326640.0, ans=0.0 2024-09-17 23:55:59,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=326680.0, ans=0.125 2024-09-17 23:56:10,480 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.598e+01 9.185e+01 9.838e+01 1.653e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-17 23:56:17,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.02 vs. limit=15.0 2024-09-17 23:56:23,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=12.0 2024-09-17 23:56:36,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=326760.0, ans=0.025 2024-09-17 23:56:38,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=326760.0, ans=0.0 2024-09-17 23:56:40,795 INFO [train.py:1198] (0/2) Epoch 19, batch 250, loss[loss=0.2552, ctc_loss=0.1418, cr_loss=0.391, attn_decoder_loss=0.2591, over 29204.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.141, cr_loss=0.3854, attn_decoder_loss=0.2527, over 4142320.36 frames. ], batch size: 100, lr: 5.94e-03, grad_scale: 8.0 2024-09-17 23:56:42,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=326800.0, ans=0.2 2024-09-17 23:56:51,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=326800.0, ans=0.0 2024-09-17 23:56:59,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=326840.0, ans=0.125 2024-09-17 23:57:02,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=326840.0, ans=15.0 2024-09-17 23:57:03,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=326840.0, ans=0.125 2024-09-17 23:57:08,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326840.0, ans=0.125 2024-09-17 23:57:32,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326920.0, ans=0.125 2024-09-17 23:57:40,988 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-09-17 23:57:56,491 INFO [train.py:1198] (0/2) Epoch 19, batch 300, loss[loss=0.2589, ctc_loss=0.1484, cr_loss=0.4061, attn_decoder_loss=0.2622, over 29561.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.14, cr_loss=0.3837, attn_decoder_loss=0.252, over 4509933.11 frames. ], batch size: 92, lr: 5.94e-03, grad_scale: 8.0 2024-09-17 23:58:09,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327000.0, ans=0.125 2024-09-17 23:58:09,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-09-17 23:58:10,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=327040.0, ans=0.125 2024-09-17 23:58:18,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=327040.0, ans=0.1 2024-09-17 23:58:22,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=327040.0, ans=0.2 2024-09-17 23:58:25,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=327080.0, ans=0.0 2024-09-17 23:58:36,198 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:58:37,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=327080.0, ans=0.125 2024-09-17 23:58:41,752 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.487e+01 9.041e+01 9.802e+01 3.671e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-17 23:58:49,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=327120.0, ans=0.2 2024-09-17 23:59:02,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=12.0 2024-09-17 23:59:03,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=327160.0, ans=0.0 2024-09-17 23:59:12,437 INFO [train.py:1198] (0/2) Epoch 19, batch 350, loss[loss=0.2221, ctc_loss=0.1219, cr_loss=0.3541, attn_decoder_loss=0.2254, over 29302.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1404, cr_loss=0.3844, attn_decoder_loss=0.2527, over 4795786.75 frames. ], batch size: 71, lr: 5.94e-03, grad_scale: 8.0 2024-09-17 23:59:17,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=327200.0, ans=10.0 2024-09-17 23:59:19,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2024-09-17 23:59:28,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=327240.0, ans=0.125 2024-09-17 23:59:30,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=327240.0, ans=0.0 2024-09-17 23:59:37,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327240.0, ans=0.125 2024-09-17 23:59:39,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=327240.0, ans=0.07 2024-09-17 23:59:40,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=327240.0, ans=0.025 2024-09-17 23:59:46,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=327280.0, ans=0.025 2024-09-17 23:59:55,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.86 vs. limit=10.0 2024-09-18 00:00:15,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=327320.0, ans=0.1 2024-09-18 00:00:15,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=327320.0, ans=15.0 2024-09-18 00:00:19,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327360.0, ans=0.1 2024-09-18 00:00:32,751 INFO [train.py:1198] (0/2) Epoch 19, batch 400, loss[loss=0.2481, ctc_loss=0.1347, cr_loss=0.3821, attn_decoder_loss=0.2522, over 29694.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1394, cr_loss=0.3829, attn_decoder_loss=0.2521, over 5024769.94 frames. ], batch size: 82, lr: 5.94e-03, grad_scale: 16.0 2024-09-18 00:00:51,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=327440.0, ans=0.0 2024-09-18 00:00:57,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=327440.0, ans=0.125 2024-09-18 00:00:58,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-09-18 00:01:14,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.00 vs. limit=22.5 2024-09-18 00:01:19,971 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.676e+01 9.493e+01 1.045e+02 1.663e+02, threshold=1.899e+02, percent-clipped=0.0 2024-09-18 00:01:20,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=327520.0, ans=0.125 2024-09-18 00:01:26,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=327520.0, ans=0.0 2024-09-18 00:01:48,883 INFO [train.py:1198] (0/2) Epoch 19, batch 450, loss[loss=0.254, ctc_loss=0.1423, cr_loss=0.3869, attn_decoder_loss=0.2578, over 29675.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1397, cr_loss=0.3827, attn_decoder_loss=0.2522, over 5187705.82 frames. ], batch size: 83, lr: 5.94e-03, grad_scale: 8.0 2024-09-18 00:02:10,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327640.0, ans=0.1 2024-09-18 00:02:19,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=327680.0, ans=0.125 2024-09-18 00:02:36,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=327720.0, ans=0.1 2024-09-18 00:02:49,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=327760.0, ans=0.0 2024-09-18 00:03:04,335 INFO [train.py:1198] (0/2) Epoch 19, batch 500, loss[loss=0.2648, ctc_loss=0.149, cr_loss=0.3976, attn_decoder_loss=0.2688, over 29450.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1393, cr_loss=0.3818, attn_decoder_loss=0.2513, over 5330119.54 frames. ], batch size: 94, lr: 5.94e-03, grad_scale: 8.0 2024-09-18 00:03:15,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=327800.0, ans=0.95 2024-09-18 00:03:21,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=327840.0, ans=0.02 2024-09-18 00:03:30,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=327840.0, ans=12.0 2024-09-18 00:03:56,183 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.703e+01 9.333e+01 1.015e+02 2.225e+02, threshold=1.867e+02, percent-clipped=2.0 2024-09-18 00:04:03,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2024-09-18 00:04:16,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=327960.0, ans=0.125 2024-09-18 00:04:17,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=327960.0, ans=0.0 2024-09-18 00:04:20,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327960.0, ans=0.125 2024-09-18 00:04:25,793 INFO [train.py:1198] (0/2) Epoch 19, batch 550, loss[loss=0.2603, ctc_loss=0.1371, cr_loss=0.3935, attn_decoder_loss=0.2653, over 28812.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1397, cr_loss=0.3823, attn_decoder_loss=0.2517, over 5421769.02 frames. ], batch size: 104, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:04:29,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2024-09-18 00:04:37,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2024-09-18 00:04:39,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=328040.0, ans=0.125 2024-09-18 00:04:59,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=328080.0, ans=0.125 2024-09-18 00:05:02,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=328080.0, ans=0.125 2024-09-18 00:05:11,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=328120.0, ans=0.125 2024-09-18 00:05:17,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=328120.0, ans=0.0 2024-09-18 00:05:26,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328160.0, ans=0.1 2024-09-18 00:05:41,273 INFO [train.py:1198] (0/2) Epoch 19, batch 600, loss[loss=0.2648, ctc_loss=0.1495, cr_loss=0.3818, attn_decoder_loss=0.2691, over 29259.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1399, cr_loss=0.3822, attn_decoder_loss=0.2521, over 5507333.62 frames. ], batch size: 100, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:05:44,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=328200.0, ans=0.0 2024-09-18 00:06:14,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328280.0, ans=0.1 2024-09-18 00:06:27,696 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.945e+01 9.378e+01 9.831e+01 2.043e+02, threshold=1.876e+02, percent-clipped=1.0 2024-09-18 00:06:56,861 INFO [train.py:1198] (0/2) Epoch 19, batch 650, loss[loss=0.2521, ctc_loss=0.1439, cr_loss=0.3964, attn_decoder_loss=0.2554, over 29735.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1381, cr_loss=0.3795, attn_decoder_loss=0.2508, over 5585375.31 frames. ], batch size: 81, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:07:02,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2024-09-18 00:07:10,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=328440.0, ans=0.125 2024-09-18 00:07:11,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-18 00:07:15,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=328440.0, ans=0.125 2024-09-18 00:07:23,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-09-18 00:07:24,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-18 00:07:48,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=328520.0, ans=0.125 2024-09-18 00:08:17,458 INFO [train.py:1198] (0/2) Epoch 19, batch 700, loss[loss=0.2328, ctc_loss=0.1284, cr_loss=0.3705, attn_decoder_loss=0.2362, over 29543.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1385, cr_loss=0.3806, attn_decoder_loss=0.2513, over 5637468.76 frames. ], batch size: 76, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:08:25,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=328600.0, ans=0.2 2024-09-18 00:08:31,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328640.0, ans=0.1 2024-09-18 00:08:41,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=328640.0, ans=0.125 2024-09-18 00:09:04,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.484e+01 8.986e+01 9.600e+01 2.397e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 00:09:33,274 INFO [train.py:1198] (0/2) Epoch 19, batch 750, loss[loss=0.26, ctc_loss=0.1508, cr_loss=0.4297, attn_decoder_loss=0.2626, over 29720.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1386, cr_loss=0.3807, attn_decoder_loss=0.2512, over 5676570.82 frames. ], batch size: 82, lr: 5.93e-03, grad_scale: 8.0 2024-09-18 00:09:54,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=328840.0, ans=0.2 2024-09-18 00:10:11,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2024-09-18 00:10:14,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=328880.0, ans=0.0 2024-09-18 00:10:24,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=328920.0, ans=0.0 2024-09-18 00:10:29,378 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:10:48,710 INFO [train.py:1198] (0/2) Epoch 19, batch 800, loss[loss=0.2115, ctc_loss=0.09974, cr_loss=0.3018, attn_decoder_loss=0.2172, over 29597.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1386, cr_loss=0.3802, attn_decoder_loss=0.2511, over 5707327.14 frames. ], batch size: 73, lr: 5.92e-03, grad_scale: 16.0 2024-09-18 00:11:01,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329000.0, ans=0.1 2024-09-18 00:11:02,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=329040.0, ans=0.125 2024-09-18 00:11:39,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.734e+01 9.110e+01 9.840e+01 2.381e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 00:11:43,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=329120.0, ans=0.0 2024-09-18 00:11:55,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=329160.0, ans=0.125 2024-09-18 00:12:09,128 INFO [train.py:1198] (0/2) Epoch 19, batch 850, loss[loss=0.2555, ctc_loss=0.1372, cr_loss=0.3835, attn_decoder_loss=0.2601, over 29686.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1383, cr_loss=0.3798, attn_decoder_loss=0.2508, over 5736441.80 frames. ], batch size: 89, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:12:13,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-09-18 00:12:30,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329240.0, ans=0.1 2024-09-18 00:12:31,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=329240.0, ans=0.125 2024-09-18 00:12:33,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329240.0, ans=0.1 2024-09-18 00:12:41,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=329280.0, ans=0.0 2024-09-18 00:13:25,354 INFO [train.py:1198] (0/2) Epoch 19, batch 900, loss[loss=0.2307, ctc_loss=0.1291, cr_loss=0.3627, attn_decoder_loss=0.2339, over 29614.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1392, cr_loss=0.3816, attn_decoder_loss=0.2515, over 5742704.52 frames. ], batch size: 73, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:13:48,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=329440.0, ans=0.125 2024-09-18 00:13:57,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=329480.0, ans=0.125 2024-09-18 00:14:14,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.696e+01 9.115e+01 9.955e+01 6.704e+02, threshold=1.823e+02, percent-clipped=4.0 2024-09-18 00:14:28,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-09-18 00:14:34,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=329560.0, ans=0.0 2024-09-18 00:14:41,598 INFO [train.py:1198] (0/2) Epoch 19, batch 950, loss[loss=0.2365, ctc_loss=0.129, cr_loss=0.3711, attn_decoder_loss=0.2402, over 29523.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1392, cr_loss=0.3817, attn_decoder_loss=0.2518, over 5743088.42 frames. ], batch size: 74, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:14:44,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2024-09-18 00:14:58,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329640.0, ans=0.1 2024-09-18 00:15:18,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=329680.0, ans=0.025 2024-09-18 00:15:29,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=329720.0, ans=0.0 2024-09-18 00:15:45,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=329760.0, ans=0.125 2024-09-18 00:15:50,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2024-09-18 00:15:59,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=329760.0, ans=0.125 2024-09-18 00:16:01,870 INFO [train.py:1198] (0/2) Epoch 19, batch 1000, loss[loss=0.2422, ctc_loss=0.1347, cr_loss=0.3759, attn_decoder_loss=0.2458, over 29481.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1398, cr_loss=0.3823, attn_decoder_loss=0.2524, over 5736374.13 frames. ], batch size: 77, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:16:09,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=329800.0, ans=0.125 2024-09-18 00:16:17,544 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:16:33,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.66 vs. limit=15.0 2024-09-18 00:16:45,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.28 vs. limit=22.5 2024-09-18 00:16:50,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.872e+01 9.584e+01 1.048e+02 1.890e+02, threshold=1.917e+02, percent-clipped=1.0 2024-09-18 00:16:56,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=329920.0, ans=0.125 2024-09-18 00:17:10,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=329960.0, ans=0.125 2024-09-18 00:17:14,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329960.0, ans=0.1 2024-09-18 00:17:17,655 INFO [train.py:1198] (0/2) Epoch 19, batch 1050, loss[loss=0.2528, ctc_loss=0.1375, cr_loss=0.3729, attn_decoder_loss=0.2573, over 29670.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1391, cr_loss=0.3813, attn_decoder_loss=0.2516, over 5743676.91 frames. ], batch size: 85, lr: 5.92e-03, grad_scale: 8.0 2024-09-18 00:17:19,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=330000.0, ans=0.125 2024-09-18 00:17:20,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=330000.0, ans=0.125 2024-09-18 00:17:33,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=330040.0, ans=0.2 2024-09-18 00:17:39,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=330040.0, ans=0.0 2024-09-18 00:17:47,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=330080.0, ans=0.0 2024-09-18 00:17:51,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2024-09-18 00:17:51,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330080.0, ans=0.1 2024-09-18 00:18:25,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=330160.0, ans=0.0 2024-09-18 00:18:34,359 INFO [train.py:1198] (0/2) Epoch 19, batch 1100, loss[loss=0.2472, ctc_loss=0.1359, cr_loss=0.38, attn_decoder_loss=0.2512, over 29459.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1388, cr_loss=0.3812, attn_decoder_loss=0.2515, over 5756361.94 frames. ], batch size: 78, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:18:42,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=330200.0, ans=0.1 2024-09-18 00:18:45,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=330200.0, ans=0.0 2024-09-18 00:18:45,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=330200.0, ans=0.0 2024-09-18 00:19:05,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330280.0, ans=0.1 2024-09-18 00:19:25,408 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.385e+01 8.690e+01 9.252e+01 1.167e+02, threshold=1.738e+02, percent-clipped=0.0 2024-09-18 00:19:35,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=330320.0, ans=0.07 2024-09-18 00:19:38,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2024-09-18 00:19:46,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=330360.0, ans=0.07 2024-09-18 00:19:49,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=330360.0, ans=0.0 2024-09-18 00:19:55,677 INFO [train.py:1198] (0/2) Epoch 19, batch 1150, loss[loss=0.2388, ctc_loss=0.1318, cr_loss=0.3735, attn_decoder_loss=0.2424, over 29461.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1388, cr_loss=0.3809, attn_decoder_loss=0.2511, over 5755423.13 frames. ], batch size: 78, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:20:26,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=330480.0, ans=0.125 2024-09-18 00:20:32,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=330480.0, ans=0.125 2024-09-18 00:20:34,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=330480.0, ans=0.2 2024-09-18 00:20:35,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330480.0, ans=0.1 2024-09-18 00:20:37,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.84 vs. limit=12.0 2024-09-18 00:21:03,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=330560.0, ans=0.0 2024-09-18 00:21:09,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=330560.0, ans=0.0 2024-09-18 00:21:11,973 INFO [train.py:1198] (0/2) Epoch 19, batch 1200, loss[loss=0.2539, ctc_loss=0.1407, cr_loss=0.3857, attn_decoder_loss=0.2579, over 29672.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1395, cr_loss=0.3818, attn_decoder_loss=0.252, over 5747105.59 frames. ], batch size: 85, lr: 5.91e-03, grad_scale: 16.0 2024-09-18 00:21:12,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-09-18 00:21:22,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=330600.0, ans=0.0 2024-09-18 00:21:30,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330640.0, ans=0.1 2024-09-18 00:21:41,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-09-18 00:22:02,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.778e+01 9.349e+01 9.833e+01 1.592e+02, threshold=1.870e+02, percent-clipped=0.0 2024-09-18 00:22:15,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=330760.0, ans=0.025 2024-09-18 00:22:28,396 INFO [train.py:1198] (0/2) Epoch 19, batch 1250, loss[loss=0.257, ctc_loss=0.1553, cr_loss=0.41, attn_decoder_loss=0.2592, over 29549.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1402, cr_loss=0.3832, attn_decoder_loss=0.2526, over 5775084.08 frames. ], batch size: 92, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:22:40,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330800.0, ans=0.1 2024-09-18 00:22:41,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=330800.0, ans=0.0 2024-09-18 00:23:40,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=330960.0, ans=0.125 2024-09-18 00:23:48,824 INFO [train.py:1198] (0/2) Epoch 19, batch 1300, loss[loss=0.2652, ctc_loss=0.1532, cr_loss=0.3994, attn_decoder_loss=0.2687, over 28438.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1397, cr_loss=0.3817, attn_decoder_loss=0.2518, over 5780963.77 frames. ], batch size: 111, lr: 5.91e-03, grad_scale: 8.0 2024-09-18 00:23:58,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=331000.0, ans=0.125 2024-09-18 00:24:13,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=331040.0, ans=0.125 2024-09-18 00:24:24,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=331080.0, ans=0.125 2024-09-18 00:24:39,241 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.628e+01 9.058e+01 9.767e+01 1.420e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 00:24:41,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.54 vs. limit=22.5 2024-09-18 00:24:42,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331120.0, ans=0.125 2024-09-18 00:25:05,571 INFO [train.py:1198] (0/2) Epoch 19, batch 1350, loss[loss=0.2486, ctc_loss=0.1359, cr_loss=0.3745, attn_decoder_loss=0.2528, over 29770.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1388, cr_loss=0.3804, attn_decoder_loss=0.2514, over 5797060.28 frames. ], batch size: 81, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:25:09,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-18 00:25:19,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=331240.0, ans=0.5 2024-09-18 00:25:30,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331240.0, ans=0.1 2024-09-18 00:25:59,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=331320.0, ans=0.04949747468305833 2024-09-18 00:26:03,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=331320.0, ans=0.02 2024-09-18 00:26:19,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=331360.0, ans=0.125 2024-09-18 00:26:21,736 INFO [train.py:1198] (0/2) Epoch 19, batch 1400, loss[loss=0.2089, ctc_loss=0.1112, cr_loss=0.3159, attn_decoder_loss=0.2127, over 29597.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1384, cr_loss=0.3797, attn_decoder_loss=0.2509, over 5807996.46 frames. ], batch size: 69, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:26:22,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331400.0, ans=0.1 2024-09-18 00:26:43,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331440.0, ans=0.1 2024-09-18 00:26:50,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=331480.0, ans=0.0 2024-09-18 00:26:55,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=331480.0, ans=0.07 2024-09-18 00:27:07,640 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:27:07,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-09-18 00:27:09,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=331520.0, ans=0.0 2024-09-18 00:27:11,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.640e+01 9.143e+01 9.808e+01 1.570e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-18 00:27:12,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=331520.0, ans=0.0 2024-09-18 00:27:12,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-18 00:27:13,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=331520.0, ans=0.1 2024-09-18 00:27:15,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=331520.0, ans=0.125 2024-09-18 00:27:28,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=331560.0, ans=0.125 2024-09-18 00:27:40,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=331600.0, ans=0.0 2024-09-18 00:27:42,237 INFO [train.py:1198] (0/2) Epoch 19, batch 1450, loss[loss=0.2601, ctc_loss=0.1497, cr_loss=0.4224, attn_decoder_loss=0.263, over 29422.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.139, cr_loss=0.3813, attn_decoder_loss=0.2518, over 5804523.94 frames. ], batch size: 94, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:27:54,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=331600.0, ans=0.125 2024-09-18 00:28:03,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=331640.0, ans=0.125 2024-09-18 00:28:09,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=331640.0, ans=0.0 2024-09-18 00:28:20,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-18 00:28:21,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=331680.0, ans=0.2 2024-09-18 00:28:37,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.03 vs. limit=15.0 2024-09-18 00:28:47,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=331760.0, ans=0.0 2024-09-18 00:28:47,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331760.0, ans=0.1 2024-09-18 00:28:57,909 INFO [train.py:1198] (0/2) Epoch 19, batch 1500, loss[loss=0.2561, ctc_loss=0.1435, cr_loss=0.3788, attn_decoder_loss=0.2602, over 29623.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1391, cr_loss=0.3812, attn_decoder_loss=0.2519, over 5805904.38 frames. ], batch size: 86, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:29:05,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=331800.0, ans=0.2 2024-09-18 00:29:15,110 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:29:40,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331880.0, ans=0.1 2024-09-18 00:29:41,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=331880.0, ans=0.125 2024-09-18 00:29:48,930 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.706e+01 9.242e+01 9.878e+01 2.158e+02, threshold=1.848e+02, percent-clipped=2.0 2024-09-18 00:29:59,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.22 vs. limit=6.0 2024-09-18 00:30:12,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2024-09-18 00:30:15,118 INFO [train.py:1198] (0/2) Epoch 19, batch 1550, loss[loss=0.2674, ctc_loss=0.1607, cr_loss=0.4099, attn_decoder_loss=0.2702, over 29508.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1395, cr_loss=0.3814, attn_decoder_loss=0.2521, over 5781915.14 frames. ], batch size: 90, lr: 5.90e-03, grad_scale: 8.0 2024-09-18 00:30:36,528 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:30:57,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=332080.0, ans=0.125 2024-09-18 00:31:10,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.60 vs. limit=15.0 2024-09-18 00:31:18,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=332160.0, ans=0.025 2024-09-18 00:31:30,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=332160.0, ans=0.0 2024-09-18 00:31:35,035 INFO [train.py:1198] (0/2) Epoch 19, batch 1600, loss[loss=0.2537, ctc_loss=0.1425, cr_loss=0.3876, attn_decoder_loss=0.2574, over 29696.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1401, cr_loss=0.3819, attn_decoder_loss=0.252, over 5763907.52 frames. ], batch size: 85, lr: 5.90e-03, grad_scale: 16.0 2024-09-18 00:31:47,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=332200.0, ans=0.125 2024-09-18 00:32:03,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.36 vs. limit=15.0 2024-09-18 00:32:08,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=332280.0, ans=0.2 2024-09-18 00:32:10,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=332280.0, ans=0.2 2024-09-18 00:32:14,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=332280.0, ans=0.125 2024-09-18 00:32:16,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332280.0, ans=0.1 2024-09-18 00:32:25,390 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:32:26,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 8.856e+01 9.608e+01 1.051e+02 2.791e+02, threshold=1.922e+02, percent-clipped=1.0 2024-09-18 00:32:28,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.87 vs. limit=15.0 2024-09-18 00:32:49,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=332400.0, ans=0.04949747468305833 2024-09-18 00:32:50,570 INFO [train.py:1198] (0/2) Epoch 19, batch 1650, loss[loss=0.2563, ctc_loss=0.1444, cr_loss=0.374, attn_decoder_loss=0.2604, over 29701.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1403, cr_loss=0.3828, attn_decoder_loss=0.2521, over 5757422.59 frames. ], batch size: 89, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:32:50,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=332400.0, ans=0.025 2024-09-18 00:32:54,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=332400.0, ans=0.95 2024-09-18 00:33:10,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.61 vs. limit=22.5 2024-09-18 00:33:11,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332440.0, ans=0.1 2024-09-18 00:33:30,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-09-18 00:33:36,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=332520.0, ans=0.125 2024-09-18 00:34:06,071 INFO [train.py:1198] (0/2) Epoch 19, batch 1700, loss[loss=0.2178, ctc_loss=0.1113, cr_loss=0.3292, attn_decoder_loss=0.2223, over 29563.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1396, cr_loss=0.3821, attn_decoder_loss=0.2517, over 5779143.30 frames. ], batch size: 69, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:34:06,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=332600.0, ans=0.125 2024-09-18 00:34:33,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=332640.0, ans=0.125 2024-09-18 00:34:59,434 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.557e+01 9.059e+01 9.709e+01 1.358e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 00:35:05,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=332720.0, ans=0.125 2024-09-18 00:35:13,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332760.0, ans=0.1 2024-09-18 00:35:26,341 INFO [train.py:1198] (0/2) Epoch 19, batch 1750, loss[loss=0.219, ctc_loss=0.1185, cr_loss=0.3489, attn_decoder_loss=0.2224, over 29340.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1391, cr_loss=0.3813, attn_decoder_loss=0.2513, over 5788444.80 frames. ], batch size: 67, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:35:29,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=332800.0, ans=0.125 2024-09-18 00:35:41,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=332840.0, ans=0.0 2024-09-18 00:35:43,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-09-18 00:36:20,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.63 vs. limit=22.5 2024-09-18 00:36:25,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=332960.0, ans=0.125 2024-09-18 00:36:27,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=332960.0, ans=0.0 2024-09-18 00:36:31,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=332960.0, ans=0.2 2024-09-18 00:36:41,691 INFO [train.py:1198] (0/2) Epoch 19, batch 1800, loss[loss=0.2514, ctc_loss=0.1378, cr_loss=0.3744, attn_decoder_loss=0.2558, over 29693.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1389, cr_loss=0.3812, attn_decoder_loss=0.2515, over 5790576.93 frames. ], batch size: 83, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:37:04,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=333040.0, ans=0.125 2024-09-18 00:37:33,205 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.534e+01 9.002e+01 9.561e+01 2.098e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 00:37:51,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-09-18 00:37:57,829 INFO [train.py:1198] (0/2) Epoch 19, batch 1850, loss[loss=0.2568, ctc_loss=0.1423, cr_loss=0.3895, attn_decoder_loss=0.2609, over 29633.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1386, cr_loss=0.3808, attn_decoder_loss=0.2513, over 5797633.58 frames. ], batch size: 86, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:38:22,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=333240.0, ans=0.0 2024-09-18 00:38:31,627 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:38:32,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2024-09-18 00:38:36,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=333280.0, ans=0.07 2024-09-18 00:39:02,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=333360.0, ans=0.0 2024-09-18 00:39:10,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-18 00:39:15,878 INFO [train.py:1198] (0/2) Epoch 19, batch 1900, loss[loss=0.2645, ctc_loss=0.1533, cr_loss=0.4106, attn_decoder_loss=0.2677, over 29711.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1389, cr_loss=0.3814, attn_decoder_loss=0.2518, over 5804780.63 frames. ], batch size: 89, lr: 5.89e-03, grad_scale: 8.0 2024-09-18 00:39:28,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=333400.0, ans=0.2 2024-09-18 00:40:10,193 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 8.878e+01 9.424e+01 1.001e+02 2.862e+02, threshold=1.885e+02, percent-clipped=2.0 2024-09-18 00:40:33,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=333600.0, ans=0.125 2024-09-18 00:40:34,855 INFO [train.py:1198] (0/2) Epoch 19, batch 1950, loss[loss=0.2442, ctc_loss=0.1398, cr_loss=0.394, attn_decoder_loss=0.247, over 29471.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1395, cr_loss=0.3828, attn_decoder_loss=0.2528, over 5819699.83 frames. ], batch size: 78, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:40:38,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=333600.0, ans=0.125 2024-09-18 00:40:40,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-09-18 00:40:56,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=333640.0, ans=0.0 2024-09-18 00:41:02,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-09-18 00:41:32,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-18 00:41:38,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=333760.0, ans=0.125 2024-09-18 00:41:38,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333760.0, ans=0.1 2024-09-18 00:41:50,438 INFO [train.py:1198] (0/2) Epoch 19, batch 2000, loss[loss=0.2171, ctc_loss=0.1207, cr_loss=0.3547, attn_decoder_loss=0.22, over 29346.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1397, cr_loss=0.3824, attn_decoder_loss=0.253, over 5797569.81 frames. ], batch size: 67, lr: 5.88e-03, grad_scale: 16.0 2024-09-18 00:42:06,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=333840.0, ans=0.0 2024-09-18 00:42:19,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=333880.0, ans=0.025 2024-09-18 00:42:31,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333880.0, ans=0.1 2024-09-18 00:42:40,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333920.0, ans=0.1 2024-09-18 00:42:41,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=333920.0, ans=0.125 2024-09-18 00:42:46,024 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.666e+01 9.128e+01 9.687e+01 2.181e+02, threshold=1.826e+02, percent-clipped=3.0 2024-09-18 00:42:53,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=333960.0, ans=0.0 2024-09-18 00:42:55,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=333960.0, ans=0.1 2024-09-18 00:43:07,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=334000.0, ans=0.125 2024-09-18 00:43:08,937 INFO [train.py:1198] (0/2) Epoch 19, batch 2050, loss[loss=0.2298, ctc_loss=0.1297, cr_loss=0.3721, attn_decoder_loss=0.2327, over 29421.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1394, cr_loss=0.3818, attn_decoder_loss=0.2521, over 5790757.28 frames. ], batch size: 70, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:43:17,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=334000.0, ans=0.0 2024-09-18 00:43:31,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=334040.0, ans=0.0 2024-09-18 00:43:56,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2024-09-18 00:43:58,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=334120.0, ans=0.2 2024-09-18 00:44:16,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=334160.0, ans=0.07 2024-09-18 00:44:27,284 INFO [train.py:1198] (0/2) Epoch 19, batch 2100, loss[loss=0.238, ctc_loss=0.1228, cr_loss=0.3591, attn_decoder_loss=0.2428, over 29744.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1387, cr_loss=0.381, attn_decoder_loss=0.2514, over 5802249.08 frames. ], batch size: 81, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:44:58,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-18 00:45:20,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.379e+01 9.013e+01 9.583e+01 3.257e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-18 00:45:23,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=334320.0, ans=0.025 2024-09-18 00:45:29,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-09-18 00:45:42,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=334400.0, ans=0.125 2024-09-18 00:45:44,008 INFO [train.py:1198] (0/2) Epoch 19, batch 2150, loss[loss=0.2525, ctc_loss=0.1449, cr_loss=0.4004, attn_decoder_loss=0.2556, over 29470.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1382, cr_loss=0.3809, attn_decoder_loss=0.2507, over 5816748.26 frames. ], batch size: 78, lr: 5.88e-03, grad_scale: 8.0 2024-09-18 00:46:13,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=334480.0, ans=0.1 2024-09-18 00:46:29,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=334520.0, ans=0.09899494936611666 2024-09-18 00:46:36,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.48 vs. limit=15.0 2024-09-18 00:46:41,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=334520.0, ans=0.0 2024-09-18 00:46:42,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=334520.0, ans=0.2 2024-09-18 00:46:48,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=334560.0, ans=0.2 2024-09-18 00:46:49,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=334560.0, ans=0.125 2024-09-18 00:46:58,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=334560.0, ans=0.0 2024-09-18 00:47:02,261 INFO [train.py:1198] (0/2) Epoch 19, batch 2200, loss[loss=0.2507, ctc_loss=0.1387, cr_loss=0.3793, attn_decoder_loss=0.2547, over 29632.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1389, cr_loss=0.3815, attn_decoder_loss=0.2512, over 5814531.90 frames. ], batch size: 86, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:47:26,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.80 vs. limit=15.0 2024-09-18 00:47:36,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=334680.0, ans=0.2 2024-09-18 00:47:45,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334680.0, ans=0.125 2024-09-18 00:47:57,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.512e+01 9.076e+01 9.778e+01 1.780e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 00:48:13,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=334760.0, ans=0.0 2024-09-18 00:48:20,719 INFO [train.py:1198] (0/2) Epoch 19, batch 2250, loss[loss=0.2496, ctc_loss=0.134, cr_loss=0.3697, attn_decoder_loss=0.2542, over 29691.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1383, cr_loss=0.3809, attn_decoder_loss=0.2508, over 5813579.94 frames. ], batch size: 82, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:48:30,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334800.0, ans=0.1 2024-09-18 00:48:33,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=334800.0, ans=0.2 2024-09-18 00:48:42,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=15.0 2024-09-18 00:48:52,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=334880.0, ans=0.125 2024-09-18 00:48:54,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=334880.0, ans=0.1 2024-09-18 00:48:58,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-18 00:48:58,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=334880.0, ans=0.0 2024-09-18 00:49:24,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=334960.0, ans=0.0 2024-09-18 00:49:27,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=334960.0, ans=0.0 2024-09-18 00:49:36,427 INFO [train.py:1198] (0/2) Epoch 19, batch 2300, loss[loss=0.2236, ctc_loss=0.1191, cr_loss=0.3605, attn_decoder_loss=0.2272, over 29697.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1376, cr_loss=0.3794, attn_decoder_loss=0.2499, over 5799942.45 frames. ], batch size: 72, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:49:38,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=335000.0, ans=0.125 2024-09-18 00:50:05,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335080.0, ans=0.125 2024-09-18 00:50:18,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=335080.0, ans=0.0 2024-09-18 00:50:29,716 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.590e+01 9.155e+01 9.781e+01 6.273e+02, threshold=1.831e+02, percent-clipped=2.0 2024-09-18 00:50:32,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=12.0 2024-09-18 00:50:42,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.43 vs. limit=15.0 2024-09-18 00:50:47,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=335160.0, ans=0.0 2024-09-18 00:50:55,497 INFO [train.py:1198] (0/2) Epoch 19, batch 2350, loss[loss=0.264, ctc_loss=0.1495, cr_loss=0.3919, attn_decoder_loss=0.268, over 29701.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1382, cr_loss=0.3804, attn_decoder_loss=0.2504, over 5805732.69 frames. ], batch size: 83, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:51:01,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335200.0, ans=0.1 2024-09-18 00:51:09,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=335240.0, ans=0.125 2024-09-18 00:51:38,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=335280.0, ans=0.125 2024-09-18 00:52:13,757 INFO [train.py:1198] (0/2) Epoch 19, batch 2400, loss[loss=0.2331, ctc_loss=0.1264, cr_loss=0.363, attn_decoder_loss=0.2369, over 29541.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1391, cr_loss=0.3824, attn_decoder_loss=0.2512, over 5809545.44 frames. ], batch size: 76, lr: 5.87e-03, grad_scale: 16.0 2024-09-18 00:52:17,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2024-09-18 00:52:29,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=335440.0, ans=0.025 2024-09-18 00:52:35,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=335440.0, ans=0.0 2024-09-18 00:52:36,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.35 vs. limit=15.0 2024-09-18 00:52:55,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=335480.0, ans=0.125 2024-09-18 00:53:07,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2024-09-18 00:53:08,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.603e+01 9.064e+01 9.775e+01 3.534e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-18 00:53:10,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=335520.0, ans=0.125 2024-09-18 00:53:11,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=335520.0, ans=0.125 2024-09-18 00:53:22,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=335560.0, ans=0.0 2024-09-18 00:53:27,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=335560.0, ans=0.025 2024-09-18 00:53:29,740 INFO [train.py:1198] (0/2) Epoch 19, batch 2450, loss[loss=0.2499, ctc_loss=0.1391, cr_loss=0.3836, attn_decoder_loss=0.2537, over 29708.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1394, cr_loss=0.3827, attn_decoder_loss=0.252, over 5785638.54 frames. ], batch size: 82, lr: 5.87e-03, grad_scale: 8.0 2024-09-18 00:53:32,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.68 vs. limit=22.5 2024-09-18 00:53:43,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=335640.0, ans=0.125 2024-09-18 00:53:45,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2024-09-18 00:54:09,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335680.0, ans=0.1 2024-09-18 00:54:12,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=335680.0, ans=15.0 2024-09-18 00:54:23,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=335720.0, ans=0.0 2024-09-18 00:54:23,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=335720.0, ans=0.1 2024-09-18 00:54:31,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=335760.0, ans=0.0 2024-09-18 00:54:32,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=335760.0, ans=0.125 2024-09-18 00:54:47,583 INFO [train.py:1198] (0/2) Epoch 19, batch 2500, loss[loss=0.2548, ctc_loss=0.1453, cr_loss=0.3942, attn_decoder_loss=0.2582, over 29636.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1395, cr_loss=0.3832, attn_decoder_loss=0.2521, over 5795828.19 frames. ], batch size: 86, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:54:54,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=335800.0, ans=0.125 2024-09-18 00:55:22,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=335880.0, ans=0.125 2024-09-18 00:55:44,877 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.526e+01 9.010e+01 9.846e+01 5.892e+02, threshold=1.802e+02, percent-clipped=2.0 2024-09-18 00:55:51,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=335960.0, ans=0.125 2024-09-18 00:55:51,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335960.0, ans=0.1 2024-09-18 00:56:05,423 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-84000.pt 2024-09-18 00:56:13,768 INFO [train.py:1198] (0/2) Epoch 19, batch 2550, loss[loss=0.215, ctc_loss=0.1143, cr_loss=0.3409, attn_decoder_loss=0.2187, over 29327.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1391, cr_loss=0.3826, attn_decoder_loss=0.2519, over 5799470.69 frames. ], batch size: 67, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:56:35,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.44 vs. limit=12.0 2024-09-18 00:56:42,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=336080.0, ans=0.2 2024-09-18 00:56:47,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=336080.0, ans=0.025 2024-09-18 00:56:53,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=336080.0, ans=0.0 2024-09-18 00:56:58,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=336120.0, ans=0.125 2024-09-18 00:57:04,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=336120.0, ans=0.0 2024-09-18 00:57:05,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=336120.0, ans=0.05 2024-09-18 00:57:14,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336160.0, ans=0.125 2024-09-18 00:57:19,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.36 vs. limit=15.0 2024-09-18 00:57:29,655 INFO [train.py:1198] (0/2) Epoch 19, batch 2600, loss[loss=0.2402, ctc_loss=0.1287, cr_loss=0.3751, attn_decoder_loss=0.2442, over 29426.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1396, cr_loss=0.3834, attn_decoder_loss=0.2526, over 5795382.32 frames. ], batch size: 78, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:57:58,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-09-18 00:58:00,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-09-18 00:58:01,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=336280.0, ans=0.125 2024-09-18 00:58:03,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=336280.0, ans=0.125 2024-09-18 00:58:26,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.566e+01 8.963e+01 9.636e+01 1.354e+02, threshold=1.793e+02, percent-clipped=0.0 2024-09-18 00:58:33,112 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:58:43,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=336360.0, ans=0.07 2024-09-18 00:58:47,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=22.5 2024-09-18 00:58:47,576 INFO [train.py:1198] (0/2) Epoch 19, batch 2650, loss[loss=0.2657, ctc_loss=0.1562, cr_loss=0.4106, attn_decoder_loss=0.2687, over 29261.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1396, cr_loss=0.3836, attn_decoder_loss=0.2527, over 5800960.15 frames. ], batch size: 100, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 00:58:54,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=336400.0, ans=0.125 2024-09-18 00:58:58,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=336400.0, ans=0.125 2024-09-18 00:59:29,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=336480.0, ans=0.125 2024-09-18 00:59:29,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=336480.0, ans=0.0 2024-09-18 00:59:58,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=336560.0, ans=0.0 2024-09-18 01:00:00,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=336560.0, ans=0.125 2024-09-18 01:00:05,872 INFO [train.py:1198] (0/2) Epoch 19, batch 2700, loss[loss=0.2489, ctc_loss=0.1279, cr_loss=0.3681, attn_decoder_loss=0.2541, over 29534.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1399, cr_loss=0.3844, attn_decoder_loss=0.253, over 5796997.58 frames. ], batch size: 87, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 01:00:06,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=336600.0, ans=0.0 2024-09-18 01:00:15,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.84 vs. limit=15.0 2024-09-18 01:00:25,711 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:00:28,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=15.0 2024-09-18 01:00:28,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=336640.0, ans=0.125 2024-09-18 01:00:43,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=336680.0, ans=0.0 2024-09-18 01:01:00,333 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.475e+01 9.059e+01 9.583e+01 2.142e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-18 01:01:22,325 INFO [train.py:1198] (0/2) Epoch 19, batch 2750, loss[loss=0.2437, ctc_loss=0.1493, cr_loss=0.4151, attn_decoder_loss=0.2449, over 29532.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.139, cr_loss=0.3824, attn_decoder_loss=0.2517, over 5795108.48 frames. ], batch size: 75, lr: 5.86e-03, grad_scale: 8.0 2024-09-18 01:01:48,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=336840.0, ans=0.125 2024-09-18 01:01:49,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=336840.0, ans=0.2 2024-09-18 01:01:51,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=336880.0, ans=0.125 2024-09-18 01:01:51,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=336880.0, ans=0.0 2024-09-18 01:01:53,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=336880.0, ans=0.04949747468305833 2024-09-18 01:02:00,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=336880.0, ans=0.2 2024-09-18 01:02:02,095 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:02:02,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.58 vs. limit=22.5 2024-09-18 01:02:04,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-09-18 01:02:09,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=336920.0, ans=0.025 2024-09-18 01:02:19,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336920.0, ans=0.1 2024-09-18 01:02:24,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=336960.0, ans=0.07 2024-09-18 01:02:24,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=336960.0, ans=0.125 2024-09-18 01:02:36,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2024-09-18 01:02:39,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=337000.0, ans=0.125 2024-09-18 01:02:40,762 INFO [train.py:1198] (0/2) Epoch 19, batch 2800, loss[loss=0.2764, ctc_loss=0.1691, cr_loss=0.3678, attn_decoder_loss=0.2801, over 20147.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1395, cr_loss=0.3822, attn_decoder_loss=0.2519, over 5776815.19 frames. ], batch size: 210, lr: 5.85e-03, grad_scale: 16.0 2024-09-18 01:02:40,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=337000.0, ans=0.125 2024-09-18 01:02:43,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.66 vs. limit=15.0 2024-09-18 01:02:51,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=337000.0, ans=0.125 2024-09-18 01:03:02,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=337040.0, ans=0.0 2024-09-18 01:03:11,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2024-09-18 01:03:12,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=337080.0, ans=0.0 2024-09-18 01:03:36,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=337120.0, ans=0.0 2024-09-18 01:03:38,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 9.014e+01 9.335e+01 1.020e+02 1.618e+02, threshold=1.867e+02, percent-clipped=0.0 2024-09-18 01:03:58,617 INFO [train.py:1198] (0/2) Epoch 19, batch 2850, loss[loss=0.2326, ctc_loss=0.1264, cr_loss=0.3672, attn_decoder_loss=0.2363, over 29492.00 frames. ], tot_loss[loss=0.2487, ctc_loss=0.1398, cr_loss=0.3827, attn_decoder_loss=0.2523, over 5762014.80 frames. ], batch size: 77, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:04:04,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=337200.0, ans=0.0 2024-09-18 01:04:11,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=12.0 2024-09-18 01:04:17,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=337240.0, ans=0.0 2024-09-18 01:04:20,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=337240.0, ans=0.125 2024-09-18 01:04:21,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=337240.0, ans=0.2 2024-09-18 01:04:34,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=337280.0, ans=0.0 2024-09-18 01:04:46,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=337320.0, ans=0.125 2024-09-18 01:04:57,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=337320.0, ans=0.125 2024-09-18 01:05:06,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337360.0, ans=0.1 2024-09-18 01:05:15,134 INFO [train.py:1198] (0/2) Epoch 19, batch 2900, loss[loss=0.2403, ctc_loss=0.1338, cr_loss=0.3688, attn_decoder_loss=0.2439, over 29439.00 frames. ], tot_loss[loss=0.2496, ctc_loss=0.1404, cr_loss=0.384, attn_decoder_loss=0.2532, over 5787343.58 frames. ], batch size: 79, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:05:25,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=337400.0, ans=0.07 2024-09-18 01:05:51,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=337480.0, ans=0.0 2024-09-18 01:06:05,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-18 01:06:13,398 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.650e+01 9.061e+01 9.798e+01 5.022e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-18 01:06:15,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=337520.0, ans=0.125 2024-09-18 01:06:18,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=337560.0, ans=0.025 2024-09-18 01:06:23,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.66 vs. limit=15.0 2024-09-18 01:06:25,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-18 01:06:33,635 INFO [train.py:1198] (0/2) Epoch 19, batch 2950, loss[loss=0.2434, ctc_loss=0.1444, cr_loss=0.3962, attn_decoder_loss=0.2456, over 29519.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1396, cr_loss=0.3821, attn_decoder_loss=0.2521, over 5781292.61 frames. ], batch size: 75, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:06:44,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=337600.0, ans=0.025 2024-09-18 01:06:47,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=337640.0, ans=0.2 2024-09-18 01:07:14,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=337680.0, ans=0.125 2024-09-18 01:07:52,306 INFO [train.py:1198] (0/2) Epoch 19, batch 3000, loss[loss=0.2509, ctc_loss=0.1408, cr_loss=0.3941, attn_decoder_loss=0.2543, over 29774.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1395, cr_loss=0.3818, attn_decoder_loss=0.252, over 5781734.16 frames. ], batch size: 81, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:07:52,307 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 01:08:10,715 INFO [train.py:1230] (0/2) Epoch 19, validation: loss=0.2115, ctc_loss=0.0393, cr_loss=5.039e-15, attn_decoder_loss=0.2306, over 944034.00 frames. 2024-09-18 01:08:10,715 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 01:08:46,147 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:08:47,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=337880.0, ans=0.125 2024-09-18 01:08:53,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=337880.0, ans=0.0 2024-09-18 01:08:55,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337920.0, ans=0.1 2024-09-18 01:08:59,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=337920.0, ans=0.025 2024-09-18 01:09:07,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.664e+01 9.190e+01 9.808e+01 2.398e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-18 01:09:08,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337920.0, ans=0.1 2024-09-18 01:09:09,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-09-18 01:09:20,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.99 vs. limit=12.0 2024-09-18 01:09:26,831 INFO [train.py:1198] (0/2) Epoch 19, batch 3050, loss[loss=0.2406, ctc_loss=0.1381, cr_loss=0.3899, attn_decoder_loss=0.2434, over 29517.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1402, cr_loss=0.3835, attn_decoder_loss=0.2528, over 5776057.49 frames. ], batch size: 76, lr: 5.85e-03, grad_scale: 8.0 2024-09-18 01:09:53,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=338040.0, ans=0.125 2024-09-18 01:10:27,664 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.23 vs. limit=15.0 2024-09-18 01:10:27,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-18 01:10:31,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=338160.0, ans=0.025 2024-09-18 01:10:34,810 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:10:45,194 INFO [train.py:1198] (0/2) Epoch 19, batch 3100, loss[loss=0.2581, ctc_loss=0.1463, cr_loss=0.4052, attn_decoder_loss=0.2615, over 29293.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1397, cr_loss=0.3831, attn_decoder_loss=0.2524, over 5776086.36 frames. ], batch size: 100, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:11:03,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2024-09-18 01:11:06,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=338240.0, ans=15.0 2024-09-18 01:11:14,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=338240.0, ans=0.07 2024-09-18 01:11:19,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=338280.0, ans=0.125 2024-09-18 01:11:27,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.67 vs. limit=15.0 2024-09-18 01:11:28,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=338280.0, ans=0.0 2024-09-18 01:11:43,973 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.533e+01 9.118e+01 9.870e+01 1.992e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-18 01:11:50,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338360.0, ans=0.125 2024-09-18 01:11:54,823 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:12:04,280 INFO [train.py:1198] (0/2) Epoch 19, batch 3150, loss[loss=0.2544, ctc_loss=0.1313, cr_loss=0.354, attn_decoder_loss=0.2602, over 28794.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1399, cr_loss=0.3834, attn_decoder_loss=0.2525, over 5782716.24 frames. ], batch size: 104, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:12:22,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.37 vs. limit=15.0 2024-09-18 01:12:33,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=338480.0, ans=0.0 2024-09-18 01:12:54,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=338520.0, ans=0.1 2024-09-18 01:12:56,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=338520.0, ans=0.125 2024-09-18 01:12:59,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=338520.0, ans=0.2 2024-09-18 01:13:03,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.90 vs. limit=10.0 2024-09-18 01:13:08,567 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:13:17,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=338560.0, ans=0.07 2024-09-18 01:13:20,357 INFO [train.py:1198] (0/2) Epoch 19, batch 3200, loss[loss=0.2421, ctc_loss=0.1328, cr_loss=0.3794, attn_decoder_loss=0.2458, over 29393.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1395, cr_loss=0.3828, attn_decoder_loss=0.2519, over 5793521.08 frames. ], batch size: 79, lr: 5.84e-03, grad_scale: 16.0 2024-09-18 01:13:35,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=338640.0, ans=0.0 2024-09-18 01:13:36,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2024-09-18 01:13:55,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-09-18 01:14:00,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=338680.0, ans=0.2 2024-09-18 01:14:00,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=338680.0, ans=0.0 2024-09-18 01:14:14,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=338720.0, ans=0.025 2024-09-18 01:14:20,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.580e+01 9.076e+01 9.687e+01 2.351e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-18 01:14:31,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=338760.0, ans=0.125 2024-09-18 01:14:38,513 INFO [train.py:1198] (0/2) Epoch 19, batch 3250, loss[loss=0.2539, ctc_loss=0.134, cr_loss=0.3724, attn_decoder_loss=0.2589, over 29689.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1393, cr_loss=0.3831, attn_decoder_loss=0.2521, over 5799907.71 frames. ], batch size: 84, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:14:42,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.86 vs. limit=22.5 2024-09-18 01:14:46,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=338800.0, ans=0.0 2024-09-18 01:14:47,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=338800.0, ans=0.125 2024-09-18 01:15:06,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=338840.0, ans=0.0 2024-09-18 01:15:29,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338920.0, ans=0.125 2024-09-18 01:15:30,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=338920.0, ans=10.0 2024-09-18 01:15:32,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=338920.0, ans=0.0 2024-09-18 01:15:33,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=338920.0, ans=0.125 2024-09-18 01:15:33,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=338920.0, ans=0.125 2024-09-18 01:15:41,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338960.0, ans=0.1 2024-09-18 01:15:46,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.92 vs. limit=15.0 2024-09-18 01:15:56,362 INFO [train.py:1198] (0/2) Epoch 19, batch 3300, loss[loss=0.2517, ctc_loss=0.1394, cr_loss=0.3689, attn_decoder_loss=0.256, over 28336.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1382, cr_loss=0.381, attn_decoder_loss=0.2509, over 5797066.12 frames. ], batch size: 111, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:15:58,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=339000.0, ans=0.125 2024-09-18 01:16:01,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=339000.0, ans=0.0 2024-09-18 01:16:32,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=339080.0, ans=0.0 2024-09-18 01:16:53,881 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.663e+01 9.126e+01 9.763e+01 2.623e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-18 01:17:12,571 INFO [train.py:1198] (0/2) Epoch 19, batch 3350, loss[loss=0.2746, ctc_loss=0.1638, cr_loss=0.4372, attn_decoder_loss=0.2772, over 28800.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1394, cr_loss=0.3825, attn_decoder_loss=0.2519, over 5774027.54 frames. ], batch size: 104, lr: 5.84e-03, grad_scale: 8.0 2024-09-18 01:17:30,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=339240.0, ans=0.125 2024-09-18 01:17:32,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.88 vs. limit=15.0 2024-09-18 01:17:42,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=339240.0, ans=0.05 2024-09-18 01:17:42,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=339240.0, ans=0.125 2024-09-18 01:17:44,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=339280.0, ans=0.0 2024-09-18 01:18:03,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=339320.0, ans=0.0 2024-09-18 01:18:14,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=339360.0, ans=0.125 2024-09-18 01:18:22,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339360.0, ans=0.125 2024-09-18 01:18:23,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=339360.0, ans=0.125 2024-09-18 01:18:25,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=339360.0, ans=0.0 2024-09-18 01:18:28,663 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-09-18 01:18:30,833 INFO [train.py:1198] (0/2) Epoch 19, batch 3400, loss[loss=0.2195, ctc_loss=0.1257, cr_loss=0.3674, attn_decoder_loss=0.2218, over 29339.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1394, cr_loss=0.3822, attn_decoder_loss=0.2518, over 5766468.41 frames. ], batch size: 67, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:18:47,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=339440.0, ans=0.125 2024-09-18 01:19:10,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.68 vs. limit=10.0 2024-09-18 01:19:15,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=339480.0, ans=0.125 2024-09-18 01:19:27,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=339520.0, ans=0.125 2024-09-18 01:19:30,515 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.511e+01 9.195e+01 9.878e+01 2.681e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-18 01:19:44,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=339560.0, ans=0.0 2024-09-18 01:19:45,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=339560.0, ans=0.125 2024-09-18 01:19:48,775 INFO [train.py:1198] (0/2) Epoch 19, batch 3450, loss[loss=0.2656, ctc_loss=0.155, cr_loss=0.3924, attn_decoder_loss=0.2691, over 28328.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1394, cr_loss=0.3827, attn_decoder_loss=0.2521, over 5774687.01 frames. ], batch size: 111, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:19:48,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=339600.0, ans=0.0 2024-09-18 01:20:17,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=339680.0, ans=0.2 2024-09-18 01:20:28,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=339680.0, ans=0.125 2024-09-18 01:20:49,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=339760.0, ans=0.0 2024-09-18 01:20:54,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339760.0, ans=0.1 2024-09-18 01:20:55,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=339760.0, ans=0.125 2024-09-18 01:21:04,601 INFO [train.py:1198] (0/2) Epoch 19, batch 3500, loss[loss=0.2133, ctc_loss=0.1045, cr_loss=0.3062, attn_decoder_loss=0.2186, over 29332.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.139, cr_loss=0.3816, attn_decoder_loss=0.2513, over 5775661.99 frames. ], batch size: 71, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:21:08,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-09-18 01:21:46,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=339880.0, ans=0.2 2024-09-18 01:22:04,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.478e+01 8.934e+01 9.584e+01 2.565e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 01:22:16,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.23 vs. limit=15.0 2024-09-18 01:22:22,259 INFO [train.py:1198] (0/2) Epoch 19, batch 3550, loss[loss=0.2606, ctc_loss=0.1401, cr_loss=0.3966, attn_decoder_loss=0.2651, over 29731.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.139, cr_loss=0.3819, attn_decoder_loss=0.2514, over 5782115.28 frames. ], batch size: 89, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:22:22,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=340000.0, ans=0.0 2024-09-18 01:22:38,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=340040.0, ans=0.05 2024-09-18 01:22:44,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340040.0, ans=0.0 2024-09-18 01:22:46,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=340040.0, ans=0.2 2024-09-18 01:22:59,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=340080.0, ans=0.0 2024-09-18 01:23:04,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=340080.0, ans=0.0 2024-09-18 01:23:09,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2024-09-18 01:23:19,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=340120.0, ans=0.0 2024-09-18 01:23:28,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=340160.0, ans=0.025 2024-09-18 01:23:38,726 INFO [train.py:1198] (0/2) Epoch 19, batch 3600, loss[loss=0.2448, ctc_loss=0.1339, cr_loss=0.3589, attn_decoder_loss=0.2492, over 29532.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1391, cr_loss=0.3817, attn_decoder_loss=0.2518, over 5791026.17 frames. ], batch size: 77, lr: 5.83e-03, grad_scale: 16.0 2024-09-18 01:23:43,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=340200.0, ans=0.0 2024-09-18 01:23:46,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=340200.0, ans=0.2 2024-09-18 01:24:12,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=340280.0, ans=0.025 2024-09-18 01:24:31,988 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2024-09-18 01:24:37,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.610e+01 9.225e+01 9.925e+01 8.683e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-18 01:24:40,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=340360.0, ans=0.125 2024-09-18 01:24:42,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=12.0 2024-09-18 01:24:53,605 INFO [train.py:1198] (0/2) Epoch 19, batch 3650, loss[loss=0.2655, ctc_loss=0.1451, cr_loss=0.4055, attn_decoder_loss=0.2699, over 29514.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1381, cr_loss=0.3803, attn_decoder_loss=0.2508, over 5791858.87 frames. ], batch size: 90, lr: 5.83e-03, grad_scale: 8.0 2024-09-18 01:25:14,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=340440.0, ans=0.125 2024-09-18 01:25:37,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=340520.0, ans=0.125 2024-09-18 01:25:44,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2024-09-18 01:25:46,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-09-18 01:25:52,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=340560.0, ans=0.2 2024-09-18 01:26:00,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=340560.0, ans=0.0 2024-09-18 01:26:08,880 INFO [train.py:1198] (0/2) Epoch 19, batch 3700, loss[loss=0.2593, ctc_loss=0.1501, cr_loss=0.3898, attn_decoder_loss=0.2628, over 29696.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1381, cr_loss=0.3802, attn_decoder_loss=0.2509, over 5800885.86 frames. ], batch size: 84, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:26:10,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340600.0, ans=0.1 2024-09-18 01:26:15,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=340600.0, ans=0.125 2024-09-18 01:26:22,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340640.0, ans=0.1 2024-09-18 01:26:39,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.64 vs. limit=10.0 2024-09-18 01:27:07,341 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.588e+01 9.238e+01 9.671e+01 4.711e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-18 01:27:09,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2024-09-18 01:27:24,464 INFO [train.py:1198] (0/2) Epoch 19, batch 3750, loss[loss=0.2182, ctc_loss=0.1168, cr_loss=0.3422, attn_decoder_loss=0.2218, over 29356.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1384, cr_loss=0.381, attn_decoder_loss=0.251, over 5805059.54 frames. ], batch size: 67, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:27:27,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=340800.0, ans=0.125 2024-09-18 01:27:29,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340800.0, ans=0.0 2024-09-18 01:27:44,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=340840.0, ans=0.125 2024-09-18 01:27:59,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340880.0, ans=0.125 2024-09-18 01:28:01,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340880.0, ans=0.125 2024-09-18 01:28:17,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2024-09-18 01:28:30,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-09-18 01:28:41,212 INFO [train.py:1198] (0/2) Epoch 19, batch 3800, loss[loss=0.2665, ctc_loss=0.1505, cr_loss=0.4011, attn_decoder_loss=0.2705, over 29634.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1383, cr_loss=0.3801, attn_decoder_loss=0.2508, over 5797336.56 frames. ], batch size: 86, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:28:41,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=341000.0, ans=0.125 2024-09-18 01:29:05,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=341040.0, ans=0.125 2024-09-18 01:29:16,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=341080.0, ans=0.0 2024-09-18 01:29:25,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=341120.0, ans=0.0 2024-09-18 01:29:39,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.913e+01 9.389e+01 9.954e+01 1.370e+02, threshold=1.878e+02, percent-clipped=0.0 2024-09-18 01:29:45,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-09-18 01:29:49,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=341160.0, ans=0.125 2024-09-18 01:29:51,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=341160.0, ans=0.125 2024-09-18 01:29:53,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=341160.0, ans=0.0 2024-09-18 01:29:57,755 INFO [train.py:1198] (0/2) Epoch 19, batch 3850, loss[loss=0.2724, ctc_loss=0.1619, cr_loss=0.4002, attn_decoder_loss=0.2758, over 29297.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1381, cr_loss=0.3799, attn_decoder_loss=0.2506, over 5811062.32 frames. ], batch size: 100, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:30:01,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=341200.0, ans=0.125 2024-09-18 01:30:11,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=341240.0, ans=0.125 2024-09-18 01:30:21,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=341240.0, ans=0.125 2024-09-18 01:30:35,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341280.0, ans=0.1 2024-09-18 01:30:59,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=341360.0, ans=0.0 2024-09-18 01:31:12,335 INFO [train.py:1198] (0/2) Epoch 19, batch 3900, loss[loss=0.2652, ctc_loss=0.1505, cr_loss=0.4217, attn_decoder_loss=0.2686, over 29628.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1385, cr_loss=0.3813, attn_decoder_loss=0.2512, over 5815180.10 frames. ], batch size: 86, lr: 5.82e-03, grad_scale: 8.0 2024-09-18 01:31:21,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=341400.0, ans=0.0 2024-09-18 01:31:29,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-09-18 01:31:37,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=341440.0, ans=0.125 2024-09-18 01:31:39,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=341440.0, ans=0.0 2024-09-18 01:31:58,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=341520.0, ans=0.05 2024-09-18 01:32:10,246 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.574e+01 8.925e+01 9.348e+01 1.659e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-18 01:32:12,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=341560.0, ans=0.95 2024-09-18 01:32:13,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=341560.0, ans=0.0 2024-09-18 01:32:18,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=341560.0, ans=0.2 2024-09-18 01:32:18,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=341560.0, ans=0.125 2024-09-18 01:32:27,248 INFO [train.py:1198] (0/2) Epoch 19, batch 3950, loss[loss=0.2606, ctc_loss=0.1425, cr_loss=0.3952, attn_decoder_loss=0.2649, over 29450.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1378, cr_loss=0.3811, attn_decoder_loss=0.2512, over 5834807.87 frames. ], batch size: 97, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:32:40,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=341640.0, ans=0.2 2024-09-18 01:32:54,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=341640.0, ans=0.0 2024-09-18 01:32:58,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=341680.0, ans=0.025 2024-09-18 01:33:34,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341760.0, ans=0.0 2024-09-18 01:33:40,201 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:33:42,799 INFO [train.py:1198] (0/2) Epoch 19, batch 4000, loss[loss=0.2372, ctc_loss=0.1347, cr_loss=0.382, attn_decoder_loss=0.2401, over 29499.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1385, cr_loss=0.3814, attn_decoder_loss=0.2514, over 5811936.68 frames. ], batch size: 74, lr: 5.81e-03, grad_scale: 16.0 2024-09-18 01:33:43,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=341800.0, ans=0.125 2024-09-18 01:33:45,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=341800.0, ans=0.0 2024-09-18 01:33:53,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=341800.0, ans=0.035 2024-09-18 01:33:56,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=341840.0, ans=0.125 2024-09-18 01:33:59,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=341840.0, ans=0.0 2024-09-18 01:34:06,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=341840.0, ans=0.125 2024-09-18 01:34:26,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-09-18 01:34:41,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.874e+01 9.386e+01 1.032e+02 2.674e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-18 01:34:50,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=341960.0, ans=0.125 2024-09-18 01:34:57,877 INFO [train.py:1198] (0/2) Epoch 19, batch 4050, loss[loss=0.2775, ctc_loss=0.1817, cr_loss=0.4095, attn_decoder_loss=0.2791, over 20582.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1387, cr_loss=0.381, attn_decoder_loss=0.2513, over 5795892.34 frames. ], batch size: 209, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:35:00,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-09-18 01:35:06,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=342000.0, ans=0.125 2024-09-18 01:35:11,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=342040.0, ans=0.125 2024-09-18 01:35:11,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342040.0, ans=0.1 2024-09-18 01:35:46,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=342120.0, ans=0.125 2024-09-18 01:35:52,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342120.0, ans=0.0 2024-09-18 01:35:59,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.14 vs. limit=22.5 2024-09-18 01:36:05,907 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:36:06,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-09-18 01:36:11,638 INFO [train.py:1198] (0/2) Epoch 19, batch 4100, loss[loss=0.2688, ctc_loss=0.1603, cr_loss=0.4115, attn_decoder_loss=0.2718, over 29534.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1387, cr_loss=0.3809, attn_decoder_loss=0.2514, over 5791409.23 frames. ], batch size: 90, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:36:16,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=342200.0, ans=0.125 2024-09-18 01:36:41,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=342280.0, ans=0.04949747468305833 2024-09-18 01:36:47,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_na.min_abs, batch_count=342280.0, ans=0.02 2024-09-18 01:37:02,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2024-09-18 01:37:06,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=342320.0, ans=0.125 2024-09-18 01:37:11,591 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.625e+01 9.215e+01 9.767e+01 2.484e+02, threshold=1.843e+02, percent-clipped=3.0 2024-09-18 01:37:12,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=342360.0, ans=0.125 2024-09-18 01:37:27,164 INFO [train.py:1198] (0/2) Epoch 19, batch 4150, loss[loss=0.2428, ctc_loss=0.1369, cr_loss=0.3842, attn_decoder_loss=0.246, over 29471.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1385, cr_loss=0.3801, attn_decoder_loss=0.2509, over 5797376.13 frames. ], batch size: 77, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:37:27,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=342400.0, ans=0.125 2024-09-18 01:37:37,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-09-18 01:37:55,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=342480.0, ans=0.125 2024-09-18 01:38:23,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.82 vs. limit=22.5 2024-09-18 01:38:28,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.98 vs. limit=15.0 2024-09-18 01:38:39,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342600.0, ans=0.1 2024-09-18 01:38:40,957 INFO [train.py:1198] (0/2) Epoch 19, batch 4200, loss[loss=0.2702, ctc_loss=0.1547, cr_loss=0.4237, attn_decoder_loss=0.2737, over 29534.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1381, cr_loss=0.3799, attn_decoder_loss=0.251, over 5799443.61 frames. ], batch size: 90, lr: 5.81e-03, grad_scale: 8.0 2024-09-18 01:38:42,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342600.0, ans=0.1 2024-09-18 01:38:51,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=342600.0, ans=0.125 2024-09-18 01:39:01,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342640.0, ans=0.1 2024-09-18 01:39:13,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=342680.0, ans=10.0 2024-09-18 01:39:41,093 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.502e+01 9.115e+01 9.695e+01 2.005e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-18 01:39:55,927 INFO [train.py:1198] (0/2) Epoch 19, batch 4250, loss[loss=0.2382, ctc_loss=0.126, cr_loss=0.3465, attn_decoder_loss=0.243, over 29515.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1386, cr_loss=0.3805, attn_decoder_loss=0.2514, over 5805464.58 frames. ], batch size: 74, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:39:58,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-09-18 01:40:03,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=342800.0, ans=0.125 2024-09-18 01:40:27,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=342880.0, ans=0.5 2024-09-18 01:40:52,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=342920.0, ans=0.0 2024-09-18 01:41:02,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=342960.0, ans=0.125 2024-09-18 01:41:03,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.19 vs. limit=22.5 2024-09-18 01:41:11,123 INFO [train.py:1198] (0/2) Epoch 19, batch 4300, loss[loss=0.2618, ctc_loss=0.1465, cr_loss=0.3836, attn_decoder_loss=0.2661, over 29574.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1382, cr_loss=0.3798, attn_decoder_loss=0.2515, over 5795035.70 frames. ], batch size: 87, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:41:27,950 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:41:41,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=343080.0, ans=0.125 2024-09-18 01:41:53,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=343080.0, ans=0.0 2024-09-18 01:41:54,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=343120.0, ans=0.025 2024-09-18 01:41:56,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=343120.0, ans=0.0 2024-09-18 01:41:57,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=343120.0, ans=0.0 2024-09-18 01:42:10,630 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.890e+01 9.360e+01 1.027e+02 1.828e+02, threshold=1.872e+02, percent-clipped=1.0 2024-09-18 01:42:10,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=343160.0, ans=10.0 2024-09-18 01:42:18,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=343160.0, ans=0.0 2024-09-18 01:42:27,030 INFO [train.py:1198] (0/2) Epoch 19, batch 4350, loss[loss=0.2769, ctc_loss=0.1679, cr_loss=0.4427, attn_decoder_loss=0.2791, over 29539.00 frames. ], tot_loss[loss=0.2508, ctc_loss=0.1406, cr_loss=0.3843, attn_decoder_loss=0.2545, over 5797664.17 frames. ], batch size: 97, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:42:30,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=343200.0, ans=0.125 2024-09-18 01:42:37,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=343200.0, ans=0.125 2024-09-18 01:42:48,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=343240.0, ans=0.125 2024-09-18 01:42:56,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=343280.0, ans=0.0 2024-09-18 01:43:01,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=343280.0, ans=0.125 2024-09-18 01:43:16,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-09-18 01:43:34,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-09-18 01:43:38,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=343360.0, ans=0.2 2024-09-18 01:43:40,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-18 01:43:41,024 INFO [train.py:1198] (0/2) Epoch 19, batch 4400, loss[loss=0.2699, ctc_loss=0.1691, cr_loss=0.4207, attn_decoder_loss=0.2718, over 27098.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1422, cr_loss=0.3874, attn_decoder_loss=0.2567, over 5769029.54 frames. ], batch size: 124, lr: 5.80e-03, grad_scale: 16.0 2024-09-18 01:44:04,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.81 vs. limit=15.0 2024-09-18 01:44:27,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=343520.0, ans=0.125 2024-09-18 01:44:28,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=343520.0, ans=0.0 2024-09-18 01:44:28,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=343520.0, ans=0.025 2024-09-18 01:44:32,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343520.0, ans=0.125 2024-09-18 01:44:34,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=343520.0, ans=0.125 2024-09-18 01:44:35,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343520.0, ans=0.1 2024-09-18 01:44:41,230 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.011e+01 9.170e+01 9.647e+01 1.019e+02 1.899e+02, threshold=1.929e+02, percent-clipped=1.0 2024-09-18 01:44:55,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2024-09-18 01:44:55,330 INFO [train.py:1198] (0/2) Epoch 19, batch 4450, loss[loss=0.2763, ctc_loss=0.1854, cr_loss=0.4312, attn_decoder_loss=0.2768, over 20408.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1469, cr_loss=0.3928, attn_decoder_loss=0.2596, over 5585000.48 frames. ], batch size: 209, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:45:07,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=343600.0, ans=0.125 2024-09-18 01:45:20,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=343640.0, ans=0.04949747468305833 2024-09-18 01:45:20,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=343640.0, ans=0.125 2024-09-18 01:45:23,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=343640.0, ans=0.125 2024-09-18 01:45:24,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=343680.0, ans=0.2 2024-09-18 01:45:34,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=343680.0, ans=0.2 2024-09-18 01:45:54,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343760.0, ans=0.1 2024-09-18 01:46:00,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343760.0, ans=0.125 2024-09-18 01:46:11,295 INFO [train.py:1198] (0/2) Epoch 19, batch 4500, loss[loss=0.2821, ctc_loss=0.194, cr_loss=0.4192, attn_decoder_loss=0.2826, over 19830.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1518, cr_loss=0.3955, attn_decoder_loss=0.262, over 5240343.05 frames. ], batch size: 209, lr: 5.80e-03, grad_scale: 8.0 2024-09-18 01:46:14,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=343800.0, ans=0.0 2024-09-18 01:46:20,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=343800.0, ans=0.0 2024-09-18 01:46:29,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=343840.0, ans=0.0 2024-09-18 01:46:31,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=343840.0, ans=0.2 2024-09-18 01:46:41,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343880.0, ans=0.1 2024-09-18 01:46:43,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=343880.0, ans=0.025 2024-09-18 01:46:45,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=343880.0, ans=15.0 2024-09-18 01:46:48,754 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-19.pt 2024-09-18 01:47:35,798 INFO [train.py:1198] (0/2) Epoch 20, batch 0, loss[loss=0.2242, ctc_loss=0.1163, cr_loss=0.3464, attn_decoder_loss=0.2285, over 29626.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1163, cr_loss=0.3464, attn_decoder_loss=0.2285, over 29626.00 frames. ], batch size: 73, lr: 5.65e-03, grad_scale: 16.0 2024-09-18 01:47:35,799 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 01:47:41,085 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.0105, 2.9573, 3.0339, 3.3353], device='cuda:0') 2024-09-18 01:47:52,984 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7348, 4.6365, 4.4636, 4.2583], device='cuda:0') 2024-09-18 01:47:54,257 INFO [train.py:1230] (0/2) Epoch 20, validation: loss=0.2118, ctc_loss=0.0395, cr_loss=4.878e-15, attn_decoder_loss=0.2309, over 944034.00 frames. 2024-09-18 01:47:54,257 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 01:48:23,230 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 9.011e+01 1.094e+02 1.165e+02 1.257e+02 3.397e+02, threshold=2.331e+02, percent-clipped=2.0 2024-09-18 01:48:26,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=343980.0, ans=0.125 2024-09-18 01:48:31,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=343980.0, ans=0.07 2024-09-18 01:48:37,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343980.0, ans=0.1 2024-09-18 01:49:12,431 INFO [train.py:1198] (0/2) Epoch 20, batch 50, loss[loss=0.2231, ctc_loss=0.1199, cr_loss=0.3623, attn_decoder_loss=0.2265, over 29427.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1408, cr_loss=0.3845, attn_decoder_loss=0.2527, over 1267027.01 frames. ], batch size: 70, lr: 5.64e-03, grad_scale: 4.0 2024-09-18 01:49:19,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.10 vs. limit=15.0 2024-09-18 01:49:31,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2024-09-18 01:49:40,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-18 01:49:44,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=344180.0, ans=0.125 2024-09-18 01:49:50,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=344180.0, ans=0.0 2024-09-18 01:50:09,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=344220.0, ans=0.0 2024-09-18 01:50:09,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-09-18 01:50:10,597 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:50:11,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.16 vs. limit=15.0 2024-09-18 01:50:24,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=344260.0, ans=0.09899494936611666 2024-09-18 01:50:24,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=344260.0, ans=0.125 2024-09-18 01:50:28,289 INFO [train.py:1198] (0/2) Epoch 20, batch 100, loss[loss=0.2462, ctc_loss=0.1381, cr_loss=0.3805, attn_decoder_loss=0.2498, over 29537.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1428, cr_loss=0.3888, attn_decoder_loss=0.255, over 2252985.71 frames. ], batch size: 76, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:50:51,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=344340.0, ans=0.5 2024-09-18 01:50:55,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.781e+01 9.298e+01 1.012e+02 1.493e+02, threshold=1.860e+02, percent-clipped=0.0 2024-09-18 01:51:03,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=344380.0, ans=0.0 2024-09-18 01:51:10,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=8.0 2024-09-18 01:51:24,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=344420.0, ans=0.1 2024-09-18 01:51:35,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=344460.0, ans=0.0 2024-09-18 01:51:42,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=344460.0, ans=0.125 2024-09-18 01:51:44,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=344500.0, ans=0.125 2024-09-18 01:51:45,547 INFO [train.py:1198] (0/2) Epoch 20, batch 150, loss[loss=0.2302, ctc_loss=0.1297, cr_loss=0.3543, attn_decoder_loss=0.2335, over 29436.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1393, cr_loss=0.3824, attn_decoder_loss=0.2522, over 3048574.84 frames. ], batch size: 70, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:51:48,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=344500.0, ans=0.0 2024-09-18 01:51:57,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=344500.0, ans=0.04949747468305833 2024-09-18 01:52:09,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=344540.0, ans=0.125 2024-09-18 01:52:19,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344580.0, ans=0.125 2024-09-18 01:52:30,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=344580.0, ans=0.125 2024-09-18 01:52:57,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-09-18 01:53:00,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2024-09-18 01:53:03,337 INFO [train.py:1198] (0/2) Epoch 20, batch 200, loss[loss=0.2557, ctc_loss=0.1437, cr_loss=0.3875, attn_decoder_loss=0.2595, over 27342.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.138, cr_loss=0.3807, attn_decoder_loss=0.2509, over 3660866.33 frames. ], batch size: 124, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:53:12,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=344700.0, ans=0.125 2024-09-18 01:53:30,636 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.380e+01 8.894e+01 9.610e+01 1.111e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 01:53:35,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=344780.0, ans=0.125 2024-09-18 01:53:42,813 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-18 01:53:50,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.22 vs. limit=22.5 2024-09-18 01:53:54,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=344820.0, ans=0.0 2024-09-18 01:54:19,408 INFO [train.py:1198] (0/2) Epoch 20, batch 250, loss[loss=0.2619, ctc_loss=0.1474, cr_loss=0.4084, attn_decoder_loss=0.2656, over 29233.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1381, cr_loss=0.3816, attn_decoder_loss=0.2509, over 4143634.82 frames. ], batch size: 100, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:54:19,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=344900.0, ans=0.0 2024-09-18 01:54:21,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=344900.0, ans=0.05 2024-09-18 01:54:36,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=344940.0, ans=0.125 2024-09-18 01:54:49,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=344980.0, ans=0.125 2024-09-18 01:54:54,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344980.0, ans=0.1 2024-09-18 01:54:54,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=344980.0, ans=0.2 2024-09-18 01:55:06,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=345020.0, ans=0.05 2024-09-18 01:55:14,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.37 vs. limit=15.0 2024-09-18 01:55:34,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=345060.0, ans=0.0 2024-09-18 01:55:37,650 INFO [train.py:1198] (0/2) Epoch 20, batch 300, loss[loss=0.2725, ctc_loss=0.1647, cr_loss=0.451, attn_decoder_loss=0.2745, over 29532.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1373, cr_loss=0.3804, attn_decoder_loss=0.2504, over 4511045.12 frames. ], batch size: 92, lr: 5.64e-03, grad_scale: 8.0 2024-09-18 01:55:45,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=345100.0, ans=0.125 2024-09-18 01:56:07,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.480e+01 8.946e+01 9.469e+01 2.628e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-18 01:56:17,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-09-18 01:56:56,017 INFO [train.py:1198] (0/2) Epoch 20, batch 350, loss[loss=0.2194, ctc_loss=0.1089, cr_loss=0.3294, attn_decoder_loss=0.2244, over 29343.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1375, cr_loss=0.3806, attn_decoder_loss=0.2509, over 4795450.48 frames. ], batch size: 71, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 01:56:58,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-09-18 01:57:14,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-09-18 01:57:24,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=345380.0, ans=0.0 2024-09-18 01:57:50,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=345420.0, ans=0.125 2024-09-18 01:57:53,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=345420.0, ans=0.0 2024-09-18 01:58:04,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-18 01:58:11,300 INFO [train.py:1198] (0/2) Epoch 20, batch 400, loss[loss=0.2529, ctc_loss=0.1453, cr_loss=0.3959, attn_decoder_loss=0.2561, over 29713.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1364, cr_loss=0.3782, attn_decoder_loss=0.2502, over 5026525.13 frames. ], batch size: 82, lr: 5.63e-03, grad_scale: 16.0 2024-09-18 01:58:13,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=345500.0, ans=0.125 2024-09-18 01:58:27,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=10.68 vs. limit=12.0 2024-09-18 01:58:40,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.703e+01 9.237e+01 1.010e+02 2.283e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-18 01:58:48,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=345580.0, ans=0.0 2024-09-18 01:59:20,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=345660.0, ans=0.09899494936611666 2024-09-18 01:59:30,512 INFO [train.py:1198] (0/2) Epoch 20, batch 450, loss[loss=0.2557, ctc_loss=0.1478, cr_loss=0.3998, attn_decoder_loss=0.2588, over 29684.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1368, cr_loss=0.3783, attn_decoder_loss=0.2504, over 5187987.42 frames. ], batch size: 83, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:00:05,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=345780.0, ans=0.0 2024-09-18 02:00:05,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=345780.0, ans=0.125 2024-09-18 02:00:14,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=345780.0, ans=0.125 2024-09-18 02:00:20,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=345820.0, ans=0.0 2024-09-18 02:00:34,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-18 02:00:43,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=345860.0, ans=0.025 2024-09-18 02:00:48,894 INFO [train.py:1198] (0/2) Epoch 20, batch 500, loss[loss=0.2682, ctc_loss=0.155, cr_loss=0.4101, attn_decoder_loss=0.2716, over 29458.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1367, cr_loss=0.3784, attn_decoder_loss=0.2498, over 5331128.41 frames. ], batch size: 94, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:01:13,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=345940.0, ans=0.09899494936611666 2024-09-18 02:01:17,859 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.440e+01 8.932e+01 9.633e+01 1.955e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-18 02:01:21,264 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:01:23,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.89 vs. limit=10.0 2024-09-18 02:02:05,130 INFO [train.py:1198] (0/2) Epoch 20, batch 550, loss[loss=0.2524, ctc_loss=0.142, cr_loss=0.402, attn_decoder_loss=0.2557, over 28883.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1373, cr_loss=0.3793, attn_decoder_loss=0.2503, over 5421758.17 frames. ], batch size: 104, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:02:09,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.97 vs. limit=22.5 2024-09-18 02:02:14,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=346100.0, ans=0.0 2024-09-18 02:02:22,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346140.0, ans=0.1 2024-09-18 02:02:29,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=346140.0, ans=0.125 2024-09-18 02:02:31,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.98 vs. limit=15.0 2024-09-18 02:02:49,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=346220.0, ans=0.125 2024-09-18 02:03:16,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=346260.0, ans=0.125 2024-09-18 02:03:23,206 INFO [train.py:1198] (0/2) Epoch 20, batch 600, loss[loss=0.2664, ctc_loss=0.1538, cr_loss=0.4188, attn_decoder_loss=0.2696, over 29172.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1377, cr_loss=0.381, attn_decoder_loss=0.2504, over 5508352.30 frames. ], batch size: 100, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:03:25,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=346300.0, ans=0.0 2024-09-18 02:03:54,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.611e+01 9.331e+01 1.005e+02 2.865e+02, threshold=1.866e+02, percent-clipped=3.0 2024-09-18 02:03:56,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.85 vs. limit=10.0 2024-09-18 02:04:17,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=346420.0, ans=0.0 2024-09-18 02:04:18,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=12.0 2024-09-18 02:04:25,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346460.0, ans=0.1 2024-09-18 02:04:26,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346460.0, ans=0.1 2024-09-18 02:04:41,259 INFO [train.py:1198] (0/2) Epoch 20, batch 650, loss[loss=0.2446, ctc_loss=0.1308, cr_loss=0.3677, attn_decoder_loss=0.249, over 29782.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1371, cr_loss=0.3802, attn_decoder_loss=0.2501, over 5585563.32 frames. ], batch size: 81, lr: 5.63e-03, grad_scale: 8.0 2024-09-18 02:04:44,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=346500.0, ans=0.0 2024-09-18 02:04:44,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346500.0, ans=0.1 2024-09-18 02:04:46,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=346500.0, ans=0.2 2024-09-18 02:04:49,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=346500.0, ans=0.0 2024-09-18 02:04:49,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=346500.0, ans=0.125 2024-09-18 02:05:45,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-09-18 02:05:47,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=346660.0, ans=0.125 2024-09-18 02:05:56,825 INFO [train.py:1198] (0/2) Epoch 20, batch 700, loss[loss=0.2573, ctc_loss=0.1475, cr_loss=0.4012, attn_decoder_loss=0.2606, over 29562.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1376, cr_loss=0.3814, attn_decoder_loss=0.2509, over 5636756.44 frames. ], batch size: 76, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:06:12,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=346740.0, ans=0.2 2024-09-18 02:06:15,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346740.0, ans=0.0 2024-09-18 02:06:25,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.542e+01 8.952e+01 9.567e+01 1.859e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-18 02:06:32,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=346780.0, ans=0.125 2024-09-18 02:06:45,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=346820.0, ans=0.05 2024-09-18 02:06:56,249 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:07:14,613 INFO [train.py:1198] (0/2) Epoch 20, batch 750, loss[loss=0.2431, ctc_loss=0.1367, cr_loss=0.3882, attn_decoder_loss=0.2463, over 29723.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1376, cr_loss=0.381, attn_decoder_loss=0.2507, over 5677044.51 frames. ], batch size: 82, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:07:30,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=346940.0, ans=0.125 2024-09-18 02:07:31,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346940.0, ans=0.1 2024-09-18 02:07:32,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-09-18 02:07:54,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=346980.0, ans=0.0 2024-09-18 02:08:02,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-18 02:08:20,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=347060.0, ans=0.0 2024-09-18 02:08:22,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-18 02:08:25,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347060.0, ans=0.1 2024-09-18 02:08:29,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=347060.0, ans=0.125 2024-09-18 02:08:32,357 INFO [train.py:1198] (0/2) Epoch 20, batch 800, loss[loss=0.2268, ctc_loss=0.1204, cr_loss=0.3471, attn_decoder_loss=0.2309, over 29621.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1373, cr_loss=0.3801, attn_decoder_loss=0.2504, over 5707625.72 frames. ], batch size: 73, lr: 5.62e-03, grad_scale: 16.0 2024-09-18 02:09:02,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.412e+01 8.904e+01 9.473e+01 1.507e+02, threshold=1.781e+02, percent-clipped=0.0 2024-09-18 02:09:12,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=347180.0, ans=0.125 2024-09-18 02:09:32,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-18 02:09:48,236 INFO [train.py:1198] (0/2) Epoch 20, batch 850, loss[loss=0.2491, ctc_loss=0.1286, cr_loss=0.3516, attn_decoder_loss=0.2546, over 29693.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1368, cr_loss=0.3792, attn_decoder_loss=0.25, over 5736558.62 frames. ], batch size: 89, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:10:03,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=347340.0, ans=0.2 2024-09-18 02:10:09,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=347340.0, ans=0.125 2024-09-18 02:10:25,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=347380.0, ans=0.125 2024-09-18 02:10:31,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=347420.0, ans=0.2 2024-09-18 02:10:31,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=347420.0, ans=0.125 2024-09-18 02:10:37,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=12.0 2024-09-18 02:10:47,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=347460.0, ans=0.0 2024-09-18 02:10:59,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347460.0, ans=0.1 2024-09-18 02:11:03,669 INFO [train.py:1198] (0/2) Epoch 20, batch 900, loss[loss=0.2238, ctc_loss=0.1201, cr_loss=0.3563, attn_decoder_loss=0.2274, over 29627.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1372, cr_loss=0.3802, attn_decoder_loss=0.2506, over 5740436.80 frames. ], batch size: 73, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:11:09,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=347500.0, ans=0.1 2024-09-18 02:11:33,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.02 vs. limit=12.0 2024-09-18 02:11:35,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=347540.0, ans=0.125 2024-09-18 02:11:38,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 8.646e+01 9.308e+01 1.001e+02 2.040e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-18 02:11:40,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347580.0, ans=0.1 2024-09-18 02:11:43,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=9.91 vs. limit=12.0 2024-09-18 02:11:56,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=347620.0, ans=0.125 2024-09-18 02:11:59,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=347620.0, ans=0.125 2024-09-18 02:12:01,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=347620.0, ans=0.125 2024-09-18 02:12:11,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=347660.0, ans=0.125 2024-09-18 02:12:14,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347660.0, ans=0.1 2024-09-18 02:12:22,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=347700.0, ans=15.0 2024-09-18 02:12:23,500 INFO [train.py:1198] (0/2) Epoch 20, batch 950, loss[loss=0.2211, ctc_loss=0.1124, cr_loss=0.353, attn_decoder_loss=0.2253, over 29521.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1372, cr_loss=0.3797, attn_decoder_loss=0.2507, over 5740257.45 frames. ], batch size: 74, lr: 5.62e-03, grad_scale: 8.0 2024-09-18 02:12:37,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=347740.0, ans=0.0 2024-09-18 02:13:04,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347780.0, ans=0.1 2024-09-18 02:13:06,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=347780.0, ans=0.05 2024-09-18 02:13:38,986 INFO [train.py:1198] (0/2) Epoch 20, batch 1000, loss[loss=0.2371, ctc_loss=0.1289, cr_loss=0.3719, attn_decoder_loss=0.2408, over 29496.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1385, cr_loss=0.3814, attn_decoder_loss=0.2517, over 5734693.99 frames. ], batch size: 77, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:13:42,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=347900.0, ans=0.0 2024-09-18 02:14:07,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=347980.0, ans=0.125 2024-09-18 02:14:09,210 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.704e+01 9.397e+01 1.040e+02 1.771e+02, threshold=1.879e+02, percent-clipped=0.0 2024-09-18 02:14:11,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.97 vs. limit=12.0 2024-09-18 02:14:17,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=347980.0, ans=0.0 2024-09-18 02:14:41,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=348060.0, ans=0.2 2024-09-18 02:14:50,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=348060.0, ans=0.05 2024-09-18 02:14:54,854 INFO [train.py:1198] (0/2) Epoch 20, batch 1050, loss[loss=0.2633, ctc_loss=0.1501, cr_loss=0.3983, attn_decoder_loss=0.267, over 29670.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1378, cr_loss=0.3805, attn_decoder_loss=0.251, over 5744286.71 frames. ], batch size: 85, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:14:56,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=348100.0, ans=0.125 2024-09-18 02:15:19,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=348140.0, ans=10.0 2024-09-18 02:15:24,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=348140.0, ans=0.0 2024-09-18 02:15:28,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=348180.0, ans=0.125 2024-09-18 02:15:33,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=348180.0, ans=0.125 2024-09-18 02:15:47,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-18 02:16:08,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=348260.0, ans=0.125 2024-09-18 02:16:14,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=348300.0, ans=0.125 2024-09-18 02:16:15,295 INFO [train.py:1198] (0/2) Epoch 20, batch 1100, loss[loss=0.241, ctc_loss=0.1369, cr_loss=0.375, attn_decoder_loss=0.2442, over 29458.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1376, cr_loss=0.3802, attn_decoder_loss=0.2506, over 5756047.67 frames. ], batch size: 78, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:16:21,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=348300.0, ans=0.0 2024-09-18 02:16:32,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=348340.0, ans=0.2 2024-09-18 02:16:38,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.33 vs. limit=15.0 2024-09-18 02:16:45,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.537e+01 9.169e+01 9.929e+01 2.148e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 02:17:01,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=348420.0, ans=0.09899494936611666 2024-09-18 02:17:05,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=348420.0, ans=0.0 2024-09-18 02:17:07,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=348420.0, ans=0.125 2024-09-18 02:17:11,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=348420.0, ans=0.125 2024-09-18 02:17:19,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=348460.0, ans=0.07 2024-09-18 02:17:31,318 INFO [train.py:1198] (0/2) Epoch 20, batch 1150, loss[loss=0.2363, ctc_loss=0.1297, cr_loss=0.3513, attn_decoder_loss=0.2404, over 29466.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1376, cr_loss=0.3798, attn_decoder_loss=0.2506, over 5753317.00 frames. ], batch size: 78, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:17:43,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=348500.0, ans=0.0 2024-09-18 02:18:24,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=348620.0, ans=0.125 2024-09-18 02:18:46,993 INFO [train.py:1198] (0/2) Epoch 20, batch 1200, loss[loss=0.257, ctc_loss=0.1449, cr_loss=0.3909, attn_decoder_loss=0.2608, over 29683.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1385, cr_loss=0.3811, attn_decoder_loss=0.2514, over 5747066.43 frames. ], batch size: 85, lr: 5.61e-03, grad_scale: 16.0 2024-09-18 02:18:47,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=348700.0, ans=0.0 2024-09-18 02:18:47,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348700.0, ans=0.1 2024-09-18 02:19:03,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348700.0, ans=0.125 2024-09-18 02:19:08,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=348740.0, ans=0.125 2024-09-18 02:19:21,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=348780.0, ans=0.025 2024-09-18 02:19:23,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.725e+01 9.303e+01 1.008e+02 1.601e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-18 02:19:33,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=348780.0, ans=0.05 2024-09-18 02:19:33,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=348780.0, ans=0.2 2024-09-18 02:19:45,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=348820.0, ans=0.125 2024-09-18 02:20:07,688 INFO [train.py:1198] (0/2) Epoch 20, batch 1250, loss[loss=0.2565, ctc_loss=0.1465, cr_loss=0.3961, attn_decoder_loss=0.2599, over 29550.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1385, cr_loss=0.3815, attn_decoder_loss=0.2518, over 5773851.17 frames. ], batch size: 92, lr: 5.61e-03, grad_scale: 8.0 2024-09-18 02:20:13,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348900.0, ans=0.0 2024-09-18 02:20:31,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-18 02:20:41,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=348980.0, ans=0.1 2024-09-18 02:20:50,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=348980.0, ans=0.1 2024-09-18 02:20:51,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=349020.0, ans=0.125 2024-09-18 02:21:23,272 INFO [train.py:1198] (0/2) Epoch 20, batch 1300, loss[loss=0.256, ctc_loss=0.1429, cr_loss=0.3877, attn_decoder_loss=0.26, over 28663.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1381, cr_loss=0.3811, attn_decoder_loss=0.2512, over 5778210.46 frames. ], batch size: 112, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:21:28,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=349100.0, ans=0.0 2024-09-18 02:21:29,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=349100.0, ans=0.125 2024-09-18 02:21:55,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.542e+01 9.047e+01 9.656e+01 1.934e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-18 02:22:00,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=349180.0, ans=0.95 2024-09-18 02:22:06,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=349180.0, ans=0.2 2024-09-18 02:22:10,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=349220.0, ans=0.125 2024-09-18 02:22:38,975 INFO [train.py:1198] (0/2) Epoch 20, batch 1350, loss[loss=0.254, ctc_loss=0.142, cr_loss=0.3818, attn_decoder_loss=0.258, over 29784.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1373, cr_loss=0.3799, attn_decoder_loss=0.2508, over 5797128.66 frames. ], batch size: 81, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:22:52,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=349340.0, ans=0.2 2024-09-18 02:22:55,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=349340.0, ans=0.2 2024-09-18 02:23:46,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=349460.0, ans=0.0 2024-09-18 02:23:56,657 INFO [train.py:1198] (0/2) Epoch 20, batch 1400, loss[loss=0.2125, ctc_loss=0.1113, cr_loss=0.3166, attn_decoder_loss=0.2167, over 29606.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1369, cr_loss=0.3794, attn_decoder_loss=0.2506, over 5808266.14 frames. ], batch size: 69, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:24:28,081 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.400e+01 8.906e+01 9.445e+01 1.188e+02, threshold=1.781e+02, percent-clipped=0.0 2024-09-18 02:24:40,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=349620.0, ans=0.1 2024-09-18 02:24:48,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-18 02:24:53,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2024-09-18 02:24:56,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-18 02:24:58,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=349660.0, ans=0.09899494936611666 2024-09-18 02:25:12,149 INFO [train.py:1198] (0/2) Epoch 20, batch 1450, loss[loss=0.2591, ctc_loss=0.1436, cr_loss=0.3949, attn_decoder_loss=0.2631, over 29465.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1372, cr_loss=0.3805, attn_decoder_loss=0.2509, over 5804520.07 frames. ], batch size: 94, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:25:16,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=349700.0, ans=0.125 2024-09-18 02:25:37,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=349740.0, ans=0.125 2024-09-18 02:25:39,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=349740.0, ans=0.2 2024-09-18 02:25:41,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=349780.0, ans=0.125 2024-09-18 02:25:44,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=349780.0, ans=0.125 2024-09-18 02:25:57,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=349820.0, ans=0.125 2024-09-18 02:26:05,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.45 vs. limit=15.0 2024-09-18 02:26:16,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2024-09-18 02:26:27,545 INFO [train.py:1198] (0/2) Epoch 20, batch 1500, loss[loss=0.2584, ctc_loss=0.1493, cr_loss=0.4002, attn_decoder_loss=0.2616, over 29643.00 frames. ], tot_loss[loss=0.248, ctc_loss=0.1379, cr_loss=0.3822, attn_decoder_loss=0.2518, over 5804815.17 frames. ], batch size: 86, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:26:58,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-09-18 02:27:04,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.814e+01 9.450e+01 1.000e+02 1.461e+02, threshold=1.890e+02, percent-clipped=0.0 2024-09-18 02:27:06,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=349980.0, ans=0.125 2024-09-18 02:27:13,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=349980.0, ans=0.09899494936611666 2024-09-18 02:27:33,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=350060.0, ans=0.0 2024-09-18 02:27:41,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=350060.0, ans=0.125 2024-09-18 02:27:48,324 INFO [train.py:1198] (0/2) Epoch 20, batch 1550, loss[loss=0.2528, ctc_loss=0.141, cr_loss=0.3985, attn_decoder_loss=0.2563, over 29528.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.138, cr_loss=0.3822, attn_decoder_loss=0.2516, over 5781896.51 frames. ], batch size: 90, lr: 5.60e-03, grad_scale: 8.0 2024-09-18 02:27:54,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-09-18 02:27:57,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=15.0 2024-09-18 02:28:02,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=350140.0, ans=0.07 2024-09-18 02:28:12,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=350140.0, ans=0.1 2024-09-18 02:28:22,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-18 02:28:39,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=350220.0, ans=0.125 2024-09-18 02:28:52,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=350260.0, ans=0.125 2024-09-18 02:28:58,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=350260.0, ans=0.125 2024-09-18 02:29:01,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=350260.0, ans=0.125 2024-09-18 02:29:03,923 INFO [train.py:1198] (0/2) Epoch 20, batch 1600, loss[loss=0.2547, ctc_loss=0.1387, cr_loss=0.367, attn_decoder_loss=0.2594, over 29673.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1376, cr_loss=0.3808, attn_decoder_loss=0.2509, over 5765078.39 frames. ], batch size: 85, lr: 5.59e-03, grad_scale: 16.0 2024-09-18 02:29:12,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-09-18 02:29:16,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=350300.0, ans=0.125 2024-09-18 02:29:17,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=350340.0, ans=0.1 2024-09-18 02:29:30,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=350340.0, ans=0.025 2024-09-18 02:29:37,442 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.748e+01 9.299e+01 1.007e+02 2.517e+02, threshold=1.860e+02, percent-clipped=3.0 2024-09-18 02:29:44,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350380.0, ans=0.1 2024-09-18 02:30:15,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=350460.0, ans=0.125 2024-09-18 02:30:16,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-18 02:30:17,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2024-09-18 02:30:19,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=12.0 2024-09-18 02:30:19,926 INFO [train.py:1198] (0/2) Epoch 20, batch 1650, loss[loss=0.2573, ctc_loss=0.1378, cr_loss=0.3867, attn_decoder_loss=0.262, over 29701.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1376, cr_loss=0.3808, attn_decoder_loss=0.2508, over 5761142.65 frames. ], batch size: 89, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:30:20,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=350500.0, ans=0.125 2024-09-18 02:30:36,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.19 vs. limit=22.5 2024-09-18 02:31:13,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350620.0, ans=0.1 2024-09-18 02:31:16,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=350620.0, ans=0.125 2024-09-18 02:31:17,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=350620.0, ans=0.125 2024-09-18 02:31:28,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=350660.0, ans=0.025 2024-09-18 02:31:39,888 INFO [train.py:1198] (0/2) Epoch 20, batch 1700, loss[loss=0.2193, ctc_loss=0.1115, cr_loss=0.3329, attn_decoder_loss=0.2239, over 29589.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1369, cr_loss=0.38, attn_decoder_loss=0.2504, over 5782815.44 frames. ], batch size: 69, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:31:44,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350700.0, ans=0.1 2024-09-18 02:31:57,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-09-18 02:32:13,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.537e+01 9.114e+01 9.746e+01 1.208e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-18 02:32:13,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=350780.0, ans=0.025 2024-09-18 02:32:16,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350780.0, ans=0.1 2024-09-18 02:32:53,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=350860.0, ans=0.125 2024-09-18 02:32:55,910 INFO [train.py:1198] (0/2) Epoch 20, batch 1750, loss[loss=0.2183, ctc_loss=0.1095, cr_loss=0.324, attn_decoder_loss=0.2231, over 29358.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1367, cr_loss=0.3798, attn_decoder_loss=0.25, over 5790414.98 frames. ], batch size: 67, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:33:09,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=350940.0, ans=0.025 2024-09-18 02:33:09,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=350940.0, ans=0.125 2024-09-18 02:33:11,835 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.10 vs. limit=15.0 2024-09-18 02:33:14,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=350940.0, ans=0.125 2024-09-18 02:33:18,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=350940.0, ans=0.125 2024-09-18 02:33:32,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=350980.0, ans=0.025 2024-09-18 02:33:56,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=351060.0, ans=0.125 2024-09-18 02:34:11,443 INFO [train.py:1198] (0/2) Epoch 20, batch 1800, loss[loss=0.252, ctc_loss=0.1427, cr_loss=0.3947, attn_decoder_loss=0.2553, over 29697.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1368, cr_loss=0.3802, attn_decoder_loss=0.2504, over 5792789.38 frames. ], batch size: 83, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:34:16,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=351100.0, ans=0.125 2024-09-18 02:34:33,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=351140.0, ans=10.0 2024-09-18 02:34:48,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.564e+01 9.228e+01 9.746e+01 1.564e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-18 02:35:00,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=351220.0, ans=0.125 2024-09-18 02:35:01,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=351220.0, ans=0.125 2024-09-18 02:35:32,098 INFO [train.py:1198] (0/2) Epoch 20, batch 1850, loss[loss=0.2563, ctc_loss=0.1397, cr_loss=0.3806, attn_decoder_loss=0.2608, over 29627.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1367, cr_loss=0.3799, attn_decoder_loss=0.2504, over 5798053.69 frames. ], batch size: 86, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:36:23,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=351420.0, ans=0.0 2024-09-18 02:36:27,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=15.0 2024-09-18 02:36:28,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=351420.0, ans=0.2 2024-09-18 02:36:32,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=351460.0, ans=0.125 2024-09-18 02:36:32,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=351460.0, ans=0.0 2024-09-18 02:36:47,372 INFO [train.py:1198] (0/2) Epoch 20, batch 1900, loss[loss=0.2528, ctc_loss=0.1383, cr_loss=0.3826, attn_decoder_loss=0.257, over 29729.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1372, cr_loss=0.3809, attn_decoder_loss=0.251, over 5805997.81 frames. ], batch size: 89, lr: 5.59e-03, grad_scale: 8.0 2024-09-18 02:36:49,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=351500.0, ans=0.125 2024-09-18 02:37:02,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351540.0, ans=0.1 2024-09-18 02:37:13,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=351540.0, ans=0.0 2024-09-18 02:37:17,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=351580.0, ans=0.125 2024-09-18 02:37:18,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=351580.0, ans=10.0 2024-09-18 02:37:20,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.814e+01 8.754e+01 9.062e+01 9.837e+01 1.384e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 02:37:22,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351580.0, ans=0.1 2024-09-18 02:37:30,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=351580.0, ans=0.125 2024-09-18 02:37:43,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351620.0, ans=0.125 2024-09-18 02:37:52,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=351660.0, ans=0.0 2024-09-18 02:38:00,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=351660.0, ans=0.0 2024-09-18 02:38:03,054 INFO [train.py:1198] (0/2) Epoch 20, batch 1950, loss[loss=0.2454, ctc_loss=0.1412, cr_loss=0.3794, attn_decoder_loss=0.2486, over 29451.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1379, cr_loss=0.382, attn_decoder_loss=0.2521, over 5820208.86 frames. ], batch size: 78, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:38:53,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=351820.0, ans=0.0 2024-09-18 02:39:23,351 INFO [train.py:1198] (0/2) Epoch 20, batch 2000, loss[loss=0.218, ctc_loss=0.1063, cr_loss=0.325, attn_decoder_loss=0.2232, over 29379.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1386, cr_loss=0.383, attn_decoder_loss=0.253, over 5796734.97 frames. ], batch size: 67, lr: 5.58e-03, grad_scale: 16.0 2024-09-18 02:39:33,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-09-18 02:39:56,873 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.619e+01 9.159e+01 9.729e+01 7.125e+02, threshold=1.832e+02, percent-clipped=2.0 2024-09-18 02:40:00,364 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-88000.pt 2024-09-18 02:40:09,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351980.0, ans=0.1 2024-09-18 02:40:20,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352020.0, ans=0.1 2024-09-18 02:40:24,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=352020.0, ans=0.0 2024-09-18 02:40:25,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=352020.0, ans=0.05 2024-09-18 02:40:27,101 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:40:28,614 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:40:43,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=352060.0, ans=0.125 2024-09-18 02:40:46,374 INFO [train.py:1198] (0/2) Epoch 20, batch 2050, loss[loss=0.2206, ctc_loss=0.1145, cr_loss=0.3312, attn_decoder_loss=0.225, over 29423.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1381, cr_loss=0.3816, attn_decoder_loss=0.252, over 5787097.09 frames. ], batch size: 70, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:40:46,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=352100.0, ans=0.95 2024-09-18 02:40:48,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-18 02:40:57,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=352100.0, ans=0.0 2024-09-18 02:40:59,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-09-18 02:41:04,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-18 02:41:17,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=352180.0, ans=0.125 2024-09-18 02:41:26,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2024-09-18 02:41:37,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=352220.0, ans=0.025 2024-09-18 02:41:39,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=352220.0, ans=0.125 2024-09-18 02:41:39,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=352220.0, ans=0.0 2024-09-18 02:41:48,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352260.0, ans=0.1 2024-09-18 02:41:54,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=352260.0, ans=0.125 2024-09-18 02:42:01,889 INFO [train.py:1198] (0/2) Epoch 20, batch 2100, loss[loss=0.2535, ctc_loss=0.144, cr_loss=0.404, attn_decoder_loss=0.2567, over 29745.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1373, cr_loss=0.3804, attn_decoder_loss=0.251, over 5798375.86 frames. ], batch size: 81, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:42:12,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=352300.0, ans=0.125 2024-09-18 02:42:22,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=352340.0, ans=0.015 2024-09-18 02:42:22,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=352340.0, ans=0.04949747468305833 2024-09-18 02:42:32,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=352380.0, ans=0.5 2024-09-18 02:42:38,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.424e+01 8.970e+01 9.709e+01 1.410e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 02:43:01,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.67 vs. limit=22.5 2024-09-18 02:43:18,689 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:43:21,525 INFO [train.py:1198] (0/2) Epoch 20, batch 2150, loss[loss=0.2437, ctc_loss=0.1295, cr_loss=0.3659, attn_decoder_loss=0.2483, over 29438.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1365, cr_loss=0.3792, attn_decoder_loss=0.2504, over 5813614.13 frames. ], batch size: 78, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:43:23,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=352500.0, ans=0.0 2024-09-18 02:43:25,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.17 vs. limit=10.0 2024-09-18 02:43:26,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.71 vs. limit=22.5 2024-09-18 02:43:35,690 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:43:36,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-09-18 02:43:41,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=352540.0, ans=0.2 2024-09-18 02:43:47,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=352540.0, ans=0.0 2024-09-18 02:43:50,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352580.0, ans=0.1 2024-09-18 02:44:05,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=352620.0, ans=0.025 2024-09-18 02:44:08,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=352620.0, ans=0.125 2024-09-18 02:44:13,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=352620.0, ans=0.0 2024-09-18 02:44:31,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352660.0, ans=0.1 2024-09-18 02:44:37,544 INFO [train.py:1198] (0/2) Epoch 20, batch 2200, loss[loss=0.2543, ctc_loss=0.1411, cr_loss=0.4038, attn_decoder_loss=0.2579, over 29608.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1368, cr_loss=0.3798, attn_decoder_loss=0.2504, over 5810636.82 frames. ], batch size: 86, lr: 5.58e-03, grad_scale: 8.0 2024-09-18 02:44:42,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=352700.0, ans=0.125 2024-09-18 02:45:08,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=352780.0, ans=0.125 2024-09-18 02:45:12,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.647e+01 9.174e+01 9.915e+01 1.896e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-18 02:45:17,794 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:45:22,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=352820.0, ans=0.125 2024-09-18 02:45:25,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=12.0 2024-09-18 02:45:38,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=352860.0, ans=0.125 2024-09-18 02:45:53,595 INFO [train.py:1198] (0/2) Epoch 20, batch 2250, loss[loss=0.2503, ctc_loss=0.1398, cr_loss=0.3811, attn_decoder_loss=0.2541, over 29726.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1364, cr_loss=0.3791, attn_decoder_loss=0.2502, over 5810498.26 frames. ], batch size: 82, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:45:56,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=352900.0, ans=0.125 2024-09-18 02:46:55,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=353020.0, ans=0.0 2024-09-18 02:47:12,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=353100.0, ans=0.0 2024-09-18 02:47:13,751 INFO [train.py:1198] (0/2) Epoch 20, batch 2300, loss[loss=0.2074, ctc_loss=0.1006, cr_loss=0.3184, attn_decoder_loss=0.2122, over 29315.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1358, cr_loss=0.3774, attn_decoder_loss=0.2492, over 5798414.10 frames. ], batch size: 71, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:47:15,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=353100.0, ans=0.0 2024-09-18 02:47:46,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-09-18 02:47:48,557 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.594e+01 9.374e+01 1.007e+02 2.489e+02, threshold=1.875e+02, percent-clipped=2.0 2024-09-18 02:47:59,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=353220.0, ans=0.0 2024-09-18 02:48:08,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=353220.0, ans=0.125 2024-09-18 02:48:13,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=353260.0, ans=0.2 2024-09-18 02:48:29,461 INFO [train.py:1198] (0/2) Epoch 20, batch 2350, loss[loss=0.2681, ctc_loss=0.1525, cr_loss=0.4275, attn_decoder_loss=0.2714, over 29682.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1356, cr_loss=0.3774, attn_decoder_loss=0.2493, over 5803855.60 frames. ], batch size: 83, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:48:43,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=353340.0, ans=0.125 2024-09-18 02:48:49,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=353340.0, ans=0.125 2024-09-18 02:48:52,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=353340.0, ans=0.125 2024-09-18 02:48:54,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.89 vs. limit=15.0 2024-09-18 02:48:56,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=353340.0, ans=0.025 2024-09-18 02:49:06,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=353380.0, ans=0.0 2024-09-18 02:49:07,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353380.0, ans=0.1 2024-09-18 02:49:30,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=353460.0, ans=0.2 2024-09-18 02:49:45,311 INFO [train.py:1198] (0/2) Epoch 20, batch 2400, loss[loss=0.2375, ctc_loss=0.1323, cr_loss=0.3744, attn_decoder_loss=0.2408, over 29520.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1362, cr_loss=0.3784, attn_decoder_loss=0.2498, over 5807549.93 frames. ], batch size: 76, lr: 5.57e-03, grad_scale: 16.0 2024-09-18 02:49:59,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=353540.0, ans=0.0 2024-09-18 02:50:12,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.52 vs. limit=15.0 2024-09-18 02:50:20,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353580.0, ans=0.1 2024-09-18 02:50:23,694 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.660e+01 9.243e+01 9.853e+01 2.252e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-18 02:50:54,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=353660.0, ans=0.09899494936611666 2024-09-18 02:50:54,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=353660.0, ans=0.0 2024-09-18 02:51:05,906 INFO [train.py:1198] (0/2) Epoch 20, batch 2450, loss[loss=0.2478, ctc_loss=0.1341, cr_loss=0.3916, attn_decoder_loss=0.2518, over 29719.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1367, cr_loss=0.3794, attn_decoder_loss=0.2506, over 5784039.22 frames. ], batch size: 82, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:51:13,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=353700.0, ans=0.2 2024-09-18 02:51:53,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=353820.0, ans=0.1 2024-09-18 02:52:09,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.97 vs. limit=15.0 2024-09-18 02:52:11,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=353860.0, ans=0.05 2024-09-18 02:52:17,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=353860.0, ans=0.2 2024-09-18 02:52:21,951 INFO [train.py:1198] (0/2) Epoch 20, batch 2500, loss[loss=0.2647, ctc_loss=0.1478, cr_loss=0.4101, attn_decoder_loss=0.2686, over 29648.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1367, cr_loss=0.3802, attn_decoder_loss=0.2508, over 5794405.68 frames. ], batch size: 86, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:52:42,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.29 vs. limit=10.0 2024-09-18 02:52:55,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=353980.0, ans=0.0 2024-09-18 02:52:58,484 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.592e+01 8.974e+01 9.558e+01 1.231e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-18 02:53:18,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=354020.0, ans=0.0 2024-09-18 02:53:38,056 INFO [train.py:1198] (0/2) Epoch 20, batch 2550, loss[loss=0.2302, ctc_loss=0.1233, cr_loss=0.3521, attn_decoder_loss=0.2343, over 29334.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1368, cr_loss=0.3805, attn_decoder_loss=0.2509, over 5797502.10 frames. ], batch size: 67, lr: 5.57e-03, grad_scale: 8.0 2024-09-18 02:53:47,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-09-18 02:54:16,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=354180.0, ans=0.125 2024-09-18 02:54:20,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=354180.0, ans=0.125 2024-09-18 02:54:26,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-09-18 02:54:37,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=354220.0, ans=0.125 2024-09-18 02:54:58,067 INFO [train.py:1198] (0/2) Epoch 20, batch 2600, loss[loss=0.2464, ctc_loss=0.1373, cr_loss=0.3978, attn_decoder_loss=0.2497, over 29422.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1369, cr_loss=0.3804, attn_decoder_loss=0.2513, over 5794595.17 frames. ], batch size: 78, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:55:04,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=22.5 2024-09-18 02:55:28,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-18 02:55:34,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.665e+01 9.316e+01 9.977e+01 1.565e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-18 02:55:46,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2024-09-18 02:55:47,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=354420.0, ans=0.0 2024-09-18 02:55:57,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=354460.0, ans=0.2 2024-09-18 02:55:59,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=354460.0, ans=0.09899494936611666 2024-09-18 02:56:13,768 INFO [train.py:1198] (0/2) Epoch 20, batch 2650, loss[loss=0.2594, ctc_loss=0.1408, cr_loss=0.3804, attn_decoder_loss=0.2642, over 29201.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.137, cr_loss=0.3809, attn_decoder_loss=0.2517, over 5800268.29 frames. ], batch size: 100, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:56:27,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=354540.0, ans=0.125 2024-09-18 02:56:50,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=354580.0, ans=0.125 2024-09-18 02:57:03,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=354620.0, ans=0.025 2024-09-18 02:57:29,350 INFO [train.py:1198] (0/2) Epoch 20, batch 2700, loss[loss=0.2617, ctc_loss=0.1464, cr_loss=0.3939, attn_decoder_loss=0.2658, over 29546.00 frames. ], tot_loss[loss=0.2483, ctc_loss=0.1375, cr_loss=0.3813, attn_decoder_loss=0.2521, over 5794869.41 frames. ], batch size: 87, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:57:34,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=354700.0, ans=0.07 2024-09-18 02:57:35,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=354700.0, ans=0.0 2024-09-18 02:57:35,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=354700.0, ans=0.2 2024-09-18 02:57:38,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=354700.0, ans=15.0 2024-09-18 02:57:51,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=354740.0, ans=0.125 2024-09-18 02:58:07,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.506e+01 9.049e+01 9.472e+01 1.287e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 02:58:15,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=354820.0, ans=0.2 2024-09-18 02:58:20,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-09-18 02:58:24,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=354820.0, ans=0.0 2024-09-18 02:58:27,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=354820.0, ans=0.125 2024-09-18 02:58:33,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=354860.0, ans=0.0 2024-09-18 02:58:45,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=354860.0, ans=0.0 2024-09-18 02:58:49,462 INFO [train.py:1198] (0/2) Epoch 20, batch 2750, loss[loss=0.2412, ctc_loss=0.1359, cr_loss=0.387, attn_decoder_loss=0.2443, over 29524.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1366, cr_loss=0.3797, attn_decoder_loss=0.2508, over 5793027.81 frames. ], batch size: 75, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 02:58:54,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=354900.0, ans=0.125 2024-09-18 02:59:04,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=354940.0, ans=0.2 2024-09-18 02:59:12,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=354940.0, ans=0.0 2024-09-18 02:59:14,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-09-18 02:59:21,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=354980.0, ans=0.125 2024-09-18 02:59:32,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=354980.0, ans=0.1 2024-09-18 02:59:41,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=355020.0, ans=0.0 2024-09-18 02:59:49,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=355060.0, ans=0.125 2024-09-18 03:00:05,928 INFO [train.py:1198] (0/2) Epoch 20, batch 2800, loss[loss=0.2833, ctc_loss=0.1919, cr_loss=0.4092, attn_decoder_loss=0.2843, over 20086.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1373, cr_loss=0.3802, attn_decoder_loss=0.2511, over 5774720.83 frames. ], batch size: 209, lr: 5.56e-03, grad_scale: 16.0 2024-09-18 03:00:22,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355140.0, ans=0.1 2024-09-18 03:00:22,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355140.0, ans=0.125 2024-09-18 03:00:31,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355140.0, ans=0.1 2024-09-18 03:00:36,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.74 vs. limit=15.0 2024-09-18 03:00:44,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.732e+01 9.172e+01 1.024e+02 2.809e+02, threshold=1.834e+02, percent-clipped=3.0 2024-09-18 03:00:46,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=355180.0, ans=0.2 2024-09-18 03:00:56,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=355220.0, ans=0.07 2024-09-18 03:01:21,683 INFO [train.py:1198] (0/2) Epoch 20, batch 2850, loss[loss=0.2331, ctc_loss=0.1213, cr_loss=0.3478, attn_decoder_loss=0.2378, over 29488.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1378, cr_loss=0.3806, attn_decoder_loss=0.2514, over 5760308.24 frames. ], batch size: 77, lr: 5.56e-03, grad_scale: 8.0 2024-09-18 03:01:24,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-18 03:01:45,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=355340.0, ans=0.025 2024-09-18 03:01:57,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=355380.0, ans=0.0 2024-09-18 03:01:59,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=355380.0, ans=0.025 2024-09-18 03:02:17,216 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:02:24,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=355460.0, ans=0.125 2024-09-18 03:02:37,547 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:02:41,736 INFO [train.py:1198] (0/2) Epoch 20, batch 2900, loss[loss=0.2375, ctc_loss=0.1219, cr_loss=0.3527, attn_decoder_loss=0.2425, over 29440.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1381, cr_loss=0.3824, attn_decoder_loss=0.2524, over 5786180.06 frames. ], batch size: 79, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:03:11,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=12.0 2024-09-18 03:03:19,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.493e+01 9.196e+01 9.952e+01 2.490e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-18 03:03:51,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355660.0, ans=0.125 2024-09-18 03:03:57,577 INFO [train.py:1198] (0/2) Epoch 20, batch 2950, loss[loss=0.2269, ctc_loss=0.1298, cr_loss=0.3745, attn_decoder_loss=0.2294, over 29533.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.137, cr_loss=0.3805, attn_decoder_loss=0.2508, over 5781980.62 frames. ], batch size: 75, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:04:02,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355700.0, ans=0.1 2024-09-18 03:04:05,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=355700.0, ans=0.1 2024-09-18 03:04:08,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=355700.0, ans=0.125 2024-09-18 03:04:19,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2024-09-18 03:04:27,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=355780.0, ans=0.0 2024-09-18 03:04:31,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355780.0, ans=0.1 2024-09-18 03:04:35,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=355780.0, ans=0.125 2024-09-18 03:04:52,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=355820.0, ans=0.2 2024-09-18 03:05:05,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355860.0, ans=0.1 2024-09-18 03:05:13,041 INFO [train.py:1198] (0/2) Epoch 20, batch 3000, loss[loss=0.2591, ctc_loss=0.1553, cr_loss=0.4206, attn_decoder_loss=0.2612, over 29757.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1372, cr_loss=0.3807, attn_decoder_loss=0.251, over 5783366.14 frames. ], batch size: 81, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:05:13,042 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 03:05:32,384 INFO [train.py:1230] (0/2) Epoch 20, validation: loss=0.2111, ctc_loss=0.03914, cr_loss=5.228e-15, attn_decoder_loss=0.2302, over 944034.00 frames. 2024-09-18 03:05:32,384 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 03:05:41,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=355900.0, ans=0.125 2024-09-18 03:05:53,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=355940.0, ans=0.2 2024-09-18 03:06:10,670 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.598e+01 9.158e+01 9.918e+01 2.557e+02, threshold=1.832e+02, percent-clipped=1.0 2024-09-18 03:06:44,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=356060.0, ans=0.2 2024-09-18 03:06:47,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=356060.0, ans=0.125 2024-09-18 03:06:50,699 INFO [train.py:1198] (0/2) Epoch 20, batch 3050, loss[loss=0.2354, ctc_loss=0.1342, cr_loss=0.3808, attn_decoder_loss=0.2382, over 29515.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1381, cr_loss=0.3822, attn_decoder_loss=0.2519, over 5776682.07 frames. ], batch size: 76, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:07:16,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356140.0, ans=0.125 2024-09-18 03:07:17,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-09-18 03:07:36,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=356220.0, ans=0.125 2024-09-18 03:07:46,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=356220.0, ans=0.0 2024-09-18 03:08:00,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=356260.0, ans=0.0 2024-09-18 03:08:05,864 INFO [train.py:1198] (0/2) Epoch 20, batch 3100, loss[loss=0.2727, ctc_loss=0.1509, cr_loss=0.4011, attn_decoder_loss=0.2773, over 29233.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1377, cr_loss=0.3815, attn_decoder_loss=0.2513, over 5777445.85 frames. ], batch size: 100, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:08:07,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=356300.0, ans=0.5 2024-09-18 03:08:08,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2024-09-18 03:08:12,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=356300.0, ans=0.125 2024-09-18 03:08:18,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=356300.0, ans=0.125 2024-09-18 03:08:30,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=356340.0, ans=0.125 2024-09-18 03:08:33,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=356340.0, ans=0.125 2024-09-18 03:08:38,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=22.5 2024-09-18 03:08:43,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=356380.0, ans=15.0 2024-09-18 03:08:43,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.464e+01 9.160e+01 9.747e+01 2.632e+02, threshold=1.832e+02, percent-clipped=3.0 2024-09-18 03:08:49,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.06 vs. limit=10.0 2024-09-18 03:08:51,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=356420.0, ans=0.09899494936611666 2024-09-18 03:09:23,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=356500.0, ans=0.025 2024-09-18 03:09:24,110 INFO [train.py:1198] (0/2) Epoch 20, batch 3150, loss[loss=0.2668, ctc_loss=0.1513, cr_loss=0.3997, attn_decoder_loss=0.2707, over 28761.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1377, cr_loss=0.3814, attn_decoder_loss=0.2515, over 5782549.93 frames. ], batch size: 104, lr: 5.55e-03, grad_scale: 8.0 2024-09-18 03:09:31,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=356500.0, ans=0.0 2024-09-18 03:09:59,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=356580.0, ans=0.125 2024-09-18 03:10:00,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356580.0, ans=0.1 2024-09-18 03:10:02,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=356580.0, ans=0.125 2024-09-18 03:10:02,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.26 vs. limit=10.0 2024-09-18 03:10:42,208 INFO [train.py:1198] (0/2) Epoch 20, batch 3200, loss[loss=0.2458, ctc_loss=0.1394, cr_loss=0.3796, attn_decoder_loss=0.2492, over 29422.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1376, cr_loss=0.3811, attn_decoder_loss=0.2512, over 5792079.01 frames. ], batch size: 79, lr: 5.54e-03, grad_scale: 16.0 2024-09-18 03:10:59,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=356740.0, ans=0.2 2024-09-18 03:11:00,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=356740.0, ans=0.2 2024-09-18 03:11:08,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.27 vs. limit=22.5 2024-09-18 03:11:20,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=356780.0, ans=0.125 2024-09-18 03:11:21,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.428e+01 9.069e+01 9.579e+01 2.573e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-18 03:11:29,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=356820.0, ans=0.125 2024-09-18 03:11:42,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356860.0, ans=0.1 2024-09-18 03:11:58,533 INFO [train.py:1198] (0/2) Epoch 20, batch 3250, loss[loss=0.2544, ctc_loss=0.1412, cr_loss=0.3622, attn_decoder_loss=0.2589, over 29727.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.1371, cr_loss=0.3804, attn_decoder_loss=0.2511, over 5799014.64 frames. ], batch size: 84, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:12:13,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=356940.0, ans=0.125 2024-09-18 03:12:21,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=356940.0, ans=0.125 2024-09-18 03:12:42,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=357020.0, ans=0.125 2024-09-18 03:13:15,875 INFO [train.py:1198] (0/2) Epoch 20, batch 3300, loss[loss=0.2605, ctc_loss=0.1531, cr_loss=0.3989, attn_decoder_loss=0.2636, over 28537.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1362, cr_loss=0.3787, attn_decoder_loss=0.2498, over 5797535.21 frames. ], batch size: 112, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:13:37,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=357140.0, ans=0.05 2024-09-18 03:13:44,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=357180.0, ans=0.025 2024-09-18 03:13:55,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.624e+01 9.196e+01 9.884e+01 4.402e+02, threshold=1.839e+02, percent-clipped=2.0 2024-09-18 03:14:09,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=357220.0, ans=0.0 2024-09-18 03:14:32,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=357300.0, ans=0.2 2024-09-18 03:14:33,205 INFO [train.py:1198] (0/2) Epoch 20, batch 3350, loss[loss=0.2598, ctc_loss=0.1449, cr_loss=0.3894, attn_decoder_loss=0.264, over 28851.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1372, cr_loss=0.38, attn_decoder_loss=0.2508, over 5772753.96 frames. ], batch size: 104, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:14:33,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=357300.0, ans=0.125 2024-09-18 03:14:45,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=357300.0, ans=0.025 2024-09-18 03:14:47,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=357340.0, ans=0.2 2024-09-18 03:15:00,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=357340.0, ans=0.125 2024-09-18 03:15:30,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-09-18 03:15:47,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=357500.0, ans=0.125 2024-09-18 03:15:48,959 INFO [train.py:1198] (0/2) Epoch 20, batch 3400, loss[loss=0.2174, ctc_loss=0.1151, cr_loss=0.3433, attn_decoder_loss=0.2211, over 29348.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1373, cr_loss=0.3795, attn_decoder_loss=0.2507, over 5763884.14 frames. ], batch size: 67, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:16:28,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.602e+01 9.311e+01 9.873e+01 3.083e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-18 03:16:35,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=357620.0, ans=0.0 2024-09-18 03:16:36,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=357620.0, ans=0.125 2024-09-18 03:16:38,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=357620.0, ans=0.1 2024-09-18 03:16:44,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=357620.0, ans=0.2 2024-09-18 03:16:47,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=357620.0, ans=0.125 2024-09-18 03:16:57,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=357660.0, ans=0.125 2024-09-18 03:17:00,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=357660.0, ans=0.125 2024-09-18 03:17:07,410 INFO [train.py:1198] (0/2) Epoch 20, batch 3450, loss[loss=0.2503, ctc_loss=0.1392, cr_loss=0.3575, attn_decoder_loss=0.2548, over 28163.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1376, cr_loss=0.3805, attn_decoder_loss=0.2512, over 5772855.07 frames. ], batch size: 111, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:17:15,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=357700.0, ans=0.125 2024-09-18 03:17:24,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=357740.0, ans=0.125 2024-09-18 03:17:32,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=357740.0, ans=0.09899494936611666 2024-09-18 03:17:35,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-09-18 03:17:36,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=357780.0, ans=0.2 2024-09-18 03:17:48,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-09-18 03:18:09,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-09-18 03:18:17,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=357860.0, ans=0.125 2024-09-18 03:18:25,103 INFO [train.py:1198] (0/2) Epoch 20, batch 3500, loss[loss=0.2212, ctc_loss=0.1248, cr_loss=0.3493, attn_decoder_loss=0.2241, over 29280.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1372, cr_loss=0.3796, attn_decoder_loss=0.2506, over 5775548.01 frames. ], batch size: 71, lr: 5.54e-03, grad_scale: 8.0 2024-09-18 03:18:27,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=357900.0, ans=0.0 2024-09-18 03:18:39,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=357940.0, ans=0.0 2024-09-18 03:18:45,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=357940.0, ans=15.0 2024-09-18 03:19:03,916 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.576e+01 9.185e+01 9.795e+01 1.651e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-18 03:19:10,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=358020.0, ans=0.05 2024-09-18 03:19:11,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=358020.0, ans=0.125 2024-09-18 03:19:13,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=358020.0, ans=0.2 2024-09-18 03:19:28,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358060.0, ans=0.1 2024-09-18 03:19:34,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=358060.0, ans=0.0 2024-09-18 03:19:35,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=358060.0, ans=0.125 2024-09-18 03:19:39,927 INFO [train.py:1198] (0/2) Epoch 20, batch 3550, loss[loss=0.2591, ctc_loss=0.1456, cr_loss=0.3872, attn_decoder_loss=0.2631, over 29741.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.1371, cr_loss=0.3792, attn_decoder_loss=0.2506, over 5781627.15 frames. ], batch size: 89, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:19:43,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=358100.0, ans=0.125 2024-09-18 03:19:47,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358100.0, ans=0.1 2024-09-18 03:19:50,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=358100.0, ans=0.125 2024-09-18 03:19:59,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=358140.0, ans=0.2 2024-09-18 03:20:03,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=358140.0, ans=0.0 2024-09-18 03:20:09,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358180.0, ans=0.1 2024-09-18 03:20:21,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=358180.0, ans=0.125 2024-09-18 03:20:21,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=358180.0, ans=0.0 2024-09-18 03:20:21,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=358180.0, ans=0.1 2024-09-18 03:20:23,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=358220.0, ans=0.0 2024-09-18 03:20:24,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=358220.0, ans=0.125 2024-09-18 03:20:29,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=358220.0, ans=0.125 2024-09-18 03:20:34,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=358220.0, ans=0.0 2024-09-18 03:20:53,756 INFO [train.py:1198] (0/2) Epoch 20, batch 3600, loss[loss=0.2345, ctc_loss=0.1255, cr_loss=0.3581, attn_decoder_loss=0.2387, over 29500.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1368, cr_loss=0.379, attn_decoder_loss=0.2505, over 5790735.34 frames. ], batch size: 77, lr: 5.53e-03, grad_scale: 16.0 2024-09-18 03:21:18,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=358340.0, ans=0.125 2024-09-18 03:21:33,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.608e+01 9.165e+01 9.950e+01 3.634e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-18 03:21:48,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=358420.0, ans=0.0 2024-09-18 03:21:52,781 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:22:08,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=358460.0, ans=0.0 2024-09-18 03:22:08,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=358460.0, ans=0.0 2024-09-18 03:22:10,953 INFO [train.py:1198] (0/2) Epoch 20, batch 3650, loss[loss=0.2608, ctc_loss=0.143, cr_loss=0.4143, attn_decoder_loss=0.2646, over 29515.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1365, cr_loss=0.3781, attn_decoder_loss=0.2501, over 5792247.20 frames. ], batch size: 90, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:22:38,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=358540.0, ans=0.05 2024-09-18 03:22:41,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=358580.0, ans=0.0 2024-09-18 03:22:45,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=358580.0, ans=0.0 2024-09-18 03:22:48,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=358580.0, ans=0.1 2024-09-18 03:23:10,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=358660.0, ans=0.125 2024-09-18 03:23:12,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=358660.0, ans=0.125 2024-09-18 03:23:21,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=358660.0, ans=0.0 2024-09-18 03:23:24,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=358700.0, ans=0.125 2024-09-18 03:23:25,633 INFO [train.py:1198] (0/2) Epoch 20, batch 3700, loss[loss=0.2404, ctc_loss=0.1214, cr_loss=0.358, attn_decoder_loss=0.2457, over 29714.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1362, cr_loss=0.3773, attn_decoder_loss=0.2502, over 5802277.12 frames. ], batch size: 84, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:23:32,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.74 vs. limit=10.0 2024-09-18 03:23:52,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=358740.0, ans=0.1 2024-09-18 03:24:05,520 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.568e+01 9.154e+01 9.793e+01 1.686e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-18 03:24:34,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=358860.0, ans=0.125 2024-09-18 03:24:41,704 INFO [train.py:1198] (0/2) Epoch 20, batch 3750, loss[loss=0.2244, ctc_loss=0.1174, cr_loss=0.3346, attn_decoder_loss=0.2289, over 29336.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1358, cr_loss=0.3767, attn_decoder_loss=0.2497, over 5806094.86 frames. ], batch size: 67, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:24:43,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=358900.0, ans=0.05 2024-09-18 03:24:43,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=358900.0, ans=0.05 2024-09-18 03:24:57,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358940.0, ans=0.125 2024-09-18 03:25:01,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=358940.0, ans=0.1 2024-09-18 03:25:07,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=358940.0, ans=0.04949747468305833 2024-09-18 03:25:37,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.56 vs. limit=15.0 2024-09-18 03:25:50,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=359060.0, ans=0.05 2024-09-18 03:25:56,007 INFO [train.py:1198] (0/2) Epoch 20, batch 3800, loss[loss=0.2457, ctc_loss=0.1365, cr_loss=0.3981, attn_decoder_loss=0.249, over 29627.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1358, cr_loss=0.3766, attn_decoder_loss=0.2492, over 5796796.54 frames. ], batch size: 86, lr: 5.53e-03, grad_scale: 8.0 2024-09-18 03:26:20,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=359140.0, ans=0.125 2024-09-18 03:26:20,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.61 vs. limit=15.0 2024-09-18 03:26:36,575 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.575e+01 9.018e+01 9.556e+01 1.555e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-18 03:26:38,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359180.0, ans=0.1 2024-09-18 03:26:41,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=359220.0, ans=0.125 2024-09-18 03:26:50,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-09-18 03:26:59,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=359260.0, ans=0.125 2024-09-18 03:27:01,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359260.0, ans=0.1 2024-09-18 03:27:10,603 INFO [train.py:1198] (0/2) Epoch 20, batch 3850, loss[loss=0.2557, ctc_loss=0.1392, cr_loss=0.3737, attn_decoder_loss=0.2603, over 29243.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1355, cr_loss=0.3763, attn_decoder_loss=0.2492, over 5812225.67 frames. ], batch size: 100, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:27:10,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=359300.0, ans=0.125 2024-09-18 03:27:28,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=359340.0, ans=0.035 2024-09-18 03:27:43,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=359380.0, ans=0.0 2024-09-18 03:27:52,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=359380.0, ans=0.125 2024-09-18 03:28:00,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-18 03:28:22,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=359460.0, ans=0.2 2024-09-18 03:28:22,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=359460.0, ans=0.0 2024-09-18 03:28:22,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2024-09-18 03:28:26,449 INFO [train.py:1198] (0/2) Epoch 20, batch 3900, loss[loss=0.2468, ctc_loss=0.139, cr_loss=0.394, attn_decoder_loss=0.2501, over 29612.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.136, cr_loss=0.3775, attn_decoder_loss=0.2498, over 5816826.54 frames. ], batch size: 86, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:28:26,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=359500.0, ans=0.125 2024-09-18 03:28:37,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=359500.0, ans=0.125 2024-09-18 03:28:59,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=359580.0, ans=0.07 2024-09-18 03:29:06,327 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.505e+01 9.019e+01 9.664e+01 2.565e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 03:29:28,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=359660.0, ans=0.125 2024-09-18 03:29:37,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=359660.0, ans=0.125 2024-09-18 03:29:40,549 INFO [train.py:1198] (0/2) Epoch 20, batch 3950, loss[loss=0.2593, ctc_loss=0.1406, cr_loss=0.413, attn_decoder_loss=0.2633, over 29509.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1354, cr_loss=0.3767, attn_decoder_loss=0.2497, over 5836225.77 frames. ], batch size: 97, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:29:51,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.18 vs. limit=15.0 2024-09-18 03:29:54,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359700.0, ans=0.1 2024-09-18 03:30:01,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=359740.0, ans=0.125 2024-09-18 03:30:01,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=359740.0, ans=0.1 2024-09-18 03:30:03,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2024-09-18 03:30:25,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=359820.0, ans=0.0 2024-09-18 03:30:56,095 INFO [train.py:1198] (0/2) Epoch 20, batch 4000, loss[loss=0.2305, ctc_loss=0.1213, cr_loss=0.3394, attn_decoder_loss=0.2351, over 29523.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1357, cr_loss=0.3767, attn_decoder_loss=0.2498, over 5814596.54 frames. ], batch size: 74, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 03:31:21,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.45 vs. limit=15.0 2024-09-18 03:31:25,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359980.0, ans=0.1 2024-09-18 03:31:38,100 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.709e+01 8.706e+01 9.188e+01 9.943e+01 2.259e+02, threshold=1.838e+02, percent-clipped=3.0 2024-09-18 03:31:51,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=360020.0, ans=0.035 2024-09-18 03:32:05,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-09-18 03:32:10,749 INFO [train.py:1198] (0/2) Epoch 20, batch 4050, loss[loss=0.2802, ctc_loss=0.1897, cr_loss=0.4252, attn_decoder_loss=0.2808, over 20119.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1362, cr_loss=0.3773, attn_decoder_loss=0.2499, over 5797871.26 frames. ], batch size: 209, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:32:11,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-09-18 03:32:32,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=360140.0, ans=0.125 2024-09-18 03:32:33,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=360140.0, ans=0.0 2024-09-18 03:32:36,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-09-18 03:33:03,569 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:33:25,676 INFO [train.py:1198] (0/2) Epoch 20, batch 4100, loss[loss=0.2731, ctc_loss=0.1605, cr_loss=0.4499, attn_decoder_loss=0.2756, over 29492.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1366, cr_loss=0.3778, attn_decoder_loss=0.2502, over 5792640.59 frames. ], batch size: 90, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:33:47,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=360340.0, ans=0.125 2024-09-18 03:33:48,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-18 03:34:06,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.554e+01 9.204e+01 1.015e+02 1.958e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-18 03:34:07,181 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:34:27,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=360460.0, ans=0.025 2024-09-18 03:34:28,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=360460.0, ans=0.2 2024-09-18 03:34:29,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-09-18 03:34:40,437 INFO [train.py:1198] (0/2) Epoch 20, batch 4150, loss[loss=0.2358, ctc_loss=0.1315, cr_loss=0.3555, attn_decoder_loss=0.2395, over 29503.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1365, cr_loss=0.3775, attn_decoder_loss=0.2501, over 5798400.21 frames. ], batch size: 77, lr: 5.52e-03, grad_scale: 8.0 2024-09-18 03:35:08,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=360580.0, ans=0.1 2024-09-18 03:35:08,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2024-09-18 03:35:29,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2024-09-18 03:35:53,872 INFO [train.py:1198] (0/2) Epoch 20, batch 4200, loss[loss=0.2629, ctc_loss=0.147, cr_loss=0.4052, attn_decoder_loss=0.2668, over 29471.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.137, cr_loss=0.3795, attn_decoder_loss=0.2506, over 5800654.87 frames. ], batch size: 90, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:36:19,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=360740.0, ans=0.125 2024-09-18 03:36:21,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=360780.0, ans=0.125 2024-09-18 03:36:36,269 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.598e+01 9.049e+01 1.004e+02 1.437e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 03:36:48,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360820.0, ans=0.0 2024-09-18 03:36:49,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-18 03:36:58,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=360860.0, ans=0.125 2024-09-18 03:37:09,129 INFO [train.py:1198] (0/2) Epoch 20, batch 4250, loss[loss=0.2341, ctc_loss=0.1201, cr_loss=0.3408, attn_decoder_loss=0.2391, over 29528.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1367, cr_loss=0.379, attn_decoder_loss=0.2506, over 5805582.43 frames. ], batch size: 74, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:37:10,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=360900.0, ans=0.125 2024-09-18 03:37:12,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=360900.0, ans=0.2 2024-09-18 03:37:28,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=360940.0, ans=0.2 2024-09-18 03:37:30,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-09-18 03:37:37,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=360980.0, ans=0.0 2024-09-18 03:37:43,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=360980.0, ans=0.125 2024-09-18 03:38:09,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=361060.0, ans=0.125 2024-09-18 03:38:19,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=361060.0, ans=0.125 2024-09-18 03:38:19,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361060.0, ans=0.1 2024-09-18 03:38:22,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=361100.0, ans=0.0 2024-09-18 03:38:23,849 INFO [train.py:1198] (0/2) Epoch 20, batch 4300, loss[loss=0.2596, ctc_loss=0.1425, cr_loss=0.4027, attn_decoder_loss=0.2637, over 29554.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1364, cr_loss=0.3783, attn_decoder_loss=0.2504, over 5793494.58 frames. ], batch size: 87, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:38:27,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=12.0 2024-09-18 03:38:30,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=361100.0, ans=0.125 2024-09-18 03:38:34,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=361100.0, ans=0.025 2024-09-18 03:38:42,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=361140.0, ans=0.125 2024-09-18 03:38:56,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=361180.0, ans=0.125 2024-09-18 03:39:01,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361180.0, ans=0.1 2024-09-18 03:39:01,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=361180.0, ans=0.125 2024-09-18 03:39:05,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 8.736e+01 9.238e+01 9.877e+01 2.557e+02, threshold=1.848e+02, percent-clipped=2.0 2024-09-18 03:39:07,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=361220.0, ans=0.125 2024-09-18 03:39:08,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=361220.0, ans=0.125 2024-09-18 03:39:11,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=361220.0, ans=0.125 2024-09-18 03:39:23,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=361260.0, ans=0.125 2024-09-18 03:39:38,013 INFO [train.py:1198] (0/2) Epoch 20, batch 4350, loss[loss=0.2572, ctc_loss=0.1445, cr_loss=0.3813, attn_decoder_loss=0.2612, over 29463.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1386, cr_loss=0.3831, attn_decoder_loss=0.2536, over 5795513.96 frames. ], batch size: 97, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:39:40,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=361300.0, ans=0.0 2024-09-18 03:39:42,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.74 vs. limit=12.0 2024-09-18 03:39:55,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=361340.0, ans=0.125 2024-09-18 03:40:05,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=361340.0, ans=0.125 2024-09-18 03:40:09,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=361380.0, ans=0.035 2024-09-18 03:40:17,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=361380.0, ans=0.0 2024-09-18 03:40:27,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=361420.0, ans=0.125 2024-09-18 03:40:29,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361420.0, ans=0.1 2024-09-18 03:40:51,533 INFO [train.py:1198] (0/2) Epoch 20, batch 4400, loss[loss=0.2524, ctc_loss=0.146, cr_loss=0.3801, attn_decoder_loss=0.2558, over 27153.00 frames. ], tot_loss[loss=0.2518, ctc_loss=0.14, cr_loss=0.3854, attn_decoder_loss=0.2557, over 5766620.01 frames. ], batch size: 124, lr: 5.51e-03, grad_scale: 16.0 2024-09-18 03:41:19,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361540.0, ans=0.1 2024-09-18 03:41:34,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.117e+01 8.833e+01 9.166e+01 9.784e+01 1.631e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-18 03:41:58,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=361660.0, ans=0.125 2024-09-18 03:42:01,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=361660.0, ans=0.0 2024-09-18 03:42:05,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=361700.0, ans=0.125 2024-09-18 03:42:06,985 INFO [train.py:1198] (0/2) Epoch 20, batch 4450, loss[loss=0.2757, ctc_loss=0.1804, cr_loss=0.4216, attn_decoder_loss=0.2769, over 20068.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1442, cr_loss=0.3904, attn_decoder_loss=0.2581, over 5575625.71 frames. ], batch size: 211, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:42:07,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=361700.0, ans=6.0 2024-09-18 03:42:59,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=361820.0, ans=0.125 2024-09-18 03:43:00,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361820.0, ans=0.125 2024-09-18 03:43:05,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=361820.0, ans=0.125 2024-09-18 03:43:22,881 INFO [train.py:1198] (0/2) Epoch 20, batch 4500, loss[loss=0.267, ctc_loss=0.1702, cr_loss=0.403, attn_decoder_loss=0.2688, over 20674.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1492, cr_loss=0.393, attn_decoder_loss=0.2605, over 5234870.80 frames. ], batch size: 210, lr: 5.51e-03, grad_scale: 8.0 2024-09-18 03:43:35,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=361900.0, ans=0.0 2024-09-18 03:44:00,126 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-20.pt 2024-09-18 03:44:52,337 INFO [train.py:1198] (0/2) Epoch 21, batch 0, loss[loss=0.2207, ctc_loss=0.1079, cr_loss=0.3422, attn_decoder_loss=0.2256, over 29626.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1079, cr_loss=0.3422, attn_decoder_loss=0.2256, over 29626.00 frames. ], batch size: 73, lr: 5.37e-03, grad_scale: 16.0 2024-09-18 03:44:52,338 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 03:45:10,776 INFO [train.py:1230] (0/2) Epoch 21, validation: loss=0.2126, ctc_loss=0.0391, cr_loss=5.275e-15, attn_decoder_loss=0.2319, over 944034.00 frames. 2024-09-18 03:45:10,776 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 03:45:19,729 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.076e+02 1.145e+02 1.241e+02 1.705e+02, threshold=2.291e+02, percent-clipped=0.0 2024-09-18 03:45:29,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=362040.0, ans=0.0 2024-09-18 03:45:31,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=12.0 2024-09-18 03:45:42,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=362080.0, ans=0.015 2024-09-18 03:46:00,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=362120.0, ans=0.1 2024-09-18 03:46:02,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=362120.0, ans=0.0 2024-09-18 03:46:04,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=362120.0, ans=0.0 2024-09-18 03:46:23,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-09-18 03:46:24,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=362160.0, ans=0.125 2024-09-18 03:46:28,349 INFO [train.py:1198] (0/2) Epoch 21, batch 50, loss[loss=0.2237, ctc_loss=0.1233, cr_loss=0.3545, attn_decoder_loss=0.227, over 29401.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1376, cr_loss=0.3786, attn_decoder_loss=0.251, over 1268268.75 frames. ], batch size: 70, lr: 5.37e-03, grad_scale: 8.0 2024-09-18 03:46:31,905 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:46:39,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=362200.0, ans=0.1 2024-09-18 03:47:00,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-09-18 03:47:22,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=362320.0, ans=0.0 2024-09-18 03:47:23,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=362320.0, ans=0.125 2024-09-18 03:47:25,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=362320.0, ans=0.07 2024-09-18 03:47:46,702 INFO [train.py:1198] (0/2) Epoch 21, batch 100, loss[loss=0.2303, ctc_loss=0.1205, cr_loss=0.342, attn_decoder_loss=0.2349, over 29553.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1379, cr_loss=0.3814, attn_decoder_loss=0.2527, over 2253007.91 frames. ], batch size: 76, lr: 5.37e-03, grad_scale: 8.0 2024-09-18 03:47:55,562 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.793e+01 9.358e+01 9.884e+01 2.727e+02, threshold=1.872e+02, percent-clipped=1.0 2024-09-18 03:47:57,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=362400.0, ans=0.125 2024-09-18 03:48:13,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=362440.0, ans=0.125 2024-09-18 03:48:46,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=362560.0, ans=0.025 2024-09-18 03:48:49,452 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:48:50,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-09-18 03:49:01,016 INFO [train.py:1198] (0/2) Epoch 21, batch 150, loss[loss=0.2134, ctc_loss=0.104, cr_loss=0.317, attn_decoder_loss=0.2185, over 29427.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1363, cr_loss=0.3803, attn_decoder_loss=0.2509, over 3048374.36 frames. ], batch size: 70, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:49:10,456 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:49:21,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362640.0, ans=0.125 2024-09-18 03:49:25,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=362640.0, ans=0.125 2024-09-18 03:49:54,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.65 vs. limit=15.0 2024-09-18 03:50:18,547 INFO [train.py:1198] (0/2) Epoch 21, batch 200, loss[loss=0.2622, ctc_loss=0.153, cr_loss=0.4061, attn_decoder_loss=0.2653, over 27535.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1361, cr_loss=0.3799, attn_decoder_loss=0.2502, over 3660214.80 frames. ], batch size: 124, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:50:21,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=362800.0, ans=0.09899494936611666 2024-09-18 03:50:27,602 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.461e+01 9.001e+01 9.601e+01 1.394e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-18 03:50:28,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=8.0 2024-09-18 03:50:29,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=362800.0, ans=0.125 2024-09-18 03:50:29,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-18 03:50:32,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=362840.0, ans=0.0 2024-09-18 03:50:37,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=362840.0, ans=0.125 2024-09-18 03:50:38,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=362840.0, ans=0.04949747468305833 2024-09-18 03:51:34,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=362960.0, ans=0.0 2024-09-18 03:51:37,236 INFO [train.py:1198] (0/2) Epoch 21, batch 250, loss[loss=0.2607, ctc_loss=0.1476, cr_loss=0.3991, attn_decoder_loss=0.2643, over 29256.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1355, cr_loss=0.3787, attn_decoder_loss=0.2499, over 4142547.94 frames. ], batch size: 100, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:51:39,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=363000.0, ans=0.07 2024-09-18 03:51:46,462 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:51:47,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363000.0, ans=0.1 2024-09-18 03:51:55,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=363040.0, ans=10.0 2024-09-18 03:52:02,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.53 vs. limit=15.0 2024-09-18 03:52:04,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=363040.0, ans=0.025 2024-09-18 03:52:04,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363040.0, ans=0.125 2024-09-18 03:52:19,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=363080.0, ans=0.5 2024-09-18 03:52:29,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363120.0, ans=0.1 2024-09-18 03:52:50,429 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:52:52,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363200.0, ans=0.125 2024-09-18 03:52:53,505 INFO [train.py:1198] (0/2) Epoch 21, batch 300, loss[loss=0.2644, ctc_loss=0.155, cr_loss=0.4306, attn_decoder_loss=0.267, over 29512.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1356, cr_loss=0.3789, attn_decoder_loss=0.2499, over 4511800.60 frames. ], batch size: 92, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:53:02,585 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.788e+01 8.424e+01 9.085e+01 9.553e+01 2.134e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-18 03:53:10,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=363240.0, ans=0.2 2024-09-18 03:53:15,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=363240.0, ans=0.0 2024-09-18 03:53:30,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363280.0, ans=0.1 2024-09-18 03:53:36,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=363280.0, ans=0.125 2024-09-18 03:53:43,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=363320.0, ans=0.0 2024-09-18 03:53:56,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=363360.0, ans=0.125 2024-09-18 03:54:11,641 INFO [train.py:1198] (0/2) Epoch 21, batch 350, loss[loss=0.216, ctc_loss=0.1061, cr_loss=0.3084, attn_decoder_loss=0.2213, over 29324.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.136, cr_loss=0.3797, attn_decoder_loss=0.2503, over 4797821.13 frames. ], batch size: 71, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:54:17,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=363400.0, ans=0.125 2024-09-18 03:54:28,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=363440.0, ans=0.0 2024-09-18 03:54:37,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=363440.0, ans=0.125 2024-09-18 03:54:56,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=363480.0, ans=0.04949747468305833 2024-09-18 03:55:23,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-18 03:55:25,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=363560.0, ans=0.0 2024-09-18 03:55:29,551 INFO [train.py:1198] (0/2) Epoch 21, batch 400, loss[loss=0.2536, ctc_loss=0.1421, cr_loss=0.3881, attn_decoder_loss=0.2573, over 29692.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1356, cr_loss=0.3789, attn_decoder_loss=0.2497, over 5026006.51 frames. ], batch size: 82, lr: 5.36e-03, grad_scale: 16.0 2024-09-18 03:55:38,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.497e+01 9.045e+01 9.813e+01 2.448e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-18 03:55:57,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=363640.0, ans=0.0 2024-09-18 03:56:15,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363720.0, ans=0.0 2024-09-18 03:56:39,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=363760.0, ans=0.0 2024-09-18 03:56:45,199 INFO [train.py:1198] (0/2) Epoch 21, batch 450, loss[loss=0.2692, ctc_loss=0.1566, cr_loss=0.4096, attn_decoder_loss=0.2726, over 29684.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1359, cr_loss=0.379, attn_decoder_loss=0.25, over 5189002.55 frames. ], batch size: 83, lr: 5.36e-03, grad_scale: 8.0 2024-09-18 03:56:45,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=363800.0, ans=0.025 2024-09-18 03:56:50,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=363800.0, ans=0.0 2024-09-18 03:56:57,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=363800.0, ans=0.125 2024-09-18 03:56:57,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363800.0, ans=0.125 2024-09-18 03:57:00,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=363840.0, ans=0.2 2024-09-18 03:57:12,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=363840.0, ans=0.125 2024-09-18 03:57:45,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=363960.0, ans=0.125 2024-09-18 03:58:01,733 INFO [train.py:1198] (0/2) Epoch 21, batch 500, loss[loss=0.2593, ctc_loss=0.1432, cr_loss=0.3987, attn_decoder_loss=0.2633, over 29420.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1351, cr_loss=0.3777, attn_decoder_loss=0.2492, over 5331923.08 frames. ], batch size: 94, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 03:58:08,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=364000.0, ans=0.0 2024-09-18 03:58:14,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.455e+01 8.968e+01 9.588e+01 2.224e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-18 03:58:21,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=364040.0, ans=0.0 2024-09-18 03:58:37,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=364080.0, ans=0.025 2024-09-18 03:59:07,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=364160.0, ans=0.07 2024-09-18 03:59:14,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364160.0, ans=0.125 2024-09-18 03:59:22,169 INFO [train.py:1198] (0/2) Epoch 21, batch 550, loss[loss=0.2494, ctc_loss=0.1394, cr_loss=0.3659, attn_decoder_loss=0.2535, over 28779.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1351, cr_loss=0.3775, attn_decoder_loss=0.2491, over 5424965.66 frames. ], batch size: 104, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 03:59:23,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=364200.0, ans=22.5 2024-09-18 03:59:28,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=364200.0, ans=0.1 2024-09-18 03:59:33,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=364200.0, ans=0.025 2024-09-18 04:00:06,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364320.0, ans=0.1 2024-09-18 04:00:11,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.87 vs. limit=15.0 2024-09-18 04:00:12,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=364320.0, ans=0.125 2024-09-18 04:00:31,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=364360.0, ans=0.05 2024-09-18 04:00:31,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-18 04:00:38,401 INFO [train.py:1198] (0/2) Epoch 21, batch 600, loss[loss=0.2581, ctc_loss=0.1482, cr_loss=0.3993, attn_decoder_loss=0.2615, over 29200.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1355, cr_loss=0.3784, attn_decoder_loss=0.2496, over 5509218.45 frames. ], batch size: 100, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:00:43,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=364400.0, ans=0.2 2024-09-18 04:00:48,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.547e+01 9.115e+01 9.764e+01 2.691e+02, threshold=1.823e+02, percent-clipped=3.0 2024-09-18 04:01:10,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=364480.0, ans=0.2 2024-09-18 04:01:19,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364480.0, ans=0.1 2024-09-18 04:01:22,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=364520.0, ans=0.0 2024-09-18 04:01:28,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=364520.0, ans=0.125 2024-09-18 04:01:34,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=364520.0, ans=0.1 2024-09-18 04:01:53,835 INFO [train.py:1198] (0/2) Epoch 21, batch 650, loss[loss=0.2513, ctc_loss=0.1373, cr_loss=0.3884, attn_decoder_loss=0.2554, over 29762.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1343, cr_loss=0.3768, attn_decoder_loss=0.2488, over 5586571.95 frames. ], batch size: 81, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:02:01,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=364600.0, ans=0.0 2024-09-18 04:02:06,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364600.0, ans=0.125 2024-09-18 04:02:07,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364640.0, ans=0.1 2024-09-18 04:02:11,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=364640.0, ans=0.025 2024-09-18 04:02:13,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=364640.0, ans=0.0 2024-09-18 04:02:13,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=364640.0, ans=0.025 2024-09-18 04:02:29,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.46 vs. limit=15.0 2024-09-18 04:03:14,773 INFO [train.py:1198] (0/2) Epoch 21, batch 700, loss[loss=0.2437, ctc_loss=0.137, cr_loss=0.3927, attn_decoder_loss=0.2469, over 29544.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1349, cr_loss=0.3776, attn_decoder_loss=0.2495, over 5636277.38 frames. ], batch size: 76, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:03:21,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=364800.0, ans=0.125 2024-09-18 04:03:25,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.553e+01 9.088e+01 9.665e+01 1.426e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 04:03:28,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=364840.0, ans=0.025 2024-09-18 04:04:03,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=364920.0, ans=0.125 2024-09-18 04:04:04,286 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2024-09-18 04:04:11,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.88 vs. limit=15.0 2024-09-18 04:04:18,730 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:04:26,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=364960.0, ans=0.2 2024-09-18 04:04:26,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=364960.0, ans=0.125 2024-09-18 04:04:30,490 INFO [train.py:1198] (0/2) Epoch 21, batch 750, loss[loss=0.2441, ctc_loss=0.135, cr_loss=0.3783, attn_decoder_loss=0.2478, over 29712.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1347, cr_loss=0.3769, attn_decoder_loss=0.2492, over 5675264.33 frames. ], batch size: 82, lr: 5.35e-03, grad_scale: 8.0 2024-09-18 04:04:32,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=365000.0, ans=0.125 2024-09-18 04:04:37,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-18 04:04:41,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=365000.0, ans=0.2 2024-09-18 04:04:46,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-18 04:04:56,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=365040.0, ans=0.125 2024-09-18 04:04:57,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=365040.0, ans=0.125 2024-09-18 04:05:03,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=365080.0, ans=0.125 2024-09-18 04:05:03,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365080.0, ans=0.1 2024-09-18 04:05:13,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=365080.0, ans=0.125 2024-09-18 04:05:14,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=365120.0, ans=0.0 2024-09-18 04:05:16,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-09-18 04:05:24,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.81 vs. limit=22.5 2024-09-18 04:05:25,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=365120.0, ans=0.2 2024-09-18 04:05:29,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365160.0, ans=0.1 2024-09-18 04:05:31,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365160.0, ans=0.1 2024-09-18 04:05:46,211 INFO [train.py:1198] (0/2) Epoch 21, batch 800, loss[loss=0.2328, ctc_loss=0.1244, cr_loss=0.3493, attn_decoder_loss=0.237, over 29624.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1342, cr_loss=0.3764, attn_decoder_loss=0.2489, over 5706105.37 frames. ], batch size: 73, lr: 5.35e-03, grad_scale: 16.0 2024-09-18 04:05:51,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-09-18 04:05:56,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.575e+01 9.275e+01 9.797e+01 6.839e+02, threshold=1.855e+02, percent-clipped=2.0 2024-09-18 04:06:03,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=365240.0, ans=0.2 2024-09-18 04:06:09,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-09-18 04:06:14,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=365240.0, ans=0.1 2024-09-18 04:06:16,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2024-09-18 04:06:44,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2024-09-18 04:07:06,319 INFO [train.py:1198] (0/2) Epoch 21, batch 850, loss[loss=0.2523, ctc_loss=0.1347, cr_loss=0.3792, attn_decoder_loss=0.2569, over 29725.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1341, cr_loss=0.3759, attn_decoder_loss=0.2488, over 5736131.57 frames. ], batch size: 89, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:07:12,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=365400.0, ans=0.0 2024-09-18 04:07:49,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-18 04:07:50,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=365520.0, ans=0.0 2024-09-18 04:08:04,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=365520.0, ans=0.125 2024-09-18 04:08:18,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=365560.0, ans=0.0 2024-09-18 04:08:22,631 INFO [train.py:1198] (0/2) Epoch 21, batch 900, loss[loss=0.2274, ctc_loss=0.1249, cr_loss=0.3463, attn_decoder_loss=0.231, over 29586.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1341, cr_loss=0.3759, attn_decoder_loss=0.2491, over 5739978.17 frames. ], batch size: 73, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:08:34,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.573e+01 9.119e+01 9.639e+01 3.066e+02, threshold=1.824e+02, percent-clipped=3.0 2024-09-18 04:08:48,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=365640.0, ans=0.0 2024-09-18 04:08:51,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=365680.0, ans=0.1 2024-09-18 04:08:53,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=365680.0, ans=0.125 2024-09-18 04:08:57,127 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.13 vs. limit=22.5 2024-09-18 04:09:23,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=365760.0, ans=0.125 2024-09-18 04:09:35,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=365760.0, ans=0.95 2024-09-18 04:09:38,096 INFO [train.py:1198] (0/2) Epoch 21, batch 950, loss[loss=0.2284, ctc_loss=0.121, cr_loss=0.3508, attn_decoder_loss=0.2325, over 29523.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1344, cr_loss=0.3761, attn_decoder_loss=0.2493, over 5741870.51 frames. ], batch size: 74, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:10:10,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365880.0, ans=0.1 2024-09-18 04:10:26,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=365920.0, ans=0.015 2024-09-18 04:10:47,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365960.0, ans=0.125 2024-09-18 04:10:53,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-09-18 04:10:58,218 INFO [train.py:1198] (0/2) Epoch 21, batch 1000, loss[loss=0.237, ctc_loss=0.1223, cr_loss=0.3637, attn_decoder_loss=0.2417, over 29525.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1355, cr_loss=0.3775, attn_decoder_loss=0.2502, over 5737720.63 frames. ], batch size: 77, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:11:10,224 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 8.708e+01 9.150e+01 9.911e+01 2.107e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-18 04:11:25,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=366040.0, ans=0.125 2024-09-18 04:11:39,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=366080.0, ans=0.95 2024-09-18 04:11:44,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=366120.0, ans=0.0 2024-09-18 04:12:13,877 INFO [train.py:1198] (0/2) Epoch 21, batch 1050, loss[loss=0.2422, ctc_loss=0.1187, cr_loss=0.3332, attn_decoder_loss=0.2485, over 29680.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1351, cr_loss=0.3768, attn_decoder_loss=0.2496, over 5744884.25 frames. ], batch size: 85, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:12:49,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=366280.0, ans=0.125 2024-09-18 04:13:30,146 INFO [train.py:1198] (0/2) Epoch 21, batch 1100, loss[loss=0.2345, ctc_loss=0.1221, cr_loss=0.3579, attn_decoder_loss=0.239, over 29428.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1347, cr_loss=0.3768, attn_decoder_loss=0.2493, over 5756321.89 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:13:37,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366400.0, ans=0.1 2024-09-18 04:13:42,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.489e+01 9.148e+01 9.741e+01 7.755e+02, threshold=1.830e+02, percent-clipped=3.0 2024-09-18 04:13:53,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=8.0 2024-09-18 04:14:12,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=366480.0, ans=0.2 2024-09-18 04:14:35,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=366560.0, ans=0.125 2024-09-18 04:14:50,298 INFO [train.py:1198] (0/2) Epoch 21, batch 1150, loss[loss=0.2448, ctc_loss=0.1362, cr_loss=0.3895, attn_decoder_loss=0.2483, over 29437.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1346, cr_loss=0.3765, attn_decoder_loss=0.2492, over 5754288.63 frames. ], batch size: 78, lr: 5.34e-03, grad_scale: 8.0 2024-09-18 04:15:34,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=366720.0, ans=0.125 2024-09-18 04:15:58,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=366760.0, ans=0.125 2024-09-18 04:16:05,938 INFO [train.py:1198] (0/2) Epoch 21, batch 1200, loss[loss=0.2609, ctc_loss=0.1439, cr_loss=0.389, attn_decoder_loss=0.2653, over 29670.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1352, cr_loss=0.3776, attn_decoder_loss=0.25, over 5746495.98 frames. ], batch size: 85, lr: 5.33e-03, grad_scale: 16.0 2024-09-18 04:16:06,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=366800.0, ans=0.125 2024-09-18 04:16:10,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-09-18 04:16:13,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=366800.0, ans=0.5 2024-09-18 04:16:13,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=366800.0, ans=0.125 2024-09-18 04:16:19,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.603e+01 9.203e+01 9.910e+01 1.694e+02, threshold=1.841e+02, percent-clipped=0.0 2024-09-18 04:16:20,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=366840.0, ans=0.125 2024-09-18 04:16:20,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=366840.0, ans=0.125 2024-09-18 04:16:21,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=366840.0, ans=0.025 2024-09-18 04:16:24,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366840.0, ans=0.1 2024-09-18 04:16:27,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=366840.0, ans=0.125 2024-09-18 04:17:20,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=367000.0, ans=0.125 2024-09-18 04:17:22,070 INFO [train.py:1198] (0/2) Epoch 21, batch 1250, loss[loss=0.2676, ctc_loss=0.1571, cr_loss=0.4356, attn_decoder_loss=0.2702, over 29544.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1355, cr_loss=0.3787, attn_decoder_loss=0.2504, over 5773152.46 frames. ], batch size: 92, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:17:52,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367080.0, ans=0.1 2024-09-18 04:17:56,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2024-09-18 04:18:34,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=367160.0, ans=0.125 2024-09-18 04:18:40,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2024-09-18 04:18:41,094 INFO [train.py:1198] (0/2) Epoch 21, batch 1300, loss[loss=0.2528, ctc_loss=0.1407, cr_loss=0.3992, attn_decoder_loss=0.2563, over 28318.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1352, cr_loss=0.3776, attn_decoder_loss=0.2499, over 5777995.78 frames. ], batch size: 112, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:18:52,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=367200.0, ans=0.125 2024-09-18 04:18:54,777 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.475e+01 9.131e+01 9.688e+01 1.292e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-18 04:19:03,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-09-18 04:19:23,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=367280.0, ans=0.0 2024-09-18 04:19:26,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.19 vs. limit=15.0 2024-09-18 04:19:29,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.42 vs. limit=15.0 2024-09-18 04:19:30,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=367320.0, ans=0.0 2024-09-18 04:19:40,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=367360.0, ans=0.0 2024-09-18 04:19:57,012 INFO [train.py:1198] (0/2) Epoch 21, batch 1350, loss[loss=0.2492, ctc_loss=0.1382, cr_loss=0.4028, attn_decoder_loss=0.2525, over 29746.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1348, cr_loss=0.3772, attn_decoder_loss=0.2496, over 5795295.43 frames. ], batch size: 81, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:20:18,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=367440.0, ans=0.035 2024-09-18 04:20:36,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=367480.0, ans=0.0 2024-09-18 04:20:45,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=367520.0, ans=0.125 2024-09-18 04:20:45,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=367520.0, ans=0.125 2024-09-18 04:20:55,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2024-09-18 04:21:12,480 INFO [train.py:1198] (0/2) Epoch 21, batch 1400, loss[loss=0.2159, ctc_loss=0.1177, cr_loss=0.346, attn_decoder_loss=0.2192, over 29596.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1348, cr_loss=0.3773, attn_decoder_loss=0.2494, over 5807043.58 frames. ], batch size: 69, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:21:24,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=367600.0, ans=0.2 2024-09-18 04:21:25,902 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.438e+01 9.001e+01 9.853e+01 2.309e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 04:21:29,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=367640.0, ans=0.2 2024-09-18 04:21:56,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=367720.0, ans=0.025 2024-09-18 04:21:59,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=367720.0, ans=0.0 2024-09-18 04:22:14,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=367720.0, ans=0.125 2024-09-18 04:22:31,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=367800.0, ans=0.0 2024-09-18 04:22:32,120 INFO [train.py:1198] (0/2) Epoch 21, batch 1450, loss[loss=0.2623, ctc_loss=0.1504, cr_loss=0.4141, attn_decoder_loss=0.2656, over 29418.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1348, cr_loss=0.3777, attn_decoder_loss=0.2496, over 5803048.04 frames. ], batch size: 94, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:22:52,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=367840.0, ans=0.05 2024-09-18 04:23:07,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2024-09-18 04:23:17,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367920.0, ans=0.1 2024-09-18 04:23:19,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=367920.0, ans=0.2 2024-09-18 04:23:37,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=367960.0, ans=0.125 2024-09-18 04:23:46,784 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-92000.pt 2024-09-18 04:23:55,039 INFO [train.py:1198] (0/2) Epoch 21, batch 1500, loss[loss=0.251, ctc_loss=0.1369, cr_loss=0.3836, attn_decoder_loss=0.2552, over 29608.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1352, cr_loss=0.3784, attn_decoder_loss=0.2501, over 5804032.41 frames. ], batch size: 86, lr: 5.33e-03, grad_scale: 8.0 2024-09-18 04:24:08,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.602e+01 9.157e+01 9.632e+01 2.068e+02, threshold=1.831e+02, percent-clipped=2.0 2024-09-18 04:24:27,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368080.0, ans=0.1 2024-09-18 04:24:28,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.93 vs. limit=15.0 2024-09-18 04:24:32,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=368080.0, ans=0.07 2024-09-18 04:24:34,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2024-09-18 04:24:47,494 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:24:55,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=368160.0, ans=0.0 2024-09-18 04:24:56,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=368160.0, ans=0.0 2024-09-18 04:25:11,419 INFO [train.py:1198] (0/2) Epoch 21, batch 1550, loss[loss=0.2681, ctc_loss=0.1524, cr_loss=0.4118, attn_decoder_loss=0.2718, over 29504.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1358, cr_loss=0.3792, attn_decoder_loss=0.2503, over 5780367.76 frames. ], batch size: 90, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:25:16,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=368200.0, ans=0.0 2024-09-18 04:25:21,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2024-09-18 04:25:28,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=368240.0, ans=0.125 2024-09-18 04:25:31,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-09-18 04:25:40,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=368280.0, ans=0.2 2024-09-18 04:25:57,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=368320.0, ans=0.125 2024-09-18 04:26:14,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-09-18 04:26:17,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=12.0 2024-09-18 04:26:24,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-18 04:26:31,469 INFO [train.py:1198] (0/2) Epoch 21, batch 1600, loss[loss=0.2542, ctc_loss=0.1386, cr_loss=0.3837, attn_decoder_loss=0.2585, over 29670.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1358, cr_loss=0.3792, attn_decoder_loss=0.25, over 5764326.21 frames. ], batch size: 85, lr: 5.32e-03, grad_scale: 16.0 2024-09-18 04:26:33,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368400.0, ans=0.125 2024-09-18 04:26:39,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=368400.0, ans=0.125 2024-09-18 04:26:39,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=368400.0, ans=0.025 2024-09-18 04:26:46,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.519e+01 9.030e+01 9.960e+01 2.636e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-18 04:27:23,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=368520.0, ans=0.0 2024-09-18 04:27:29,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=368520.0, ans=0.05 2024-09-18 04:27:41,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=368560.0, ans=0.04949747468305833 2024-09-18 04:27:47,304 INFO [train.py:1198] (0/2) Epoch 21, batch 1650, loss[loss=0.2437, ctc_loss=0.121, cr_loss=0.3539, attn_decoder_loss=0.2494, over 29704.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1354, cr_loss=0.3786, attn_decoder_loss=0.2495, over 5758713.34 frames. ], batch size: 89, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:27:53,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=368600.0, ans=0.125 2024-09-18 04:28:02,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=368640.0, ans=0.125 2024-09-18 04:28:33,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=368720.0, ans=0.125 2024-09-18 04:28:47,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-09-18 04:28:50,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=15.0 2024-09-18 04:29:03,548 INFO [train.py:1198] (0/2) Epoch 21, batch 1700, loss[loss=0.2152, ctc_loss=0.1165, cr_loss=0.3589, attn_decoder_loss=0.2182, over 29575.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1346, cr_loss=0.3774, attn_decoder_loss=0.2493, over 5782511.51 frames. ], batch size: 69, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:29:18,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.456e+01 9.072e+01 9.555e+01 1.411e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 04:30:08,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=368960.0, ans=0.025 2024-09-18 04:30:10,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=368960.0, ans=0.0 2024-09-18 04:30:13,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=368960.0, ans=0.125 2024-09-18 04:30:19,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=368960.0, ans=0.0 2024-09-18 04:30:23,632 INFO [train.py:1198] (0/2) Epoch 21, batch 1750, loss[loss=0.2159, ctc_loss=0.111, cr_loss=0.3279, attn_decoder_loss=0.2203, over 29384.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1346, cr_loss=0.3782, attn_decoder_loss=0.2494, over 5789846.95 frames. ], batch size: 67, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:30:24,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=369000.0, ans=0.125 2024-09-18 04:30:35,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=369000.0, ans=0.0 2024-09-18 04:30:51,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369040.0, ans=0.125 2024-09-18 04:30:55,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369080.0, ans=0.125 2024-09-18 04:30:55,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=369080.0, ans=0.05 2024-09-18 04:31:32,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2024-09-18 04:31:37,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369200.0, ans=0.1 2024-09-18 04:31:38,782 INFO [train.py:1198] (0/2) Epoch 21, batch 1800, loss[loss=0.2666, ctc_loss=0.1566, cr_loss=0.4359, attn_decoder_loss=0.2692, over 29683.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1344, cr_loss=0.3773, attn_decoder_loss=0.2492, over 5792322.69 frames. ], batch size: 83, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:31:47,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-18 04:31:49,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369200.0, ans=0.125 2024-09-18 04:31:51,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=369200.0, ans=0.0 2024-09-18 04:31:54,056 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.570e+01 9.201e+01 9.986e+01 1.467e+02, threshold=1.840e+02, percent-clipped=0.0 2024-09-18 04:32:11,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.87 vs. limit=15.0 2024-09-18 04:32:20,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369280.0, ans=0.1 2024-09-18 04:32:54,989 INFO [train.py:1198] (0/2) Epoch 21, batch 1850, loss[loss=0.2527, ctc_loss=0.1371, cr_loss=0.3844, attn_decoder_loss=0.257, over 29628.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1346, cr_loss=0.3777, attn_decoder_loss=0.2492, over 5797270.39 frames. ], batch size: 86, lr: 5.32e-03, grad_scale: 8.0 2024-09-18 04:33:11,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=369440.0, ans=0.125 2024-09-18 04:33:21,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=369440.0, ans=0.0 2024-09-18 04:33:26,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-09-18 04:33:46,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-09-18 04:34:12,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=369560.0, ans=0.125 2024-09-18 04:34:12,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369560.0, ans=0.125 2024-09-18 04:34:15,227 INFO [train.py:1198] (0/2) Epoch 21, batch 1900, loss[loss=0.2619, ctc_loss=0.143, cr_loss=0.4013, attn_decoder_loss=0.2662, over 29718.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1352, cr_loss=0.3784, attn_decoder_loss=0.2501, over 5805525.55 frames. ], batch size: 89, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:34:17,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-09-18 04:34:30,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.618e+01 9.006e+01 9.728e+01 3.211e+02, threshold=1.801e+02, percent-clipped=2.0 2024-09-18 04:35:13,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=369720.0, ans=0.125 2024-09-18 04:35:25,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=369760.0, ans=0.2 2024-09-18 04:35:31,126 INFO [train.py:1198] (0/2) Epoch 21, batch 1950, loss[loss=0.2425, ctc_loss=0.1335, cr_loss=0.3849, attn_decoder_loss=0.2461, over 29434.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.136, cr_loss=0.3802, attn_decoder_loss=0.251, over 5820038.99 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:35:40,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=369800.0, ans=0.125 2024-09-18 04:36:00,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=369880.0, ans=0.07 2024-09-18 04:36:05,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.04 vs. limit=10.0 2024-09-18 04:36:45,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=370000.0, ans=0.125 2024-09-18 04:36:46,465 INFO [train.py:1198] (0/2) Epoch 21, batch 2000, loss[loss=0.2208, ctc_loss=0.1189, cr_loss=0.3429, attn_decoder_loss=0.2245, over 29340.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1362, cr_loss=0.3802, attn_decoder_loss=0.2511, over 5798736.27 frames. ], batch size: 67, lr: 5.31e-03, grad_scale: 16.0 2024-09-18 04:37:01,560 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 8.831e+01 9.227e+01 9.765e+01 5.439e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-18 04:37:04,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=370040.0, ans=0.0 2024-09-18 04:37:49,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2024-09-18 04:38:05,839 INFO [train.py:1198] (0/2) Epoch 21, batch 2050, loss[loss=0.2112, ctc_loss=0.1036, cr_loss=0.3155, attn_decoder_loss=0.2161, over 29405.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1352, cr_loss=0.378, attn_decoder_loss=0.2499, over 5788255.34 frames. ], batch size: 70, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:38:12,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=370200.0, ans=0.125 2024-09-18 04:38:33,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=370240.0, ans=0.0 2024-09-18 04:38:39,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=370280.0, ans=0.125 2024-09-18 04:38:44,072 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:38:44,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=22.5 2024-09-18 04:39:21,719 INFO [train.py:1198] (0/2) Epoch 21, batch 2100, loss[loss=0.239, ctc_loss=0.1282, cr_loss=0.35, attn_decoder_loss=0.2435, over 29756.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1351, cr_loss=0.3776, attn_decoder_loss=0.2497, over 5800597.69 frames. ], batch size: 81, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:39:38,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.297e+01 8.835e+01 9.326e+01 1.551e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-18 04:40:10,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=370520.0, ans=0.125 2024-09-18 04:40:11,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=370520.0, ans=0.1 2024-09-18 04:40:33,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2024-09-18 04:40:37,246 INFO [train.py:1198] (0/2) Epoch 21, batch 2150, loss[loss=0.237, ctc_loss=0.1317, cr_loss=0.3855, attn_decoder_loss=0.2402, over 29463.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1344, cr_loss=0.3765, attn_decoder_loss=0.2491, over 5815216.16 frames. ], batch size: 78, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:41:00,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=370640.0, ans=0.0 2024-09-18 04:41:03,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=370640.0, ans=0.125 2024-09-18 04:41:11,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2024-09-18 04:41:20,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-09-18 04:41:54,703 INFO [train.py:1198] (0/2) Epoch 21, batch 2200, loss[loss=0.252, ctc_loss=0.1367, cr_loss=0.364, attn_decoder_loss=0.2567, over 29640.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.135, cr_loss=0.3775, attn_decoder_loss=0.2494, over 5811852.56 frames. ], batch size: 86, lr: 5.31e-03, grad_scale: 8.0 2024-09-18 04:42:02,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-09-18 04:42:13,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.585e+01 9.031e+01 9.683e+01 2.928e+02, threshold=1.806e+02, percent-clipped=3.0 2024-09-18 04:42:27,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=370880.0, ans=0.025 2024-09-18 04:42:50,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=370920.0, ans=0.125 2024-09-18 04:42:52,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-09-18 04:43:12,504 INFO [train.py:1198] (0/2) Epoch 21, batch 2250, loss[loss=0.2617, ctc_loss=0.149, cr_loss=0.4164, attn_decoder_loss=0.265, over 29713.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1347, cr_loss=0.3768, attn_decoder_loss=0.2494, over 5810428.16 frames. ], batch size: 82, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:43:20,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371000.0, ans=0.1 2024-09-18 04:43:24,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=371000.0, ans=0.125 2024-09-18 04:43:33,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=371040.0, ans=0.0 2024-09-18 04:43:35,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=371040.0, ans=0.125 2024-09-18 04:43:54,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=371080.0, ans=0.1 2024-09-18 04:44:06,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2024-09-18 04:44:28,725 INFO [train.py:1198] (0/2) Epoch 21, batch 2300, loss[loss=0.2097, ctc_loss=0.1065, cr_loss=0.3099, attn_decoder_loss=0.2142, over 29308.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1339, cr_loss=0.3749, attn_decoder_loss=0.2483, over 5799808.97 frames. ], batch size: 71, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:44:33,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=371200.0, ans=0.125 2024-09-18 04:44:40,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-09-18 04:44:45,172 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.450e+01 8.935e+01 9.776e+01 2.210e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 04:45:10,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2024-09-18 04:45:12,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=371320.0, ans=0.0 2024-09-18 04:45:31,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=371360.0, ans=0.125 2024-09-18 04:45:46,540 INFO [train.py:1198] (0/2) Epoch 21, batch 2350, loss[loss=0.2625, ctc_loss=0.1527, cr_loss=0.4052, attn_decoder_loss=0.2657, over 29697.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1339, cr_loss=0.3755, attn_decoder_loss=0.2485, over 5805105.83 frames. ], batch size: 83, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:45:54,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=371400.0, ans=0.1 2024-09-18 04:45:55,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=371400.0, ans=0.2 2024-09-18 04:45:58,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=371400.0, ans=0.04949747468305833 2024-09-18 04:46:02,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-09-18 04:46:08,880 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:46:22,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=371480.0, ans=0.0 2024-09-18 04:46:36,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=371520.0, ans=0.05 2024-09-18 04:47:00,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=371560.0, ans=0.125 2024-09-18 04:47:04,812 INFO [train.py:1198] (0/2) Epoch 21, batch 2400, loss[loss=0.2303, ctc_loss=0.1245, cr_loss=0.3388, attn_decoder_loss=0.2346, over 29556.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1346, cr_loss=0.3772, attn_decoder_loss=0.2491, over 5808906.62 frames. ], batch size: 76, lr: 5.30e-03, grad_scale: 16.0 2024-09-18 04:47:11,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=12.0 2024-09-18 04:47:21,500 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.473e+01 9.186e+01 9.665e+01 3.026e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 04:47:37,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=371680.0, ans=0.125 2024-09-18 04:47:37,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=371680.0, ans=0.125 2024-09-18 04:47:40,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=371680.0, ans=0.07 2024-09-18 04:47:55,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=371720.0, ans=0.125 2024-09-18 04:48:04,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371760.0, ans=0.125 2024-09-18 04:48:20,901 INFO [train.py:1198] (0/2) Epoch 21, batch 2450, loss[loss=0.2486, ctc_loss=0.1296, cr_loss=0.3744, attn_decoder_loss=0.2535, over 29699.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1352, cr_loss=0.3777, attn_decoder_loss=0.25, over 5786183.88 frames. ], batch size: 82, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:48:27,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-18 04:48:33,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=371800.0, ans=0.125 2024-09-18 04:49:14,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.36 vs. limit=22.5 2024-09-18 04:49:19,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=371920.0, ans=0.125 2024-09-18 04:49:38,820 INFO [train.py:1198] (0/2) Epoch 21, batch 2500, loss[loss=0.2543, ctc_loss=0.1367, cr_loss=0.3625, attn_decoder_loss=0.2593, over 29627.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1349, cr_loss=0.3775, attn_decoder_loss=0.2497, over 5795549.00 frames. ], batch size: 86, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:49:45,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=372000.0, ans=0.125 2024-09-18 04:49:55,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=372040.0, ans=0.1 2024-09-18 04:49:59,136 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.495e+01 9.101e+01 9.738e+01 1.875e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 04:50:07,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=372040.0, ans=0.0 2024-09-18 04:50:19,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=372080.0, ans=0.125 2024-09-18 04:50:57,254 INFO [train.py:1198] (0/2) Epoch 21, batch 2550, loss[loss=0.2214, ctc_loss=0.1122, cr_loss=0.357, attn_decoder_loss=0.2256, over 29308.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1343, cr_loss=0.3767, attn_decoder_loss=0.2496, over 5797809.81 frames. ], batch size: 67, lr: 5.30e-03, grad_scale: 8.0 2024-09-18 04:51:10,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=372240.0, ans=0.0 2024-09-18 04:51:15,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=372240.0, ans=0.125 2024-09-18 04:51:22,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=372240.0, ans=0.125 2024-09-18 04:51:23,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=372240.0, ans=0.2 2024-09-18 04:51:30,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372280.0, ans=0.1 2024-09-18 04:52:04,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=372360.0, ans=0.125 2024-09-18 04:52:09,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-09-18 04:52:13,127 INFO [train.py:1198] (0/2) Epoch 21, batch 2600, loss[loss=0.2404, ctc_loss=0.1301, cr_loss=0.3978, attn_decoder_loss=0.2439, over 29467.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1345, cr_loss=0.3773, attn_decoder_loss=0.2498, over 5794428.99 frames. ], batch size: 78, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:52:16,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=372400.0, ans=0.2 2024-09-18 04:52:28,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=372440.0, ans=0.125 2024-09-18 04:52:31,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 8.657e+01 9.187e+01 9.794e+01 2.069e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 04:52:34,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=372440.0, ans=0.125 2024-09-18 04:52:44,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=372480.0, ans=0.0 2024-09-18 04:52:54,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372480.0, ans=0.1 2024-09-18 04:53:00,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=372520.0, ans=0.125 2024-09-18 04:53:02,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=372520.0, ans=0.025 2024-09-18 04:53:02,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-09-18 04:53:08,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=372520.0, ans=0.1 2024-09-18 04:53:18,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=372560.0, ans=0.125 2024-09-18 04:53:30,485 INFO [train.py:1198] (0/2) Epoch 21, batch 2650, loss[loss=0.2641, ctc_loss=0.1499, cr_loss=0.4085, attn_decoder_loss=0.2678, over 29302.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1349, cr_loss=0.378, attn_decoder_loss=0.2502, over 5801669.82 frames. ], batch size: 100, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:53:43,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=372600.0, ans=0.05 2024-09-18 04:53:56,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-09-18 04:54:01,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=372680.0, ans=0.0 2024-09-18 04:54:08,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.86 vs. limit=15.0 2024-09-18 04:54:12,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=372680.0, ans=0.125 2024-09-18 04:54:15,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-09-18 04:54:22,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=372720.0, ans=0.125 2024-09-18 04:54:29,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-18 04:54:31,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=372760.0, ans=0.125 2024-09-18 04:54:42,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=372760.0, ans=0.0 2024-09-18 04:54:48,398 INFO [train.py:1198] (0/2) Epoch 21, batch 2700, loss[loss=0.2517, ctc_loss=0.1462, cr_loss=0.3928, attn_decoder_loss=0.2547, over 29544.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1351, cr_loss=0.3785, attn_decoder_loss=0.2506, over 5798070.67 frames. ], batch size: 87, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:54:51,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=372800.0, ans=0.07 2024-09-18 04:55:06,490 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.585e+01 9.069e+01 9.661e+01 1.375e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 04:55:12,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=372840.0, ans=0.125 2024-09-18 04:55:16,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-18 04:55:36,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=372920.0, ans=0.125 2024-09-18 04:55:55,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=372960.0, ans=0.2 2024-09-18 04:56:04,540 INFO [train.py:1198] (0/2) Epoch 21, batch 2750, loss[loss=0.2408, ctc_loss=0.136, cr_loss=0.385, attn_decoder_loss=0.2439, over 29493.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1342, cr_loss=0.3768, attn_decoder_loss=0.2493, over 5795978.54 frames. ], batch size: 75, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:56:06,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=373000.0, ans=0.1 2024-09-18 04:56:06,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373000.0, ans=0.1 2024-09-18 04:56:29,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=373040.0, ans=0.0 2024-09-18 04:56:32,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-09-18 04:56:48,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=373120.0, ans=0.125 2024-09-18 04:57:17,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=373160.0, ans=0.0 2024-09-18 04:57:19,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373160.0, ans=0.125 2024-09-18 04:57:22,200 INFO [train.py:1198] (0/2) Epoch 21, batch 2800, loss[loss=0.2648, ctc_loss=0.1701, cr_loss=0.3938, attn_decoder_loss=0.2666, over 20423.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1345, cr_loss=0.3771, attn_decoder_loss=0.2495, over 5776161.96 frames. ], batch size: 209, lr: 5.29e-03, grad_scale: 16.0 2024-09-18 04:57:22,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=373200.0, ans=0.125 2024-09-18 04:57:22,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=373200.0, ans=0.0 2024-09-18 04:57:24,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=373200.0, ans=0.0 2024-09-18 04:57:26,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=373200.0, ans=0.0 2024-09-18 04:57:43,971 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.642e+01 9.187e+01 1.013e+02 2.371e+02, threshold=1.837e+02, percent-clipped=3.0 2024-09-18 04:57:54,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=373280.0, ans=0.125 2024-09-18 04:58:01,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=373280.0, ans=0.0 2024-09-18 04:58:10,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=373320.0, ans=0.125 2024-09-18 04:58:19,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=373320.0, ans=0.2 2024-09-18 04:58:25,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.54 vs. limit=15.0 2024-09-18 04:58:29,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=373360.0, ans=0.125 2024-09-18 04:58:35,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=373360.0, ans=0.0 2024-09-18 04:58:40,012 INFO [train.py:1198] (0/2) Epoch 21, batch 2850, loss[loss=0.2373, ctc_loss=0.1332, cr_loss=0.3872, attn_decoder_loss=0.2403, over 29500.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1353, cr_loss=0.3784, attn_decoder_loss=0.2502, over 5763090.41 frames. ], batch size: 77, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:58:52,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=373400.0, ans=0.1 2024-09-18 04:59:01,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=373440.0, ans=0.125 2024-09-18 04:59:21,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=373480.0, ans=0.0 2024-09-18 04:59:21,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=373480.0, ans=0.125 2024-09-18 04:59:24,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=373520.0, ans=0.1 2024-09-18 04:59:36,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.73 vs. limit=15.0 2024-09-18 04:59:56,343 INFO [train.py:1198] (0/2) Epoch 21, batch 2900, loss[loss=0.2397, ctc_loss=0.1203, cr_loss=0.3394, attn_decoder_loss=0.2454, over 29448.00 frames. ], tot_loss[loss=0.2476, ctc_loss=0.1362, cr_loss=0.3803, attn_decoder_loss=0.2515, over 5787794.53 frames. ], batch size: 79, lr: 5.29e-03, grad_scale: 8.0 2024-09-18 04:59:58,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=373600.0, ans=0.125 2024-09-18 05:00:07,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-18 05:00:15,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.588e+01 9.125e+01 9.672e+01 3.101e+02, threshold=1.825e+02, percent-clipped=2.0 2024-09-18 05:00:23,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=373640.0, ans=0.0 2024-09-18 05:00:36,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=373680.0, ans=0.5 2024-09-18 05:00:39,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=373680.0, ans=0.0 2024-09-18 05:01:05,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=373760.0, ans=0.125 2024-09-18 05:01:11,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=373760.0, ans=0.125 2024-09-18 05:01:13,987 INFO [train.py:1198] (0/2) Epoch 21, batch 2950, loss[loss=0.2414, ctc_loss=0.1381, cr_loss=0.3949, attn_decoder_loss=0.2441, over 29520.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1349, cr_loss=0.3776, attn_decoder_loss=0.2501, over 5782732.05 frames. ], batch size: 75, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:01:17,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=373800.0, ans=0.025 2024-09-18 05:01:26,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373800.0, ans=0.125 2024-09-18 05:01:30,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=373840.0, ans=0.125 2024-09-18 05:01:54,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=373880.0, ans=0.07 2024-09-18 05:02:00,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=373920.0, ans=0.125 2024-09-18 05:02:06,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=373920.0, ans=0.0 2024-09-18 05:02:06,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.34 vs. limit=12.0 2024-09-18 05:02:10,898 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:02:14,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=15.0 2024-09-18 05:02:31,643 INFO [train.py:1198] (0/2) Epoch 21, batch 3000, loss[loss=0.2381, ctc_loss=0.1232, cr_loss=0.3447, attn_decoder_loss=0.2433, over 29763.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1347, cr_loss=0.3776, attn_decoder_loss=0.2498, over 5784376.39 frames. ], batch size: 81, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:02:31,643 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 05:02:50,165 INFO [train.py:1230] (0/2) Epoch 21, validation: loss=0.2116, ctc_loss=0.03952, cr_loss=5.001e-15, attn_decoder_loss=0.2307, over 944034.00 frames. 2024-09-18 05:02:50,165 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 05:02:56,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=374000.0, ans=0.0 2024-09-18 05:03:07,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=374040.0, ans=0.0 2024-09-18 05:03:10,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.620e+01 9.343e+01 9.937e+01 2.049e+02, threshold=1.869e+02, percent-clipped=2.0 2024-09-18 05:03:23,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.53 vs. limit=15.0 2024-09-18 05:03:24,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=374080.0, ans=0.125 2024-09-18 05:03:36,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=374120.0, ans=0.125 2024-09-18 05:04:06,327 INFO [train.py:1198] (0/2) Epoch 21, batch 3050, loss[loss=0.2358, ctc_loss=0.1261, cr_loss=0.3658, attn_decoder_loss=0.2399, over 29547.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1352, cr_loss=0.3785, attn_decoder_loss=0.2502, over 5778337.82 frames. ], batch size: 76, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:04:14,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=374200.0, ans=0.125 2024-09-18 05:04:29,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=374240.0, ans=0.0 2024-09-18 05:04:32,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=374240.0, ans=0.0 2024-09-18 05:04:50,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=374280.0, ans=0.0 2024-09-18 05:05:07,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=374360.0, ans=0.0 2024-09-18 05:05:15,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=374360.0, ans=0.0 2024-09-18 05:05:26,567 INFO [train.py:1198] (0/2) Epoch 21, batch 3100, loss[loss=0.2656, ctc_loss=0.152, cr_loss=0.4122, attn_decoder_loss=0.2691, over 29273.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1355, cr_loss=0.3795, attn_decoder_loss=0.2503, over 5776717.52 frames. ], batch size: 100, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:05:26,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=374400.0, ans=0.2 2024-09-18 05:05:45,909 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.504e+01 9.125e+01 9.577e+01 2.431e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-18 05:05:48,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-09-18 05:05:50,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=374440.0, ans=0.1 2024-09-18 05:06:03,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2024-09-18 05:06:03,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.35 vs. limit=15.0 2024-09-18 05:06:21,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.24 vs. limit=15.0 2024-09-18 05:06:35,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-18 05:06:36,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=374560.0, ans=0.0 2024-09-18 05:06:36,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=374560.0, ans=0.0 2024-09-18 05:06:41,991 INFO [train.py:1198] (0/2) Epoch 21, batch 3150, loss[loss=0.2636, ctc_loss=0.1513, cr_loss=0.4108, attn_decoder_loss=0.267, over 28842.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1355, cr_loss=0.379, attn_decoder_loss=0.2503, over 5782803.36 frames. ], batch size: 104, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:06:49,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=374600.0, ans=0.0 2024-09-18 05:06:58,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2024-09-18 05:07:23,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=374680.0, ans=0.5 2024-09-18 05:07:44,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=374760.0, ans=0.125 2024-09-18 05:07:47,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=374760.0, ans=0.1 2024-09-18 05:07:57,657 INFO [train.py:1198] (0/2) Epoch 21, batch 3200, loss[loss=0.2377, ctc_loss=0.1256, cr_loss=0.3625, attn_decoder_loss=0.2421, over 29412.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1348, cr_loss=0.3777, attn_decoder_loss=0.2496, over 5792494.88 frames. ], batch size: 79, lr: 5.28e-03, grad_scale: 16.0 2024-09-18 05:08:00,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=374800.0, ans=0.125 2024-09-18 05:08:04,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2024-09-18 05:08:06,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374800.0, ans=0.1 2024-09-18 05:08:20,849 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.671e+01 9.297e+01 1.015e+02 2.448e+02, threshold=1.859e+02, percent-clipped=1.0 2024-09-18 05:08:35,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=374880.0, ans=0.125 2024-09-18 05:08:38,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-09-18 05:08:50,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-09-18 05:09:15,156 INFO [train.py:1198] (0/2) Epoch 21, batch 3250, loss[loss=0.2509, ctc_loss=0.13, cr_loss=0.3631, attn_decoder_loss=0.2563, over 29694.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.135, cr_loss=0.3784, attn_decoder_loss=0.2502, over 5799255.11 frames. ], batch size: 84, lr: 5.28e-03, grad_scale: 8.0 2024-09-18 05:09:33,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=375040.0, ans=0.05 2024-09-18 05:09:36,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=375040.0, ans=0.0 2024-09-18 05:09:46,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375080.0, ans=0.1 2024-09-18 05:09:46,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=375080.0, ans=0.025 2024-09-18 05:09:49,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=375080.0, ans=0.125 2024-09-18 05:10:06,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375120.0, ans=0.125 2024-09-18 05:10:26,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=375160.0, ans=0.2 2024-09-18 05:10:33,521 INFO [train.py:1198] (0/2) Epoch 21, batch 3300, loss[loss=0.2513, ctc_loss=0.1335, cr_loss=0.3561, attn_decoder_loss=0.2565, over 28294.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1342, cr_loss=0.3767, attn_decoder_loss=0.2491, over 5797491.37 frames. ], batch size: 111, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:10:52,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=375240.0, ans=0.0 2024-09-18 05:10:54,883 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.586e+01 9.172e+01 9.727e+01 2.274e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 05:10:58,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-18 05:11:37,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375360.0, ans=0.0 2024-09-18 05:11:48,833 INFO [train.py:1198] (0/2) Epoch 21, batch 3350, loss[loss=0.2599, ctc_loss=0.1377, cr_loss=0.3905, attn_decoder_loss=0.2648, over 28890.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1349, cr_loss=0.3779, attn_decoder_loss=0.2498, over 5773235.32 frames. ], batch size: 104, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:12:04,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=375440.0, ans=0.5 2024-09-18 05:12:12,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=375440.0, ans=0.1 2024-09-18 05:12:36,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=375520.0, ans=0.125 2024-09-18 05:12:54,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=375560.0, ans=0.0 2024-09-18 05:13:00,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=375560.0, ans=0.0 2024-09-18 05:13:05,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=375600.0, ans=0.0 2024-09-18 05:13:06,609 INFO [train.py:1198] (0/2) Epoch 21, batch 3400, loss[loss=0.2189, ctc_loss=0.1085, cr_loss=0.3326, attn_decoder_loss=0.2238, over 29337.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1349, cr_loss=0.3778, attn_decoder_loss=0.2499, over 5766623.43 frames. ], batch size: 67, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:13:25,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=375640.0, ans=0.125 2024-09-18 05:13:29,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.485e+01 9.062e+01 9.587e+01 1.561e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 05:13:38,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.51 vs. limit=10.0 2024-09-18 05:13:45,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=375680.0, ans=0.125 2024-09-18 05:14:02,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=375720.0, ans=0.1 2024-09-18 05:14:08,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=375760.0, ans=0.025 2024-09-18 05:14:24,505 INFO [train.py:1198] (0/2) Epoch 21, batch 3450, loss[loss=0.2485, ctc_loss=0.1359, cr_loss=0.3723, attn_decoder_loss=0.2527, over 28304.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1352, cr_loss=0.3787, attn_decoder_loss=0.2502, over 5774468.58 frames. ], batch size: 111, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:14:32,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=375800.0, ans=0.0 2024-09-18 05:14:57,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375880.0, ans=0.125 2024-09-18 05:15:20,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=375920.0, ans=0.1 2024-09-18 05:15:21,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.00 vs. limit=6.0 2024-09-18 05:15:37,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=375960.0, ans=0.125 2024-09-18 05:15:40,542 INFO [train.py:1198] (0/2) Epoch 21, batch 3500, loss[loss=0.2194, ctc_loss=0.1096, cr_loss=0.3308, attn_decoder_loss=0.2242, over 29299.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1348, cr_loss=0.378, attn_decoder_loss=0.2494, over 5776519.19 frames. ], batch size: 71, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:16:03,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.729e+01 9.303e+01 9.808e+01 4.681e+02, threshold=1.861e+02, percent-clipped=2.0 2024-09-18 05:16:05,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=376040.0, ans=0.05 2024-09-18 05:16:26,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=376120.0, ans=0.1 2024-09-18 05:16:27,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.61 vs. limit=12.0 2024-09-18 05:16:34,566 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-09-18 05:16:57,298 INFO [train.py:1198] (0/2) Epoch 21, batch 3550, loss[loss=0.2606, ctc_loss=0.1394, cr_loss=0.39, attn_decoder_loss=0.2654, over 29715.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1344, cr_loss=0.3773, attn_decoder_loss=0.2493, over 5783278.99 frames. ], batch size: 89, lr: 5.27e-03, grad_scale: 8.0 2024-09-18 05:16:57,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=376200.0, ans=0.1 2024-09-18 05:17:02,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=376200.0, ans=0.125 2024-09-18 05:17:04,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-09-18 05:17:19,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=376240.0, ans=0.2 2024-09-18 05:17:34,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=376280.0, ans=0.05 2024-09-18 05:18:00,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=376360.0, ans=0.125 2024-09-18 05:18:13,899 INFO [train.py:1198] (0/2) Epoch 21, batch 3600, loss[loss=0.2313, ctc_loss=0.1187, cr_loss=0.3314, attn_decoder_loss=0.2365, over 29508.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1344, cr_loss=0.3773, attn_decoder_loss=0.2495, over 5793314.89 frames. ], batch size: 77, lr: 5.27e-03, grad_scale: 16.0 2024-09-18 05:18:29,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-09-18 05:18:34,845 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.337e+01 8.787e+01 9.364e+01 1.302e+02, threshold=1.757e+02, percent-clipped=0.0 2024-09-18 05:18:47,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2024-09-18 05:19:10,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=376520.0, ans=0.0 2024-09-18 05:19:28,201 INFO [train.py:1198] (0/2) Epoch 21, batch 3650, loss[loss=0.259, ctc_loss=0.143, cr_loss=0.4022, attn_decoder_loss=0.263, over 29497.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.134, cr_loss=0.3765, attn_decoder_loss=0.249, over 5794509.55 frames. ], batch size: 90, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:19:36,021 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:19:38,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=376600.0, ans=0.025 2024-09-18 05:19:56,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=376680.0, ans=0.0 2024-09-18 05:20:01,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=376680.0, ans=0.2 2024-09-18 05:20:20,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=376720.0, ans=0.2 2024-09-18 05:20:28,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=376760.0, ans=0.2 2024-09-18 05:20:42,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=376800.0, ans=0.125 2024-09-18 05:20:43,489 INFO [train.py:1198] (0/2) Epoch 21, batch 3700, loss[loss=0.2511, ctc_loss=0.1319, cr_loss=0.3778, attn_decoder_loss=0.2559, over 29697.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1341, cr_loss=0.3766, attn_decoder_loss=0.2491, over 5804887.64 frames. ], batch size: 84, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:21:05,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.517e+01 9.022e+01 9.849e+01 1.949e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-18 05:21:16,657 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:21:23,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=376880.0, ans=0.09899494936611666 2024-09-18 05:21:33,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=22.5 2024-09-18 05:21:57,754 INFO [train.py:1198] (0/2) Epoch 21, batch 3750, loss[loss=0.2197, ctc_loss=0.1098, cr_loss=0.3279, attn_decoder_loss=0.2246, over 29330.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1342, cr_loss=0.3766, attn_decoder_loss=0.2487, over 5808621.54 frames. ], batch size: 67, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:21:58,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=377000.0, ans=0.025 2024-09-18 05:22:07,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=377000.0, ans=0.125 2024-09-18 05:22:07,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=377000.0, ans=0.025 2024-09-18 05:22:20,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=377040.0, ans=0.0 2024-09-18 05:22:35,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=377080.0, ans=0.125 2024-09-18 05:22:40,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-09-18 05:22:43,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-09-18 05:23:14,090 INFO [train.py:1198] (0/2) Epoch 21, batch 3800, loss[loss=0.2527, ctc_loss=0.1427, cr_loss=0.3801, attn_decoder_loss=0.2565, over 29638.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.134, cr_loss=0.3761, attn_decoder_loss=0.2484, over 5798593.74 frames. ], batch size: 86, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:23:36,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.558e+01 9.240e+01 9.922e+01 2.766e+02, threshold=1.848e+02, percent-clipped=2.0 2024-09-18 05:23:38,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-18 05:23:39,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=377240.0, ans=0.04949747468305833 2024-09-18 05:23:45,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=377280.0, ans=0.125 2024-09-18 05:23:48,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=377280.0, ans=0.125 2024-09-18 05:23:59,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=377320.0, ans=22.5 2024-09-18 05:24:00,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=377320.0, ans=0.0 2024-09-18 05:24:18,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=377360.0, ans=0.125 2024-09-18 05:24:20,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-18 05:24:30,327 INFO [train.py:1198] (0/2) Epoch 21, batch 3850, loss[loss=0.2609, ctc_loss=0.1464, cr_loss=0.3972, attn_decoder_loss=0.2649, over 29293.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1338, cr_loss=0.3762, attn_decoder_loss=0.2484, over 5811831.24 frames. ], batch size: 100, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:24:30,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377400.0, ans=0.1 2024-09-18 05:24:36,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=377400.0, ans=0.0 2024-09-18 05:24:42,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=377400.0, ans=0.09899494936611666 2024-09-18 05:24:49,946 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:25:01,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=377480.0, ans=0.125 2024-09-18 05:25:03,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.58 vs. limit=15.0 2024-09-18 05:25:09,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-09-18 05:25:36,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=377560.0, ans=0.025 2024-09-18 05:25:43,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=377600.0, ans=0.035 2024-09-18 05:25:45,158 INFO [train.py:1198] (0/2) Epoch 21, batch 3900, loss[loss=0.2651, ctc_loss=0.1464, cr_loss=0.4038, attn_decoder_loss=0.2693, over 29647.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1343, cr_loss=0.3769, attn_decoder_loss=0.2489, over 5816091.42 frames. ], batch size: 86, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:25:55,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.10 vs. limit=10.0 2024-09-18 05:26:07,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 8.671e+01 9.111e+01 9.603e+01 1.300e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-18 05:26:10,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-18 05:26:34,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=377720.0, ans=0.125 2024-09-18 05:26:51,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=377760.0, ans=0.0 2024-09-18 05:26:57,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.75 vs. limit=22.5 2024-09-18 05:26:59,572 INFO [train.py:1198] (0/2) Epoch 21, batch 3950, loss[loss=0.2454, ctc_loss=0.1282, cr_loss=0.3519, attn_decoder_loss=0.2506, over 29494.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1339, cr_loss=0.3759, attn_decoder_loss=0.2491, over 5835666.96 frames. ], batch size: 97, lr: 5.26e-03, grad_scale: 8.0 2024-09-18 05:27:52,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=377920.0, ans=0.0 2024-09-18 05:28:09,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-09-18 05:28:14,618 INFO [train.py:1198] (0/2) Epoch 21, batch 4000, loss[loss=0.228, ctc_loss=0.1166, cr_loss=0.3577, attn_decoder_loss=0.2324, over 29499.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1343, cr_loss=0.3763, attn_decoder_loss=0.2494, over 5813263.39 frames. ], batch size: 74, lr: 5.26e-03, grad_scale: 16.0 2024-09-18 05:28:26,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=378000.0, ans=0.0 2024-09-18 05:28:38,247 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.637e+01 9.105e+01 9.736e+01 3.809e+02, threshold=1.821e+02, percent-clipped=2.0 2024-09-18 05:28:40,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-09-18 05:28:56,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=378080.0, ans=0.125 2024-09-18 05:29:03,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=378120.0, ans=0.2 2024-09-18 05:29:06,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=12.0 2024-09-18 05:29:17,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378160.0, ans=0.1 2024-09-18 05:29:18,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=378160.0, ans=0.125 2024-09-18 05:29:29,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=378200.0, ans=0.2 2024-09-18 05:29:30,122 INFO [train.py:1198] (0/2) Epoch 21, batch 4050, loss[loss=0.2778, ctc_loss=0.1843, cr_loss=0.4239, attn_decoder_loss=0.2788, over 20109.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.134, cr_loss=0.3754, attn_decoder_loss=0.249, over 5795487.91 frames. ], batch size: 209, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:29:47,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=378240.0, ans=0.025 2024-09-18 05:29:49,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=378240.0, ans=0.125 2024-09-18 05:29:55,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=378240.0, ans=0.125 2024-09-18 05:29:58,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=378280.0, ans=0.0 2024-09-18 05:30:28,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=378360.0, ans=0.0 2024-09-18 05:30:39,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=378360.0, ans=0.025 2024-09-18 05:30:44,000 INFO [train.py:1198] (0/2) Epoch 21, batch 4100, loss[loss=0.2565, ctc_loss=0.1427, cr_loss=0.4077, attn_decoder_loss=0.2601, over 29501.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1341, cr_loss=0.3755, attn_decoder_loss=0.249, over 5790750.47 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:31:07,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.642e+01 9.337e+01 1.033e+02 5.468e+02, threshold=1.867e+02, percent-clipped=3.0 2024-09-18 05:31:11,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=22.5 2024-09-18 05:31:32,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378520.0, ans=0.1 2024-09-18 05:31:38,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2024-09-18 05:31:40,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=378520.0, ans=0.0 2024-09-18 05:31:47,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-09-18 05:31:50,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378560.0, ans=0.1 2024-09-18 05:31:58,985 INFO [train.py:1198] (0/2) Epoch 21, batch 4150, loss[loss=0.2401, ctc_loss=0.1358, cr_loss=0.3545, attn_decoder_loss=0.2438, over 29488.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1339, cr_loss=0.3755, attn_decoder_loss=0.2488, over 5796929.89 frames. ], batch size: 77, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:32:09,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=378600.0, ans=0.1 2024-09-18 05:32:12,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-09-18 05:32:37,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-09-18 05:32:58,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=378760.0, ans=0.125 2024-09-18 05:32:59,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=378760.0, ans=0.1 2024-09-18 05:33:11,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=378800.0, ans=0.05 2024-09-18 05:33:12,774 INFO [train.py:1198] (0/2) Epoch 21, batch 4200, loss[loss=0.2593, ctc_loss=0.1498, cr_loss=0.4038, attn_decoder_loss=0.2625, over 29517.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1341, cr_loss=0.3759, attn_decoder_loss=0.249, over 5798684.74 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:33:18,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=378800.0, ans=0.125 2024-09-18 05:33:36,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378840.0, ans=0.1 2024-09-18 05:33:37,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.402e+01 9.063e+01 9.513e+01 1.420e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-18 05:33:43,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=378880.0, ans=0.2 2024-09-18 05:34:03,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-09-18 05:34:07,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=378920.0, ans=0.2 2024-09-18 05:34:27,309 INFO [train.py:1198] (0/2) Epoch 21, batch 4250, loss[loss=0.232, ctc_loss=0.1254, cr_loss=0.3643, attn_decoder_loss=0.2357, over 29515.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1339, cr_loss=0.3763, attn_decoder_loss=0.2492, over 5805102.14 frames. ], batch size: 74, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:34:28,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-09-18 05:34:33,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=379000.0, ans=0.09899494936611666 2024-09-18 05:34:37,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=379000.0, ans=0.125 2024-09-18 05:34:56,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.17 vs. limit=15.0 2024-09-18 05:35:01,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=379080.0, ans=0.125 2024-09-18 05:35:04,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=379080.0, ans=0.09899494936611666 2024-09-18 05:35:17,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379120.0, ans=0.1 2024-09-18 05:35:41,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379200.0, ans=0.1 2024-09-18 05:35:42,510 INFO [train.py:1198] (0/2) Epoch 21, batch 4300, loss[loss=0.2547, ctc_loss=0.1359, cr_loss=0.3904, attn_decoder_loss=0.2592, over 29523.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1338, cr_loss=0.3756, attn_decoder_loss=0.2495, over 5794897.86 frames. ], batch size: 87, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:35:54,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=379200.0, ans=0.1 2024-09-18 05:36:05,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=379240.0, ans=0.125 2024-09-18 05:36:06,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.631e+01 9.482e+01 1.010e+02 4.284e+02, threshold=1.896e+02, percent-clipped=4.0 2024-09-18 05:36:16,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2024-09-18 05:36:45,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=379360.0, ans=0.025 2024-09-18 05:36:51,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=379360.0, ans=0.0 2024-09-18 05:36:57,603 INFO [train.py:1198] (0/2) Epoch 21, batch 4350, loss[loss=0.2615, ctc_loss=0.156, cr_loss=0.4167, attn_decoder_loss=0.2639, over 29429.00 frames. ], tot_loss[loss=0.2486, ctc_loss=0.1365, cr_loss=0.3811, attn_decoder_loss=0.2526, over 5797367.74 frames. ], batch size: 97, lr: 5.25e-03, grad_scale: 8.0 2024-09-18 05:37:02,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2024-09-18 05:37:05,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.5 2024-09-18 05:37:06,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379400.0, ans=0.1 2024-09-18 05:37:24,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=379440.0, ans=0.125 2024-09-18 05:37:27,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2024-09-18 05:37:30,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=379480.0, ans=0.125 2024-09-18 05:37:58,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=379560.0, ans=0.0 2024-09-18 05:38:06,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=379560.0, ans=0.125 2024-09-18 05:38:08,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.02 vs. limit=15.0 2024-09-18 05:38:11,715 INFO [train.py:1198] (0/2) Epoch 21, batch 4400, loss[loss=0.2535, ctc_loss=0.1549, cr_loss=0.4119, attn_decoder_loss=0.2554, over 27265.00 frames. ], tot_loss[loss=0.2505, ctc_loss=0.1379, cr_loss=0.3831, attn_decoder_loss=0.2545, over 5767837.93 frames. ], batch size: 124, lr: 5.24e-03, grad_scale: 16.0 2024-09-18 05:38:29,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=379640.0, ans=0.0 2024-09-18 05:38:34,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.049e+01 8.987e+01 9.326e+01 1.008e+02 3.021e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-18 05:38:35,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=379640.0, ans=0.025 2024-09-18 05:38:39,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=379680.0, ans=0.0 2024-09-18 05:38:41,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=379680.0, ans=0.2 2024-09-18 05:38:54,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=379720.0, ans=0.2 2024-09-18 05:39:17,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=379760.0, ans=0.07 2024-09-18 05:39:18,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=379760.0, ans=0.0 2024-09-18 05:39:25,856 INFO [train.py:1198] (0/2) Epoch 21, batch 4450, loss[loss=0.2695, ctc_loss=0.1702, cr_loss=0.4001, attn_decoder_loss=0.2716, over 19956.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1423, cr_loss=0.3887, attn_decoder_loss=0.2571, over 5579452.04 frames. ], batch size: 210, lr: 5.24e-03, grad_scale: 8.0 2024-09-18 05:39:26,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=379800.0, ans=0.04949747468305833 2024-09-18 05:39:29,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379800.0, ans=0.1 2024-09-18 05:40:01,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=379880.0, ans=0.0 2024-09-18 05:40:30,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=379960.0, ans=0.125 2024-09-18 05:40:30,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=379960.0, ans=0.0 2024-09-18 05:40:41,877 INFO [train.py:1198] (0/2) Epoch 21, batch 4500, loss[loss=0.2713, ctc_loss=0.174, cr_loss=0.403, attn_decoder_loss=0.2732, over 19971.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1471, cr_loss=0.3914, attn_decoder_loss=0.2595, over 5237584.87 frames. ], batch size: 209, lr: 5.24e-03, grad_scale: 8.0 2024-09-18 05:40:57,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=380040.0, ans=0.125 2024-09-18 05:41:00,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=380040.0, ans=0.0 2024-09-18 05:41:07,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.842e+01 1.023e+02 1.116e+02 1.184e+02 1.723e+02, threshold=2.233e+02, percent-clipped=0.0 2024-09-18 05:41:18,838 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-21.pt 2024-09-18 05:42:06,205 INFO [train.py:1198] (0/2) Epoch 22, batch 0, loss[loss=0.2248, ctc_loss=0.1267, cr_loss=0.3587, attn_decoder_loss=0.2278, over 29623.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1267, cr_loss=0.3587, attn_decoder_loss=0.2278, over 29623.00 frames. ], batch size: 73, lr: 5.12e-03, grad_scale: 16.0 2024-09-18 05:42:06,206 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 05:42:13,684 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0586, 3.5827, 3.9292, 3.5171], device='cuda:0') 2024-09-18 05:42:24,648 INFO [train.py:1230] (0/2) Epoch 22, validation: loss=0.212, ctc_loss=0.0382, cr_loss=5.087e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 05:42:24,649 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 05:42:26,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=380100.0, ans=0.0 2024-09-18 05:43:12,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2024-09-18 05:43:35,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-18 05:43:38,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=380260.0, ans=0.125 2024-09-18 05:43:42,237 INFO [train.py:1198] (0/2) Epoch 22, batch 50, loss[loss=0.2047, ctc_loss=0.1007, cr_loss=0.3103, attn_decoder_loss=0.2094, over 29422.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1376, cr_loss=0.3838, attn_decoder_loss=0.2508, over 1268560.95 frames. ], batch size: 70, lr: 5.12e-03, grad_scale: 8.0 2024-09-18 05:43:47,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=380300.0, ans=0.0 2024-09-18 05:44:12,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.47 vs. limit=22.5 2024-09-18 05:44:14,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=380380.0, ans=0.125 2024-09-18 05:44:15,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=380380.0, ans=15.0 2024-09-18 05:44:28,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380420.0, ans=0.1 2024-09-18 05:44:29,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=380420.0, ans=0.2 2024-09-18 05:44:41,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=380460.0, ans=0.125 2024-09-18 05:44:47,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.759e+01 9.355e+01 1.030e+02 2.527e+02, threshold=1.871e+02, percent-clipped=1.0 2024-09-18 05:44:52,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-18 05:44:56,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380500.0, ans=0.1 2024-09-18 05:44:57,980 INFO [train.py:1198] (0/2) Epoch 22, batch 100, loss[loss=0.2304, ctc_loss=0.1275, cr_loss=0.3432, attn_decoder_loss=0.2342, over 29540.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1379, cr_loss=0.3852, attn_decoder_loss=0.2528, over 2253339.33 frames. ], batch size: 76, lr: 5.12e-03, grad_scale: 8.0 2024-09-18 05:45:14,840 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:45:19,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=380540.0, ans=0.0 2024-09-18 05:45:40,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.87 vs. limit=22.5 2024-09-18 05:45:58,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380660.0, ans=0.1 2024-09-18 05:46:09,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=380660.0, ans=0.0 2024-09-18 05:46:12,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=380660.0, ans=0.2 2024-09-18 05:46:12,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=380660.0, ans=0.0 2024-09-18 05:46:17,498 INFO [train.py:1198] (0/2) Epoch 22, batch 150, loss[loss=0.2182, ctc_loss=0.1151, cr_loss=0.3388, attn_decoder_loss=0.2222, over 29425.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1352, cr_loss=0.3797, attn_decoder_loss=0.2502, over 3048314.36 frames. ], batch size: 70, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:46:49,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=380780.0, ans=0.1 2024-09-18 05:47:17,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.13 vs. limit=22.5 2024-09-18 05:47:22,580 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.751e+01 8.602e+01 9.163e+01 9.915e+01 1.341e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-18 05:47:32,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=380900.0, ans=0.025 2024-09-18 05:47:33,203 INFO [train.py:1198] (0/2) Epoch 22, batch 200, loss[loss=0.258, ctc_loss=0.1491, cr_loss=0.4133, attn_decoder_loss=0.2609, over 27451.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1342, cr_loss=0.3776, attn_decoder_loss=0.249, over 3660568.58 frames. ], batch size: 124, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:47:39,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=380900.0, ans=0.2 2024-09-18 05:48:01,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-09-18 05:48:11,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380980.0, ans=0.125 2024-09-18 05:48:12,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-09-18 05:48:12,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=380980.0, ans=0.0 2024-09-18 05:48:21,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381020.0, ans=0.1 2024-09-18 05:48:39,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=381060.0, ans=0.125 2024-09-18 05:48:47,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=381100.0, ans=0.125 2024-09-18 05:48:48,679 INFO [train.py:1198] (0/2) Epoch 22, batch 250, loss[loss=0.2719, ctc_loss=0.149, cr_loss=0.4063, attn_decoder_loss=0.2765, over 29251.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1333, cr_loss=0.3756, attn_decoder_loss=0.2487, over 4142327.25 frames. ], batch size: 100, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:49:01,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=381100.0, ans=0.0 2024-09-18 05:49:12,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=381140.0, ans=0.125 2024-09-18 05:49:19,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.93 vs. limit=12.0 2024-09-18 05:49:37,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=381220.0, ans=0.125 2024-09-18 05:49:52,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=381260.0, ans=0.0 2024-09-18 05:49:56,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 8.533e+01 8.896e+01 9.505e+01 2.232e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-18 05:50:06,898 INFO [train.py:1198] (0/2) Epoch 22, batch 300, loss[loss=0.2616, ctc_loss=0.1495, cr_loss=0.4099, attn_decoder_loss=0.265, over 29500.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1331, cr_loss=0.3754, attn_decoder_loss=0.2486, over 4510616.87 frames. ], batch size: 92, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:50:26,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=381340.0, ans=0.2 2024-09-18 05:50:38,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=381380.0, ans=0.2 2024-09-18 05:50:51,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381380.0, ans=0.125 2024-09-18 05:50:53,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=381420.0, ans=0.5 2024-09-18 05:51:08,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381460.0, ans=0.1 2024-09-18 05:51:14,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=381460.0, ans=0.0 2024-09-18 05:51:18,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.49 vs. limit=10.0 2024-09-18 05:51:24,701 INFO [train.py:1198] (0/2) Epoch 22, batch 350, loss[loss=0.2179, ctc_loss=0.1063, cr_loss=0.3224, attn_decoder_loss=0.2231, over 29312.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1335, cr_loss=0.3764, attn_decoder_loss=0.2492, over 4795628.38 frames. ], batch size: 71, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:51:27,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381500.0, ans=0.1 2024-09-18 05:51:50,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=381540.0, ans=0.125 2024-09-18 05:51:51,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=381540.0, ans=0.125 2024-09-18 05:51:53,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=381580.0, ans=0.125 2024-09-18 05:52:29,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.396e+01 8.841e+01 9.371e+01 8.849e+02, threshold=1.768e+02, percent-clipped=1.0 2024-09-18 05:52:33,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-18 05:52:40,327 INFO [train.py:1198] (0/2) Epoch 22, batch 400, loss[loss=0.2444, ctc_loss=0.1316, cr_loss=0.3775, attn_decoder_loss=0.2486, over 29707.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1334, cr_loss=0.3762, attn_decoder_loss=0.2489, over 5025787.21 frames. ], batch size: 82, lr: 5.11e-03, grad_scale: 16.0 2024-09-18 05:52:57,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.86 vs. limit=15.0 2024-09-18 05:53:08,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381740.0, ans=0.1 2024-09-18 05:53:09,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=381780.0, ans=0.125 2024-09-18 05:53:14,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381780.0, ans=0.125 2024-09-18 05:53:17,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=381780.0, ans=0.0 2024-09-18 05:53:19,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-09-18 05:53:59,069 INFO [train.py:1198] (0/2) Epoch 22, batch 450, loss[loss=0.2563, ctc_loss=0.1371, cr_loss=0.3925, attn_decoder_loss=0.2608, over 29697.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1337, cr_loss=0.3769, attn_decoder_loss=0.2493, over 5188038.91 frames. ], batch size: 83, lr: 5.11e-03, grad_scale: 8.0 2024-09-18 05:54:29,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=381940.0, ans=0.125 2024-09-18 05:54:33,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2024-09-18 05:54:44,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=381980.0, ans=0.125 2024-09-18 05:54:58,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=382020.0, ans=0.125 2024-09-18 05:55:08,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.472e+01 8.899e+01 9.397e+01 1.729e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-18 05:55:14,837 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:55:17,489 INFO [train.py:1198] (0/2) Epoch 22, batch 500, loss[loss=0.2632, ctc_loss=0.1546, cr_loss=0.4179, attn_decoder_loss=0.266, over 29471.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1333, cr_loss=0.3766, attn_decoder_loss=0.2487, over 5329891.18 frames. ], batch size: 94, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:55:30,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=382100.0, ans=0.125 2024-09-18 05:55:47,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2024-09-18 05:55:49,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382180.0, ans=0.1 2024-09-18 05:55:52,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=382180.0, ans=0.1 2024-09-18 05:56:01,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=382220.0, ans=0.0 2024-09-18 05:56:15,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=382220.0, ans=0.0 2024-09-18 05:56:15,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=382220.0, ans=0.125 2024-09-18 05:56:33,391 INFO [train.py:1198] (0/2) Epoch 22, batch 550, loss[loss=0.2651, ctc_loss=0.1454, cr_loss=0.3908, attn_decoder_loss=0.2697, over 28750.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1331, cr_loss=0.3762, attn_decoder_loss=0.2486, over 5423352.79 frames. ], batch size: 104, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:56:56,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=382340.0, ans=0.0 2024-09-18 05:56:58,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=382340.0, ans=0.125 2024-09-18 05:57:39,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382460.0, ans=0.1 2024-09-18 05:57:39,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=382460.0, ans=0.025 2024-09-18 05:57:40,969 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.705e+01 9.082e+01 9.823e+01 4.645e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 05:57:44,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=382460.0, ans=0.125 2024-09-18 05:57:52,558 INFO [train.py:1198] (0/2) Epoch 22, batch 600, loss[loss=0.2575, ctc_loss=0.1498, cr_loss=0.4075, attn_decoder_loss=0.2604, over 29282.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1335, cr_loss=0.3769, attn_decoder_loss=0.2492, over 5510802.67 frames. ], batch size: 100, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:58:41,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382620.0, ans=0.1 2024-09-18 05:58:50,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=382620.0, ans=0.2 2024-09-18 05:59:03,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=382660.0, ans=0.1 2024-09-18 05:59:09,506 INFO [train.py:1198] (0/2) Epoch 22, batch 650, loss[loss=0.2487, ctc_loss=0.1342, cr_loss=0.392, attn_decoder_loss=0.2527, over 29754.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1327, cr_loss=0.3761, attn_decoder_loss=0.2486, over 5587773.30 frames. ], batch size: 81, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 05:59:21,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-18 05:59:34,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=382740.0, ans=0.0 2024-09-18 05:59:58,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=382820.0, ans=0.125 2024-09-18 05:59:59,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=382820.0, ans=0.125 2024-09-18 06:00:15,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 8.434e+01 8.895e+01 9.353e+01 1.142e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 06:00:19,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.31 vs. limit=15.0 2024-09-18 06:00:24,705 INFO [train.py:1198] (0/2) Epoch 22, batch 700, loss[loss=0.2339, ctc_loss=0.1233, cr_loss=0.3688, attn_decoder_loss=0.238, over 29520.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.133, cr_loss=0.377, attn_decoder_loss=0.249, over 5638919.75 frames. ], batch size: 76, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 06:00:32,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=382900.0, ans=0.125 2024-09-18 06:00:43,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.23 vs. limit=22.5 2024-09-18 06:00:44,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382940.0, ans=0.1 2024-09-18 06:00:47,511 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:00:52,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=382940.0, ans=0.05 2024-09-18 06:00:58,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=382980.0, ans=0.125 2024-09-18 06:00:58,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382980.0, ans=0.1 2024-09-18 06:01:01,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=382980.0, ans=0.125 2024-09-18 06:01:11,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-09-18 06:01:40,815 INFO [train.py:1198] (0/2) Epoch 22, batch 750, loss[loss=0.2497, ctc_loss=0.1323, cr_loss=0.3811, attn_decoder_loss=0.2543, over 29691.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1327, cr_loss=0.3762, attn_decoder_loss=0.2484, over 5678131.40 frames. ], batch size: 82, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 06:01:44,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=383100.0, ans=0.125 2024-09-18 06:01:55,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=383100.0, ans=0.125 2024-09-18 06:01:58,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=383140.0, ans=0.1 2024-09-18 06:02:18,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=383180.0, ans=0.125 2024-09-18 06:02:34,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=383220.0, ans=0.0 2024-09-18 06:02:34,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383220.0, ans=0.125 2024-09-18 06:02:38,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383220.0, ans=0.125 2024-09-18 06:02:52,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.642e+01 9.168e+01 9.743e+01 1.816e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 06:03:01,139 INFO [train.py:1198] (0/2) Epoch 22, batch 800, loss[loss=0.2318, ctc_loss=0.1259, cr_loss=0.3564, attn_decoder_loss=0.2356, over 29606.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1328, cr_loss=0.3765, attn_decoder_loss=0.2483, over 5707925.00 frames. ], batch size: 73, lr: 5.10e-03, grad_scale: 16.0 2024-09-18 06:03:19,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=383340.0, ans=0.125 2024-09-18 06:03:21,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383340.0, ans=0.1 2024-09-18 06:03:24,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=383340.0, ans=0.04949747468305833 2024-09-18 06:03:30,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=383380.0, ans=0.025 2024-09-18 06:03:36,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383380.0, ans=0.125 2024-09-18 06:03:52,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=383420.0, ans=0.0 2024-09-18 06:04:12,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383460.0, ans=0.1 2024-09-18 06:04:16,285 INFO [train.py:1198] (0/2) Epoch 22, batch 850, loss[loss=0.2439, ctc_loss=0.1249, cr_loss=0.3494, attn_decoder_loss=0.2493, over 29713.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1326, cr_loss=0.376, attn_decoder_loss=0.248, over 5736615.10 frames. ], batch size: 89, lr: 5.10e-03, grad_scale: 8.0 2024-09-18 06:04:35,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=383540.0, ans=0.125 2024-09-18 06:04:38,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=383540.0, ans=0.125 2024-09-18 06:04:54,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2024-09-18 06:05:24,336 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.714e+01 9.138e+01 9.767e+01 2.023e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 06:05:32,035 INFO [train.py:1198] (0/2) Epoch 22, batch 900, loss[loss=0.2244, ctc_loss=0.118, cr_loss=0.3512, attn_decoder_loss=0.2284, over 29592.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1328, cr_loss=0.3758, attn_decoder_loss=0.248, over 5742101.92 frames. ], batch size: 73, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:05:39,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-18 06:05:53,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-18 06:06:01,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-18 06:06:05,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383780.0, ans=0.1 2024-09-18 06:06:38,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=383860.0, ans=0.125 2024-09-18 06:06:52,150 INFO [train.py:1198] (0/2) Epoch 22, batch 950, loss[loss=0.2231, ctc_loss=0.1143, cr_loss=0.343, attn_decoder_loss=0.2276, over 29509.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1328, cr_loss=0.3755, attn_decoder_loss=0.2483, over 5744501.83 frames. ], batch size: 74, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:07:28,873 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-96000.pt 2024-09-18 06:08:01,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=384060.0, ans=0.125 2024-09-18 06:08:06,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.991e+01 9.492e+01 1.022e+02 3.198e+02, threshold=1.898e+02, percent-clipped=2.0 2024-09-18 06:08:13,999 INFO [train.py:1198] (0/2) Epoch 22, batch 1000, loss[loss=0.2424, ctc_loss=0.1312, cr_loss=0.3629, attn_decoder_loss=0.2467, over 29497.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1338, cr_loss=0.3767, attn_decoder_loss=0.2491, over 5738641.10 frames. ], batch size: 77, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:08:17,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=384100.0, ans=0.125 2024-09-18 06:08:24,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=384100.0, ans=0.0 2024-09-18 06:08:32,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384140.0, ans=0.1 2024-09-18 06:09:29,787 INFO [train.py:1198] (0/2) Epoch 22, batch 1050, loss[loss=0.255, ctc_loss=0.1335, cr_loss=0.3714, attn_decoder_loss=0.2603, over 29684.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1332, cr_loss=0.3757, attn_decoder_loss=0.2484, over 5745869.57 frames. ], batch size: 85, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:09:44,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=384300.0, ans=0.125 2024-09-18 06:09:55,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=384340.0, ans=0.09899494936611666 2024-09-18 06:09:57,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=384340.0, ans=0.0 2024-09-18 06:10:03,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=384380.0, ans=0.0 2024-09-18 06:10:14,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384380.0, ans=0.1 2024-09-18 06:10:17,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=384380.0, ans=0.0 2024-09-18 06:10:19,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=384420.0, ans=0.125 2024-09-18 06:10:42,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.419e+01 8.971e+01 9.530e+01 1.277e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 06:10:50,699 INFO [train.py:1198] (0/2) Epoch 22, batch 1100, loss[loss=0.2396, ctc_loss=0.1321, cr_loss=0.3851, attn_decoder_loss=0.243, over 29446.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1329, cr_loss=0.3748, attn_decoder_loss=0.2482, over 5757292.82 frames. ], batch size: 78, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:11:10,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=384540.0, ans=0.2 2024-09-18 06:11:16,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=384540.0, ans=0.0 2024-09-18 06:11:27,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=384580.0, ans=0.125 2024-09-18 06:11:42,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=384620.0, ans=0.0 2024-09-18 06:11:50,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384660.0, ans=0.1 2024-09-18 06:11:54,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=384660.0, ans=0.125 2024-09-18 06:11:54,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=384660.0, ans=0.2 2024-09-18 06:12:06,664 INFO [train.py:1198] (0/2) Epoch 22, batch 1150, loss[loss=0.238, ctc_loss=0.1294, cr_loss=0.3586, attn_decoder_loss=0.2421, over 29461.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1329, cr_loss=0.3746, attn_decoder_loss=0.2481, over 5757336.96 frames. ], batch size: 78, lr: 5.09e-03, grad_scale: 8.0 2024-09-18 06:12:11,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=384700.0, ans=0.2 2024-09-18 06:12:19,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.21 vs. limit=15.0 2024-09-18 06:12:28,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=384740.0, ans=0.5 2024-09-18 06:12:32,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.33 vs. limit=12.0 2024-09-18 06:12:42,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=384780.0, ans=0.125 2024-09-18 06:12:45,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=384780.0, ans=0.125 2024-09-18 06:12:50,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=384780.0, ans=0.07 2024-09-18 06:13:02,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=384820.0, ans=0.125 2024-09-18 06:13:15,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.605e+01 9.127e+01 9.575e+01 1.863e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-18 06:13:22,817 INFO [train.py:1198] (0/2) Epoch 22, batch 1200, loss[loss=0.261, ctc_loss=0.1455, cr_loss=0.3981, attn_decoder_loss=0.2649, over 29657.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1336, cr_loss=0.3758, attn_decoder_loss=0.2488, over 5749327.97 frames. ], batch size: 85, lr: 5.09e-03, grad_scale: 16.0 2024-09-18 06:13:58,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2024-09-18 06:14:14,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385020.0, ans=0.0 2024-09-18 06:14:21,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=385020.0, ans=0.2 2024-09-18 06:14:42,679 INFO [train.py:1198] (0/2) Epoch 22, batch 1250, loss[loss=0.2564, ctc_loss=0.1361, cr_loss=0.4004, attn_decoder_loss=0.2609, over 29552.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1339, cr_loss=0.3773, attn_decoder_loss=0.2494, over 5775330.58 frames. ], batch size: 92, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:14:48,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2024-09-18 06:14:52,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=385100.0, ans=0.2 2024-09-18 06:14:53,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=385100.0, ans=0.125 2024-09-18 06:15:18,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385180.0, ans=0.1 2024-09-18 06:15:40,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=385220.0, ans=0.0 2024-09-18 06:15:42,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=385260.0, ans=0.125 2024-09-18 06:15:43,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=385260.0, ans=0.125 2024-09-18 06:15:47,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=385260.0, ans=0.125 2024-09-18 06:15:52,594 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.234e+01 8.912e+01 9.418e+01 2.045e+02, threshold=1.782e+02, percent-clipped=1.0 2024-09-18 06:15:58,556 INFO [train.py:1198] (0/2) Epoch 22, batch 1300, loss[loss=0.2524, ctc_loss=0.1344, cr_loss=0.3623, attn_decoder_loss=0.2574, over 28150.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1333, cr_loss=0.3762, attn_decoder_loss=0.2488, over 5780813.43 frames. ], batch size: 111, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:16:00,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=385300.0, ans=0.125 2024-09-18 06:16:38,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=385380.0, ans=0.04949747468305833 2024-09-18 06:16:48,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=385420.0, ans=0.0 2024-09-18 06:17:11,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=385460.0, ans=0.2 2024-09-18 06:17:14,132 INFO [train.py:1198] (0/2) Epoch 22, batch 1350, loss[loss=0.2414, ctc_loss=0.1244, cr_loss=0.3505, attn_decoder_loss=0.2466, over 29749.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1325, cr_loss=0.3746, attn_decoder_loss=0.2482, over 5798098.58 frames. ], batch size: 81, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:17:15,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-18 06:17:15,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=385500.0, ans=0.07 2024-09-18 06:17:16,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-18 06:17:17,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=385500.0, ans=0.125 2024-09-18 06:17:21,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-09-18 06:17:29,465 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:17:35,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=22.5 2024-09-18 06:17:56,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=12.0 2024-09-18 06:17:57,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=385580.0, ans=0.125 2024-09-18 06:18:06,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=385620.0, ans=0.09899494936611666 2024-09-18 06:18:13,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.25 vs. limit=22.5 2024-09-18 06:18:20,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=385660.0, ans=0.125 2024-09-18 06:18:24,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=385660.0, ans=0.1 2024-09-18 06:18:27,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.459e+01 9.043e+01 9.728e+01 1.319e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 06:18:29,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=385660.0, ans=0.125 2024-09-18 06:18:33,610 INFO [train.py:1198] (0/2) Epoch 22, batch 1400, loss[loss=0.2198, ctc_loss=0.1146, cr_loss=0.3512, attn_decoder_loss=0.2237, over 29597.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1325, cr_loss=0.3749, attn_decoder_loss=0.2481, over 5808648.42 frames. ], batch size: 69, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:18:44,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=385700.0, ans=0.0 2024-09-18 06:18:55,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385740.0, ans=0.1 2024-09-18 06:19:04,225 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:19:10,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-18 06:19:22,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=385820.0, ans=0.2 2024-09-18 06:19:22,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=385820.0, ans=0.125 2024-09-18 06:19:23,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-09-18 06:19:25,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385820.0, ans=0.1 2024-09-18 06:19:26,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=385820.0, ans=0.125 2024-09-18 06:19:37,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=385860.0, ans=0.07 2024-09-18 06:19:49,086 INFO [train.py:1198] (0/2) Epoch 22, batch 1450, loss[loss=0.2558, ctc_loss=0.1414, cr_loss=0.3781, attn_decoder_loss=0.2601, over 29435.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1329, cr_loss=0.3752, attn_decoder_loss=0.2487, over 5804554.94 frames. ], batch size: 94, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:19:59,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.88 vs. limit=15.0 2024-09-18 06:20:27,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=385980.0, ans=0.0 2024-09-18 06:20:33,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=386020.0, ans=0.0 2024-09-18 06:20:47,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=386020.0, ans=0.0 2024-09-18 06:20:50,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-18 06:20:51,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=386060.0, ans=0.2 2024-09-18 06:20:58,987 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.476e+01 9.077e+01 9.872e+01 2.572e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-18 06:20:59,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=386060.0, ans=0.0 2024-09-18 06:21:05,105 INFO [train.py:1198] (0/2) Epoch 22, batch 1500, loss[loss=0.2403, ctc_loss=0.1224, cr_loss=0.357, attn_decoder_loss=0.2455, over 29617.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1332, cr_loss=0.3765, attn_decoder_loss=0.2492, over 5805570.34 frames. ], batch size: 86, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:21:41,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=386180.0, ans=0.2 2024-09-18 06:21:56,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=386220.0, ans=6.0 2024-09-18 06:22:04,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=386220.0, ans=0.125 2024-09-18 06:22:05,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=386220.0, ans=0.125 2024-09-18 06:22:17,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386260.0, ans=0.1 2024-09-18 06:22:17,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=22.5 2024-09-18 06:22:25,762 INFO [train.py:1198] (0/2) Epoch 22, batch 1550, loss[loss=0.2508, ctc_loss=0.1458, cr_loss=0.4013, attn_decoder_loss=0.2535, over 29517.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1339, cr_loss=0.3777, attn_decoder_loss=0.2494, over 5781311.36 frames. ], batch size: 90, lr: 5.08e-03, grad_scale: 8.0 2024-09-18 06:22:27,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=386300.0, ans=0.125 2024-09-18 06:22:47,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=386340.0, ans=0.0 2024-09-18 06:23:09,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=386380.0, ans=0.07 2024-09-18 06:23:25,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=386460.0, ans=0.125 2024-09-18 06:23:35,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.667e+01 9.294e+01 9.875e+01 4.781e+02, threshold=1.859e+02, percent-clipped=2.0 2024-09-18 06:23:41,955 INFO [train.py:1198] (0/2) Epoch 22, batch 1600, loss[loss=0.2488, ctc_loss=0.1306, cr_loss=0.3735, attn_decoder_loss=0.2536, over 29672.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1342, cr_loss=0.3777, attn_decoder_loss=0.2493, over 5764196.48 frames. ], batch size: 85, lr: 5.08e-03, grad_scale: 16.0 2024-09-18 06:23:43,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=386500.0, ans=0.05 2024-09-18 06:23:51,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=386500.0, ans=0.125 2024-09-18 06:24:01,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=386540.0, ans=0.0 2024-09-18 06:24:03,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=386540.0, ans=0.1 2024-09-18 06:24:06,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=386540.0, ans=0.07 2024-09-18 06:24:29,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386620.0, ans=0.1 2024-09-18 06:24:32,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386620.0, ans=0.1 2024-09-18 06:24:51,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=386660.0, ans=0.2 2024-09-18 06:24:57,626 INFO [train.py:1198] (0/2) Epoch 22, batch 1650, loss[loss=0.2536, ctc_loss=0.1292, cr_loss=0.3773, attn_decoder_loss=0.259, over 29712.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1337, cr_loss=0.3765, attn_decoder_loss=0.2489, over 5759738.61 frames. ], batch size: 89, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:25:16,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=386740.0, ans=0.125 2024-09-18 06:25:19,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=386740.0, ans=0.025 2024-09-18 06:25:28,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=386780.0, ans=0.125 2024-09-18 06:25:30,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-18 06:25:40,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-18 06:25:46,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-09-18 06:26:12,695 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.418e+01 9.168e+01 9.653e+01 1.530e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-18 06:26:17,120 INFO [train.py:1198] (0/2) Epoch 22, batch 1700, loss[loss=0.2158, ctc_loss=0.1149, cr_loss=0.3439, attn_decoder_loss=0.2194, over 29609.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1333, cr_loss=0.3762, attn_decoder_loss=0.2487, over 5781193.14 frames. ], batch size: 69, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:26:21,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=8.0 2024-09-18 06:26:49,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=386980.0, ans=0.125 2024-09-18 06:27:02,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=387020.0, ans=0.125 2024-09-18 06:27:07,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=387020.0, ans=0.0 2024-09-18 06:27:13,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=387020.0, ans=0.125 2024-09-18 06:27:19,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=387060.0, ans=0.0 2024-09-18 06:27:32,820 INFO [train.py:1198] (0/2) Epoch 22, batch 1750, loss[loss=0.2174, ctc_loss=0.1168, cr_loss=0.3587, attn_decoder_loss=0.2206, over 29319.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1326, cr_loss=0.3753, attn_decoder_loss=0.2481, over 5788432.75 frames. ], batch size: 67, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:28:03,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-18 06:28:05,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=387180.0, ans=0.125 2024-09-18 06:28:05,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=387180.0, ans=0.0 2024-09-18 06:28:19,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=387220.0, ans=0.0 2024-09-18 06:28:24,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=387220.0, ans=15.0 2024-09-18 06:28:31,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387220.0, ans=0.0 2024-09-18 06:28:42,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-18 06:28:44,543 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.338e+01 8.803e+01 9.481e+01 6.567e+02, threshold=1.761e+02, percent-clipped=1.0 2024-09-18 06:28:49,102 INFO [train.py:1198] (0/2) Epoch 22, batch 1800, loss[loss=0.2387, ctc_loss=0.1187, cr_loss=0.3492, attn_decoder_loss=0.2443, over 29675.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1328, cr_loss=0.3756, attn_decoder_loss=0.2484, over 5791347.65 frames. ], batch size: 83, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:28:54,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=387300.0, ans=0.0 2024-09-18 06:29:19,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2024-09-18 06:29:26,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=387380.0, ans=0.125 2024-09-18 06:29:49,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-09-18 06:29:59,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=387460.0, ans=0.125 2024-09-18 06:30:09,420 INFO [train.py:1198] (0/2) Epoch 22, batch 1850, loss[loss=0.2603, ctc_loss=0.1413, cr_loss=0.3846, attn_decoder_loss=0.265, over 29639.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1327, cr_loss=0.3752, attn_decoder_loss=0.2485, over 5798387.37 frames. ], batch size: 86, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:30:14,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387500.0, ans=0.1 2024-09-18 06:30:20,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387500.0, ans=0.1 2024-09-18 06:30:24,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=387540.0, ans=0.0 2024-09-18 06:30:31,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.60 vs. limit=15.0 2024-09-18 06:30:32,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=387540.0, ans=0.5 2024-09-18 06:30:34,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=387540.0, ans=0.2 2024-09-18 06:30:34,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=22.5 2024-09-18 06:30:38,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=387580.0, ans=0.95 2024-09-18 06:30:39,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=387580.0, ans=0.125 2024-09-18 06:30:50,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=387580.0, ans=0.035 2024-09-18 06:31:11,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=387660.0, ans=0.125 2024-09-18 06:31:19,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.84 vs. limit=10.0 2024-09-18 06:31:20,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.630e+01 9.053e+01 9.518e+01 1.576e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-18 06:31:24,769 INFO [train.py:1198] (0/2) Epoch 22, batch 1900, loss[loss=0.2592, ctc_loss=0.1388, cr_loss=0.3918, attn_decoder_loss=0.2638, over 29719.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1332, cr_loss=0.3766, attn_decoder_loss=0.2492, over 5806194.55 frames. ], batch size: 89, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:31:36,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=22.5 2024-09-18 06:31:41,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=387740.0, ans=0.0 2024-09-18 06:31:52,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387740.0, ans=0.1 2024-09-18 06:32:00,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=387780.0, ans=0.0 2024-09-18 06:32:18,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=387820.0, ans=0.125 2024-09-18 06:32:30,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=387860.0, ans=0.125 2024-09-18 06:32:31,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=387860.0, ans=0.0 2024-09-18 06:32:32,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.81 vs. limit=22.5 2024-09-18 06:32:33,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=387860.0, ans=0.025 2024-09-18 06:32:40,942 INFO [train.py:1198] (0/2) Epoch 22, batch 1950, loss[loss=0.234, ctc_loss=0.1244, cr_loss=0.3527, attn_decoder_loss=0.2384, over 29435.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1337, cr_loss=0.3776, attn_decoder_loss=0.2502, over 5820237.69 frames. ], batch size: 78, lr: 5.07e-03, grad_scale: 8.0 2024-09-18 06:32:50,501 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:33:45,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-18 06:33:56,805 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.668e+01 9.187e+01 9.705e+01 3.737e+02, threshold=1.837e+02, percent-clipped=2.0 2024-09-18 06:33:57,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-09-18 06:34:01,454 INFO [train.py:1198] (0/2) Epoch 22, batch 2000, loss[loss=0.2214, ctc_loss=0.1202, cr_loss=0.3654, attn_decoder_loss=0.2246, over 29311.00 frames. ], tot_loss[loss=0.2467, ctc_loss=0.1343, cr_loss=0.3787, attn_decoder_loss=0.2508, over 5796983.45 frames. ], batch size: 67, lr: 5.07e-03, grad_scale: 16.0 2024-09-18 06:34:06,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=388100.0, ans=0.2 2024-09-18 06:34:32,130 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:34:33,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=388180.0, ans=0.2 2024-09-18 06:34:46,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-18 06:34:56,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=388220.0, ans=0.125 2024-09-18 06:35:14,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=388260.0, ans=0.125 2024-09-18 06:35:16,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=388300.0, ans=0.125 2024-09-18 06:35:17,274 INFO [train.py:1198] (0/2) Epoch 22, batch 2050, loss[loss=0.226, ctc_loss=0.1206, cr_loss=0.3611, attn_decoder_loss=0.2297, over 29438.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1335, cr_loss=0.3767, attn_decoder_loss=0.2497, over 5788976.32 frames. ], batch size: 70, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:35:23,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=388300.0, ans=0.1 2024-09-18 06:35:57,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=388380.0, ans=0.0 2024-09-18 06:36:13,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=388420.0, ans=0.125 2024-09-18 06:36:15,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=388420.0, ans=0.125 2024-09-18 06:36:30,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.598e+01 9.133e+01 9.835e+01 1.696e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-18 06:36:33,137 INFO [train.py:1198] (0/2) Epoch 22, batch 2100, loss[loss=0.2327, ctc_loss=0.1225, cr_loss=0.3626, attn_decoder_loss=0.2368, over 29750.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1327, cr_loss=0.3756, attn_decoder_loss=0.2487, over 5800850.84 frames. ], batch size: 81, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:36:44,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=388500.0, ans=0.125 2024-09-18 06:36:55,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=388540.0, ans=15.0 2024-09-18 06:37:05,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=388580.0, ans=0.1 2024-09-18 06:37:31,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=388620.0, ans=0.125 2024-09-18 06:37:52,733 INFO [train.py:1198] (0/2) Epoch 22, batch 2150, loss[loss=0.2394, ctc_loss=0.1236, cr_loss=0.3565, attn_decoder_loss=0.2444, over 29442.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1319, cr_loss=0.3747, attn_decoder_loss=0.2481, over 5815393.06 frames. ], batch size: 78, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:38:02,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=388700.0, ans=0.125 2024-09-18 06:38:05,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=388700.0, ans=0.125 2024-09-18 06:38:23,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=388780.0, ans=0.0 2024-09-18 06:38:32,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=388780.0, ans=0.125 2024-09-18 06:38:34,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=388780.0, ans=0.125 2024-09-18 06:38:40,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2024-09-18 06:38:55,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=388860.0, ans=0.125 2024-09-18 06:39:02,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=388860.0, ans=0.0 2024-09-18 06:39:04,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=388860.0, ans=0.2 2024-09-18 06:39:04,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2024-09-18 06:39:05,560 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.588e+01 8.944e+01 9.592e+01 1.412e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-18 06:39:08,645 INFO [train.py:1198] (0/2) Epoch 22, batch 2200, loss[loss=0.2525, ctc_loss=0.1379, cr_loss=0.4001, attn_decoder_loss=0.2563, over 29628.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1322, cr_loss=0.375, attn_decoder_loss=0.2483, over 5811304.21 frames. ], batch size: 86, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:39:22,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388940.0, ans=0.1 2024-09-18 06:39:25,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=388940.0, ans=0.07 2024-09-18 06:39:28,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=388940.0, ans=0.0 2024-09-18 06:39:47,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2024-09-18 06:40:08,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2024-09-18 06:40:23,970 INFO [train.py:1198] (0/2) Epoch 22, batch 2250, loss[loss=0.2415, ctc_loss=0.1227, cr_loss=0.3459, attn_decoder_loss=0.247, over 29687.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.132, cr_loss=0.374, attn_decoder_loss=0.2481, over 5810993.19 frames. ], batch size: 82, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:40:27,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=389100.0, ans=0.2 2024-09-18 06:40:36,553 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:41:14,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=389220.0, ans=0.125 2024-09-18 06:41:39,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=389260.0, ans=0.2 2024-09-18 06:41:41,017 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.612e+01 9.109e+01 9.746e+01 4.316e+02, threshold=1.822e+02, percent-clipped=5.0 2024-09-18 06:41:44,068 INFO [train.py:1198] (0/2) Epoch 22, batch 2300, loss[loss=0.2127, ctc_loss=0.1095, cr_loss=0.3204, attn_decoder_loss=0.217, over 29332.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1316, cr_loss=0.3735, attn_decoder_loss=0.2472, over 5798665.08 frames. ], batch size: 71, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:41:52,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.30 vs. limit=12.0 2024-09-18 06:42:11,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=389340.0, ans=10.0 2024-09-18 06:42:12,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=389380.0, ans=0.0 2024-09-18 06:42:14,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=389380.0, ans=0.125 2024-09-18 06:42:47,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=389460.0, ans=0.125 2024-09-18 06:42:59,650 INFO [train.py:1198] (0/2) Epoch 22, batch 2350, loss[loss=0.2553, ctc_loss=0.1414, cr_loss=0.3865, attn_decoder_loss=0.2594, over 29675.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1314, cr_loss=0.3734, attn_decoder_loss=0.2474, over 5803853.39 frames. ], batch size: 83, lr: 5.06e-03, grad_scale: 8.0 2024-09-18 06:44:04,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=389660.0, ans=0.125 2024-09-18 06:44:13,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 8.749e+01 9.346e+01 1.024e+02 1.570e+02, threshold=1.869e+02, percent-clipped=0.0 2024-09-18 06:44:16,230 INFO [train.py:1198] (0/2) Epoch 22, batch 2400, loss[loss=0.2342, ctc_loss=0.1256, cr_loss=0.3746, attn_decoder_loss=0.238, over 29527.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1322, cr_loss=0.3744, attn_decoder_loss=0.2482, over 5807491.14 frames. ], batch size: 76, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 06:44:18,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.98 vs. limit=22.5 2024-09-18 06:44:39,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=389740.0, ans=0.125 2024-09-18 06:44:46,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=389740.0, ans=0.125 2024-09-18 06:44:50,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=389780.0, ans=0.1 2024-09-18 06:45:03,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=389820.0, ans=0.1 2024-09-18 06:45:14,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=389820.0, ans=0.2 2024-09-18 06:45:17,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=389860.0, ans=0.125 2024-09-18 06:45:19,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=389860.0, ans=0.125 2024-09-18 06:45:27,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=389860.0, ans=0.2 2024-09-18 06:45:32,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=389860.0, ans=0.0 2024-09-18 06:45:36,554 INFO [train.py:1198] (0/2) Epoch 22, batch 2450, loss[loss=0.2442, ctc_loss=0.1266, cr_loss=0.3622, attn_decoder_loss=0.2492, over 29727.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1329, cr_loss=0.3757, attn_decoder_loss=0.2491, over 5784180.74 frames. ], batch size: 82, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:45:50,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=389940.0, ans=0.125 2024-09-18 06:46:16,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-18 06:46:50,165 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.749e+01 9.206e+01 9.779e+01 5.372e+02, threshold=1.841e+02, percent-clipped=2.0 2024-09-18 06:46:51,739 INFO [train.py:1198] (0/2) Epoch 22, batch 2500, loss[loss=0.2512, ctc_loss=0.1339, cr_loss=0.3699, attn_decoder_loss=0.2561, over 29611.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1332, cr_loss=0.3762, attn_decoder_loss=0.249, over 5793778.15 frames. ], batch size: 86, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:46:54,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-18 06:46:58,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=390100.0, ans=0.125 2024-09-18 06:47:04,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=390100.0, ans=0.025 2024-09-18 06:47:07,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2024-09-18 06:47:13,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.26 vs. limit=15.0 2024-09-18 06:47:28,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=390180.0, ans=0.125 2024-09-18 06:47:49,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390220.0, ans=0.1 2024-09-18 06:48:07,756 INFO [train.py:1198] (0/2) Epoch 22, batch 2550, loss[loss=0.2181, ctc_loss=0.1115, cr_loss=0.3292, attn_decoder_loss=0.2226, over 29388.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1333, cr_loss=0.3764, attn_decoder_loss=0.2494, over 5797056.18 frames. ], batch size: 67, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:48:09,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=390300.0, ans=0.0 2024-09-18 06:48:30,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-18 06:48:32,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=390340.0, ans=0.07 2024-09-18 06:48:40,440 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:48:50,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2024-09-18 06:49:02,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=390420.0, ans=0.0 2024-09-18 06:49:23,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=390460.0, ans=0.125 2024-09-18 06:49:24,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.515e+01 9.035e+01 9.623e+01 2.254e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 06:49:26,249 INFO [train.py:1198] (0/2) Epoch 22, batch 2600, loss[loss=0.236, ctc_loss=0.1329, cr_loss=0.4046, attn_decoder_loss=0.2385, over 29450.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1337, cr_loss=0.377, attn_decoder_loss=0.2499, over 5792976.39 frames. ], batch size: 78, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:49:30,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=390500.0, ans=0.0 2024-09-18 06:49:46,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=390540.0, ans=0.0 2024-09-18 06:49:54,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390540.0, ans=0.1 2024-09-18 06:50:20,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=390620.0, ans=0.0 2024-09-18 06:50:24,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390620.0, ans=0.1 2024-09-18 06:50:43,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-09-18 06:50:43,871 INFO [train.py:1198] (0/2) Epoch 22, batch 2650, loss[loss=0.2547, ctc_loss=0.1385, cr_loss=0.3707, attn_decoder_loss=0.2594, over 29256.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1336, cr_loss=0.377, attn_decoder_loss=0.2499, over 5799180.96 frames. ], batch size: 100, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:50:54,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=390700.0, ans=0.025 2024-09-18 06:51:08,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-18 06:51:18,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=390780.0, ans=0.125 2024-09-18 06:51:32,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=390820.0, ans=0.125 2024-09-18 06:51:40,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=390820.0, ans=0.0 2024-09-18 06:51:41,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=390820.0, ans=0.025 2024-09-18 06:51:51,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=390860.0, ans=0.0 2024-09-18 06:51:51,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=390860.0, ans=0.125 2024-09-18 06:51:57,484 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.432e+01 9.013e+01 9.580e+01 2.667e+02, threshold=1.803e+02, percent-clipped=2.0 2024-09-18 06:51:59,086 INFO [train.py:1198] (0/2) Epoch 22, batch 2700, loss[loss=0.2532, ctc_loss=0.1333, cr_loss=0.3584, attn_decoder_loss=0.2585, over 29536.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1337, cr_loss=0.3774, attn_decoder_loss=0.2502, over 5794354.89 frames. ], batch size: 87, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:52:08,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=390900.0, ans=0.05 2024-09-18 06:52:10,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=390900.0, ans=0.025 2024-09-18 06:52:28,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=390940.0, ans=0.0 2024-09-18 06:52:29,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2024-09-18 06:53:06,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=391060.0, ans=0.0 2024-09-18 06:53:17,094 INFO [train.py:1198] (0/2) Epoch 22, batch 2750, loss[loss=0.2333, ctc_loss=0.1312, cr_loss=0.367, attn_decoder_loss=0.2365, over 29531.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.133, cr_loss=0.3761, attn_decoder_loss=0.2489, over 5793863.10 frames. ], batch size: 75, lr: 5.05e-03, grad_scale: 8.0 2024-09-18 06:53:37,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=391140.0, ans=0.0 2024-09-18 06:53:49,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=391180.0, ans=0.0 2024-09-18 06:53:54,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391180.0, ans=0.1 2024-09-18 06:54:26,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=391260.0, ans=0.1 2024-09-18 06:54:29,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=391260.0, ans=0.125 2024-09-18 06:54:34,213 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.791e+01 9.407e+01 1.009e+02 2.763e+02, threshold=1.881e+02, percent-clipped=2.0 2024-09-18 06:54:35,685 INFO [train.py:1198] (0/2) Epoch 22, batch 2800, loss[loss=0.2779, ctc_loss=0.1767, cr_loss=0.4262, attn_decoder_loss=0.2797, over 20368.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1332, cr_loss=0.3761, attn_decoder_loss=0.249, over 5775664.38 frames. ], batch size: 211, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 06:54:57,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=391340.0, ans=0.1 2024-09-18 06:55:00,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=391340.0, ans=0.0 2024-09-18 06:55:30,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=391420.0, ans=0.125 2024-09-18 06:55:35,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=391460.0, ans=0.07 2024-09-18 06:55:45,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=391460.0, ans=0.0 2024-09-18 06:55:51,525 INFO [train.py:1198] (0/2) Epoch 22, batch 2850, loss[loss=0.2442, ctc_loss=0.1359, cr_loss=0.3949, attn_decoder_loss=0.2474, over 29520.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1333, cr_loss=0.3762, attn_decoder_loss=0.2494, over 5761859.23 frames. ], batch size: 77, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:56:13,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=391540.0, ans=0.125 2024-09-18 06:56:35,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=391580.0, ans=0.0 2024-09-18 06:56:42,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=391620.0, ans=0.0 2024-09-18 06:56:51,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=391620.0, ans=0.125 2024-09-18 06:56:57,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=391660.0, ans=0.0 2024-09-18 06:57:03,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.22 vs. limit=22.5 2024-09-18 06:57:06,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=391660.0, ans=0.125 2024-09-18 06:57:09,049 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.934e+01 9.738e+01 1.096e+02 2.741e+02, threshold=1.948e+02, percent-clipped=1.0 2024-09-18 06:57:09,071 INFO [train.py:1198] (0/2) Epoch 22, batch 2900, loss[loss=0.2354, ctc_loss=0.1238, cr_loss=0.374, attn_decoder_loss=0.2395, over 29427.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1336, cr_loss=0.3776, attn_decoder_loss=0.2502, over 5787904.04 frames. ], batch size: 79, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:57:40,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=391780.0, ans=0.125 2024-09-18 06:57:42,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=391780.0, ans=0.025 2024-09-18 06:57:52,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten.whitening_limit, batch_count=391780.0, ans=15.0 2024-09-18 06:57:55,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=391820.0, ans=0.125 2024-09-18 06:58:22,722 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:58:27,107 INFO [train.py:1198] (0/2) Epoch 22, batch 2950, loss[loss=0.2407, ctc_loss=0.1296, cr_loss=0.3705, attn_decoder_loss=0.2448, over 29546.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1326, cr_loss=0.3755, attn_decoder_loss=0.2487, over 5782191.72 frames. ], batch size: 75, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:58:30,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=391900.0, ans=0.015 2024-09-18 06:58:47,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-09-18 06:59:06,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.49 vs. limit=10.0 2024-09-18 06:59:15,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=392020.0, ans=0.2 2024-09-18 06:59:31,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=392060.0, ans=0.0 2024-09-18 06:59:43,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.512e+01 8.926e+01 9.722e+01 3.359e+02, threshold=1.785e+02, percent-clipped=2.0 2024-09-18 06:59:43,593 INFO [train.py:1198] (0/2) Epoch 22, batch 3000, loss[loss=0.2505, ctc_loss=0.1362, cr_loss=0.4077, attn_decoder_loss=0.2542, over 29769.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1332, cr_loss=0.3767, attn_decoder_loss=0.2491, over 5783346.98 frames. ], batch size: 81, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 06:59:43,594 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 06:59:51,655 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3017, 4.2715, 3.7968, 4.2151], device='cuda:0') 2024-09-18 07:00:03,080 INFO [train.py:1230] (0/2) Epoch 22, validation: loss=0.2118, ctc_loss=0.03901, cr_loss=5.241e-15, attn_decoder_loss=0.231, over 944034.00 frames. 2024-09-18 07:00:03,080 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 07:00:14,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=392100.0, ans=0.125 2024-09-18 07:00:30,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=392140.0, ans=0.125 2024-09-18 07:00:31,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=392140.0, ans=0.0 2024-09-18 07:01:06,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392260.0, ans=0.125 2024-09-18 07:01:21,517 INFO [train.py:1198] (0/2) Epoch 22, batch 3050, loss[loss=0.236, ctc_loss=0.1256, cr_loss=0.3484, attn_decoder_loss=0.2405, over 29537.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1337, cr_loss=0.3773, attn_decoder_loss=0.2497, over 5776725.81 frames. ], batch size: 76, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 07:01:50,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=392380.0, ans=0.0 2024-09-18 07:01:52,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=392380.0, ans=0.2 2024-09-18 07:02:01,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.07 vs. limit=12.0 2024-09-18 07:02:12,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-18 07:02:14,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=392420.0, ans=0.125 2024-09-18 07:02:17,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=392420.0, ans=0.0 2024-09-18 07:02:20,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=392460.0, ans=0.5 2024-09-18 07:02:21,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2024-09-18 07:02:22,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=392460.0, ans=0.0 2024-09-18 07:02:29,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=392460.0, ans=0.125 2024-09-18 07:02:31,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=392460.0, ans=0.2 2024-09-18 07:02:37,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 8.738e+01 9.227e+01 9.918e+01 5.288e+02, threshold=1.845e+02, percent-clipped=2.0 2024-09-18 07:02:37,041 INFO [train.py:1198] (0/2) Epoch 22, batch 3100, loss[loss=0.266, ctc_loss=0.1519, cr_loss=0.4075, attn_decoder_loss=0.2696, over 29299.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1339, cr_loss=0.3775, attn_decoder_loss=0.2495, over 5777160.55 frames. ], batch size: 100, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 07:02:57,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=392540.0, ans=0.05 2024-09-18 07:02:58,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=392540.0, ans=0.125 2024-09-18 07:03:00,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2024-09-18 07:03:10,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=392580.0, ans=0.2 2024-09-18 07:03:46,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=392660.0, ans=0.0 2024-09-18 07:03:55,333 INFO [train.py:1198] (0/2) Epoch 22, batch 3150, loss[loss=0.2581, ctc_loss=0.1336, cr_loss=0.3768, attn_decoder_loss=0.2635, over 28794.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1339, cr_loss=0.3779, attn_decoder_loss=0.2493, over 5783534.18 frames. ], batch size: 104, lr: 5.04e-03, grad_scale: 8.0 2024-09-18 07:04:10,977 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:04:12,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=392740.0, ans=0.125 2024-09-18 07:04:20,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=392740.0, ans=0.2 2024-09-18 07:04:42,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.94 vs. limit=15.0 2024-09-18 07:04:44,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=392820.0, ans=0.025 2024-09-18 07:04:50,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=392820.0, ans=0.09899494936611666 2024-09-18 07:05:09,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=392860.0, ans=0.125 2024-09-18 07:05:13,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.635e+01 9.167e+01 9.821e+01 1.751e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-18 07:05:13,356 INFO [train.py:1198] (0/2) Epoch 22, batch 3200, loss[loss=0.2448, ctc_loss=0.1305, cr_loss=0.3869, attn_decoder_loss=0.2489, over 29430.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1331, cr_loss=0.3763, attn_decoder_loss=0.2485, over 5793742.32 frames. ], batch size: 79, lr: 5.03e-03, grad_scale: 16.0 2024-09-18 07:05:30,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-18 07:05:38,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=392940.0, ans=0.0 2024-09-18 07:05:43,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=392980.0, ans=0.025 2024-09-18 07:05:44,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-09-18 07:05:47,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=392980.0, ans=0.125 2024-09-18 07:05:49,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=392980.0, ans=0.07 2024-09-18 07:06:15,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=393060.0, ans=0.0 2024-09-18 07:06:24,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2024-09-18 07:06:28,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-09-18 07:06:29,127 INFO [train.py:1198] (0/2) Epoch 22, batch 3250, loss[loss=0.2494, ctc_loss=0.1282, cr_loss=0.3808, attn_decoder_loss=0.2544, over 29685.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1332, cr_loss=0.3766, attn_decoder_loss=0.2488, over 5799315.35 frames. ], batch size: 84, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:06:44,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=393140.0, ans=0.0 2024-09-18 07:06:52,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=393140.0, ans=0.0 2024-09-18 07:07:05,635 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:07:07,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=393180.0, ans=0.125 2024-09-18 07:07:17,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=393220.0, ans=0.125 2024-09-18 07:07:17,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=393220.0, ans=0.125 2024-09-18 07:07:26,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2024-09-18 07:07:26,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=393220.0, ans=0.0 2024-09-18 07:07:31,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=393260.0, ans=0.125 2024-09-18 07:07:33,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=393260.0, ans=0.0 2024-09-18 07:07:47,130 INFO [train.py:1198] (0/2) Epoch 22, batch 3300, loss[loss=0.2536, ctc_loss=0.1432, cr_loss=0.3902, attn_decoder_loss=0.2572, over 28543.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1324, cr_loss=0.3748, attn_decoder_loss=0.2474, over 5797414.32 frames. ], batch size: 112, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:07:48,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.576e+01 9.104e+01 9.607e+01 2.025e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 07:07:54,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=393300.0, ans=0.125 2024-09-18 07:07:56,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393300.0, ans=0.1 2024-09-18 07:08:07,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=393340.0, ans=0.125 2024-09-18 07:08:15,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=393380.0, ans=0.125 2024-09-18 07:08:19,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=393380.0, ans=0.125 2024-09-18 07:08:44,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.87 vs. limit=22.5 2024-09-18 07:08:46,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=393460.0, ans=0.125 2024-09-18 07:08:46,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=393460.0, ans=0.125 2024-09-18 07:09:04,459 INFO [train.py:1198] (0/2) Epoch 22, batch 3350, loss[loss=0.2585, ctc_loss=0.1399, cr_loss=0.3875, attn_decoder_loss=0.2631, over 28779.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.133, cr_loss=0.3752, attn_decoder_loss=0.2482, over 5775421.47 frames. ], batch size: 104, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:09:21,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=393540.0, ans=0.05 2024-09-18 07:09:30,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=393540.0, ans=0.0 2024-09-18 07:09:33,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=393580.0, ans=0.09899494936611666 2024-09-18 07:09:40,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2024-09-18 07:09:41,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=393580.0, ans=0.125 2024-09-18 07:09:49,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=393620.0, ans=0.0 2024-09-18 07:09:53,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=393620.0, ans=0.0 2024-09-18 07:10:20,923 INFO [train.py:1198] (0/2) Epoch 22, batch 3400, loss[loss=0.2113, ctc_loss=0.1101, cr_loss=0.3183, attn_decoder_loss=0.2155, over 29342.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.133, cr_loss=0.3748, attn_decoder_loss=0.2481, over 5766593.11 frames. ], batch size: 67, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:10:22,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.673e+01 9.256e+01 9.754e+01 2.312e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-18 07:10:30,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=393700.0, ans=0.125 2024-09-18 07:10:31,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=393700.0, ans=0.1 2024-09-18 07:11:07,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=393820.0, ans=0.035 2024-09-18 07:11:08,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=393820.0, ans=0.0 2024-09-18 07:11:14,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=393820.0, ans=0.2 2024-09-18 07:11:18,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.37 vs. limit=10.0 2024-09-18 07:11:37,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393900.0, ans=0.1 2024-09-18 07:11:38,638 INFO [train.py:1198] (0/2) Epoch 22, batch 3450, loss[loss=0.25, ctc_loss=0.1316, cr_loss=0.3732, attn_decoder_loss=0.2549, over 28321.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.133, cr_loss=0.3748, attn_decoder_loss=0.2484, over 5774615.23 frames. ], batch size: 111, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:12:00,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=393940.0, ans=0.0 2024-09-18 07:12:05,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=22.5 2024-09-18 07:12:08,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.44 vs. limit=15.0 2024-09-18 07:12:17,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=393980.0, ans=0.125 2024-09-18 07:12:36,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=394020.0, ans=0.025 2024-09-18 07:12:56,515 INFO [train.py:1198] (0/2) Epoch 22, batch 3500, loss[loss=0.2222, ctc_loss=0.1188, cr_loss=0.3558, attn_decoder_loss=0.2258, over 29318.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1329, cr_loss=0.3749, attn_decoder_loss=0.2482, over 5777030.80 frames. ], batch size: 71, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:12:58,047 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.509e+01 8.992e+01 9.710e+01 6.035e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-18 07:12:58,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394100.0, ans=0.1 2024-09-18 07:13:04,582 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:13:07,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=394100.0, ans=0.125 2024-09-18 07:13:31,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=394180.0, ans=0.2 2024-09-18 07:13:53,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=394220.0, ans=0.0 2024-09-18 07:13:55,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394260.0, ans=0.125 2024-09-18 07:14:11,193 INFO [train.py:1198] (0/2) Epoch 22, batch 3550, loss[loss=0.2501, ctc_loss=0.1277, cr_loss=0.3708, attn_decoder_loss=0.2555, over 29703.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1323, cr_loss=0.3743, attn_decoder_loss=0.2481, over 5782829.52 frames. ], batch size: 89, lr: 5.03e-03, grad_scale: 8.0 2024-09-18 07:14:12,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=394300.0, ans=0.125 2024-09-18 07:14:18,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394300.0, ans=0.1 2024-09-18 07:14:39,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=394380.0, ans=0.125 2024-09-18 07:14:48,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.34 vs. limit=22.5 2024-09-18 07:14:59,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=394420.0, ans=0.0 2024-09-18 07:15:11,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=394460.0, ans=0.0 2024-09-18 07:15:24,735 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:15:25,910 INFO [train.py:1198] (0/2) Epoch 22, batch 3600, loss[loss=0.2359, ctc_loss=0.1305, cr_loss=0.3794, attn_decoder_loss=0.2392, over 29501.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1323, cr_loss=0.3747, attn_decoder_loss=0.2481, over 5791977.58 frames. ], batch size: 77, lr: 5.02e-03, grad_scale: 16.0 2024-09-18 07:15:27,409 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.441e+01 8.945e+01 9.412e+01 1.487e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-18 07:16:08,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=394580.0, ans=0.0 2024-09-18 07:16:40,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=394700.0, ans=0.125 2024-09-18 07:16:42,262 INFO [train.py:1198] (0/2) Epoch 22, batch 3650, loss[loss=0.2608, ctc_loss=0.1449, cr_loss=0.4032, attn_decoder_loss=0.2647, over 29522.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1322, cr_loss=0.3745, attn_decoder_loss=0.2478, over 5793274.24 frames. ], batch size: 90, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:17:39,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2024-09-18 07:17:40,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=394860.0, ans=0.125 2024-09-18 07:17:46,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=394860.0, ans=0.0 2024-09-18 07:17:56,477 INFO [train.py:1198] (0/2) Epoch 22, batch 3700, loss[loss=0.2552, ctc_loss=0.1397, cr_loss=0.3929, attn_decoder_loss=0.2593, over 29714.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1316, cr_loss=0.3742, attn_decoder_loss=0.2476, over 5803053.40 frames. ], batch size: 84, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:17:59,516 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.466e+01 8.986e+01 9.824e+01 1.367e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-18 07:18:10,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=394940.0, ans=0.025 2024-09-18 07:18:11,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=394940.0, ans=0.125 2024-09-18 07:18:22,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=394940.0, ans=0.1 2024-09-18 07:18:22,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=394940.0, ans=0.2 2024-09-18 07:18:35,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=394980.0, ans=0.2 2024-09-18 07:19:12,625 INFO [train.py:1198] (0/2) Epoch 22, batch 3750, loss[loss=0.2206, ctc_loss=0.1252, cr_loss=0.3682, attn_decoder_loss=0.223, over 29339.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1322, cr_loss=0.3752, attn_decoder_loss=0.2478, over 5806760.96 frames. ], batch size: 67, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:19:26,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395140.0, ans=0.125 2024-09-18 07:19:54,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=395180.0, ans=0.125 2024-09-18 07:19:56,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=395220.0, ans=0.1 2024-09-18 07:20:05,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=395220.0, ans=0.2 2024-09-18 07:20:07,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=395220.0, ans=0.125 2024-09-18 07:20:08,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=395220.0, ans=0.025 2024-09-18 07:20:19,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=395260.0, ans=0.125 2024-09-18 07:20:20,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=395260.0, ans=0.05 2024-09-18 07:20:26,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=395300.0, ans=0.125 2024-09-18 07:20:27,749 INFO [train.py:1198] (0/2) Epoch 22, batch 3800, loss[loss=0.2571, ctc_loss=0.1359, cr_loss=0.3944, attn_decoder_loss=0.2618, over 29636.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1316, cr_loss=0.3739, attn_decoder_loss=0.2472, over 5797279.93 frames. ], batch size: 86, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:20:30,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.441e+01 9.008e+01 9.541e+01 1.561e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-18 07:20:59,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=395380.0, ans=10.0 2024-09-18 07:20:59,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=395380.0, ans=0.0 2024-09-18 07:21:18,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=395420.0, ans=0.0 2024-09-18 07:21:25,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395460.0, ans=0.1 2024-09-18 07:21:41,882 INFO [train.py:1198] (0/2) Epoch 22, batch 3850, loss[loss=0.2642, ctc_loss=0.1515, cr_loss=0.4257, attn_decoder_loss=0.2673, over 29305.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1316, cr_loss=0.3741, attn_decoder_loss=0.2475, over 5812245.84 frames. ], batch size: 100, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:21:49,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=395500.0, ans=0.015 2024-09-18 07:21:55,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2024-09-18 07:21:58,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=395540.0, ans=0.0 2024-09-18 07:22:27,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-09-18 07:22:49,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=395660.0, ans=0.125 2024-09-18 07:22:57,687 INFO [train.py:1198] (0/2) Epoch 22, batch 3900, loss[loss=0.2478, ctc_loss=0.1298, cr_loss=0.3667, attn_decoder_loss=0.2527, over 29615.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1318, cr_loss=0.3743, attn_decoder_loss=0.2479, over 5816353.60 frames. ], batch size: 86, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:23:00,754 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.669e+01 9.089e+01 9.620e+01 1.531e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 07:23:02,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=395700.0, ans=0.025 2024-09-18 07:23:14,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=395740.0, ans=0.025 2024-09-18 07:23:39,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=395780.0, ans=0.07 2024-09-18 07:23:39,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=395780.0, ans=0.125 2024-09-18 07:23:57,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.42 vs. limit=15.0 2024-09-18 07:24:04,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=395860.0, ans=0.09899494936611666 2024-09-18 07:24:11,541 INFO [train.py:1198] (0/2) Epoch 22, batch 3950, loss[loss=0.2579, ctc_loss=0.1431, cr_loss=0.3862, attn_decoder_loss=0.262, over 29497.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1316, cr_loss=0.3748, attn_decoder_loss=0.248, over 5835824.65 frames. ], batch size: 97, lr: 5.02e-03, grad_scale: 8.0 2024-09-18 07:24:15,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=395900.0, ans=0.125 2024-09-18 07:25:01,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396020.0, ans=0.1 2024-09-18 07:25:02,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=396020.0, ans=0.125 2024-09-18 07:25:06,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=396020.0, ans=0.035 2024-09-18 07:25:15,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=396060.0, ans=0.125 2024-09-18 07:25:17,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=396060.0, ans=0.2 2024-09-18 07:25:27,277 INFO [train.py:1198] (0/2) Epoch 22, batch 4000, loss[loss=0.2289, ctc_loss=0.1226, cr_loss=0.3475, attn_decoder_loss=0.233, over 29553.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1324, cr_loss=0.3757, attn_decoder_loss=0.2483, over 5813026.98 frames. ], batch size: 74, lr: 5.01e-03, grad_scale: 16.0 2024-09-18 07:25:30,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.472e+01 8.530e+01 8.952e+01 9.583e+01 2.635e+02, threshold=1.790e+02, percent-clipped=1.0 2024-09-18 07:25:46,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=396140.0, ans=0.05 2024-09-18 07:25:58,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=396180.0, ans=0.1 2024-09-18 07:26:06,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=396180.0, ans=0.0 2024-09-18 07:26:10,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=396220.0, ans=0.125 2024-09-18 07:26:20,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=396220.0, ans=0.125 2024-09-18 07:26:23,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-18 07:26:29,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-18 07:26:32,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=396260.0, ans=0.2 2024-09-18 07:26:34,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=396260.0, ans=0.125 2024-09-18 07:26:41,541 INFO [train.py:1198] (0/2) Epoch 22, batch 4050, loss[loss=0.2746, ctc_loss=0.1727, cr_loss=0.4072, attn_decoder_loss=0.2769, over 19797.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1321, cr_loss=0.3747, attn_decoder_loss=0.2481, over 5796783.07 frames. ], batch size: 209, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:26:51,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=396300.0, ans=0.125 2024-09-18 07:26:52,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=396300.0, ans=0.125 2024-09-18 07:27:10,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=396380.0, ans=0.125 2024-09-18 07:27:35,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=396420.0, ans=0.0 2024-09-18 07:27:46,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396460.0, ans=0.1 2024-09-18 07:27:56,314 INFO [train.py:1198] (0/2) Epoch 22, batch 4100, loss[loss=0.2613, ctc_loss=0.1447, cr_loss=0.3919, attn_decoder_loss=0.2656, over 29469.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1326, cr_loss=0.3756, attn_decoder_loss=0.2484, over 5792163.96 frames. ], batch size: 90, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:28:00,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.697e+01 9.214e+01 1.008e+02 3.653e+02, threshold=1.843e+02, percent-clipped=2.0 2024-09-18 07:28:03,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=396500.0, ans=0.0 2024-09-18 07:28:04,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=396500.0, ans=0.0 2024-09-18 07:28:34,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=396580.0, ans=0.09899494936611666 2024-09-18 07:28:39,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=396620.0, ans=0.125 2024-09-18 07:28:42,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-09-18 07:28:45,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2024-09-18 07:28:53,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-09-18 07:29:11,101 INFO [train.py:1198] (0/2) Epoch 22, batch 4150, loss[loss=0.2399, ctc_loss=0.1331, cr_loss=0.3785, attn_decoder_loss=0.2434, over 29501.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1323, cr_loss=0.3754, attn_decoder_loss=0.2479, over 5797913.39 frames. ], batch size: 77, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:29:11,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=396700.0, ans=0.1 2024-09-18 07:29:15,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=396700.0, ans=0.5 2024-09-18 07:29:32,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=396740.0, ans=15.0 2024-09-18 07:29:42,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396780.0, ans=0.1 2024-09-18 07:29:52,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-18 07:30:25,223 INFO [train.py:1198] (0/2) Epoch 22, batch 4200, loss[loss=0.2596, ctc_loss=0.1416, cr_loss=0.3861, attn_decoder_loss=0.2642, over 29523.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1326, cr_loss=0.376, attn_decoder_loss=0.2484, over 5799342.44 frames. ], batch size: 90, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:30:29,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.447e+01 9.085e+01 9.593e+01 1.747e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-18 07:30:31,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=396900.0, ans=0.125 2024-09-18 07:30:43,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=396940.0, ans=0.5 2024-09-18 07:31:07,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-18 07:31:32,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=397060.0, ans=0.0 2024-09-18 07:31:39,886 INFO [train.py:1198] (0/2) Epoch 22, batch 4250, loss[loss=0.2233, ctc_loss=0.1108, cr_loss=0.3368, attn_decoder_loss=0.2283, over 29485.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1322, cr_loss=0.3754, attn_decoder_loss=0.2485, over 5805171.56 frames. ], batch size: 74, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:31:48,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=397100.0, ans=0.125 2024-09-18 07:31:57,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=397140.0, ans=0.0 2024-09-18 07:32:21,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=397180.0, ans=0.0 2024-09-18 07:32:22,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=397220.0, ans=0.2 2024-09-18 07:32:53,988 INFO [train.py:1198] (0/2) Epoch 22, batch 4300, loss[loss=0.2593, ctc_loss=0.1448, cr_loss=0.3904, attn_decoder_loss=0.2634, over 29560.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1318, cr_loss=0.3742, attn_decoder_loss=0.2485, over 5794848.07 frames. ], batch size: 87, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:32:58,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.737e+01 9.479e+01 1.036e+02 1.602e+02, threshold=1.896e+02, percent-clipped=0.0 2024-09-18 07:32:59,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.14 vs. limit=22.5 2024-09-18 07:33:03,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=397300.0, ans=0.025 2024-09-18 07:33:09,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=397340.0, ans=0.0 2024-09-18 07:33:17,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=397340.0, ans=0.0 2024-09-18 07:33:45,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=397420.0, ans=0.0 2024-09-18 07:33:47,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=397420.0, ans=0.0 2024-09-18 07:33:57,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.92 vs. limit=22.5 2024-09-18 07:33:58,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=397460.0, ans=0.0 2024-09-18 07:34:03,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=397460.0, ans=0.0 2024-09-18 07:34:07,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=397500.0, ans=0.125 2024-09-18 07:34:08,149 INFO [train.py:1198] (0/2) Epoch 22, batch 4350, loss[loss=0.2643, ctc_loss=0.147, cr_loss=0.4032, attn_decoder_loss=0.2684, over 29424.00 frames. ], tot_loss[loss=0.2478, ctc_loss=0.1346, cr_loss=0.3798, attn_decoder_loss=0.2519, over 5797519.36 frames. ], batch size: 97, lr: 5.01e-03, grad_scale: 8.0 2024-09-18 07:34:11,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=397500.0, ans=0.2 2024-09-18 07:34:17,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=397500.0, ans=0.0 2024-09-18 07:34:26,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.51 vs. limit=15.0 2024-09-18 07:34:33,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=397540.0, ans=0.125 2024-09-18 07:34:50,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=22.5 2024-09-18 07:35:03,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-18 07:35:21,398 INFO [train.py:1198] (0/2) Epoch 22, batch 4400, loss[loss=0.2665, ctc_loss=0.1553, cr_loss=0.4323, attn_decoder_loss=0.2693, over 27482.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1367, cr_loss=0.384, attn_decoder_loss=0.2544, over 5768049.45 frames. ], batch size: 124, lr: 5.00e-03, grad_scale: 16.0 2024-09-18 07:35:25,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.295e+01 9.019e+01 9.432e+01 1.021e+02 4.096e+02, threshold=1.886e+02, percent-clipped=2.0 2024-09-18 07:35:35,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2024-09-18 07:35:48,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=397740.0, ans=0.015 2024-09-18 07:36:04,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=397820.0, ans=0.0 2024-09-18 07:36:07,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=397820.0, ans=0.0 2024-09-18 07:36:18,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=397820.0, ans=0.125 2024-09-18 07:36:36,342 INFO [train.py:1198] (0/2) Epoch 22, batch 4450, loss[loss=0.2671, ctc_loss=0.1668, cr_loss=0.3972, attn_decoder_loss=0.2695, over 20594.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.1409, cr_loss=0.389, attn_decoder_loss=0.2567, over 5572868.78 frames. ], batch size: 209, lr: 5.00e-03, grad_scale: 8.0 2024-09-18 07:36:36,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=397900.0, ans=0.125 2024-09-18 07:36:41,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=397900.0, ans=0.0 2024-09-18 07:36:45,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=397900.0, ans=0.025 2024-09-18 07:36:51,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=397940.0, ans=0.025 2024-09-18 07:37:19,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=397980.0, ans=0.125 2024-09-18 07:37:20,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=398020.0, ans=0.2 2024-09-18 07:37:44,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=398060.0, ans=0.0 2024-09-18 07:37:51,783 INFO [train.py:1198] (0/2) Epoch 22, batch 4500, loss[loss=0.2693, ctc_loss=0.1663, cr_loss=0.4054, attn_decoder_loss=0.2717, over 19576.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1453, cr_loss=0.3912, attn_decoder_loss=0.259, over 5233750.18 frames. ], batch size: 210, lr: 5.00e-03, grad_scale: 8.0 2024-09-18 07:37:53,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=398100.0, ans=0.125 2024-09-18 07:37:57,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.580e+01 1.014e+02 1.103e+02 1.223e+02 2.065e+02, threshold=2.205e+02, percent-clipped=1.0 2024-09-18 07:37:59,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=398100.0, ans=0.02 2024-09-18 07:38:01,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.69 vs. limit=22.5 2024-09-18 07:38:08,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-09-18 07:38:10,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=8.0 2024-09-18 07:38:20,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=398180.0, ans=0.125 2024-09-18 07:38:29,076 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-22.pt 2024-09-18 07:39:14,512 INFO [train.py:1198] (0/2) Epoch 23, batch 0, loss[loss=0.225, ctc_loss=0.1159, cr_loss=0.3369, attn_decoder_loss=0.2296, over 29596.00 frames. ], tot_loss[loss=0.225, ctc_loss=0.1159, cr_loss=0.3369, attn_decoder_loss=0.2296, over 29596.00 frames. ], batch size: 73, lr: 4.89e-03, grad_scale: 16.0 2024-09-18 07:39:14,513 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 07:39:33,043 INFO [train.py:1230] (0/2) Epoch 23, validation: loss=0.212, ctc_loss=0.03823, cr_loss=5.578e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 07:39:33,044 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 07:39:39,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-09-18 07:39:58,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398240.0, ans=0.1 2024-09-18 07:40:27,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=398320.0, ans=0.0 2024-09-18 07:40:32,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=398360.0, ans=0.125 2024-09-18 07:40:49,048 INFO [train.py:1198] (0/2) Epoch 23, batch 50, loss[loss=0.2233, ctc_loss=0.1242, cr_loss=0.3615, attn_decoder_loss=0.2262, over 29432.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1355, cr_loss=0.3819, attn_decoder_loss=0.25, over 1266805.41 frames. ], batch size: 70, lr: 4.89e-03, grad_scale: 8.0 2024-09-18 07:41:05,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:41:38,768 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.809e+01 9.782e+01 1.101e+02 2.337e+02, threshold=1.956e+02, percent-clipped=1.0 2024-09-18 07:41:52,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2024-09-18 07:41:54,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=398560.0, ans=0.2 2024-09-18 07:41:57,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=398560.0, ans=0.125 2024-09-18 07:42:06,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=398560.0, ans=0.0 2024-09-18 07:42:08,982 INFO [train.py:1198] (0/2) Epoch 23, batch 100, loss[loss=0.2427, ctc_loss=0.1377, cr_loss=0.3851, attn_decoder_loss=0.2458, over 29526.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1366, cr_loss=0.3824, attn_decoder_loss=0.2515, over 2252552.34 frames. ], batch size: 76, lr: 4.89e-03, grad_scale: 8.0 2024-09-18 07:42:36,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=398640.0, ans=0.125 2024-09-18 07:43:08,046 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-09-18 07:43:19,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=398760.0, ans=0.2 2024-09-18 07:43:22,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=398800.0, ans=0.0 2024-09-18 07:43:23,648 INFO [train.py:1198] (0/2) Epoch 23, batch 150, loss[loss=0.2189, ctc_loss=0.1141, cr_loss=0.3509, attn_decoder_loss=0.2228, over 29417.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1336, cr_loss=0.3773, attn_decoder_loss=0.2492, over 3047785.53 frames. ], batch size: 70, lr: 4.89e-03, grad_scale: 8.0 2024-09-18 07:43:23,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=398800.0, ans=0.0 2024-09-18 07:43:34,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=398800.0, ans=0.0 2024-09-18 07:43:46,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=398840.0, ans=0.125 2024-09-18 07:43:56,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.88 vs. limit=22.5 2024-09-18 07:44:08,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-09-18 07:44:08,890 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.443e+01 9.031e+01 9.523e+01 1.308e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-18 07:44:12,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=398920.0, ans=0.1 2024-09-18 07:44:21,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=398920.0, ans=0.0 2024-09-18 07:44:31,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-18 07:44:38,861 INFO [train.py:1198] (0/2) Epoch 23, batch 200, loss[loss=0.2604, ctc_loss=0.1421, cr_loss=0.3946, attn_decoder_loss=0.2648, over 27267.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1323, cr_loss=0.3762, attn_decoder_loss=0.2482, over 3658162.74 frames. ], batch size: 125, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:45:05,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=399040.0, ans=10.0 2024-09-18 07:45:07,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=399040.0, ans=0.125 2024-09-18 07:45:18,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=399080.0, ans=0.125 2024-09-18 07:45:27,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399120.0, ans=0.1 2024-09-18 07:45:43,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2024-09-18 07:45:45,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=399160.0, ans=0.2 2024-09-18 07:45:59,853 INFO [train.py:1198] (0/2) Epoch 23, batch 250, loss[loss=0.2441, ctc_loss=0.1306, cr_loss=0.3807, attn_decoder_loss=0.2483, over 29216.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.132, cr_loss=0.3759, attn_decoder_loss=0.2478, over 4140112.02 frames. ], batch size: 100, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:46:01,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=399200.0, ans=0.5 2024-09-18 07:46:03,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=399200.0, ans=0.125 2024-09-18 07:46:05,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2024-09-18 07:46:18,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=399240.0, ans=0.125 2024-09-18 07:46:21,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=399240.0, ans=0.0 2024-09-18 07:46:22,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399240.0, ans=0.1 2024-09-18 07:46:32,521 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.28 vs. limit=15.0 2024-09-18 07:46:45,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.537e+01 9.009e+01 9.547e+01 2.225e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 07:46:56,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399320.0, ans=0.1 2024-09-18 07:47:03,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=399360.0, ans=0.0 2024-09-18 07:47:15,459 INFO [train.py:1198] (0/2) Epoch 23, batch 300, loss[loss=0.2628, ctc_loss=0.1405, cr_loss=0.4001, attn_decoder_loss=0.2674, over 29520.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1313, cr_loss=0.3743, attn_decoder_loss=0.2475, over 4509692.85 frames. ], batch size: 92, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:47:23,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=399400.0, ans=0.125 2024-09-18 07:47:25,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2024-09-18 07:47:46,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=399480.0, ans=0.2 2024-09-18 07:47:46,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=399480.0, ans=0.2 2024-09-18 07:48:29,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=399600.0, ans=0.025 2024-09-18 07:48:31,060 INFO [train.py:1198] (0/2) Epoch 23, batch 350, loss[loss=0.2212, ctc_loss=0.1132, cr_loss=0.3324, attn_decoder_loss=0.2258, over 29328.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.132, cr_loss=0.375, attn_decoder_loss=0.2482, over 4794541.37 frames. ], batch size: 71, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:48:37,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=399600.0, ans=0.025 2024-09-18 07:49:01,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-09-18 07:49:19,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=399720.0, ans=0.1 2024-09-18 07:49:20,802 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.416e+01 8.727e+01 9.232e+01 2.116e+02, threshold=1.745e+02, percent-clipped=2.0 2024-09-18 07:49:22,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399720.0, ans=0.125 2024-09-18 07:49:27,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=399720.0, ans=0.125 2024-09-18 07:49:28,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=399720.0, ans=0.09899494936611666 2024-09-18 07:49:43,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=399760.0, ans=0.125 2024-09-18 07:49:48,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=399760.0, ans=0.0 2024-09-18 07:49:50,841 INFO [train.py:1198] (0/2) Epoch 23, batch 400, loss[loss=0.2498, ctc_loss=0.1312, cr_loss=0.3826, attn_decoder_loss=0.2544, over 29696.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.131, cr_loss=0.3733, attn_decoder_loss=0.2475, over 5024319.81 frames. ], batch size: 82, lr: 4.88e-03, grad_scale: 16.0 2024-09-18 07:49:52,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399800.0, ans=0.1 2024-09-18 07:50:57,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 2024-09-18 07:51:05,994 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-100000.pt 2024-09-18 07:51:14,608 INFO [train.py:1198] (0/2) Epoch 23, batch 450, loss[loss=0.2568, ctc_loss=0.1334, cr_loss=0.3993, attn_decoder_loss=0.2616, over 29690.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1314, cr_loss=0.3741, attn_decoder_loss=0.2481, over 5187304.74 frames. ], batch size: 83, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:51:17,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=400000.0, ans=0.125 2024-09-18 07:51:17,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=400000.0, ans=0.125 2024-09-18 07:51:34,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=400040.0, ans=0.0 2024-09-18 07:51:58,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=400120.0, ans=0.125 2024-09-18 07:52:01,483 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.450e+01 8.997e+01 9.501e+01 2.678e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-18 07:52:21,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400160.0, ans=0.1 2024-09-18 07:52:30,172 INFO [train.py:1198] (0/2) Epoch 23, batch 500, loss[loss=0.2565, ctc_loss=0.1384, cr_loss=0.3835, attn_decoder_loss=0.2611, over 29461.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.131, cr_loss=0.3736, attn_decoder_loss=0.2473, over 5330331.38 frames. ], batch size: 94, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:52:45,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=400240.0, ans=0.125 2024-09-18 07:52:55,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=400240.0, ans=0.125 2024-09-18 07:52:57,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2024-09-18 07:53:19,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2024-09-18 07:53:25,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=400320.0, ans=0.125 2024-09-18 07:53:44,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=400360.0, ans=0.2 2024-09-18 07:53:50,362 INFO [train.py:1198] (0/2) Epoch 23, batch 550, loss[loss=0.257, ctc_loss=0.1387, cr_loss=0.4059, attn_decoder_loss=0.2612, over 28763.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.131, cr_loss=0.3735, attn_decoder_loss=0.2473, over 5422777.38 frames. ], batch size: 104, lr: 4.88e-03, grad_scale: 8.0 2024-09-18 07:53:58,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=400400.0, ans=0.0 2024-09-18 07:54:05,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=400440.0, ans=0.125 2024-09-18 07:54:10,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=400440.0, ans=0.0 2024-09-18 07:54:37,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.525e+01 9.043e+01 9.907e+01 2.945e+02, threshold=1.809e+02, percent-clipped=3.0 2024-09-18 07:55:05,779 INFO [train.py:1198] (0/2) Epoch 23, batch 600, loss[loss=0.2589, ctc_loss=0.141, cr_loss=0.3726, attn_decoder_loss=0.2637, over 29269.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1311, cr_loss=0.3742, attn_decoder_loss=0.2476, over 5510368.62 frames. ], batch size: 100, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:55:31,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=400640.0, ans=0.125 2024-09-18 07:55:34,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400680.0, ans=0.1 2024-09-18 07:55:36,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=400680.0, ans=0.0 2024-09-18 07:55:36,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=400680.0, ans=0.125 2024-09-18 07:55:39,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=400680.0, ans=0.5 2024-09-18 07:56:21,501 INFO [train.py:1198] (0/2) Epoch 23, batch 650, loss[loss=0.2479, ctc_loss=0.1275, cr_loss=0.3694, attn_decoder_loss=0.253, over 29778.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1304, cr_loss=0.373, attn_decoder_loss=0.247, over 5587446.78 frames. ], batch size: 81, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:56:32,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=400800.0, ans=0.125 2024-09-18 07:56:37,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-09-18 07:56:47,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=400840.0, ans=0.0 2024-09-18 07:56:48,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-09-18 07:57:07,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400880.0, ans=0.1 2024-09-18 07:57:10,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-09-18 07:57:12,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-18 07:57:12,930 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.571e+01 9.065e+01 9.710e+01 2.691e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-18 07:57:30,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.39 vs. limit=15.0 2024-09-18 07:57:41,626 INFO [train.py:1198] (0/2) Epoch 23, batch 700, loss[loss=0.2354, ctc_loss=0.1244, cr_loss=0.3698, attn_decoder_loss=0.2395, over 29518.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1307, cr_loss=0.3734, attn_decoder_loss=0.2475, over 5637807.99 frames. ], batch size: 76, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:57:55,454 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:57:58,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=401040.0, ans=0.125 2024-09-18 07:58:27,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=401120.0, ans=0.07 2024-09-18 07:58:33,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401120.0, ans=0.0 2024-09-18 07:58:35,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401120.0, ans=0.125 2024-09-18 07:58:36,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=401120.0, ans=0.125 2024-09-18 07:58:47,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.99 vs. limit=15.0 2024-09-18 07:58:52,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=401160.0, ans=10.0 2024-09-18 07:58:54,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=401160.0, ans=0.125 2024-09-18 07:58:57,478 INFO [train.py:1198] (0/2) Epoch 23, batch 750, loss[loss=0.2552, ctc_loss=0.1385, cr_loss=0.3796, attn_decoder_loss=0.2598, over 29723.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1307, cr_loss=0.3729, attn_decoder_loss=0.2472, over 5676771.75 frames. ], batch size: 82, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 07:59:05,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=401200.0, ans=0.125 2024-09-18 07:59:05,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.52 vs. limit=22.5 2024-09-18 07:59:06,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=401200.0, ans=0.0 2024-09-18 07:59:35,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=401280.0, ans=0.0 2024-09-18 07:59:43,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=401320.0, ans=0.125 2024-09-18 07:59:44,084 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.454e+01 8.911e+01 9.640e+01 3.418e+02, threshold=1.782e+02, percent-clipped=1.0 2024-09-18 08:00:04,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=401360.0, ans=0.125 2024-09-18 08:00:12,898 INFO [train.py:1198] (0/2) Epoch 23, batch 800, loss[loss=0.2255, ctc_loss=0.1183, cr_loss=0.3594, attn_decoder_loss=0.2295, over 29639.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1308, cr_loss=0.3734, attn_decoder_loss=0.2472, over 5705433.12 frames. ], batch size: 73, lr: 4.87e-03, grad_scale: 16.0 2024-09-18 08:00:17,675 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:00:21,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.45 vs. limit=10.0 2024-09-18 08:01:09,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=401520.0, ans=0.07 2024-09-18 08:01:15,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-09-18 08:01:18,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=401560.0, ans=0.0 2024-09-18 08:01:30,985 INFO [train.py:1198] (0/2) Epoch 23, batch 850, loss[loss=0.2436, ctc_loss=0.1227, cr_loss=0.3555, attn_decoder_loss=0.2492, over 29720.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1303, cr_loss=0.3725, attn_decoder_loss=0.2469, over 5734237.32 frames. ], batch size: 89, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 08:01:32,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=401600.0, ans=0.125 2024-09-18 08:01:34,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401600.0, ans=0.125 2024-09-18 08:01:41,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=401600.0, ans=0.1 2024-09-18 08:01:56,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=401640.0, ans=0.125 2024-09-18 08:02:18,871 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.350e+01 8.947e+01 9.398e+01 1.136e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-18 08:02:31,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=401760.0, ans=0.0 2024-09-18 08:02:40,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=401760.0, ans=0.0 2024-09-18 08:02:46,367 INFO [train.py:1198] (0/2) Epoch 23, batch 900, loss[loss=0.2228, ctc_loss=0.1146, cr_loss=0.3316, attn_decoder_loss=0.2274, over 29598.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1312, cr_loss=0.3742, attn_decoder_loss=0.2474, over 5739863.91 frames. ], batch size: 73, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 08:02:48,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=401800.0, ans=0.5 2024-09-18 08:03:31,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=401920.0, ans=0.125 2024-09-18 08:03:37,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=401920.0, ans=0.2 2024-09-18 08:03:39,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401920.0, ans=0.125 2024-09-18 08:03:46,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=401960.0, ans=0.125 2024-09-18 08:04:01,327 INFO [train.py:1198] (0/2) Epoch 23, batch 950, loss[loss=0.2313, ctc_loss=0.1181, cr_loss=0.334, attn_decoder_loss=0.2365, over 29513.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1314, cr_loss=0.3737, attn_decoder_loss=0.2476, over 5741611.14 frames. ], batch size: 74, lr: 4.87e-03, grad_scale: 8.0 2024-09-18 08:04:19,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=402040.0, ans=0.2 2024-09-18 08:04:30,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=402080.0, ans=0.1 2024-09-18 08:04:33,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.25 vs. limit=15.0 2024-09-18 08:04:48,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=402080.0, ans=0.2 2024-09-18 08:04:54,188 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.814e+01 9.447e+01 1.062e+02 2.466e+02, threshold=1.889e+02, percent-clipped=1.0 2024-09-18 08:05:00,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=402120.0, ans=0.2 2024-09-18 08:05:09,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=402160.0, ans=0.125 2024-09-18 08:05:21,287 INFO [train.py:1198] (0/2) Epoch 23, batch 1000, loss[loss=0.235, ctc_loss=0.1299, cr_loss=0.3715, attn_decoder_loss=0.2385, over 29490.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1323, cr_loss=0.3755, attn_decoder_loss=0.2485, over 5735108.81 frames. ], batch size: 77, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:05:23,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=402200.0, ans=0.125 2024-09-18 08:05:43,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=402240.0, ans=0.0 2024-09-18 08:05:50,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=402280.0, ans=0.125 2024-09-18 08:06:06,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=402320.0, ans=0.125 2024-09-18 08:06:06,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-09-18 08:06:31,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=402360.0, ans=0.025 2024-09-18 08:06:31,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-09-18 08:06:33,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=402360.0, ans=0.125 2024-09-18 08:06:37,691 INFO [train.py:1198] (0/2) Epoch 23, batch 1050, loss[loss=0.2476, ctc_loss=0.1306, cr_loss=0.3837, attn_decoder_loss=0.2521, over 29663.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1317, cr_loss=0.3747, attn_decoder_loss=0.2478, over 5743650.84 frames. ], batch size: 85, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:06:51,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=402440.0, ans=0.2 2024-09-18 08:07:19,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402480.0, ans=0.1 2024-09-18 08:07:19,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=402480.0, ans=0.025 2024-09-18 08:07:26,535 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 8.303e+01 8.731e+01 9.470e+01 1.420e+02, threshold=1.746e+02, percent-clipped=0.0 2024-09-18 08:07:49,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-09-18 08:07:54,156 INFO [train.py:1198] (0/2) Epoch 23, batch 1100, loss[loss=0.2407, ctc_loss=0.1329, cr_loss=0.3728, attn_decoder_loss=0.2443, over 29462.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1312, cr_loss=0.3737, attn_decoder_loss=0.2474, over 5756094.58 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:08:01,854 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:08:13,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.51 vs. limit=22.5 2024-09-18 08:08:27,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-18 08:08:32,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=402680.0, ans=0.2 2024-09-18 08:08:56,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=402720.0, ans=0.0 2024-09-18 08:09:11,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=402760.0, ans=0.125 2024-09-18 08:09:14,480 INFO [train.py:1198] (0/2) Epoch 23, batch 1150, loss[loss=0.2311, ctc_loss=0.1164, cr_loss=0.3555, attn_decoder_loss=0.236, over 29471.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1313, cr_loss=0.3737, attn_decoder_loss=0.2474, over 5755449.67 frames. ], batch size: 78, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:09:21,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-09-18 08:09:40,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=402840.0, ans=0.125 2024-09-18 08:10:03,216 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.564e+01 9.109e+01 9.682e+01 1.953e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 08:10:03,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=402920.0, ans=0.0 2024-09-18 08:10:17,202 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:10:30,401 INFO [train.py:1198] (0/2) Epoch 23, batch 1200, loss[loss=0.2643, ctc_loss=0.145, cr_loss=0.4138, attn_decoder_loss=0.2683, over 29683.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1318, cr_loss=0.3749, attn_decoder_loss=0.2483, over 5747716.77 frames. ], batch size: 85, lr: 4.86e-03, grad_scale: 16.0 2024-09-18 08:10:44,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=403040.0, ans=0.0 2024-09-18 08:10:51,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=403040.0, ans=0.125 2024-09-18 08:11:02,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=403080.0, ans=0.125 2024-09-18 08:11:16,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=403120.0, ans=0.125 2024-09-18 08:11:20,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=403120.0, ans=0.025 2024-09-18 08:11:23,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 08:11:33,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.47 vs. limit=10.0 2024-09-18 08:11:43,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=403160.0, ans=0.1 2024-09-18 08:11:46,535 INFO [train.py:1198] (0/2) Epoch 23, batch 1250, loss[loss=0.2563, ctc_loss=0.1429, cr_loss=0.3926, attn_decoder_loss=0.2601, over 29523.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1325, cr_loss=0.3766, attn_decoder_loss=0.2489, over 5775177.51 frames. ], batch size: 92, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:12:11,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=403240.0, ans=0.125 2024-09-18 08:12:24,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.96 vs. limit=15.0 2024-09-18 08:12:35,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-09-18 08:12:39,079 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.243e+01 8.772e+01 9.696e+01 1.858e+02, threshold=1.754e+02, percent-clipped=1.0 2024-09-18 08:12:53,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=403360.0, ans=0.09899494936611666 2024-09-18 08:13:06,960 INFO [train.py:1198] (0/2) Epoch 23, batch 1300, loss[loss=0.2516, ctc_loss=0.1297, cr_loss=0.3569, attn_decoder_loss=0.2572, over 28160.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.132, cr_loss=0.3757, attn_decoder_loss=0.2482, over 5778215.38 frames. ], batch size: 111, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:13:14,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=403400.0, ans=0.0 2024-09-18 08:13:16,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=403400.0, ans=0.125 2024-09-18 08:13:32,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2024-09-18 08:13:36,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=403480.0, ans=0.125 2024-09-18 08:13:43,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403480.0, ans=0.1 2024-09-18 08:13:48,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=403480.0, ans=0.125 2024-09-18 08:13:51,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=403520.0, ans=0.1 2024-09-18 08:13:56,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-18 08:13:59,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=12.0 2024-09-18 08:14:22,782 INFO [train.py:1198] (0/2) Epoch 23, batch 1350, loss[loss=0.2387, ctc_loss=0.1159, cr_loss=0.3508, attn_decoder_loss=0.2446, over 29736.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1316, cr_loss=0.3753, attn_decoder_loss=0.2481, over 5796398.52 frames. ], batch size: 81, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:14:26,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-09-18 08:14:55,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=403680.0, ans=0.2 2024-09-18 08:15:04,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=403680.0, ans=0.1 2024-09-18 08:15:12,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.370e+01 8.788e+01 9.254e+01 1.206e+02, threshold=1.758e+02, percent-clipped=0.0 2024-09-18 08:15:13,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=403720.0, ans=0.125 2024-09-18 08:15:14,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=403720.0, ans=0.1 2024-09-18 08:15:15,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=403720.0, ans=0.2 2024-09-18 08:15:37,899 INFO [train.py:1198] (0/2) Epoch 23, batch 1400, loss[loss=0.2094, ctc_loss=0.1077, cr_loss=0.3285, attn_decoder_loss=0.2134, over 29575.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1314, cr_loss=0.3748, attn_decoder_loss=0.2481, over 5807181.51 frames. ], batch size: 69, lr: 4.86e-03, grad_scale: 8.0 2024-09-18 08:15:38,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=403800.0, ans=0.0 2024-09-18 08:15:45,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=403800.0, ans=0.125 2024-09-18 08:15:51,777 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:16:52,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.38 vs. limit=22.5 2024-09-18 08:16:58,245 INFO [train.py:1198] (0/2) Epoch 23, batch 1450, loss[loss=0.2496, ctc_loss=0.1291, cr_loss=0.3717, attn_decoder_loss=0.2547, over 29417.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1315, cr_loss=0.3742, attn_decoder_loss=0.2483, over 5802430.50 frames. ], batch size: 94, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:17:25,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=404040.0, ans=0.125 2024-09-18 08:17:47,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.721e+01 9.213e+01 9.736e+01 2.438e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-18 08:18:13,670 INFO [train.py:1198] (0/2) Epoch 23, batch 1500, loss[loss=0.2465, ctc_loss=0.133, cr_loss=0.3858, attn_decoder_loss=0.2505, over 29626.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1318, cr_loss=0.3749, attn_decoder_loss=0.2486, over 5802638.98 frames. ], batch size: 86, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:18:17,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=404200.0, ans=0.0 2024-09-18 08:18:52,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=404280.0, ans=0.125 2024-09-18 08:19:29,876 INFO [train.py:1198] (0/2) Epoch 23, batch 1550, loss[loss=0.2626, ctc_loss=0.1461, cr_loss=0.417, attn_decoder_loss=0.2663, over 29491.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1325, cr_loss=0.376, attn_decoder_loss=0.2487, over 5778428.34 frames. ], batch size: 90, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:19:33,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404400.0, ans=0.1 2024-09-18 08:19:54,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404440.0, ans=0.1 2024-09-18 08:20:22,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.646e+01 9.169e+01 9.938e+01 3.341e+02, threshold=1.834e+02, percent-clipped=2.0 2024-09-18 08:20:41,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=404560.0, ans=0.125 2024-09-18 08:20:50,182 INFO [train.py:1198] (0/2) Epoch 23, batch 1600, loss[loss=0.2484, ctc_loss=0.1263, cr_loss=0.3638, attn_decoder_loss=0.2538, over 29677.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1322, cr_loss=0.3754, attn_decoder_loss=0.2484, over 5761868.24 frames. ], batch size: 85, lr: 4.85e-03, grad_scale: 16.0 2024-09-18 08:21:04,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=404640.0, ans=0.0 2024-09-18 08:21:28,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=404680.0, ans=0.0 2024-09-18 08:21:34,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=404720.0, ans=0.0 2024-09-18 08:21:57,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=404760.0, ans=0.2 2024-09-18 08:22:02,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2024-09-18 08:22:06,223 INFO [train.py:1198] (0/2) Epoch 23, batch 1650, loss[loss=0.2581, ctc_loss=0.1343, cr_loss=0.3922, attn_decoder_loss=0.2631, over 29723.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1319, cr_loss=0.3751, attn_decoder_loss=0.2482, over 5756361.75 frames. ], batch size: 89, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:22:30,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=404840.0, ans=0.07 2024-09-18 08:22:30,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=404840.0, ans=0.1 2024-09-18 08:22:32,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=404840.0, ans=0.025 2024-09-18 08:22:37,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=404880.0, ans=0.0 2024-09-18 08:22:53,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-18 08:22:58,234 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.577e+01 9.100e+01 9.886e+01 2.579e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 08:22:58,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=404920.0, ans=0.125 2024-09-18 08:23:07,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=404960.0, ans=0.125 2024-09-18 08:23:22,335 INFO [train.py:1198] (0/2) Epoch 23, batch 1700, loss[loss=0.2004, ctc_loss=0.09551, cr_loss=0.306, attn_decoder_loss=0.2053, over 29584.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1315, cr_loss=0.3749, attn_decoder_loss=0.248, over 5778634.77 frames. ], batch size: 69, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:23:37,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=405040.0, ans=0.125 2024-09-18 08:23:39,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=405040.0, ans=0.125 2024-09-18 08:23:40,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=405040.0, ans=0.0 2024-09-18 08:23:47,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=405040.0, ans=0.125 2024-09-18 08:23:52,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=405080.0, ans=0.025 2024-09-18 08:23:54,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405080.0, ans=0.1 2024-09-18 08:24:25,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=405160.0, ans=0.1 2024-09-18 08:24:31,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=405160.0, ans=0.0 2024-09-18 08:24:42,501 INFO [train.py:1198] (0/2) Epoch 23, batch 1750, loss[loss=0.2138, ctc_loss=0.1119, cr_loss=0.3477, attn_decoder_loss=0.2174, over 29329.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1312, cr_loss=0.3743, attn_decoder_loss=0.2478, over 5788260.60 frames. ], batch size: 67, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:25:18,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.76 vs. limit=10.0 2024-09-18 08:25:33,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.508e+01 9.190e+01 9.615e+01 2.377e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-18 08:25:57,522 INFO [train.py:1198] (0/2) Epoch 23, batch 1800, loss[loss=0.2455, ctc_loss=0.1352, cr_loss=0.3994, attn_decoder_loss=0.2489, over 29690.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1317, cr_loss=0.3748, attn_decoder_loss=0.2482, over 5790778.10 frames. ], batch size: 83, lr: 4.85e-03, grad_scale: 8.0 2024-09-18 08:26:14,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=405440.0, ans=0.0 2024-09-18 08:26:18,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=405440.0, ans=0.125 2024-09-18 08:26:59,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=405560.0, ans=15.0 2024-09-18 08:27:13,787 INFO [train.py:1198] (0/2) Epoch 23, batch 1850, loss[loss=0.259, ctc_loss=0.1395, cr_loss=0.3881, attn_decoder_loss=0.2636, over 29635.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1317, cr_loss=0.3748, attn_decoder_loss=0.2483, over 5795255.43 frames. ], batch size: 86, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:27:23,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.26 vs. limit=15.0 2024-09-18 08:27:24,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=405600.0, ans=0.025 2024-09-18 08:27:29,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=405640.0, ans=0.07 2024-09-18 08:27:32,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=405640.0, ans=0.125 2024-09-18 08:27:39,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=405640.0, ans=0.125 2024-09-18 08:27:45,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.25 vs. limit=15.0 2024-09-18 08:28:02,237 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:28:07,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.499e+01 9.020e+01 9.564e+01 1.401e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-18 08:28:11,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=405720.0, ans=0.125 2024-09-18 08:28:14,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=405720.0, ans=0.025 2024-09-18 08:28:23,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=405760.0, ans=0.0 2024-09-18 08:28:31,649 INFO [train.py:1198] (0/2) Epoch 23, batch 1900, loss[loss=0.2555, ctc_loss=0.1372, cr_loss=0.3868, attn_decoder_loss=0.26, over 29707.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1319, cr_loss=0.3752, attn_decoder_loss=0.2489, over 5803028.32 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:28:40,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=405800.0, ans=0.125 2024-09-18 08:29:17,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=10.0 2024-09-18 08:29:20,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=405920.0, ans=0.125 2024-09-18 08:29:22,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-09-18 08:29:33,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=405960.0, ans=0.0 2024-09-18 08:29:38,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=22.5 2024-09-18 08:29:50,195 INFO [train.py:1198] (0/2) Epoch 23, batch 1950, loss[loss=0.2382, ctc_loss=0.1384, cr_loss=0.389, attn_decoder_loss=0.2407, over 29471.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1328, cr_loss=0.3768, attn_decoder_loss=0.25, over 5817734.88 frames. ], batch size: 78, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:29:58,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=406000.0, ans=0.0 2024-09-18 08:29:59,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=406000.0, ans=0.2 2024-09-18 08:30:05,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=406040.0, ans=0.2 2024-09-18 08:30:29,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=406080.0, ans=0.125 2024-09-18 08:30:34,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=406120.0, ans=0.0 2024-09-18 08:30:39,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=406120.0, ans=0.025 2024-09-18 08:30:41,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.634e+01 9.173e+01 9.833e+01 1.215e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-18 08:31:05,558 INFO [train.py:1198] (0/2) Epoch 23, batch 2000, loss[loss=0.2204, ctc_loss=0.1203, cr_loss=0.3797, attn_decoder_loss=0.2231, over 29328.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1327, cr_loss=0.3765, attn_decoder_loss=0.25, over 5797292.88 frames. ], batch size: 67, lr: 4.84e-03, grad_scale: 16.0 2024-09-18 08:31:13,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=406200.0, ans=0.125 2024-09-18 08:31:19,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=406240.0, ans=0.0 2024-09-18 08:31:22,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=406240.0, ans=0.125 2024-09-18 08:31:25,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=406240.0, ans=0.2 2024-09-18 08:31:48,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=406280.0, ans=0.0 2024-09-18 08:31:59,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=406320.0, ans=0.125 2024-09-18 08:32:20,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=406360.0, ans=0.125 2024-09-18 08:32:20,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=406360.0, ans=0.125 2024-09-18 08:32:23,716 INFO [train.py:1198] (0/2) Epoch 23, batch 2050, loss[loss=0.2091, ctc_loss=0.1098, cr_loss=0.3292, attn_decoder_loss=0.2128, over 29435.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1321, cr_loss=0.3749, attn_decoder_loss=0.2488, over 5788082.02 frames. ], batch size: 70, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:32:49,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.37 vs. limit=15.0 2024-09-18 08:33:05,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=406480.0, ans=0.125 2024-09-18 08:33:13,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=406520.0, ans=0.0 2024-09-18 08:33:18,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.483e+01 9.027e+01 9.590e+01 1.679e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-18 08:33:22,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=406520.0, ans=0.0 2024-09-18 08:33:25,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=406560.0, ans=0.125 2024-09-18 08:33:37,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=406560.0, ans=0.125 2024-09-18 08:33:37,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-18 08:33:41,476 INFO [train.py:1198] (0/2) Epoch 23, batch 2100, loss[loss=0.2487, ctc_loss=0.1306, cr_loss=0.3807, attn_decoder_loss=0.2534, over 29744.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1315, cr_loss=0.3742, attn_decoder_loss=0.2483, over 5798905.85 frames. ], batch size: 81, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:34:04,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=406640.0, ans=0.1 2024-09-18 08:34:11,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=406680.0, ans=0.125 2024-09-18 08:34:17,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=406680.0, ans=0.0 2024-09-18 08:34:35,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=406720.0, ans=0.0 2024-09-18 08:34:43,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=406760.0, ans=0.07 2024-09-18 08:34:50,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-18 08:34:56,596 INFO [train.py:1198] (0/2) Epoch 23, batch 2150, loss[loss=0.2444, ctc_loss=0.1329, cr_loss=0.3763, attn_decoder_loss=0.2484, over 29453.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1307, cr_loss=0.3729, attn_decoder_loss=0.2476, over 5813727.82 frames. ], batch size: 78, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:34:56,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406800.0, ans=0.1 2024-09-18 08:35:04,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=406800.0, ans=0.0 2024-09-18 08:35:05,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=406800.0, ans=0.5 2024-09-18 08:35:12,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=406840.0, ans=0.07 2024-09-18 08:35:28,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=406880.0, ans=0.125 2024-09-18 08:35:44,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=406920.0, ans=0.0 2024-09-18 08:35:51,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.476e+01 8.832e+01 9.481e+01 1.697e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-18 08:35:55,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.80 vs. limit=10.0 2024-09-18 08:36:14,434 INFO [train.py:1198] (0/2) Epoch 23, batch 2200, loss[loss=0.2534, ctc_loss=0.1296, cr_loss=0.3764, attn_decoder_loss=0.2588, over 29628.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1311, cr_loss=0.3738, attn_decoder_loss=0.2477, over 5810070.72 frames. ], batch size: 86, lr: 4.84e-03, grad_scale: 8.0 2024-09-18 08:37:06,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=407120.0, ans=0.125 2024-09-18 08:37:09,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=407120.0, ans=0.0 2024-09-18 08:37:11,575 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:37:32,957 INFO [train.py:1198] (0/2) Epoch 23, batch 2250, loss[loss=0.2511, ctc_loss=0.1276, cr_loss=0.3606, attn_decoder_loss=0.2569, over 29705.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1307, cr_loss=0.3731, attn_decoder_loss=0.2476, over 5810142.32 frames. ], batch size: 82, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:37:39,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=407200.0, ans=0.0 2024-09-18 08:37:40,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2024-09-18 08:37:51,604 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:38:03,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5 2024-09-18 08:38:09,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-18 08:38:25,782 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.566e+01 9.041e+01 9.811e+01 1.660e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-18 08:38:46,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-09-18 08:38:48,510 INFO [train.py:1198] (0/2) Epoch 23, batch 2300, loss[loss=0.2163, ctc_loss=0.1051, cr_loss=0.34, attn_decoder_loss=0.2211, over 29304.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1302, cr_loss=0.3716, attn_decoder_loss=0.2466, over 5797904.98 frames. ], batch size: 71, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:38:50,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=407400.0, ans=0.125 2024-09-18 08:39:03,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=407440.0, ans=0.0 2024-09-18 08:39:08,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=407440.0, ans=0.125 2024-09-18 08:39:09,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=407440.0, ans=0.125 2024-09-18 08:39:11,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=407440.0, ans=0.09899494936611666 2024-09-18 08:39:11,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=407440.0, ans=0.2 2024-09-18 08:39:18,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=407480.0, ans=0.0 2024-09-18 08:39:25,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.94 vs. limit=22.5 2024-09-18 08:39:29,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=407480.0, ans=0.125 2024-09-18 08:39:37,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=407520.0, ans=0.025 2024-09-18 08:39:39,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=407520.0, ans=0.0 2024-09-18 08:39:45,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=407520.0, ans=0.2 2024-09-18 08:40:03,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407560.0, ans=0.1 2024-09-18 08:40:06,190 INFO [train.py:1198] (0/2) Epoch 23, batch 2350, loss[loss=0.2538, ctc_loss=0.1381, cr_loss=0.3932, attn_decoder_loss=0.2579, over 29685.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1306, cr_loss=0.3727, attn_decoder_loss=0.2468, over 5803846.35 frames. ], batch size: 83, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:40:35,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2024-09-18 08:41:01,418 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.649e+01 9.243e+01 9.923e+01 8.680e+02, threshold=1.849e+02, percent-clipped=2.0 2024-09-18 08:41:08,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2024-09-18 08:41:15,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407760.0, ans=0.0 2024-09-18 08:41:24,574 INFO [train.py:1198] (0/2) Epoch 23, batch 2400, loss[loss=0.2454, ctc_loss=0.1349, cr_loss=0.3834, attn_decoder_loss=0.2492, over 29548.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.131, cr_loss=0.3737, attn_decoder_loss=0.2475, over 5807232.08 frames. ], batch size: 76, lr: 4.83e-03, grad_scale: 16.0 2024-09-18 08:41:30,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=407800.0, ans=0.125 2024-09-18 08:41:39,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407840.0, ans=0.1 2024-09-18 08:41:45,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=407840.0, ans=0.0 2024-09-18 08:41:45,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=407840.0, ans=0.2 2024-09-18 08:42:05,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=407880.0, ans=0.0 2024-09-18 08:42:08,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=407920.0, ans=0.2 2024-09-18 08:42:28,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=407960.0, ans=0.025 2024-09-18 08:42:40,652 INFO [train.py:1198] (0/2) Epoch 23, batch 2450, loss[loss=0.2441, ctc_loss=0.1295, cr_loss=0.3825, attn_decoder_loss=0.2483, over 29701.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1323, cr_loss=0.376, attn_decoder_loss=0.2487, over 5784250.71 frames. ], batch size: 82, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:42:41,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=408000.0, ans=0.1 2024-09-18 08:42:51,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=408000.0, ans=0.0 2024-09-18 08:42:55,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=408040.0, ans=0.125 2024-09-18 08:43:03,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=408040.0, ans=0.0 2024-09-18 08:43:11,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408080.0, ans=0.1 2024-09-18 08:43:12,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2024-09-18 08:43:37,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.992e+01 9.709e+01 1.062e+02 3.982e+02, threshold=1.942e+02, percent-clipped=1.0 2024-09-18 08:43:52,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=408160.0, ans=0.0 2024-09-18 08:43:58,461 INFO [train.py:1198] (0/2) Epoch 23, batch 2500, loss[loss=0.2537, ctc_loss=0.1339, cr_loss=0.3849, attn_decoder_loss=0.2585, over 29632.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1319, cr_loss=0.3756, attn_decoder_loss=0.2485, over 5794697.06 frames. ], batch size: 86, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:45:06,988 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.43 vs. limit=15.0 2024-09-18 08:45:12,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2024-09-18 08:45:16,708 INFO [train.py:1198] (0/2) Epoch 23, batch 2550, loss[loss=0.2104, ctc_loss=0.108, cr_loss=0.314, attn_decoder_loss=0.2148, over 29312.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1316, cr_loss=0.3753, attn_decoder_loss=0.2482, over 5798278.41 frames. ], batch size: 67, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:45:26,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=408400.0, ans=0.125 2024-09-18 08:45:43,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-09-18 08:46:11,117 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.424e+01 8.872e+01 9.650e+01 4.846e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-18 08:46:19,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=408560.0, ans=0.2 2024-09-18 08:46:22,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=408560.0, ans=0.05 2024-09-18 08:46:32,480 INFO [train.py:1198] (0/2) Epoch 23, batch 2600, loss[loss=0.2318, ctc_loss=0.1254, cr_loss=0.3634, attn_decoder_loss=0.2355, over 29443.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1316, cr_loss=0.3748, attn_decoder_loss=0.2485, over 5794908.46 frames. ], batch size: 78, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:46:32,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408600.0, ans=0.1 2024-09-18 08:46:46,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408640.0, ans=0.1 2024-09-18 08:46:59,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=408640.0, ans=0.1 2024-09-18 08:47:16,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=408680.0, ans=0.0 2024-09-18 08:47:40,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=408760.0, ans=0.125 2024-09-18 08:47:43,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=408760.0, ans=0.05 2024-09-18 08:47:45,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=408760.0, ans=0.125 2024-09-18 08:47:50,070 INFO [train.py:1198] (0/2) Epoch 23, batch 2650, loss[loss=0.2595, ctc_loss=0.1456, cr_loss=0.415, attn_decoder_loss=0.2629, over 29249.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1313, cr_loss=0.3749, attn_decoder_loss=0.2486, over 5801011.13 frames. ], batch size: 100, lr: 4.83e-03, grad_scale: 8.0 2024-09-18 08:47:58,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=408800.0, ans=0.025 2024-09-18 08:48:02,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=408800.0, ans=0.125 2024-09-18 08:48:19,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=408880.0, ans=0.09899494936611666 2024-09-18 08:48:31,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408880.0, ans=0.1 2024-09-18 08:48:32,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-18 08:48:37,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408920.0, ans=0.1 2024-09-18 08:48:46,441 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.322e+01 8.846e+01 9.392e+01 1.397e+02, threshold=1.769e+02, percent-clipped=0.0 2024-09-18 08:48:48,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=408920.0, ans=0.2 2024-09-18 08:48:59,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=408960.0, ans=0.125 2024-09-18 08:49:07,646 INFO [train.py:1198] (0/2) Epoch 23, batch 2700, loss[loss=0.2411, ctc_loss=0.1223, cr_loss=0.3727, attn_decoder_loss=0.2461, over 29511.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1317, cr_loss=0.3755, attn_decoder_loss=0.2489, over 5796099.06 frames. ], batch size: 87, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:49:07,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=409000.0, ans=0.125 2024-09-18 08:49:10,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=409000.0, ans=0.125 2024-09-18 08:49:16,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=409000.0, ans=0.035 2024-09-18 08:49:25,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2024-09-18 08:49:53,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=409120.0, ans=0.0 2024-09-18 08:50:16,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2024-09-18 08:50:23,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.18 vs. limit=22.5 2024-09-18 08:50:23,524 INFO [train.py:1198] (0/2) Epoch 23, batch 2750, loss[loss=0.2388, ctc_loss=0.1321, cr_loss=0.3847, attn_decoder_loss=0.2421, over 29506.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1305, cr_loss=0.3731, attn_decoder_loss=0.2476, over 5793806.72 frames. ], batch size: 75, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:50:31,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=409200.0, ans=0.2 2024-09-18 08:50:37,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=409240.0, ans=0.0 2024-09-18 08:51:00,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=12.0 2024-09-18 08:51:19,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=409320.0, ans=0.2 2024-09-18 08:51:20,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.421e+01 8.872e+01 9.349e+01 6.581e+02, threshold=1.774e+02, percent-clipped=3.0 2024-09-18 08:51:37,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=409360.0, ans=0.0 2024-09-18 08:51:41,683 INFO [train.py:1198] (0/2) Epoch 23, batch 2800, loss[loss=0.2619, ctc_loss=0.163, cr_loss=0.3741, attn_decoder_loss=0.2645, over 20507.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1314, cr_loss=0.3748, attn_decoder_loss=0.2483, over 5775033.13 frames. ], batch size: 210, lr: 4.82e-03, grad_scale: 16.0 2024-09-18 08:52:00,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=409440.0, ans=0.0 2024-09-18 08:52:03,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=409440.0, ans=0.125 2024-09-18 08:52:06,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409440.0, ans=0.1 2024-09-18 08:52:10,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=409480.0, ans=0.0 2024-09-18 08:52:13,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=409480.0, ans=0.0 2024-09-18 08:52:59,484 INFO [train.py:1198] (0/2) Epoch 23, batch 2850, loss[loss=0.2241, ctc_loss=0.1112, cr_loss=0.3334, attn_decoder_loss=0.2293, over 29543.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.132, cr_loss=0.3756, attn_decoder_loss=0.2487, over 5759859.22 frames. ], batch size: 77, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:53:02,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409600.0, ans=0.1 2024-09-18 08:53:19,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=409640.0, ans=0.125 2024-09-18 08:53:21,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=409640.0, ans=0.1 2024-09-18 08:53:30,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=409680.0, ans=0.125 2024-09-18 08:53:43,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-18 08:53:51,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=409720.0, ans=0.125 2024-09-18 08:53:55,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.592e+01 9.017e+01 9.666e+01 1.557e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 08:54:15,015 INFO [train.py:1198] (0/2) Epoch 23, batch 2900, loss[loss=0.2418, ctc_loss=0.129, cr_loss=0.3758, attn_decoder_loss=0.2459, over 29410.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1322, cr_loss=0.3762, attn_decoder_loss=0.2495, over 5786269.68 frames. ], batch size: 79, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:54:58,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.30 vs. limit=12.0 2024-09-18 08:55:09,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=409920.0, ans=0.125 2024-09-18 08:55:15,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=409920.0, ans=0.0 2024-09-18 08:55:33,242 INFO [train.py:1198] (0/2) Epoch 23, batch 2950, loss[loss=0.2413, ctc_loss=0.1334, cr_loss=0.3929, attn_decoder_loss=0.2445, over 29532.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1314, cr_loss=0.3748, attn_decoder_loss=0.2481, over 5781818.95 frames. ], batch size: 75, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:55:56,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=410040.0, ans=0.0 2024-09-18 08:56:25,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=410120.0, ans=0.04949747468305833 2024-09-18 08:56:27,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=410120.0, ans=0.2 2024-09-18 08:56:31,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.552e+01 9.299e+01 9.927e+01 2.795e+02, threshold=1.860e+02, percent-clipped=1.0 2024-09-18 08:56:38,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=410160.0, ans=0.1 2024-09-18 08:56:51,633 INFO [train.py:1198] (0/2) Epoch 23, batch 3000, loss[loss=0.2494, ctc_loss=0.1384, cr_loss=0.387, attn_decoder_loss=0.2532, over 29751.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1315, cr_loss=0.3753, attn_decoder_loss=0.2483, over 5783184.32 frames. ], batch size: 81, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:56:51,634 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 08:57:10,057 INFO [train.py:1230] (0/2) Epoch 23, validation: loss=0.2116, ctc_loss=0.03932, cr_loss=5.516e-15, attn_decoder_loss=0.2308, over 944034.00 frames. 2024-09-18 08:57:10,058 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 08:57:17,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=410200.0, ans=0.0 2024-09-18 08:57:53,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2024-09-18 08:58:03,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=410320.0, ans=0.95 2024-09-18 08:58:25,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=410400.0, ans=0.04949747468305833 2024-09-18 08:58:26,566 INFO [train.py:1198] (0/2) Epoch 23, batch 3050, loss[loss=0.2436, ctc_loss=0.1288, cr_loss=0.3888, attn_decoder_loss=0.2477, over 29541.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1318, cr_loss=0.3759, attn_decoder_loss=0.2488, over 5777593.71 frames. ], batch size: 76, lr: 4.82e-03, grad_scale: 8.0 2024-09-18 08:58:34,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2024-09-18 08:58:40,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2024-09-18 08:58:53,668 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:59:20,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=410520.0, ans=0.0 2024-09-18 08:59:24,740 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.808e+01 9.332e+01 1.013e+02 4.220e+02, threshold=1.866e+02, percent-clipped=2.0 2024-09-18 08:59:25,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410520.0, ans=0.1 2024-09-18 08:59:32,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=410560.0, ans=0.125 2024-09-18 08:59:35,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=410560.0, ans=0.125 2024-09-18 08:59:40,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=22.5 2024-09-18 08:59:44,268 INFO [train.py:1198] (0/2) Epoch 23, batch 3100, loss[loss=0.2677, ctc_loss=0.1514, cr_loss=0.4049, attn_decoder_loss=0.2716, over 29253.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1319, cr_loss=0.3763, attn_decoder_loss=0.2485, over 5777820.02 frames. ], batch size: 100, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 08:59:54,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=410600.0, ans=0.125 2024-09-18 09:00:15,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410680.0, ans=0.1 2024-09-18 09:00:42,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=410720.0, ans=0.125 2024-09-18 09:01:02,026 INFO [train.py:1198] (0/2) Epoch 23, batch 3150, loss[loss=0.2613, ctc_loss=0.1395, cr_loss=0.3946, attn_decoder_loss=0.266, over 28822.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1316, cr_loss=0.3754, attn_decoder_loss=0.2485, over 5784242.33 frames. ], batch size: 104, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:01:26,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=410840.0, ans=0.125 2024-09-18 09:01:35,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=410880.0, ans=0.0 2024-09-18 09:01:57,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.637e+01 9.168e+01 9.786e+01 2.272e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-18 09:02:17,396 INFO [train.py:1198] (0/2) Epoch 23, batch 3200, loss[loss=0.2372, ctc_loss=0.122, cr_loss=0.3753, attn_decoder_loss=0.2416, over 29430.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1309, cr_loss=0.3738, attn_decoder_loss=0.248, over 5794132.34 frames. ], batch size: 79, lr: 4.81e-03, grad_scale: 16.0 2024-09-18 09:02:20,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=411000.0, ans=0.2 2024-09-18 09:02:28,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411000.0, ans=0.1 2024-09-18 09:02:28,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=411000.0, ans=0.0 2024-09-18 09:02:36,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=411040.0, ans=0.0 2024-09-18 09:02:50,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=411080.0, ans=0.0 2024-09-18 09:02:59,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=411080.0, ans=0.1 2024-09-18 09:03:15,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-09-18 09:03:35,720 INFO [train.py:1198] (0/2) Epoch 23, batch 3250, loss[loss=0.2553, ctc_loss=0.133, cr_loss=0.378, attn_decoder_loss=0.2605, over 29712.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1314, cr_loss=0.375, attn_decoder_loss=0.2484, over 5800240.47 frames. ], batch size: 84, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:03:38,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411200.0, ans=0.1 2024-09-18 09:03:45,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=411200.0, ans=0.2 2024-09-18 09:03:45,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=411200.0, ans=0.0 2024-09-18 09:03:46,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=411200.0, ans=0.09899494936611666 2024-09-18 09:04:34,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.655e+01 9.272e+01 9.823e+01 1.322e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-18 09:04:49,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2024-09-18 09:04:53,301 INFO [train.py:1198] (0/2) Epoch 23, batch 3300, loss[loss=0.2501, ctc_loss=0.1236, cr_loss=0.3621, attn_decoder_loss=0.2561, over 28405.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1306, cr_loss=0.3731, attn_decoder_loss=0.2473, over 5796544.97 frames. ], batch size: 111, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:05:01,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=411400.0, ans=0.125 2024-09-18 09:06:04,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=411560.0, ans=0.0 2024-09-18 09:06:08,663 INFO [train.py:1198] (0/2) Epoch 23, batch 3350, loss[loss=0.2553, ctc_loss=0.147, cr_loss=0.4158, attn_decoder_loss=0.2581, over 28863.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1314, cr_loss=0.3744, attn_decoder_loss=0.2481, over 5773198.81 frames. ], batch size: 104, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:06:21,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=411600.0, ans=0.0 2024-09-18 09:06:24,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=411640.0, ans=0.0 2024-09-18 09:06:31,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=411640.0, ans=0.07 2024-09-18 09:06:44,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=411680.0, ans=0.125 2024-09-18 09:06:55,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=411720.0, ans=0.125 2024-09-18 09:07:08,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.663e+01 9.206e+01 9.789e+01 2.075e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-18 09:07:10,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=411760.0, ans=0.0 2024-09-18 09:07:19,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=411760.0, ans=0.125 2024-09-18 09:07:23,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=411760.0, ans=0.125 2024-09-18 09:07:26,467 INFO [train.py:1198] (0/2) Epoch 23, batch 3400, loss[loss=0.2197, ctc_loss=0.117, cr_loss=0.3496, attn_decoder_loss=0.2234, over 29364.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1316, cr_loss=0.3742, attn_decoder_loss=0.248, over 5766013.70 frames. ], batch size: 67, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:07:30,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.10 vs. limit=15.0 2024-09-18 09:07:32,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=411800.0, ans=0.125 2024-09-18 09:08:06,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2024-09-18 09:08:44,752 INFO [train.py:1198] (0/2) Epoch 23, batch 3450, loss[loss=0.2521, ctc_loss=0.1318, cr_loss=0.3774, attn_decoder_loss=0.2571, over 28191.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1313, cr_loss=0.3737, attn_decoder_loss=0.2482, over 5774069.30 frames. ], batch size: 111, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:08:51,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=412000.0, ans=0.125 2024-09-18 09:09:06,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=412040.0, ans=0.125 2024-09-18 09:09:09,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=412040.0, ans=0.5 2024-09-18 09:09:22,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=412080.0, ans=0.025 2024-09-18 09:09:22,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=412080.0, ans=0.0 2024-09-18 09:09:29,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=412120.0, ans=0.125 2024-09-18 09:09:35,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=412120.0, ans=0.125 2024-09-18 09:09:42,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.670e+01 9.056e+01 9.530e+01 1.937e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-18 09:09:47,514 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:10:02,667 INFO [train.py:1198] (0/2) Epoch 23, batch 3500, loss[loss=0.2213, ctc_loss=0.1147, cr_loss=0.3444, attn_decoder_loss=0.2255, over 29341.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1312, cr_loss=0.374, attn_decoder_loss=0.2477, over 5776840.32 frames. ], batch size: 71, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 09:10:41,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=12.0 2024-09-18 09:10:46,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=412320.0, ans=0.5 2024-09-18 09:10:47,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=412320.0, ans=0.0 2024-09-18 09:10:53,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=412320.0, ans=0.0 2024-09-18 09:11:17,256 INFO [train.py:1198] (0/2) Epoch 23, batch 3550, loss[loss=0.251, ctc_loss=0.1347, cr_loss=0.385, attn_decoder_loss=0.2554, over 29705.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1309, cr_loss=0.3738, attn_decoder_loss=0.2476, over 5783488.07 frames. ], batch size: 89, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:11:54,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=412480.0, ans=0.125 2024-09-18 09:12:06,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412520.0, ans=0.1 2024-09-18 09:12:09,840 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:12:14,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 8.498e+01 8.967e+01 9.754e+01 1.546e+02, threshold=1.793e+02, percent-clipped=1.0 2024-09-18 09:12:31,829 INFO [train.py:1198] (0/2) Epoch 23, batch 3600, loss[loss=0.2346, ctc_loss=0.1217, cr_loss=0.3633, attn_decoder_loss=0.2391, over 29497.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1307, cr_loss=0.3732, attn_decoder_loss=0.2474, over 5791557.08 frames. ], batch size: 77, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 09:13:17,587 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:13:26,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-09-18 09:13:38,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=412760.0, ans=0.0 2024-09-18 09:13:48,738 INFO [train.py:1198] (0/2) Epoch 23, batch 3650, loss[loss=0.2463, ctc_loss=0.1334, cr_loss=0.3894, attn_decoder_loss=0.2502, over 29518.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1305, cr_loss=0.3727, attn_decoder_loss=0.2471, over 5793234.61 frames. ], batch size: 90, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:13:53,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=412800.0, ans=0.0 2024-09-18 09:14:06,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=412840.0, ans=0.0 2024-09-18 09:14:14,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=412840.0, ans=10.0 2024-09-18 09:14:15,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=412840.0, ans=0.0 2024-09-18 09:14:17,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=412880.0, ans=0.2 2024-09-18 09:14:38,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=412920.0, ans=0.0 2024-09-18 09:14:39,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=412920.0, ans=0.95 2024-09-18 09:14:46,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.375e+01 8.989e+01 9.606e+01 2.045e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-18 09:14:51,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412960.0, ans=0.1 2024-09-18 09:15:02,931 INFO [train.py:1198] (0/2) Epoch 23, batch 3700, loss[loss=0.2542, ctc_loss=0.1409, cr_loss=0.4041, attn_decoder_loss=0.2578, over 29704.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1302, cr_loss=0.3724, attn_decoder_loss=0.2471, over 5803249.18 frames. ], batch size: 84, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:15:22,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2024-09-18 09:15:41,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=413080.0, ans=0.125 2024-09-18 09:15:42,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=413080.0, ans=0.125 2024-09-18 09:15:57,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2024-09-18 09:16:01,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=413160.0, ans=0.0 2024-09-18 09:16:16,983 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:16:19,392 INFO [train.py:1198] (0/2) Epoch 23, batch 3750, loss[loss=0.2178, ctc_loss=0.1182, cr_loss=0.3385, attn_decoder_loss=0.2214, over 29335.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1299, cr_loss=0.3715, attn_decoder_loss=0.2466, over 5807302.11 frames. ], batch size: 67, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:16:31,587 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:16:45,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=22.5 2024-09-18 09:16:46,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-09-18 09:16:59,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=413280.0, ans=0.125 2024-09-18 09:17:13,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=413320.0, ans=0.125 2024-09-18 09:17:17,040 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.618e+01 9.156e+01 9.859e+01 5.134e+02, threshold=1.831e+02, percent-clipped=3.0 2024-09-18 09:17:19,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-18 09:17:33,575 INFO [train.py:1198] (0/2) Epoch 23, batch 3800, loss[loss=0.2488, ctc_loss=0.1259, cr_loss=0.357, attn_decoder_loss=0.2545, over 29624.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1297, cr_loss=0.3711, attn_decoder_loss=0.2463, over 5797274.70 frames. ], batch size: 86, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:17:41,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=413400.0, ans=0.2 2024-09-18 09:17:43,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.30 vs. limit=15.0 2024-09-18 09:17:50,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.73 vs. limit=22.5 2024-09-18 09:17:54,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-18 09:18:15,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=413480.0, ans=0.125 2024-09-18 09:18:15,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=413480.0, ans=0.125 2024-09-18 09:18:23,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=413520.0, ans=0.125 2024-09-18 09:18:35,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413560.0, ans=0.1 2024-09-18 09:18:36,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=413560.0, ans=0.025 2024-09-18 09:18:39,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413560.0, ans=0.0 2024-09-18 09:18:41,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=413560.0, ans=0.125 2024-09-18 09:18:44,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=413560.0, ans=0.2 2024-09-18 09:18:45,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=413560.0, ans=0.2 2024-09-18 09:18:48,727 INFO [train.py:1198] (0/2) Epoch 23, batch 3850, loss[loss=0.2579, ctc_loss=0.1366, cr_loss=0.4009, attn_decoder_loss=0.2624, over 29265.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1298, cr_loss=0.3715, attn_decoder_loss=0.2465, over 5811821.05 frames. ], batch size: 100, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:18:56,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=413600.0, ans=0.125 2024-09-18 09:19:04,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-18 09:19:48,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.512e+01 9.090e+01 9.629e+01 1.233e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 09:19:51,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=413760.0, ans=0.0 2024-09-18 09:19:53,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=413760.0, ans=0.0 2024-09-18 09:20:05,023 INFO [train.py:1198] (0/2) Epoch 23, batch 3900, loss[loss=0.258, ctc_loss=0.1349, cr_loss=0.3876, attn_decoder_loss=0.263, over 29610.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1303, cr_loss=0.3723, attn_decoder_loss=0.2471, over 5815852.05 frames. ], batch size: 86, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:20:18,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=413840.0, ans=0.0 2024-09-18 09:20:18,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=413840.0, ans=0.125 2024-09-18 09:20:30,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413840.0, ans=0.1 2024-09-18 09:20:32,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.95 vs. limit=15.0 2024-09-18 09:20:34,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=413880.0, ans=0.125 2024-09-18 09:20:37,745 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:20:47,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.60 vs. limit=10.0 2024-09-18 09:20:53,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.04 vs. limit=22.5 2024-09-18 09:20:55,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=413920.0, ans=0.125 2024-09-18 09:21:19,073 INFO [train.py:1198] (0/2) Epoch 23, batch 3950, loss[loss=0.2523, ctc_loss=0.1345, cr_loss=0.3897, attn_decoder_loss=0.2567, over 29481.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.13, cr_loss=0.3731, attn_decoder_loss=0.247, over 5835374.69 frames. ], batch size: 97, lr: 4.80e-03, grad_scale: 8.0 2024-09-18 09:21:44,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=414040.0, ans=0.025 2024-09-18 09:21:57,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=414080.0, ans=0.1 2024-09-18 09:22:04,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-09-18 09:22:18,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.566e+01 9.101e+01 9.931e+01 2.734e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 09:22:24,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=414160.0, ans=0.125 2024-09-18 09:22:30,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-09-18 09:22:34,496 INFO [train.py:1198] (0/2) Epoch 23, batch 4000, loss[loss=0.2329, ctc_loss=0.1245, cr_loss=0.3646, attn_decoder_loss=0.2368, over 29522.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1304, cr_loss=0.3729, attn_decoder_loss=0.2471, over 5814414.70 frames. ], batch size: 74, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 09:22:36,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=414200.0, ans=0.125 2024-09-18 09:23:04,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=414280.0, ans=0.1 2024-09-18 09:23:26,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-18 09:23:43,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2024-09-18 09:23:48,958 INFO [train.py:1198] (0/2) Epoch 23, batch 4050, loss[loss=0.2726, ctc_loss=0.1757, cr_loss=0.4093, attn_decoder_loss=0.2743, over 20008.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1307, cr_loss=0.3725, attn_decoder_loss=0.247, over 5797963.48 frames. ], batch size: 210, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:24:03,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=414440.0, ans=0.125 2024-09-18 09:24:08,091 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:24:42,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=414520.0, ans=0.125 2024-09-18 09:24:49,138 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.682e+01 9.236e+01 9.757e+01 1.586e+02, threshold=1.847e+02, percent-clipped=0.0 2024-09-18 09:25:03,928 INFO [train.py:1198] (0/2) Epoch 23, batch 4100, loss[loss=0.2592, ctc_loss=0.147, cr_loss=0.3935, attn_decoder_loss=0.2629, over 29495.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1309, cr_loss=0.3732, attn_decoder_loss=0.2473, over 5793134.35 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:25:26,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=414640.0, ans=0.125 2024-09-18 09:25:29,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=414640.0, ans=0.125 2024-09-18 09:25:58,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2024-09-18 09:26:02,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=414760.0, ans=0.025 2024-09-18 09:26:18,785 INFO [train.py:1198] (0/2) Epoch 23, batch 4150, loss[loss=0.2343, ctc_loss=0.1283, cr_loss=0.3739, attn_decoder_loss=0.2377, over 29491.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1308, cr_loss=0.3734, attn_decoder_loss=0.2471, over 5798404.60 frames. ], batch size: 77, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:26:48,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=414880.0, ans=0.125 2024-09-18 09:27:16,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414960.0, ans=0.1 2024-09-18 09:27:17,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.495e+01 8.961e+01 9.619e+01 1.585e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 09:27:19,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414960.0, ans=0.1 2024-09-18 09:27:25,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.91 vs. limit=15.0 2024-09-18 09:27:32,591 INFO [train.py:1198] (0/2) Epoch 23, batch 4200, loss[loss=0.2682, ctc_loss=0.1535, cr_loss=0.411, attn_decoder_loss=0.2718, over 29486.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1311, cr_loss=0.3737, attn_decoder_loss=0.2473, over 5800598.11 frames. ], batch size: 90, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:27:45,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415000.0, ans=0.125 2024-09-18 09:28:18,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=415120.0, ans=0.125 2024-09-18 09:28:18,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=415120.0, ans=0.125 2024-09-18 09:28:20,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-18 09:28:43,089 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-09-18 09:28:48,193 INFO [train.py:1198] (0/2) Epoch 23, batch 4250, loss[loss=0.2262, ctc_loss=0.117, cr_loss=0.3413, attn_decoder_loss=0.2308, over 29522.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1308, cr_loss=0.3733, attn_decoder_loss=0.2476, over 5805553.05 frames. ], batch size: 74, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:29:04,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=415240.0, ans=0.07 2024-09-18 09:29:07,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=415240.0, ans=0.2 2024-09-18 09:29:19,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-09-18 09:29:21,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-09-18 09:29:24,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415280.0, ans=0.1 2024-09-18 09:29:28,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=415280.0, ans=0.2 2024-09-18 09:29:38,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=415320.0, ans=0.0 2024-09-18 09:29:47,861 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.728e+01 9.274e+01 9.904e+01 2.860e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-18 09:30:02,719 INFO [train.py:1198] (0/2) Epoch 23, batch 4300, loss[loss=0.2576, ctc_loss=0.1452, cr_loss=0.3932, attn_decoder_loss=0.2613, over 29531.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1307, cr_loss=0.3734, attn_decoder_loss=0.2478, over 5794632.24 frames. ], batch size: 87, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:30:13,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=415400.0, ans=0.125 2024-09-18 09:30:19,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=415440.0, ans=0.125 2024-09-18 09:30:26,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=415440.0, ans=0.125 2024-09-18 09:30:28,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415440.0, ans=0.1 2024-09-18 09:30:44,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=415480.0, ans=0.125 2024-09-18 09:31:16,768 INFO [train.py:1198] (0/2) Epoch 23, batch 4350, loss[loss=0.2623, ctc_loss=0.146, cr_loss=0.4098, attn_decoder_loss=0.2661, over 29478.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1334, cr_loss=0.3795, attn_decoder_loss=0.2512, over 5797691.14 frames. ], batch size: 97, lr: 4.79e-03, grad_scale: 8.0 2024-09-18 09:31:33,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=415640.0, ans=0.125 2024-09-18 09:31:36,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=415640.0, ans=0.125 2024-09-18 09:31:54,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=12.0 2024-09-18 09:32:16,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.901e+01 9.212e+01 9.767e+01 1.363e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-18 09:32:20,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-18 09:32:30,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.63 vs. limit=10.0 2024-09-18 09:32:31,048 INFO [train.py:1198] (0/2) Epoch 23, batch 4400, loss[loss=0.2544, ctc_loss=0.1347, cr_loss=0.3792, attn_decoder_loss=0.2593, over 27084.00 frames. ], tot_loss[loss=0.249, ctc_loss=0.1349, cr_loss=0.382, attn_decoder_loss=0.2532, over 5767420.41 frames. ], batch size: 124, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 09:32:46,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.78 vs. limit=10.0 2024-09-18 09:33:15,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=415920.0, ans=0.025 2024-09-18 09:33:34,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=415960.0, ans=0.125 2024-09-18 09:33:44,565 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-104000.pt 2024-09-18 09:33:53,116 INFO [train.py:1198] (0/2) Epoch 23, batch 4450, loss[loss=0.2671, ctc_loss=0.166, cr_loss=0.4067, attn_decoder_loss=0.2694, over 20485.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1388, cr_loss=0.3865, attn_decoder_loss=0.2555, over 5576389.12 frames. ], batch size: 210, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 09:34:48,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=416120.0, ans=10.0 2024-09-18 09:34:52,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=416160.0, ans=0.0 2024-09-18 09:34:54,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=416160.0, ans=0.0 2024-09-18 09:34:55,164 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.365e+01 9.450e+01 1.070e+02 1.179e+02 4.631e+02, threshold=2.141e+02, percent-clipped=3.0 2024-09-18 09:35:06,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=416160.0, ans=0.025 2024-09-18 09:35:09,063 INFO [train.py:1198] (0/2) Epoch 23, batch 4500, loss[loss=0.2635, ctc_loss=0.1602, cr_loss=0.3813, attn_decoder_loss=0.2665, over 20071.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.143, cr_loss=0.3891, attn_decoder_loss=0.2577, over 5237689.90 frames. ], batch size: 210, lr: 4.78e-03, grad_scale: 8.0 2024-09-18 09:35:11,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.61 vs. limit=22.5 2024-09-18 09:35:11,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-09-18 09:35:22,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=416240.0, ans=0.09899494936611666 2024-09-18 09:35:31,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=416240.0, ans=0.1 2024-09-18 09:35:46,056 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-23.pt 2024-09-18 09:36:38,043 INFO [train.py:1198] (0/2) Epoch 24, batch 0, loss[loss=0.2248, ctc_loss=0.113, cr_loss=0.3443, attn_decoder_loss=0.2295, over 29608.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.113, cr_loss=0.3443, attn_decoder_loss=0.2295, over 29608.00 frames. ], batch size: 73, lr: 4.68e-03, grad_scale: 16.0 2024-09-18 09:36:38,044 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 09:36:58,732 INFO [train.py:1230] (0/2) Epoch 24, validation: loss=0.2127, ctc_loss=0.03777, cr_loss=4.976e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-18 09:36:58,732 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 09:37:11,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=416300.0, ans=0.2 2024-09-18 09:37:14,382 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:37:32,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=416380.0, ans=0.0 2024-09-18 09:38:14,645 INFO [train.py:1198] (0/2) Epoch 24, batch 50, loss[loss=0.2245, ctc_loss=0.1171, cr_loss=0.3634, attn_decoder_loss=0.2284, over 29414.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1326, cr_loss=0.378, attn_decoder_loss=0.2492, over 1267441.45 frames. ], batch size: 70, lr: 4.68e-03, grad_scale: 8.0 2024-09-18 09:38:40,703 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.838e+01 9.694e+01 1.103e+02 3.363e+02, threshold=1.939e+02, percent-clipped=1.0 2024-09-18 09:39:30,895 INFO [train.py:1198] (0/2) Epoch 24, batch 100, loss[loss=0.2225, ctc_loss=0.1116, cr_loss=0.3367, attn_decoder_loss=0.2273, over 29528.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1333, cr_loss=0.3791, attn_decoder_loss=0.2505, over 2250592.16 frames. ], batch size: 76, lr: 4.68e-03, grad_scale: 8.0 2024-09-18 09:39:55,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=416740.0, ans=0.125 2024-09-18 09:39:57,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2024-09-18 09:40:09,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=416780.0, ans=0.2 2024-09-18 09:40:11,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416780.0, ans=0.1 2024-09-18 09:40:15,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=416780.0, ans=0.05 2024-09-18 09:40:21,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=416820.0, ans=0.125 2024-09-18 09:40:23,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=416820.0, ans=0.125 2024-09-18 09:40:50,682 INFO [train.py:1198] (0/2) Epoch 24, batch 150, loss[loss=0.222, ctc_loss=0.1179, cr_loss=0.3604, attn_decoder_loss=0.2256, over 29435.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1306, cr_loss=0.3747, attn_decoder_loss=0.2476, over 3046501.94 frames. ], batch size: 70, lr: 4.68e-03, grad_scale: 8.0 2024-09-18 09:41:16,582 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.498e+01 9.006e+01 9.810e+01 1.466e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 09:41:31,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=416980.0, ans=0.0 2024-09-18 09:41:50,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=417060.0, ans=0.125 2024-09-18 09:42:06,295 INFO [train.py:1198] (0/2) Epoch 24, batch 200, loss[loss=0.2539, ctc_loss=0.1453, cr_loss=0.3997, attn_decoder_loss=0.257, over 27222.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1301, cr_loss=0.3739, attn_decoder_loss=0.2471, over 3659772.09 frames. ], batch size: 124, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:42:06,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=417100.0, ans=0.0 2024-09-18 09:42:38,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=417180.0, ans=0.0 2024-09-18 09:42:47,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=417180.0, ans=0.05 2024-09-18 09:42:52,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-18 09:42:56,697 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:43:00,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-18 09:43:13,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=417260.0, ans=0.0 2024-09-18 09:43:22,141 INFO [train.py:1198] (0/2) Epoch 24, batch 250, loss[loss=0.263, ctc_loss=0.1455, cr_loss=0.4044, attn_decoder_loss=0.267, over 29326.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1301, cr_loss=0.3738, attn_decoder_loss=0.2471, over 4142618.33 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:43:40,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-18 09:43:47,873 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.833e+01 9.396e+01 1.002e+02 2.195e+02, threshold=1.879e+02, percent-clipped=2.0 2024-09-18 09:44:01,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.86 vs. limit=15.0 2024-09-18 09:44:08,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=417420.0, ans=0.125 2024-09-18 09:44:23,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=417460.0, ans=0.025 2024-09-18 09:44:42,567 INFO [train.py:1198] (0/2) Epoch 24, batch 300, loss[loss=0.2694, ctc_loss=0.1517, cr_loss=0.4356, attn_decoder_loss=0.2728, over 29501.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1299, cr_loss=0.3737, attn_decoder_loss=0.2472, over 4511016.49 frames. ], batch size: 92, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:44:42,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=417500.0, ans=0.035 2024-09-18 09:44:43,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2024-09-18 09:44:52,119 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:45:17,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=417580.0, ans=0.125 2024-09-18 09:45:34,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=417620.0, ans=0.2 2024-09-18 09:45:36,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2024-09-18 09:45:58,770 INFO [train.py:1198] (0/2) Epoch 24, batch 350, loss[loss=0.2165, ctc_loss=0.103, cr_loss=0.3045, attn_decoder_loss=0.2223, over 29326.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1301, cr_loss=0.3742, attn_decoder_loss=0.2476, over 4796809.02 frames. ], batch size: 71, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:46:04,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2024-09-18 09:46:25,838 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.494e+01 8.951e+01 9.745e+01 1.329e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-18 09:46:37,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2024-09-18 09:46:42,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=417820.0, ans=0.125 2024-09-18 09:46:53,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.10 vs. limit=15.0 2024-09-18 09:46:58,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=417860.0, ans=0.1 2024-09-18 09:47:14,296 INFO [train.py:1198] (0/2) Epoch 24, batch 400, loss[loss=0.2546, ctc_loss=0.14, cr_loss=0.3908, attn_decoder_loss=0.2587, over 29716.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1294, cr_loss=0.3725, attn_decoder_loss=0.247, over 5026817.28 frames. ], batch size: 82, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:47:25,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=417900.0, ans=10.0 2024-09-18 09:47:58,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=418020.0, ans=0.0 2024-09-18 09:48:00,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=418020.0, ans=0.125 2024-09-18 09:48:24,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=418060.0, ans=0.125 2024-09-18 09:48:32,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=418060.0, ans=0.0 2024-09-18 09:48:35,438 INFO [train.py:1198] (0/2) Epoch 24, batch 450, loss[loss=0.2491, ctc_loss=0.129, cr_loss=0.3675, attn_decoder_loss=0.2542, over 29706.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1294, cr_loss=0.3729, attn_decoder_loss=0.247, over 5189502.84 frames. ], batch size: 83, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:48:37,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=418100.0, ans=0.125 2024-09-18 09:49:02,619 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.530e+01 9.135e+01 9.796e+01 4.658e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-18 09:49:24,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=418220.0, ans=0.05 2024-09-18 09:49:37,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=418260.0, ans=0.125 2024-09-18 09:49:50,901 INFO [train.py:1198] (0/2) Epoch 24, batch 500, loss[loss=0.2534, ctc_loss=0.1356, cr_loss=0.3822, attn_decoder_loss=0.258, over 29472.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1288, cr_loss=0.3717, attn_decoder_loss=0.2463, over 5332204.13 frames. ], batch size: 94, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:50:00,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=418300.0, ans=0.0 2024-09-18 09:50:06,745 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:50:11,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=418340.0, ans=0.125 2024-09-18 09:50:18,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=418340.0, ans=0.125 2024-09-18 09:50:34,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=418380.0, ans=0.2 2024-09-18 09:50:41,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=418420.0, ans=0.125 2024-09-18 09:50:49,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=418420.0, ans=0.2 2024-09-18 09:51:05,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=418500.0, ans=0.125 2024-09-18 09:51:07,118 INFO [train.py:1198] (0/2) Epoch 24, batch 550, loss[loss=0.2666, ctc_loss=0.1456, cr_loss=0.4107, attn_decoder_loss=0.2709, over 28906.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1288, cr_loss=0.3713, attn_decoder_loss=0.2463, over 5424609.82 frames. ], batch size: 104, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:51:07,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=418500.0, ans=0.2 2024-09-18 09:51:10,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418500.0, ans=0.1 2024-09-18 09:51:34,481 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.569e+01 9.031e+01 9.630e+01 1.358e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-18 09:51:43,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=418580.0, ans=0.125 2024-09-18 09:51:59,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2024-09-18 09:52:03,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-09-18 09:52:14,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-09-18 09:52:16,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=418660.0, ans=0.125 2024-09-18 09:52:27,743 INFO [train.py:1198] (0/2) Epoch 24, batch 600, loss[loss=0.2494, ctc_loss=0.1327, cr_loss=0.3802, attn_decoder_loss=0.2539, over 29268.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1293, cr_loss=0.372, attn_decoder_loss=0.2467, over 5512126.11 frames. ], batch size: 100, lr: 4.67e-03, grad_scale: 8.0 2024-09-18 09:52:54,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=418740.0, ans=0.0 2024-09-18 09:53:29,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=418860.0, ans=0.0 2024-09-18 09:53:34,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=418860.0, ans=0.0 2024-09-18 09:53:36,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-18 09:53:42,743 INFO [train.py:1198] (0/2) Epoch 24, batch 650, loss[loss=0.2549, ctc_loss=0.1413, cr_loss=0.4025, attn_decoder_loss=0.2586, over 29771.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1285, cr_loss=0.3705, attn_decoder_loss=0.2461, over 5588441.15 frames. ], batch size: 81, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:53:43,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-18 09:53:49,512 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.02 vs. limit=10.0 2024-09-18 09:54:10,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.656e+01 8.941e+01 9.589e+01 2.067e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 09:54:16,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=418980.0, ans=0.125 2024-09-18 09:54:37,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=419020.0, ans=0.0 2024-09-18 09:54:42,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=419060.0, ans=0.0 2024-09-18 09:54:47,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=12.0 2024-09-18 09:54:49,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=419060.0, ans=0.02 2024-09-18 09:54:51,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=419060.0, ans=0.0 2024-09-18 09:54:58,685 INFO [train.py:1198] (0/2) Epoch 24, batch 700, loss[loss=0.2262, ctc_loss=0.1121, cr_loss=0.3439, attn_decoder_loss=0.2312, over 29538.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1297, cr_loss=0.3727, attn_decoder_loss=0.247, over 5638803.96 frames. ], batch size: 76, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:55:02,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=419100.0, ans=0.025 2024-09-18 09:55:03,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419100.0, ans=0.125 2024-09-18 09:55:33,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=419180.0, ans=0.125 2024-09-18 09:55:53,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.98 vs. limit=10.0 2024-09-18 09:56:17,350 INFO [train.py:1198] (0/2) Epoch 24, batch 750, loss[loss=0.2403, ctc_loss=0.1267, cr_loss=0.3668, attn_decoder_loss=0.2447, over 29717.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1297, cr_loss=0.3724, attn_decoder_loss=0.2464, over 5677730.35 frames. ], batch size: 82, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:56:17,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=419300.0, ans=0.125 2024-09-18 09:56:17,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419300.0, ans=0.125 2024-09-18 09:56:20,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=419300.0, ans=0.0 2024-09-18 09:56:37,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=22.5 2024-09-18 09:56:39,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=419340.0, ans=0.0 2024-09-18 09:56:46,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.693e+01 9.112e+01 9.779e+01 2.514e+02, threshold=1.822e+02, percent-clipped=3.0 2024-09-18 09:56:49,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2024-09-18 09:57:22,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=419460.0, ans=0.125 2024-09-18 09:57:36,009 INFO [train.py:1198] (0/2) Epoch 24, batch 800, loss[loss=0.2246, ctc_loss=0.1178, cr_loss=0.3603, attn_decoder_loss=0.2284, over 29593.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1291, cr_loss=0.3718, attn_decoder_loss=0.2462, over 5708557.20 frames. ], batch size: 73, lr: 4.66e-03, grad_scale: 16.0 2024-09-18 09:57:45,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=419500.0, ans=0.125 2024-09-18 09:57:49,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=419540.0, ans=0.025 2024-09-18 09:58:52,056 INFO [train.py:1198] (0/2) Epoch 24, batch 850, loss[loss=0.246, ctc_loss=0.125, cr_loss=0.3583, attn_decoder_loss=0.2515, over 29687.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1288, cr_loss=0.3711, attn_decoder_loss=0.246, over 5738208.29 frames. ], batch size: 89, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 09:59:21,101 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.516e+01 9.029e+01 9.587e+01 2.043e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-18 09:59:36,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=419820.0, ans=0.025 2024-09-18 09:59:37,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-18 09:59:40,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=419820.0, ans=0.125 2024-09-18 09:59:52,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=419860.0, ans=0.035 2024-09-18 10:00:00,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419860.0, ans=0.1 2024-09-18 10:00:03,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=419860.0, ans=0.125 2024-09-18 10:00:08,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=419900.0, ans=0.125 2024-09-18 10:00:09,271 INFO [train.py:1198] (0/2) Epoch 24, batch 900, loss[loss=0.2297, ctc_loss=0.1175, cr_loss=0.3547, attn_decoder_loss=0.2343, over 29587.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1289, cr_loss=0.3711, attn_decoder_loss=0.2463, over 5742960.81 frames. ], batch size: 73, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:00:22,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=419900.0, ans=0.0 2024-09-18 10:00:27,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=419900.0, ans=0.125 2024-09-18 10:00:32,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-18 10:00:35,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-09-18 10:00:37,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=419940.0, ans=0.0 2024-09-18 10:00:41,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=419940.0, ans=0.0 2024-09-18 10:00:47,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=419980.0, ans=0.2 2024-09-18 10:00:47,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=419980.0, ans=0.125 2024-09-18 10:00:55,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=419980.0, ans=0.0 2024-09-18 10:01:31,745 INFO [train.py:1198] (0/2) Epoch 24, batch 950, loss[loss=0.2171, ctc_loss=0.0974, cr_loss=0.3009, attn_decoder_loss=0.2237, over 29526.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1291, cr_loss=0.3711, attn_decoder_loss=0.2466, over 5743560.79 frames. ], batch size: 74, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:01:32,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=12.0 2024-09-18 10:01:33,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=420100.0, ans=0.2 2024-09-18 10:01:50,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420140.0, ans=0.1 2024-09-18 10:02:00,914 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.541e+01 9.200e+01 9.747e+01 2.326e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-18 10:02:18,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=420220.0, ans=0.125 2024-09-18 10:02:36,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=420260.0, ans=0.125 2024-09-18 10:02:47,906 INFO [train.py:1198] (0/2) Epoch 24, batch 1000, loss[loss=0.2398, ctc_loss=0.1268, cr_loss=0.3646, attn_decoder_loss=0.2442, over 29487.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1303, cr_loss=0.3731, attn_decoder_loss=0.2474, over 5738176.24 frames. ], batch size: 77, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:03:02,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=420340.0, ans=0.125 2024-09-18 10:03:07,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.47 vs. limit=15.0 2024-09-18 10:03:12,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2024-09-18 10:03:15,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=420340.0, ans=0.125 2024-09-18 10:03:20,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=420380.0, ans=0.0 2024-09-18 10:03:20,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420380.0, ans=0.0 2024-09-18 10:03:25,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=420380.0, ans=0.125 2024-09-18 10:03:44,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=420420.0, ans=0.125 2024-09-18 10:04:02,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=12.0 2024-09-18 10:04:03,941 INFO [train.py:1198] (0/2) Epoch 24, batch 1050, loss[loss=0.2483, ctc_loss=0.1352, cr_loss=0.3756, attn_decoder_loss=0.2525, over 29664.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1298, cr_loss=0.3721, attn_decoder_loss=0.2466, over 5745384.06 frames. ], batch size: 85, lr: 4.66e-03, grad_scale: 8.0 2024-09-18 10:04:21,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-18 10:04:23,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=420540.0, ans=0.0 2024-09-18 10:04:35,588 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.418e+01 8.849e+01 9.632e+01 1.961e+02, threshold=1.770e+02, percent-clipped=1.0 2024-09-18 10:04:45,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=420580.0, ans=0.125 2024-09-18 10:04:53,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=22.5 2024-09-18 10:04:58,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2024-09-18 10:05:12,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-09-18 10:05:22,855 INFO [train.py:1198] (0/2) Epoch 24, batch 1100, loss[loss=0.2269, ctc_loss=0.1185, cr_loss=0.3492, attn_decoder_loss=0.2312, over 29479.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1297, cr_loss=0.372, attn_decoder_loss=0.2465, over 5757411.86 frames. ], batch size: 78, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:05:33,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=420700.0, ans=0.2 2024-09-18 10:05:49,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2024-09-18 10:06:08,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=420820.0, ans=0.125 2024-09-18 10:06:20,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=420820.0, ans=0.0 2024-09-18 10:06:39,953 INFO [train.py:1198] (0/2) Epoch 24, batch 1150, loss[loss=0.2401, ctc_loss=0.1312, cr_loss=0.3648, attn_decoder_loss=0.2441, over 29452.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1299, cr_loss=0.3722, attn_decoder_loss=0.2467, over 5755628.32 frames. ], batch size: 78, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:06:44,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=420900.0, ans=0.125 2024-09-18 10:07:08,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=420940.0, ans=10.0 2024-09-18 10:07:08,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=420940.0, ans=15.0 2024-09-18 10:07:09,200 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.363e+01 8.865e+01 9.557e+01 3.982e+02, threshold=1.773e+02, percent-clipped=2.0 2024-09-18 10:07:42,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=421060.0, ans=0.125 2024-09-18 10:07:50,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421060.0, ans=0.1 2024-09-18 10:07:58,432 INFO [train.py:1198] (0/2) Epoch 24, batch 1200, loss[loss=0.2321, ctc_loss=0.1111, cr_loss=0.3149, attn_decoder_loss=0.2386, over 29680.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1304, cr_loss=0.3732, attn_decoder_loss=0.2476, over 5748607.40 frames. ], batch size: 85, lr: 4.65e-03, grad_scale: 16.0 2024-09-18 10:08:19,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-18 10:08:37,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-09-18 10:08:52,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=421220.0, ans=0.125 2024-09-18 10:09:00,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=421260.0, ans=0.125 2024-09-18 10:09:00,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=421260.0, ans=0.125 2024-09-18 10:09:10,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=421260.0, ans=0.0 2024-09-18 10:09:13,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=421260.0, ans=0.125 2024-09-18 10:09:16,336 INFO [train.py:1198] (0/2) Epoch 24, batch 1250, loss[loss=0.2584, ctc_loss=0.1443, cr_loss=0.3926, attn_decoder_loss=0.2624, over 29508.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1312, cr_loss=0.3755, attn_decoder_loss=0.2482, over 5776089.99 frames. ], batch size: 92, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:09:21,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=421300.0, ans=0.125 2024-09-18 10:09:25,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421300.0, ans=0.1 2024-09-18 10:09:46,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.716e+01 9.105e+01 9.689e+01 1.606e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-18 10:10:08,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=421420.0, ans=15.0 2024-09-18 10:10:10,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=421420.0, ans=22.5 2024-09-18 10:10:19,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=421460.0, ans=0.125 2024-09-18 10:10:25,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421460.0, ans=0.1 2024-09-18 10:10:31,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=421500.0, ans=0.2 2024-09-18 10:10:32,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2024-09-18 10:10:32,443 INFO [train.py:1198] (0/2) Epoch 24, batch 1300, loss[loss=0.2524, ctc_loss=0.1363, cr_loss=0.3874, attn_decoder_loss=0.2566, over 28516.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1305, cr_loss=0.3743, attn_decoder_loss=0.2476, over 5779994.26 frames. ], batch size: 112, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:10:49,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-09-18 10:11:01,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=22.5 2024-09-18 10:11:17,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=421620.0, ans=0.025 2024-09-18 10:11:23,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=421620.0, ans=0.2 2024-09-18 10:11:31,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=421620.0, ans=0.1 2024-09-18 10:11:39,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-09-18 10:11:49,283 INFO [train.py:1198] (0/2) Epoch 24, batch 1350, loss[loss=0.2343, ctc_loss=0.1272, cr_loss=0.3766, attn_decoder_loss=0.2378, over 29770.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.13, cr_loss=0.3734, attn_decoder_loss=0.247, over 5798677.17 frames. ], batch size: 81, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:11:59,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=421700.0, ans=0.2 2024-09-18 10:12:24,088 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.854e+01 9.285e+01 9.935e+01 1.189e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-18 10:12:24,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=421780.0, ans=0.125 2024-09-18 10:12:42,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=421820.0, ans=0.1 2024-09-18 10:12:50,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=421820.0, ans=0.125 2024-09-18 10:13:09,369 INFO [train.py:1198] (0/2) Epoch 24, batch 1400, loss[loss=0.2137, ctc_loss=0.111, cr_loss=0.3455, attn_decoder_loss=0.2175, over 29591.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1298, cr_loss=0.3733, attn_decoder_loss=0.2468, over 5809033.63 frames. ], batch size: 69, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:13:12,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=421900.0, ans=0.0 2024-09-18 10:13:21,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=421900.0, ans=0.125 2024-09-18 10:13:31,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=421940.0, ans=0.125 2024-09-18 10:13:38,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=421980.0, ans=0.125 2024-09-18 10:13:51,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=421980.0, ans=0.125 2024-09-18 10:14:16,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=422060.0, ans=0.125 2024-09-18 10:14:24,951 INFO [train.py:1198] (0/2) Epoch 24, batch 1450, loss[loss=0.2598, ctc_loss=0.1433, cr_loss=0.4112, attn_decoder_loss=0.2636, over 29399.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1298, cr_loss=0.373, attn_decoder_loss=0.2473, over 5805151.75 frames. ], batch size: 94, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:14:26,817 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:14:32,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=422100.0, ans=0.125 2024-09-18 10:14:38,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=422140.0, ans=0.0 2024-09-18 10:14:54,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=422180.0, ans=0.0 2024-09-18 10:14:55,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.531e+01 9.051e+01 9.633e+01 1.306e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 10:15:03,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=422180.0, ans=0.0 2024-09-18 10:15:06,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=422180.0, ans=0.125 2024-09-18 10:15:19,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=422220.0, ans=0.125 2024-09-18 10:15:39,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=422300.0, ans=0.125 2024-09-18 10:15:40,538 INFO [train.py:1198] (0/2) Epoch 24, batch 1500, loss[loss=0.252, ctc_loss=0.1379, cr_loss=0.3867, attn_decoder_loss=0.256, over 29644.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.13, cr_loss=0.3736, attn_decoder_loss=0.2477, over 5804915.64 frames. ], batch size: 86, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:16:17,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=422380.0, ans=0.125 2024-09-18 10:16:36,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=422420.0, ans=0.05 2024-09-18 10:16:43,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=422420.0, ans=0.125 2024-09-18 10:16:52,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=22.5 2024-09-18 10:17:01,784 INFO [train.py:1198] (0/2) Epoch 24, batch 1550, loss[loss=0.2561, ctc_loss=0.143, cr_loss=0.4161, attn_decoder_loss=0.2595, over 29502.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1302, cr_loss=0.3734, attn_decoder_loss=0.2478, over 5780208.38 frames. ], batch size: 90, lr: 4.65e-03, grad_scale: 8.0 2024-09-18 10:17:19,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.14 vs. limit=15.0 2024-09-18 10:17:29,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-09-18 10:17:32,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.697e+01 9.200e+01 9.648e+01 4.928e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-18 10:17:36,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422580.0, ans=0.1 2024-09-18 10:17:54,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.99 vs. limit=22.5 2024-09-18 10:18:17,225 INFO [train.py:1198] (0/2) Epoch 24, batch 1600, loss[loss=0.2472, ctc_loss=0.1257, cr_loss=0.3581, attn_decoder_loss=0.2527, over 29665.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1307, cr_loss=0.374, attn_decoder_loss=0.2477, over 5763021.28 frames. ], batch size: 85, lr: 4.64e-03, grad_scale: 16.0 2024-09-18 10:18:18,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.13 vs. limit=12.0 2024-09-18 10:18:38,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=422740.0, ans=0.0 2024-09-18 10:18:38,644 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:18:46,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2024-09-18 10:18:53,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=422780.0, ans=0.025 2024-09-18 10:19:15,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.89 vs. limit=22.5 2024-09-18 10:19:17,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422860.0, ans=0.1 2024-09-18 10:19:21,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422860.0, ans=0.1 2024-09-18 10:19:21,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=422860.0, ans=0.125 2024-09-18 10:19:28,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=422860.0, ans=0.0 2024-09-18 10:19:35,112 INFO [train.py:1198] (0/2) Epoch 24, batch 1650, loss[loss=0.2575, ctc_loss=0.1365, cr_loss=0.3915, attn_decoder_loss=0.2622, over 29725.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1303, cr_loss=0.3729, attn_decoder_loss=0.2474, over 5759119.10 frames. ], batch size: 89, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:19:38,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-18 10:19:42,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-09-18 10:19:53,590 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:20:00,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=422940.0, ans=0.125 2024-09-18 10:20:08,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.438e+01 9.287e+01 9.952e+01 1.595e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-18 10:20:12,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.73 vs. limit=15.0 2024-09-18 10:20:26,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=423020.0, ans=0.0 2024-09-18 10:20:27,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=423020.0, ans=0.025 2024-09-18 10:20:32,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-18 10:20:35,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=423020.0, ans=0.125 2024-09-18 10:20:50,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=423060.0, ans=0.125 2024-09-18 10:20:52,744 INFO [train.py:1198] (0/2) Epoch 24, batch 1700, loss[loss=0.2165, ctc_loss=0.1123, cr_loss=0.3359, attn_decoder_loss=0.2206, over 29575.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1298, cr_loss=0.3718, attn_decoder_loss=0.247, over 5780925.31 frames. ], batch size: 69, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:20:56,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=423100.0, ans=0.125 2024-09-18 10:20:59,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=423100.0, ans=0.125 2024-09-18 10:21:21,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=423180.0, ans=0.025 2024-09-18 10:21:26,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=423180.0, ans=0.025 2024-09-18 10:21:31,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=423180.0, ans=0.025 2024-09-18 10:21:40,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=423220.0, ans=0.125 2024-09-18 10:22:08,371 INFO [train.py:1198] (0/2) Epoch 24, batch 1750, loss[loss=0.2197, ctc_loss=0.1127, cr_loss=0.3226, attn_decoder_loss=0.2244, over 29302.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1294, cr_loss=0.3713, attn_decoder_loss=0.2467, over 5788848.91 frames. ], batch size: 67, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:22:13,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=423300.0, ans=0.125 2024-09-18 10:22:33,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=423340.0, ans=0.0 2024-09-18 10:22:39,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=423380.0, ans=0.125 2024-09-18 10:22:40,206 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.547e+01 8.974e+01 9.351e+01 1.739e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-18 10:22:57,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423420.0, ans=0.1 2024-09-18 10:23:13,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-09-18 10:23:19,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=423460.0, ans=0.0 2024-09-18 10:23:25,962 INFO [train.py:1198] (0/2) Epoch 24, batch 1800, loss[loss=0.2484, ctc_loss=0.1286, cr_loss=0.3747, attn_decoder_loss=0.2534, over 29682.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1295, cr_loss=0.3716, attn_decoder_loss=0.2468, over 5792001.08 frames. ], batch size: 83, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:24:20,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-18 10:24:43,910 INFO [train.py:1198] (0/2) Epoch 24, batch 1850, loss[loss=0.2576, ctc_loss=0.1377, cr_loss=0.3617, attn_decoder_loss=0.2628, over 29631.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1294, cr_loss=0.3713, attn_decoder_loss=0.2467, over 5796273.76 frames. ], batch size: 86, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:25:15,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.229e+01 8.631e+01 9.392e+01 8.263e+02, threshold=1.726e+02, percent-clipped=1.0 2024-09-18 10:25:17,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=423780.0, ans=0.125 2024-09-18 10:25:23,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=423780.0, ans=0.0 2024-09-18 10:25:24,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=423780.0, ans=0.125 2024-09-18 10:25:38,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=15.0 2024-09-18 10:25:56,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=423860.0, ans=0.0 2024-09-18 10:25:59,342 INFO [train.py:1198] (0/2) Epoch 24, batch 1900, loss[loss=0.2521, ctc_loss=0.1304, cr_loss=0.3786, attn_decoder_loss=0.2572, over 29716.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1298, cr_loss=0.3725, attn_decoder_loss=0.2474, over 5803860.40 frames. ], batch size: 89, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:26:24,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=423940.0, ans=0.125 2024-09-18 10:26:36,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=423980.0, ans=0.025 2024-09-18 10:26:52,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten.whitening_limit, batch_count=424020.0, ans=15.0 2024-09-18 10:26:56,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=424020.0, ans=0.05 2024-09-18 10:27:11,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=424060.0, ans=0.07 2024-09-18 10:27:15,260 INFO [train.py:1198] (0/2) Epoch 24, batch 1950, loss[loss=0.2406, ctc_loss=0.1266, cr_loss=0.3796, attn_decoder_loss=0.2448, over 29448.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1306, cr_loss=0.3744, attn_decoder_loss=0.2486, over 5818807.57 frames. ], batch size: 78, lr: 4.64e-03, grad_scale: 8.0 2024-09-18 10:27:19,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=424100.0, ans=0.2 2024-09-18 10:27:27,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-09-18 10:27:34,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=424140.0, ans=0.125 2024-09-18 10:27:34,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=424140.0, ans=0.125 2024-09-18 10:27:38,948 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:27:42,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-18 10:27:49,222 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.703e+01 9.158e+01 9.577e+01 1.650e+02, threshold=1.832e+02, percent-clipped=0.0 2024-09-18 10:27:58,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-09-18 10:28:04,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.77 vs. limit=12.0 2024-09-18 10:28:21,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=424260.0, ans=0.05 2024-09-18 10:28:21,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=424260.0, ans=0.125 2024-09-18 10:28:34,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-09-18 10:28:35,225 INFO [train.py:1198] (0/2) Epoch 24, batch 2000, loss[loss=0.2216, ctc_loss=0.1126, cr_loss=0.3554, attn_decoder_loss=0.2258, over 29364.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1309, cr_loss=0.3749, attn_decoder_loss=0.2489, over 5795650.51 frames. ], batch size: 67, lr: 4.64e-03, grad_scale: 16.0 2024-09-18 10:28:43,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-18 10:29:17,260 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-09-18 10:29:22,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=424420.0, ans=0.0 2024-09-18 10:29:51,157 INFO [train.py:1198] (0/2) Epoch 24, batch 2050, loss[loss=0.2107, ctc_loss=0.1024, cr_loss=0.3271, attn_decoder_loss=0.2154, over 29439.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1301, cr_loss=0.3733, attn_decoder_loss=0.2477, over 5788134.05 frames. ], batch size: 70, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:29:54,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=424500.0, ans=0.04949747468305833 2024-09-18 10:29:55,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.63 vs. limit=12.0 2024-09-18 10:30:08,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424540.0, ans=0.1 2024-09-18 10:30:24,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.469e+01 9.021e+01 9.794e+01 2.013e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 10:30:29,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=424580.0, ans=0.0 2024-09-18 10:30:50,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424660.0, ans=0.1 2024-09-18 10:31:01,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=424660.0, ans=0.125 2024-09-18 10:31:08,887 INFO [train.py:1198] (0/2) Epoch 24, batch 2100, loss[loss=0.2504, ctc_loss=0.1315, cr_loss=0.3965, attn_decoder_loss=0.2548, over 29761.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1296, cr_loss=0.373, attn_decoder_loss=0.2472, over 5800045.68 frames. ], batch size: 81, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:31:25,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424740.0, ans=0.1 2024-09-18 10:31:27,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424740.0, ans=0.1 2024-09-18 10:31:40,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=424780.0, ans=0.0 2024-09-18 10:31:50,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=424780.0, ans=0.0 2024-09-18 10:31:53,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=424820.0, ans=0.125 2024-09-18 10:32:05,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=424820.0, ans=0.125 2024-09-18 10:32:16,619 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:32:19,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424860.0, ans=0.1 2024-09-18 10:32:22,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=424860.0, ans=0.0 2024-09-18 10:32:25,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=424900.0, ans=0.2 2024-09-18 10:32:26,893 INFO [train.py:1198] (0/2) Epoch 24, batch 2150, loss[loss=0.2406, ctc_loss=0.1292, cr_loss=0.391, attn_decoder_loss=0.2443, over 29462.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1289, cr_loss=0.3721, attn_decoder_loss=0.2466, over 5815500.55 frames. ], batch size: 78, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:32:29,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-18 10:32:57,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=424980.0, ans=0.2 2024-09-18 10:33:00,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.375e+01 8.762e+01 9.510e+01 1.706e+02, threshold=1.752e+02, percent-clipped=0.0 2024-09-18 10:33:05,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424980.0, ans=0.1 2024-09-18 10:33:22,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2024-09-18 10:33:27,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=425060.0, ans=0.0 2024-09-18 10:33:42,641 INFO [train.py:1198] (0/2) Epoch 24, batch 2200, loss[loss=0.2507, ctc_loss=0.1308, cr_loss=0.3765, attn_decoder_loss=0.2556, over 29627.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.129, cr_loss=0.372, attn_decoder_loss=0.2467, over 5811841.05 frames. ], batch size: 86, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:33:46,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=425100.0, ans=0.05 2024-09-18 10:33:57,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=10.24 vs. limit=12.0 2024-09-18 10:34:16,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=425180.0, ans=0.0 2024-09-18 10:34:17,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=425180.0, ans=0.125 2024-09-18 10:34:19,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425180.0, ans=0.125 2024-09-18 10:34:24,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=425180.0, ans=0.025 2024-09-18 10:34:52,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=425260.0, ans=0.125 2024-09-18 10:34:58,654 INFO [train.py:1198] (0/2) Epoch 24, batch 2250, loss[loss=0.2397, ctc_loss=0.1171, cr_loss=0.3319, attn_decoder_loss=0.246, over 29725.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1283, cr_loss=0.3709, attn_decoder_loss=0.2463, over 5811897.67 frames. ], batch size: 82, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:35:30,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.51 vs. limit=5.0 2024-09-18 10:35:33,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.517e+01 9.003e+01 9.651e+01 2.176e+02, threshold=1.801e+02, percent-clipped=2.0 2024-09-18 10:35:43,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=425380.0, ans=0.125 2024-09-18 10:35:46,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=425420.0, ans=0.125 2024-09-18 10:35:52,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.69 vs. limit=10.0 2024-09-18 10:36:10,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=425460.0, ans=0.125 2024-09-18 10:36:12,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=425460.0, ans=0.2 2024-09-18 10:36:18,241 INFO [train.py:1198] (0/2) Epoch 24, batch 2300, loss[loss=0.2155, ctc_loss=0.111, cr_loss=0.3466, attn_decoder_loss=0.2194, over 29344.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1277, cr_loss=0.3696, attn_decoder_loss=0.2452, over 5797584.95 frames. ], batch size: 71, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:36:53,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=425580.0, ans=0.125 2024-09-18 10:36:55,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=425580.0, ans=0.125 2024-09-18 10:37:00,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=425580.0, ans=0.125 2024-09-18 10:37:34,641 INFO [train.py:1198] (0/2) Epoch 24, batch 2350, loss[loss=0.2527, ctc_loss=0.1366, cr_loss=0.388, attn_decoder_loss=0.257, over 29682.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1282, cr_loss=0.3706, attn_decoder_loss=0.2455, over 5803412.52 frames. ], batch size: 83, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:37:51,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=425740.0, ans=0.125 2024-09-18 10:38:01,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=425740.0, ans=0.125 2024-09-18 10:38:07,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.476e+01 9.011e+01 9.684e+01 2.166e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 10:38:20,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=425820.0, ans=0.0 2024-09-18 10:38:29,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=425820.0, ans=0.0 2024-09-18 10:38:41,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=425860.0, ans=0.125 2024-09-18 10:38:50,527 INFO [train.py:1198] (0/2) Epoch 24, batch 2400, loss[loss=0.2341, ctc_loss=0.1169, cr_loss=0.3593, attn_decoder_loss=0.2391, over 29555.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1288, cr_loss=0.3714, attn_decoder_loss=0.2461, over 5807652.88 frames. ], batch size: 76, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 10:39:38,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426020.0, ans=0.1 2024-09-18 10:39:52,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=426020.0, ans=0.125 2024-09-18 10:39:58,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426060.0, ans=0.1 2024-09-18 10:40:02,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-09-18 10:40:10,574 INFO [train.py:1198] (0/2) Epoch 24, batch 2450, loss[loss=0.239, ctc_loss=0.1186, cr_loss=0.3404, attn_decoder_loss=0.2448, over 29713.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1298, cr_loss=0.3734, attn_decoder_loss=0.2472, over 5785688.83 frames. ], batch size: 82, lr: 4.63e-03, grad_scale: 8.0 2024-09-18 10:40:11,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-18 10:40:14,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=15.0 2024-09-18 10:40:28,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=426140.0, ans=0.1 2024-09-18 10:40:28,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=426140.0, ans=0.5 2024-09-18 10:40:30,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426140.0, ans=0.1 2024-09-18 10:40:38,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=426140.0, ans=0.125 2024-09-18 10:40:45,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.992e+01 9.865e+01 1.103e+02 3.120e+02, threshold=1.973e+02, percent-clipped=1.0 2024-09-18 10:40:50,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426180.0, ans=0.1 2024-09-18 10:40:56,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=426220.0, ans=0.1 2024-09-18 10:41:26,495 INFO [train.py:1198] (0/2) Epoch 24, batch 2500, loss[loss=0.2507, ctc_loss=0.1403, cr_loss=0.394, attn_decoder_loss=0.2542, over 29622.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1303, cr_loss=0.375, attn_decoder_loss=0.2476, over 5795107.56 frames. ], batch size: 86, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:41:29,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.62 vs. limit=22.5 2024-09-18 10:41:40,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=426340.0, ans=0.0 2024-09-18 10:41:46,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=426340.0, ans=0.125 2024-09-18 10:41:47,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-09-18 10:42:35,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-18 10:42:41,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=426500.0, ans=0.09899494936611666 2024-09-18 10:42:45,179 INFO [train.py:1198] (0/2) Epoch 24, batch 2550, loss[loss=0.2164, ctc_loss=0.1073, cr_loss=0.3341, attn_decoder_loss=0.2211, over 29317.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1305, cr_loss=0.3753, attn_decoder_loss=0.2474, over 5797145.47 frames. ], batch size: 67, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:42:48,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=426500.0, ans=0.125 2024-09-18 10:43:00,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=426540.0, ans=0.125 2024-09-18 10:43:19,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.669e+01 9.245e+01 9.655e+01 1.436e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-18 10:43:20,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=426580.0, ans=0.0 2024-09-18 10:43:32,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=426620.0, ans=0.125 2024-09-18 10:43:36,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=426620.0, ans=0.5 2024-09-18 10:44:01,779 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:44:03,056 INFO [train.py:1198] (0/2) Epoch 24, batch 2600, loss[loss=0.2464, ctc_loss=0.1288, cr_loss=0.3539, attn_decoder_loss=0.2516, over 29426.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1304, cr_loss=0.375, attn_decoder_loss=0.2475, over 5793207.93 frames. ], batch size: 78, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:44:09,370 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:44:13,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=426700.0, ans=0.0 2024-09-18 10:44:15,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=426700.0, ans=0.125 2024-09-18 10:44:21,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=426740.0, ans=0.0 2024-09-18 10:44:24,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=426740.0, ans=0.035 2024-09-18 10:44:39,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=426780.0, ans=0.0 2024-09-18 10:44:59,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=426820.0, ans=0.0 2024-09-18 10:45:05,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=426860.0, ans=0.125 2024-09-18 10:45:08,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=426860.0, ans=0.125 2024-09-18 10:45:11,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=426860.0, ans=0.125 2024-09-18 10:45:18,307 INFO [train.py:1198] (0/2) Epoch 24, batch 2650, loss[loss=0.2706, ctc_loss=0.1545, cr_loss=0.4093, attn_decoder_loss=0.2744, over 29221.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1303, cr_loss=0.3749, attn_decoder_loss=0.2477, over 5799137.57 frames. ], batch size: 100, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:45:29,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.94 vs. limit=15.0 2024-09-18 10:45:32,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-09-18 10:45:49,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=426980.0, ans=0.0 2024-09-18 10:45:53,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 8.422e+01 8.884e+01 9.489e+01 2.051e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-18 10:46:24,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=427060.0, ans=0.0 2024-09-18 10:46:35,889 INFO [train.py:1198] (0/2) Epoch 24, batch 2700, loss[loss=0.2431, ctc_loss=0.1179, cr_loss=0.3313, attn_decoder_loss=0.2496, over 29515.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1308, cr_loss=0.3752, attn_decoder_loss=0.2483, over 5795158.44 frames. ], batch size: 87, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:46:52,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=427140.0, ans=0.2 2024-09-18 10:47:03,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=427140.0, ans=0.125 2024-09-18 10:47:29,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=427220.0, ans=0.0 2024-09-18 10:47:35,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=427260.0, ans=0.0 2024-09-18 10:47:38,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=427260.0, ans=0.0 2024-09-18 10:47:43,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=427260.0, ans=0.0 2024-09-18 10:47:54,353 INFO [train.py:1198] (0/2) Epoch 24, batch 2750, loss[loss=0.2427, ctc_loss=0.1337, cr_loss=0.3823, attn_decoder_loss=0.2463, over 29522.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1301, cr_loss=0.373, attn_decoder_loss=0.2471, over 5793811.76 frames. ], batch size: 75, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:48:02,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-09-18 10:48:03,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=427300.0, ans=0.125 2024-09-18 10:48:28,938 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.612e+01 9.140e+01 9.786e+01 3.109e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 10:48:33,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=427380.0, ans=0.0 2024-09-18 10:48:35,242 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:48:38,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427420.0, ans=0.125 2024-09-18 10:48:48,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-09-18 10:49:02,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=427460.0, ans=0.95 2024-09-18 10:49:07,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=427460.0, ans=0.125 2024-09-18 10:49:10,160 INFO [train.py:1198] (0/2) Epoch 24, batch 2800, loss[loss=0.2697, ctc_loss=0.161, cr_loss=0.4071, attn_decoder_loss=0.2727, over 20269.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1308, cr_loss=0.3742, attn_decoder_loss=0.2475, over 5773574.38 frames. ], batch size: 210, lr: 4.62e-03, grad_scale: 16.0 2024-09-18 10:49:13,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=427500.0, ans=0.125 2024-09-18 10:49:42,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=427580.0, ans=0.125 2024-09-18 10:49:52,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427580.0, ans=0.1 2024-09-18 10:49:55,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=427620.0, ans=0.125 2024-09-18 10:50:06,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=427620.0, ans=0.0 2024-09-18 10:50:17,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=427660.0, ans=0.125 2024-09-18 10:50:27,678 INFO [train.py:1198] (0/2) Epoch 24, batch 2850, loss[loss=0.2362, ctc_loss=0.1232, cr_loss=0.3846, attn_decoder_loss=0.2402, over 29486.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1309, cr_loss=0.3742, attn_decoder_loss=0.2477, over 5760524.14 frames. ], batch size: 77, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:50:57,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-09-18 10:51:00,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-09-18 10:51:04,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.759e+01 9.407e+01 9.943e+01 3.710e+02, threshold=1.881e+02, percent-clipped=1.0 2024-09-18 10:51:04,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 10:51:10,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=427780.0, ans=0.025 2024-09-18 10:51:13,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=427820.0, ans=0.0 2024-09-18 10:51:40,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-18 10:51:42,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=427860.0, ans=0.0 2024-09-18 10:51:45,750 INFO [train.py:1198] (0/2) Epoch 24, batch 2900, loss[loss=0.2386, ctc_loss=0.121, cr_loss=0.3622, attn_decoder_loss=0.2437, over 29426.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1312, cr_loss=0.3754, attn_decoder_loss=0.2487, over 5786616.71 frames. ], batch size: 79, lr: 4.62e-03, grad_scale: 8.0 2024-09-18 10:51:53,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=427900.0, ans=0.125 2024-09-18 10:52:24,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=427980.0, ans=0.0 2024-09-18 10:52:39,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=428020.0, ans=0.0 2024-09-18 10:52:45,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=428060.0, ans=0.125 2024-09-18 10:53:02,261 INFO [train.py:1198] (0/2) Epoch 24, batch 2950, loss[loss=0.2355, ctc_loss=0.1286, cr_loss=0.3676, attn_decoder_loss=0.2392, over 29522.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.13, cr_loss=0.373, attn_decoder_loss=0.2472, over 5782077.49 frames. ], batch size: 75, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:53:07,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=428100.0, ans=0.2 2024-09-18 10:53:38,589 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.398e+01 8.942e+01 9.654e+01 3.446e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 10:53:42,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-09-18 10:54:09,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=428260.0, ans=0.0 2024-09-18 10:54:19,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=428300.0, ans=0.09899494936611666 2024-09-18 10:54:20,525 INFO [train.py:1198] (0/2) Epoch 24, batch 3000, loss[loss=0.2432, ctc_loss=0.1258, cr_loss=0.362, attn_decoder_loss=0.2482, over 29770.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1301, cr_loss=0.3733, attn_decoder_loss=0.2475, over 5782659.19 frames. ], batch size: 81, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:54:20,526 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 10:54:39,002 INFO [train.py:1230] (0/2) Epoch 24, validation: loss=0.2118, ctc_loss=0.03891, cr_loss=5.525e-15, attn_decoder_loss=0.231, over 944034.00 frames. 2024-09-18 10:54:39,002 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 10:54:40,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=428300.0, ans=0.125 2024-09-18 10:54:50,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=428300.0, ans=0.125 2024-09-18 10:54:54,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-18 10:55:11,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=428380.0, ans=0.0 2024-09-18 10:55:20,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=428380.0, ans=0.125 2024-09-18 10:55:24,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=428420.0, ans=0.125 2024-09-18 10:55:33,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=428420.0, ans=0.125 2024-09-18 10:55:51,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=428460.0, ans=0.2 2024-09-18 10:55:56,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=15.0 2024-09-18 10:55:57,318 INFO [train.py:1198] (0/2) Epoch 24, batch 3050, loss[loss=0.2443, ctc_loss=0.1301, cr_loss=0.3718, attn_decoder_loss=0.2488, over 29539.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1307, cr_loss=0.3744, attn_decoder_loss=0.2483, over 5775731.11 frames. ], batch size: 76, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:56:08,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=428500.0, ans=0.5 2024-09-18 10:56:21,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=428540.0, ans=0.0 2024-09-18 10:56:23,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=428540.0, ans=0.1 2024-09-18 10:56:33,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.657e+01 9.220e+01 9.690e+01 1.587e+02, threshold=1.844e+02, percent-clipped=0.0 2024-09-18 10:56:40,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=428580.0, ans=0.125 2024-09-18 10:56:49,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=428620.0, ans=0.2 2024-09-18 10:57:13,018 INFO [train.py:1198] (0/2) Epoch 24, batch 3100, loss[loss=0.2591, ctc_loss=0.1414, cr_loss=0.4077, attn_decoder_loss=0.2631, over 29236.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1306, cr_loss=0.3746, attn_decoder_loss=0.248, over 5776390.23 frames. ], batch size: 100, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:57:16,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=428700.0, ans=0.125 2024-09-18 10:57:40,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428740.0, ans=0.1 2024-09-18 10:58:04,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=428820.0, ans=0.0 2024-09-18 10:58:31,280 INFO [train.py:1198] (0/2) Epoch 24, batch 3150, loss[loss=0.2577, ctc_loss=0.1392, cr_loss=0.396, attn_decoder_loss=0.262, over 28925.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1306, cr_loss=0.3742, attn_decoder_loss=0.2482, over 5782684.56 frames. ], batch size: 104, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 10:58:34,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428900.0, ans=0.1 2024-09-18 10:58:50,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.21 vs. limit=15.0 2024-09-18 10:58:56,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.80 vs. limit=10.0 2024-09-18 10:59:07,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.611e+01 9.043e+01 9.612e+01 2.237e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-18 10:59:28,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=429020.0, ans=0.125 2024-09-18 10:59:49,143 INFO [train.py:1198] (0/2) Epoch 24, batch 3200, loss[loss=0.2332, ctc_loss=0.1217, cr_loss=0.3486, attn_decoder_loss=0.2378, over 29758.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1301, cr_loss=0.3731, attn_decoder_loss=0.2476, over 5793189.46 frames. ], batch size: 80, lr: 4.61e-03, grad_scale: 16.0 2024-09-18 10:59:53,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=429100.0, ans=0.0 2024-09-18 10:59:56,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=429100.0, ans=0.07 2024-09-18 10:59:59,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=429100.0, ans=0.125 2024-09-18 11:00:01,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=12.0 2024-09-18 11:00:19,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=429180.0, ans=0.5 2024-09-18 11:00:35,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=429220.0, ans=0.0 2024-09-18 11:00:36,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=429220.0, ans=0.125 2024-09-18 11:00:48,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=429260.0, ans=0.05 2024-09-18 11:01:00,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429260.0, ans=0.1 2024-09-18 11:01:04,861 INFO [train.py:1198] (0/2) Epoch 24, batch 3250, loss[loss=0.2392, ctc_loss=0.1206, cr_loss=0.3554, attn_decoder_loss=0.2445, over 29728.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1302, cr_loss=0.3737, attn_decoder_loss=0.2477, over 5800023.58 frames. ], batch size: 84, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:01:06,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=429300.0, ans=0.025 2024-09-18 11:01:06,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429300.0, ans=0.1 2024-09-18 11:01:42,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.528e+01 8.996e+01 9.575e+01 1.279e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-18 11:02:12,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=429460.0, ans=0.2 2024-09-18 11:02:22,378 INFO [train.py:1198] (0/2) Epoch 24, batch 3300, loss[loss=0.2558, ctc_loss=0.1388, cr_loss=0.3839, attn_decoder_loss=0.2603, over 28349.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1297, cr_loss=0.3722, attn_decoder_loss=0.2466, over 5797962.65 frames. ], batch size: 112, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:02:25,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.06 vs. limit=15.0 2024-09-18 11:02:45,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=429540.0, ans=0.125 2024-09-18 11:02:53,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2024-09-18 11:02:54,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=429580.0, ans=0.05 2024-09-18 11:02:56,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=429580.0, ans=0.0 2024-09-18 11:02:57,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=429580.0, ans=0.125 2024-09-18 11:03:24,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=429660.0, ans=0.025 2024-09-18 11:03:34,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=429660.0, ans=0.2 2024-09-18 11:03:40,490 INFO [train.py:1198] (0/2) Epoch 24, batch 3350, loss[loss=0.2495, ctc_loss=0.1285, cr_loss=0.3667, attn_decoder_loss=0.2548, over 28837.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1302, cr_loss=0.373, attn_decoder_loss=0.2473, over 5774319.05 frames. ], batch size: 104, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:03:53,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=429700.0, ans=0.025 2024-09-18 11:03:57,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=429740.0, ans=0.125 2024-09-18 11:04:03,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429740.0, ans=0.1 2024-09-18 11:04:10,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=429780.0, ans=0.0 2024-09-18 11:04:10,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-18 11:04:16,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=429780.0, ans=0.0 2024-09-18 11:04:16,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=429780.0, ans=0.125 2024-09-18 11:04:16,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=15.0 2024-09-18 11:04:18,811 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.492e+01 9.207e+01 9.979e+01 1.773e+02, threshold=1.841e+02, percent-clipped=0.0 2024-09-18 11:04:29,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=429820.0, ans=0.07 2024-09-18 11:04:30,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-09-18 11:04:35,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=429820.0, ans=0.125 2024-09-18 11:04:51,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2024-09-18 11:04:55,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429900.0, ans=0.125 2024-09-18 11:04:56,631 INFO [train.py:1198] (0/2) Epoch 24, batch 3400, loss[loss=0.2166, ctc_loss=0.1146, cr_loss=0.3412, attn_decoder_loss=0.2204, over 29313.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1303, cr_loss=0.3735, attn_decoder_loss=0.2475, over 5766875.38 frames. ], batch size: 67, lr: 4.61e-03, grad_scale: 8.0 2024-09-18 11:04:56,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=429900.0, ans=0.0 2024-09-18 11:05:02,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=429900.0, ans=0.1 2024-09-18 11:05:27,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=429980.0, ans=0.0 2024-09-18 11:05:40,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=429980.0, ans=0.07 2024-09-18 11:05:40,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=429980.0, ans=0.125 2024-09-18 11:05:41,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=429980.0, ans=0.5 2024-09-18 11:05:44,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=430020.0, ans=0.125 2024-09-18 11:05:46,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=430020.0, ans=0.2 2024-09-18 11:05:50,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=430020.0, ans=0.125 2024-09-18 11:05:52,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.63 vs. limit=15.0 2024-09-18 11:05:58,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430060.0, ans=0.1 2024-09-18 11:06:14,382 INFO [train.py:1198] (0/2) Epoch 24, batch 3450, loss[loss=0.2568, ctc_loss=0.1348, cr_loss=0.3872, attn_decoder_loss=0.2617, over 28221.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1306, cr_loss=0.3746, attn_decoder_loss=0.2482, over 5775274.65 frames. ], batch size: 111, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:06:35,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430140.0, ans=0.1 2024-09-18 11:06:47,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=430180.0, ans=0.2 2024-09-18 11:06:51,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.449e+01 8.954e+01 9.468e+01 1.386e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 11:07:13,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=430220.0, ans=0.025 2024-09-18 11:07:19,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=430260.0, ans=0.0 2024-09-18 11:07:32,348 INFO [train.py:1198] (0/2) Epoch 24, batch 3500, loss[loss=0.2168, ctc_loss=0.1139, cr_loss=0.3357, attn_decoder_loss=0.2207, over 29329.00 frames. ], tot_loss[loss=0.2436, ctc_loss=0.1307, cr_loss=0.3748, attn_decoder_loss=0.2478, over 5777206.17 frames. ], batch size: 71, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:07:46,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430340.0, ans=0.1 2024-09-18 11:08:20,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=430420.0, ans=0.125 2024-09-18 11:08:28,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2024-09-18 11:08:31,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=430460.0, ans=0.1 2024-09-18 11:08:47,394 INFO [train.py:1198] (0/2) Epoch 24, batch 3550, loss[loss=0.2562, ctc_loss=0.1375, cr_loss=0.3997, attn_decoder_loss=0.2605, over 29725.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1302, cr_loss=0.3739, attn_decoder_loss=0.2474, over 5783532.74 frames. ], batch size: 89, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:09:01,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430540.0, ans=0.1 2024-09-18 11:09:09,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=430540.0, ans=0.0 2024-09-18 11:09:24,137 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.484e+01 9.073e+01 9.801e+01 1.561e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 11:09:24,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=430580.0, ans=0.0 2024-09-18 11:09:48,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=430660.0, ans=0.0 2024-09-18 11:10:00,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-09-18 11:10:01,186 INFO [train.py:1198] (0/2) Epoch 24, batch 3600, loss[loss=0.2404, ctc_loss=0.1307, cr_loss=0.3777, attn_decoder_loss=0.2442, over 29504.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1296, cr_loss=0.3732, attn_decoder_loss=0.2471, over 5792393.72 frames. ], batch size: 77, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 11:10:07,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=430700.0, ans=0.0 2024-09-18 11:10:08,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-09-18 11:10:20,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430740.0, ans=0.1 2024-09-18 11:10:25,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=430740.0, ans=0.2 2024-09-18 11:10:28,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=430740.0, ans=0.0 2024-09-18 11:10:33,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=430780.0, ans=0.125 2024-09-18 11:10:39,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=430780.0, ans=0.2 2024-09-18 11:10:43,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=430780.0, ans=0.125 2024-09-18 11:10:52,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=430820.0, ans=0.125 2024-09-18 11:11:16,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430900.0, ans=0.1 2024-09-18 11:11:17,346 INFO [train.py:1198] (0/2) Epoch 24, batch 3650, loss[loss=0.2535, ctc_loss=0.1461, cr_loss=0.3955, attn_decoder_loss=0.2566, over 29507.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.129, cr_loss=0.3716, attn_decoder_loss=0.2464, over 5793870.69 frames. ], batch size: 90, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 11:11:29,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=430900.0, ans=0.0 2024-09-18 11:11:44,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=430940.0, ans=0.0 2024-09-18 11:11:56,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.409e+01 9.046e+01 9.842e+01 1.750e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 11:12:03,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=431020.0, ans=0.07 2024-09-18 11:12:15,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=431060.0, ans=0.0 2024-09-18 11:12:24,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=431060.0, ans=0.125 2024-09-18 11:12:32,073 INFO [train.py:1198] (0/2) Epoch 24, batch 3700, loss[loss=0.2445, ctc_loss=0.1311, cr_loss=0.3836, attn_decoder_loss=0.2486, over 29697.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1287, cr_loss=0.3711, attn_decoder_loss=0.2464, over 5804203.92 frames. ], batch size: 84, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:12:53,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.76 vs. limit=15.0 2024-09-18 11:13:20,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=431220.0, ans=0.0 2024-09-18 11:13:48,722 INFO [train.py:1198] (0/2) Epoch 24, batch 3750, loss[loss=0.217, ctc_loss=0.1101, cr_loss=0.3528, attn_decoder_loss=0.221, over 29362.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.129, cr_loss=0.3714, attn_decoder_loss=0.2464, over 5807245.38 frames. ], batch size: 67, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:13:51,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=431300.0, ans=0.125 2024-09-18 11:13:53,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=431300.0, ans=0.0 2024-09-18 11:13:55,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.68 vs. limit=22.5 2024-09-18 11:13:55,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-18 11:13:57,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.04 vs. limit=15.0 2024-09-18 11:14:12,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=431340.0, ans=0.125 2024-09-18 11:14:20,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=431380.0, ans=0.125 2024-09-18 11:14:21,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=431380.0, ans=0.125 2024-09-18 11:14:27,338 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.348e+01 8.770e+01 9.473e+01 2.105e+02, threshold=1.754e+02, percent-clipped=1.0 2024-09-18 11:14:30,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=431380.0, ans=0.025 2024-09-18 11:15:03,203 INFO [train.py:1198] (0/2) Epoch 24, batch 3800, loss[loss=0.2574, ctc_loss=0.1386, cr_loss=0.4126, attn_decoder_loss=0.2615, over 29617.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1287, cr_loss=0.3714, attn_decoder_loss=0.2462, over 5798321.18 frames. ], batch size: 86, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:15:21,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431540.0, ans=0.125 2024-09-18 11:15:25,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=431540.0, ans=0.015 2024-09-18 11:15:32,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=22.5 2024-09-18 11:15:33,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=431580.0, ans=0.1 2024-09-18 11:16:14,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=431660.0, ans=0.2 2024-09-18 11:16:19,235 INFO [train.py:1198] (0/2) Epoch 24, batch 3850, loss[loss=0.262, ctc_loss=0.1457, cr_loss=0.4267, attn_decoder_loss=0.2654, over 29221.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1284, cr_loss=0.3709, attn_decoder_loss=0.246, over 5811907.24 frames. ], batch size: 100, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 11:16:37,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=431740.0, ans=0.0 2024-09-18 11:16:46,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=431740.0, ans=0.125 2024-09-18 11:16:49,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=431780.0, ans=0.5 2024-09-18 11:16:52,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=431780.0, ans=0.0 2024-09-18 11:16:52,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=431780.0, ans=0.0 2024-09-18 11:16:57,861 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.427e+01 9.024e+01 9.626e+01 1.408e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-18 11:17:01,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=431780.0, ans=0.0 2024-09-18 11:17:21,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-18 11:17:33,516 INFO [train.py:1198] (0/2) Epoch 24, batch 3900, loss[loss=0.2556, ctc_loss=0.1319, cr_loss=0.3758, attn_decoder_loss=0.2609, over 29619.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1289, cr_loss=0.3718, attn_decoder_loss=0.2468, over 5815656.59 frames. ], batch size: 86, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:17:41,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431900.0, ans=0.1 2024-09-18 11:17:50,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=431940.0, ans=0.2 2024-09-18 11:17:57,421 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:18:00,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431940.0, ans=0.1 2024-09-18 11:18:03,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=431980.0, ans=0.2 2024-09-18 11:18:09,529 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-108000.pt 2024-09-18 11:18:19,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431980.0, ans=0.1 2024-09-18 11:18:20,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.34 vs. limit=10.0 2024-09-18 11:18:42,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.34 vs. limit=15.0 2024-09-18 11:18:52,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=432060.0, ans=0.2 2024-09-18 11:18:55,351 INFO [train.py:1198] (0/2) Epoch 24, batch 3950, loss[loss=0.2556, ctc_loss=0.146, cr_loss=0.4153, attn_decoder_loss=0.2585, over 29548.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1285, cr_loss=0.3715, attn_decoder_loss=0.2466, over 5835301.94 frames. ], batch size: 97, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:19:20,693 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:19:28,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=432180.0, ans=0.0 2024-09-18 11:19:35,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.348e+01 8.902e+01 9.353e+01 3.258e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-18 11:19:40,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.43 vs. limit=15.0 2024-09-18 11:19:42,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=432220.0, ans=0.2 2024-09-18 11:20:08,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=432260.0, ans=0.125 2024-09-18 11:20:10,533 INFO [train.py:1198] (0/2) Epoch 24, batch 4000, loss[loss=0.232, ctc_loss=0.1221, cr_loss=0.3545, attn_decoder_loss=0.2364, over 29509.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.129, cr_loss=0.3718, attn_decoder_loss=0.2467, over 5813476.72 frames. ], batch size: 74, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 11:20:17,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2024-09-18 11:20:22,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-09-18 11:20:29,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=432340.0, ans=0.09899494936611666 2024-09-18 11:20:38,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=432380.0, ans=0.0 2024-09-18 11:20:52,249 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:20:59,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=432420.0, ans=0.2 2024-09-18 11:21:21,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=432460.0, ans=0.125 2024-09-18 11:21:25,719 INFO [train.py:1198] (0/2) Epoch 24, batch 4050, loss[loss=0.2571, ctc_loss=0.1487, cr_loss=0.3752, attn_decoder_loss=0.2608, over 20136.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1286, cr_loss=0.3709, attn_decoder_loss=0.2463, over 5797914.52 frames. ], batch size: 210, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:21:30,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-09-18 11:21:43,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=432540.0, ans=0.125 2024-09-18 11:21:45,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-18 11:21:49,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432540.0, ans=0.125 2024-09-18 11:21:52,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=432540.0, ans=0.125 2024-09-18 11:22:05,482 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.379e+01 9.029e+01 9.565e+01 1.787e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-18 11:22:10,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=432620.0, ans=0.09899494936611666 2024-09-18 11:22:22,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432620.0, ans=0.1 2024-09-18 11:22:24,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-09-18 11:22:30,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=432660.0, ans=0.0 2024-09-18 11:22:39,423 INFO [train.py:1198] (0/2) Epoch 24, batch 4100, loss[loss=0.267, ctc_loss=0.1535, cr_loss=0.4323, attn_decoder_loss=0.27, over 29483.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.129, cr_loss=0.3719, attn_decoder_loss=0.2468, over 5792472.61 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:22:47,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=432700.0, ans=0.07 2024-09-18 11:23:07,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=432780.0, ans=0.0 2024-09-18 11:23:15,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=432780.0, ans=0.125 2024-09-18 11:23:18,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=432780.0, ans=0.125 2024-09-18 11:23:38,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=432860.0, ans=0.125 2024-09-18 11:23:49,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=432860.0, ans=0.125 2024-09-18 11:23:54,706 INFO [train.py:1198] (0/2) Epoch 24, batch 4150, loss[loss=0.2352, ctc_loss=0.1294, cr_loss=0.3681, attn_decoder_loss=0.2387, over 29488.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1296, cr_loss=0.3724, attn_decoder_loss=0.247, over 5798126.50 frames. ], batch size: 77, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:24:06,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=432900.0, ans=0.125 2024-09-18 11:24:14,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=432940.0, ans=0.0 2024-09-18 11:24:23,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.43 vs. limit=10.0 2024-09-18 11:24:34,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.706e+01 9.310e+01 1.000e+02 1.548e+02, threshold=1.862e+02, percent-clipped=0.0 2024-09-18 11:25:00,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.21 vs. limit=22.5 2024-09-18 11:25:04,426 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:25:04,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433060.0, ans=0.125 2024-09-18 11:25:08,561 INFO [train.py:1198] (0/2) Epoch 24, batch 4200, loss[loss=0.2667, ctc_loss=0.1462, cr_loss=0.4112, attn_decoder_loss=0.2709, over 29507.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1298, cr_loss=0.373, attn_decoder_loss=0.2472, over 5801208.39 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:25:42,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433180.0, ans=0.1 2024-09-18 11:25:48,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433180.0, ans=0.0 2024-09-18 11:25:52,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=433220.0, ans=0.125 2024-09-18 11:26:12,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=433260.0, ans=0.2 2024-09-18 11:26:17,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=433260.0, ans=0.0 2024-09-18 11:26:21,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-18 11:26:23,505 INFO [train.py:1198] (0/2) Epoch 24, batch 4250, loss[loss=0.2259, ctc_loss=0.1151, cr_loss=0.3489, attn_decoder_loss=0.2305, over 29511.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1295, cr_loss=0.3728, attn_decoder_loss=0.2474, over 5806496.94 frames. ], batch size: 74, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:26:35,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=433300.0, ans=0.0 2024-09-18 11:26:57,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=433380.0, ans=0.125 2024-09-18 11:27:03,541 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.615e+01 9.136e+01 9.637e+01 1.647e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-18 11:27:05,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=433380.0, ans=0.0 2024-09-18 11:27:27,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=433460.0, ans=0.125 2024-09-18 11:27:38,619 INFO [train.py:1198] (0/2) Epoch 24, batch 4300, loss[loss=0.2514, ctc_loss=0.1339, cr_loss=0.386, attn_decoder_loss=0.2559, over 29518.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1292, cr_loss=0.3719, attn_decoder_loss=0.2474, over 5795361.45 frames. ], batch size: 87, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:27:41,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=433500.0, ans=0.125 2024-09-18 11:27:48,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-09-18 11:27:55,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=433540.0, ans=0.2 2024-09-18 11:28:04,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433540.0, ans=0.0 2024-09-18 11:28:19,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=433580.0, ans=0.0 2024-09-18 11:28:19,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=433580.0, ans=0.125 2024-09-18 11:28:21,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=433580.0, ans=0.07 2024-09-18 11:28:44,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433660.0, ans=0.1 2024-09-18 11:28:54,031 INFO [train.py:1198] (0/2) Epoch 24, batch 4350, loss[loss=0.2623, ctc_loss=0.1493, cr_loss=0.4152, attn_decoder_loss=0.2656, over 29484.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1312, cr_loss=0.3764, attn_decoder_loss=0.2504, over 5798060.25 frames. ], batch size: 97, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 11:29:03,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=433700.0, ans=0.0 2024-09-18 11:29:03,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.80 vs. limit=10.0 2024-09-18 11:29:08,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-09-18 11:29:26,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=433780.0, ans=0.0 2024-09-18 11:29:33,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.959e+01 8.974e+01 9.434e+01 1.011e+02 1.996e+02, threshold=1.887e+02, percent-clipped=1.0 2024-09-18 11:29:47,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-18 11:29:55,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=433860.0, ans=0.035 2024-09-18 11:30:05,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=433900.0, ans=0.0 2024-09-18 11:30:07,005 INFO [train.py:1198] (0/2) Epoch 24, batch 4400, loss[loss=0.2518, ctc_loss=0.141, cr_loss=0.4097, attn_decoder_loss=0.255, over 27469.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1323, cr_loss=0.3783, attn_decoder_loss=0.2521, over 5769411.00 frames. ], batch size: 125, lr: 4.58e-03, grad_scale: 16.0 2024-09-18 11:30:15,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=433900.0, ans=0.125 2024-09-18 11:30:27,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=433940.0, ans=22.5 2024-09-18 11:30:28,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=433940.0, ans=0.04949747468305833 2024-09-18 11:30:38,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=433980.0, ans=0.0 2024-09-18 11:30:44,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=433980.0, ans=0.125 2024-09-18 11:31:00,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=434020.0, ans=0.04949747468305833 2024-09-18 11:31:21,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.14 vs. limit=10.0 2024-09-18 11:31:21,665 INFO [train.py:1198] (0/2) Epoch 24, batch 4450, loss[loss=0.2658, ctc_loss=0.1598, cr_loss=0.3752, attn_decoder_loss=0.2692, over 20223.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1364, cr_loss=0.3835, attn_decoder_loss=0.2544, over 5584158.32 frames. ], batch size: 210, lr: 4.58e-03, grad_scale: 8.0 2024-09-18 11:31:47,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-09-18 11:31:48,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=434140.0, ans=0.09899494936611666 2024-09-18 11:31:50,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=434180.0, ans=0.125 2024-09-18 11:32:04,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.021e+01 9.778e+01 1.211e+02 1.854e+02, threshold=1.956e+02, percent-clipped=0.0 2024-09-18 11:32:07,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=434220.0, ans=0.05 2024-09-18 11:32:15,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=434220.0, ans=0.2 2024-09-18 11:32:18,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=434220.0, ans=0.2 2024-09-18 11:32:21,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=434260.0, ans=0.125 2024-09-18 11:32:30,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=434260.0, ans=0.125 2024-09-18 11:32:37,517 INFO [train.py:1198] (0/2) Epoch 24, batch 4500, loss[loss=0.2705, ctc_loss=0.1671, cr_loss=0.4141, attn_decoder_loss=0.2728, over 20188.00 frames. ], tot_loss[loss=0.2531, ctc_loss=0.1411, cr_loss=0.3875, attn_decoder_loss=0.2569, over 5240242.20 frames. ], batch size: 211, lr: 4.58e-03, grad_scale: 8.0 2024-09-18 11:33:14,949 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-24.pt 2024-09-18 11:34:00,534 INFO [train.py:1198] (0/2) Epoch 25, batch 0, loss[loss=0.2124, ctc_loss=0.09974, cr_loss=0.3218, attn_decoder_loss=0.2178, over 29624.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.09974, cr_loss=0.3218, attn_decoder_loss=0.2178, over 29624.00 frames. ], batch size: 73, lr: 4.49e-03, grad_scale: 16.0 2024-09-18 11:34:00,535 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 11:34:18,958 INFO [train.py:1230] (0/2) Epoch 25, validation: loss=0.2119, ctc_loss=0.03765, cr_loss=5.538e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 11:34:18,958 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 11:34:53,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=434480.0, ans=0.0 2024-09-18 11:34:56,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=434480.0, ans=0.125 2024-09-18 11:35:08,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.85 vs. limit=10.0 2024-09-18 11:35:18,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-09-18 11:35:36,620 INFO [train.py:1198] (0/2) Epoch 25, batch 50, loss[loss=0.222, ctc_loss=0.1096, cr_loss=0.3395, attn_decoder_loss=0.2269, over 29453.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1314, cr_loss=0.3783, attn_decoder_loss=0.2479, over 1268992.76 frames. ], batch size: 70, lr: 4.49e-03, grad_scale: 8.0 2024-09-18 11:35:37,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-18 11:35:41,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=434600.0, ans=0.0 2024-09-18 11:35:42,739 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.952e+01 1.043e+02 1.177e+02 2.373e+02, threshold=2.086e+02, percent-clipped=2.0 2024-09-18 11:35:55,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-18 11:36:23,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=434720.0, ans=0.0 2024-09-18 11:36:53,484 INFO [train.py:1198] (0/2) Epoch 25, batch 100, loss[loss=0.2381, ctc_loss=0.1196, cr_loss=0.3604, attn_decoder_loss=0.2432, over 29523.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1327, cr_loss=0.3801, attn_decoder_loss=0.2503, over 2253706.56 frames. ], batch size: 76, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:37:01,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434800.0, ans=0.1 2024-09-18 11:37:01,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=434800.0, ans=10.0 2024-09-18 11:37:08,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=434840.0, ans=0.125 2024-09-18 11:37:31,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=434880.0, ans=0.2 2024-09-18 11:37:44,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=434920.0, ans=0.09899494936611666 2024-09-18 11:37:55,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=434960.0, ans=0.125 2024-09-18 11:37:55,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=434960.0, ans=0.0 2024-09-18 11:37:58,189 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:38:02,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=434960.0, ans=0.0 2024-09-18 11:38:08,137 INFO [train.py:1198] (0/2) Epoch 25, batch 150, loss[loss=0.2211, ctc_loss=0.1186, cr_loss=0.3383, attn_decoder_loss=0.225, over 29418.00 frames. ], tot_loss[loss=0.2439, ctc_loss=0.1301, cr_loss=0.3745, attn_decoder_loss=0.2483, over 3048095.50 frames. ], batch size: 70, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:38:08,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=435000.0, ans=0.0 2024-09-18 11:38:14,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.648e+01 9.269e+01 9.917e+01 1.697e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-18 11:38:14,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=435000.0, ans=0.1 2024-09-18 11:38:34,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=435040.0, ans=0.0 2024-09-18 11:38:41,780 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:39:05,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=435120.0, ans=0.125 2024-09-18 11:39:21,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=435160.0, ans=0.125 2024-09-18 11:39:24,084 INFO [train.py:1198] (0/2) Epoch 25, batch 200, loss[loss=0.2536, ctc_loss=0.138, cr_loss=0.3873, attn_decoder_loss=0.2578, over 27556.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1294, cr_loss=0.3731, attn_decoder_loss=0.247, over 3659578.51 frames. ], batch size: 125, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:39:31,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=15.0 2024-09-18 11:39:34,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435200.0, ans=0.1 2024-09-18 11:39:54,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=435240.0, ans=0.0 2024-09-18 11:40:00,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435280.0, ans=0.1 2024-09-18 11:40:15,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-18 11:40:37,418 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:40:42,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=22.5 2024-09-18 11:40:44,516 INFO [train.py:1198] (0/2) Epoch 25, batch 250, loss[loss=0.2486, ctc_loss=0.1294, cr_loss=0.3634, attn_decoder_loss=0.2538, over 29228.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1288, cr_loss=0.3722, attn_decoder_loss=0.2466, over 4142482.82 frames. ], batch size: 100, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:40:46,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=435400.0, ans=0.2 2024-09-18 11:40:50,550 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.660e+01 8.459e+01 8.857e+01 9.365e+01 1.077e+02, threshold=1.771e+02, percent-clipped=0.0 2024-09-18 11:41:30,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=435520.0, ans=0.07 2024-09-18 11:41:38,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=435520.0, ans=0.0 2024-09-18 11:41:43,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.25 vs. limit=15.0 2024-09-18 11:41:49,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-09-18 11:41:50,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=435560.0, ans=15.0 2024-09-18 11:41:56,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=435560.0, ans=0.125 2024-09-18 11:41:59,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2024-09-18 11:42:00,695 INFO [train.py:1198] (0/2) Epoch 25, batch 300, loss[loss=0.2482, ctc_loss=0.1299, cr_loss=0.3776, attn_decoder_loss=0.2529, over 29507.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1289, cr_loss=0.3721, attn_decoder_loss=0.2466, over 4511644.29 frames. ], batch size: 92, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:42:23,579 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:42:31,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.02 vs. limit=10.0 2024-09-18 11:42:36,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=12.0 2024-09-18 11:42:37,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=435680.0, ans=0.125 2024-09-18 11:43:08,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=435760.0, ans=0.2 2024-09-18 11:43:09,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2024-09-18 11:43:15,964 INFO [train.py:1198] (0/2) Epoch 25, batch 350, loss[loss=0.2232, ctc_loss=0.111, cr_loss=0.3191, attn_decoder_loss=0.2286, over 29332.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1285, cr_loss=0.3712, attn_decoder_loss=0.2465, over 4796369.32 frames. ], batch size: 71, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:43:21,920 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.434e+01 8.932e+01 9.530e+01 2.745e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-18 11:43:30,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=435800.0, ans=0.125 2024-09-18 11:43:39,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=435840.0, ans=0.0 2024-09-18 11:43:57,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.43 vs. limit=15.0 2024-09-18 11:44:27,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=435960.0, ans=0.125 2024-09-18 11:44:36,416 INFO [train.py:1198] (0/2) Epoch 25, batch 400, loss[loss=0.2586, ctc_loss=0.142, cr_loss=0.4138, attn_decoder_loss=0.2624, over 29720.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1282, cr_loss=0.3709, attn_decoder_loss=0.2463, over 5027159.78 frames. ], batch size: 82, lr: 4.48e-03, grad_scale: 16.0 2024-09-18 11:45:06,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.80 vs. limit=15.0 2024-09-18 11:45:26,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.70 vs. limit=8.0 2024-09-18 11:45:29,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=436120.0, ans=0.125 2024-09-18 11:45:32,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2024-09-18 11:45:52,038 INFO [train.py:1198] (0/2) Epoch 25, batch 450, loss[loss=0.2438, ctc_loss=0.1333, cr_loss=0.3816, attn_decoder_loss=0.2476, over 29679.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1286, cr_loss=0.3716, attn_decoder_loss=0.2463, over 5188697.01 frames. ], batch size: 83, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:45:52,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=436200.0, ans=0.0 2024-09-18 11:45:59,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.625e+01 9.050e+01 9.660e+01 1.722e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-18 11:46:48,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=436320.0, ans=0.1 2024-09-18 11:47:07,835 INFO [train.py:1198] (0/2) Epoch 25, batch 500, loss[loss=0.2706, ctc_loss=0.1583, cr_loss=0.427, attn_decoder_loss=0.2736, over 29437.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1282, cr_loss=0.3712, attn_decoder_loss=0.2459, over 5330368.47 frames. ], batch size: 94, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:47:29,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2024-09-18 11:48:08,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=436520.0, ans=0.0 2024-09-18 11:48:25,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=436560.0, ans=0.125 2024-09-18 11:48:28,229 INFO [train.py:1198] (0/2) Epoch 25, batch 550, loss[loss=0.2568, ctc_loss=0.1361, cr_loss=0.3926, attn_decoder_loss=0.2615, over 28858.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1277, cr_loss=0.3702, attn_decoder_loss=0.246, over 5422095.63 frames. ], batch size: 104, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 11:48:33,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=436600.0, ans=0.2 2024-09-18 11:48:35,874 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.562e+01 9.108e+01 9.510e+01 4.336e+02, threshold=1.822e+02, percent-clipped=3.0 2024-09-18 11:48:56,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=436640.0, ans=0.125 2024-09-18 11:48:56,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.85 vs. limit=15.0 2024-09-18 11:49:05,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=436680.0, ans=0.0 2024-09-18 11:49:11,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=436680.0, ans=0.125 2024-09-18 11:49:14,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=436720.0, ans=0.125 2024-09-18 11:49:16,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=436720.0, ans=0.2 2024-09-18 11:49:27,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=436720.0, ans=15.0 2024-09-18 11:49:34,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=436760.0, ans=10.0 2024-09-18 11:49:45,358 INFO [train.py:1198] (0/2) Epoch 25, batch 600, loss[loss=0.2558, ctc_loss=0.1416, cr_loss=0.3816, attn_decoder_loss=0.26, over 29262.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1274, cr_loss=0.3699, attn_decoder_loss=0.2463, over 5509206.82 frames. ], batch size: 100, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:49:48,849 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:49:56,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=436800.0, ans=0.125 2024-09-18 11:49:58,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=436840.0, ans=0.025 2024-09-18 11:50:13,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=436880.0, ans=0.125 2024-09-18 11:50:18,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=436880.0, ans=0.125 2024-09-18 11:50:37,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=436920.0, ans=0.2 2024-09-18 11:50:53,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=436960.0, ans=0.0 2024-09-18 11:50:56,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=436960.0, ans=0.05 2024-09-18 11:51:00,491 INFO [train.py:1198] (0/2) Epoch 25, batch 650, loss[loss=0.234, ctc_loss=0.1205, cr_loss=0.3645, attn_decoder_loss=0.2385, over 29772.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1267, cr_loss=0.3684, attn_decoder_loss=0.2456, over 5587379.03 frames. ], batch size: 81, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:51:05,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=437000.0, ans=0.0 2024-09-18 11:51:08,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.416e+01 8.904e+01 9.509e+01 2.097e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-18 11:51:16,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=437040.0, ans=0.125 2024-09-18 11:51:22,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437040.0, ans=0.125 2024-09-18 11:51:26,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437040.0, ans=0.125 2024-09-18 11:51:31,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=437080.0, ans=0.125 2024-09-18 11:51:49,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=437120.0, ans=0.125 2024-09-18 11:51:55,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=437120.0, ans=0.125 2024-09-18 11:52:03,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=437120.0, ans=0.125 2024-09-18 11:52:05,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-18 11:52:09,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=437160.0, ans=0.025 2024-09-18 11:52:21,114 INFO [train.py:1198] (0/2) Epoch 25, batch 700, loss[loss=0.2384, ctc_loss=0.1324, cr_loss=0.3713, attn_decoder_loss=0.2419, over 29536.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1275, cr_loss=0.3693, attn_decoder_loss=0.2461, over 5636449.29 frames. ], batch size: 76, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:52:27,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=437200.0, ans=0.0 2024-09-18 11:52:28,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=437200.0, ans=0.125 2024-09-18 11:52:43,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.77 vs. limit=15.0 2024-09-18 11:53:30,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.86 vs. limit=22.5 2024-09-18 11:53:33,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2024-09-18 11:53:37,357 INFO [train.py:1198] (0/2) Epoch 25, batch 750, loss[loss=0.2291, ctc_loss=0.1112, cr_loss=0.3316, attn_decoder_loss=0.2349, over 29710.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1274, cr_loss=0.369, attn_decoder_loss=0.2457, over 5674865.06 frames. ], batch size: 82, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:53:43,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-18 11:53:44,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.436e+01 8.901e+01 9.527e+01 2.571e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-18 11:54:07,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=437480.0, ans=0.125 2024-09-18 11:54:33,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-18 11:54:35,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=437520.0, ans=0.1 2024-09-18 11:54:53,473 INFO [train.py:1198] (0/2) Epoch 25, batch 800, loss[loss=0.2197, ctc_loss=0.1064, cr_loss=0.3124, attn_decoder_loss=0.2253, over 29597.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1274, cr_loss=0.3688, attn_decoder_loss=0.2457, over 5705268.78 frames. ], batch size: 73, lr: 4.47e-03, grad_scale: 16.0 2024-09-18 11:55:03,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=437600.0, ans=0.0 2024-09-18 11:55:04,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2024-09-18 11:55:06,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=437600.0, ans=0.125 2024-09-18 11:55:09,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=437640.0, ans=0.025 2024-09-18 11:55:35,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=437680.0, ans=0.0 2024-09-18 11:55:45,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=437720.0, ans=0.0 2024-09-18 11:55:52,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=22.5 2024-09-18 11:56:13,703 INFO [train.py:1198] (0/2) Epoch 25, batch 850, loss[loss=0.2416, ctc_loss=0.1165, cr_loss=0.3412, attn_decoder_loss=0.2479, over 29696.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1271, cr_loss=0.3686, attn_decoder_loss=0.2454, over 5734676.18 frames. ], batch size: 89, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:56:22,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.420e+01 8.934e+01 9.567e+01 3.952e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 11:56:28,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=437840.0, ans=0.0 2024-09-18 11:56:36,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=437840.0, ans=0.0 2024-09-18 11:56:40,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=437840.0, ans=0.2 2024-09-18 11:57:00,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=437920.0, ans=0.125 2024-09-18 11:57:23,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.72 vs. limit=15.0 2024-09-18 11:57:29,246 INFO [train.py:1198] (0/2) Epoch 25, batch 900, loss[loss=0.2224, ctc_loss=0.1097, cr_loss=0.3312, attn_decoder_loss=0.2275, over 29645.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1278, cr_loss=0.3695, attn_decoder_loss=0.2459, over 5740099.17 frames. ], batch size: 73, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:57:41,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=438000.0, ans=0.125 2024-09-18 11:57:52,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=438040.0, ans=0.125 2024-09-18 11:57:55,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=438040.0, ans=0.125 2024-09-18 11:58:01,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=438080.0, ans=0.0 2024-09-18 11:58:01,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=438080.0, ans=0.0 2024-09-18 11:58:11,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=438080.0, ans=0.125 2024-09-18 11:58:14,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=438120.0, ans=0.125 2024-09-18 11:58:22,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=438120.0, ans=0.025 2024-09-18 11:58:44,546 INFO [train.py:1198] (0/2) Epoch 25, batch 950, loss[loss=0.2245, ctc_loss=0.1153, cr_loss=0.342, attn_decoder_loss=0.229, over 29522.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1274, cr_loss=0.3685, attn_decoder_loss=0.2457, over 5743839.52 frames. ], batch size: 74, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 11:58:53,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.540e+01 9.168e+01 9.959e+01 1.680e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-18 11:59:13,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=438280.0, ans=0.2 2024-09-18 11:59:18,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=438280.0, ans=0.125 2024-09-18 11:59:54,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.69 vs. limit=10.0 2024-09-18 12:00:04,898 INFO [train.py:1198] (0/2) Epoch 25, batch 1000, loss[loss=0.2332, ctc_loss=0.1218, cr_loss=0.3643, attn_decoder_loss=0.2374, over 29494.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1284, cr_loss=0.3713, attn_decoder_loss=0.2467, over 5739432.10 frames. ], batch size: 77, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 12:00:08,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=438400.0, ans=0.2 2024-09-18 12:00:09,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-18 12:00:15,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=438400.0, ans=0.125 2024-09-18 12:00:18,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=438440.0, ans=0.125 2024-09-18 12:00:34,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=438480.0, ans=0.95 2024-09-18 12:00:51,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=22.5 2024-09-18 12:00:52,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=438520.0, ans=0.125 2024-09-18 12:00:58,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=438520.0, ans=0.0 2024-09-18 12:01:20,738 INFO [train.py:1198] (0/2) Epoch 25, batch 1050, loss[loss=0.2428, ctc_loss=0.1231, cr_loss=0.3473, attn_decoder_loss=0.2484, over 29671.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1281, cr_loss=0.3707, attn_decoder_loss=0.2461, over 5746860.94 frames. ], batch size: 85, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 12:01:25,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=438600.0, ans=0.1 2024-09-18 12:01:29,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.550e+01 9.112e+01 9.812e+01 2.455e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 12:01:30,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=438600.0, ans=0.125 2024-09-18 12:01:44,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-09-18 12:01:49,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=438680.0, ans=0.125 2024-09-18 12:01:57,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=438680.0, ans=0.125 2024-09-18 12:02:17,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438720.0, ans=0.1 2024-09-18 12:02:20,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=438760.0, ans=0.125 2024-09-18 12:02:23,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=438760.0, ans=0.025 2024-09-18 12:02:35,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438800.0, ans=0.1 2024-09-18 12:02:36,548 INFO [train.py:1198] (0/2) Epoch 25, batch 1100, loss[loss=0.2395, ctc_loss=0.1227, cr_loss=0.3704, attn_decoder_loss=0.2442, over 29475.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1282, cr_loss=0.3712, attn_decoder_loss=0.2462, over 5759597.11 frames. ], batch size: 78, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:02:36,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=438800.0, ans=0.125 2024-09-18 12:02:41,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-09-18 12:02:53,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-18 12:02:55,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=438840.0, ans=0.125 2024-09-18 12:02:59,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=438840.0, ans=0.035 2024-09-18 12:02:59,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=438840.0, ans=0.025 2024-09-18 12:03:03,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=438840.0, ans=0.125 2024-09-18 12:03:08,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=438880.0, ans=0.125 2024-09-18 12:03:13,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=438880.0, ans=0.2 2024-09-18 12:03:13,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=438880.0, ans=0.0 2024-09-18 12:03:28,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=438920.0, ans=0.125 2024-09-18 12:03:50,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=438960.0, ans=0.0 2024-09-18 12:03:56,686 INFO [train.py:1198] (0/2) Epoch 25, batch 1150, loss[loss=0.2296, ctc_loss=0.1196, cr_loss=0.3646, attn_decoder_loss=0.2337, over 29435.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1282, cr_loss=0.3711, attn_decoder_loss=0.2458, over 5755390.58 frames. ], batch size: 78, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:04:05,924 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.574e+01 9.064e+01 9.855e+01 2.778e+02, threshold=1.813e+02, percent-clipped=2.0 2024-09-18 12:04:08,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=12.0 2024-09-18 12:04:13,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=439040.0, ans=0.0 2024-09-18 12:04:43,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=439120.0, ans=0.0 2024-09-18 12:04:56,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=439160.0, ans=0.025 2024-09-18 12:05:13,553 INFO [train.py:1198] (0/2) Epoch 25, batch 1200, loss[loss=0.2502, ctc_loss=0.1347, cr_loss=0.384, attn_decoder_loss=0.2545, over 29672.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.129, cr_loss=0.3726, attn_decoder_loss=0.2466, over 5747811.89 frames. ], batch size: 85, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 12:05:13,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=439200.0, ans=0.125 2024-09-18 12:05:48,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-09-18 12:06:01,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=439320.0, ans=0.125 2024-09-18 12:06:16,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=439360.0, ans=0.0 2024-09-18 12:06:24,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=439360.0, ans=0.0 2024-09-18 12:06:29,669 INFO [train.py:1198] (0/2) Epoch 25, batch 1250, loss[loss=0.2472, ctc_loss=0.1232, cr_loss=0.3687, attn_decoder_loss=0.2528, over 29536.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1294, cr_loss=0.3734, attn_decoder_loss=0.2473, over 5775494.57 frames. ], batch size: 92, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:06:31,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=439400.0, ans=0.125 2024-09-18 12:06:40,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.697e+01 9.266e+01 9.820e+01 4.128e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-18 12:06:55,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=439440.0, ans=0.125 2024-09-18 12:07:03,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-09-18 12:07:04,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=439480.0, ans=0.125 2024-09-18 12:07:30,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.41 vs. limit=15.0 2024-09-18 12:07:46,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=439560.0, ans=0.2 2024-09-18 12:07:50,372 INFO [train.py:1198] (0/2) Epoch 25, batch 1300, loss[loss=0.2526, ctc_loss=0.1329, cr_loss=0.3888, attn_decoder_loss=0.2572, over 28238.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1291, cr_loss=0.3734, attn_decoder_loss=0.2468, over 5780532.04 frames. ], batch size: 111, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:07:50,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=439600.0, ans=0.125 2024-09-18 12:08:07,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.20 vs. limit=22.5 2024-09-18 12:08:42,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=22.5 2024-09-18 12:09:06,235 INFO [train.py:1198] (0/2) Epoch 25, batch 1350, loss[loss=0.2508, ctc_loss=0.1352, cr_loss=0.3904, attn_decoder_loss=0.255, over 29755.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1289, cr_loss=0.3735, attn_decoder_loss=0.2466, over 5797388.83 frames. ], batch size: 81, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:09:16,805 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.561e+01 9.293e+01 1.003e+02 2.081e+02, threshold=1.859e+02, percent-clipped=1.0 2024-09-18 12:09:28,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439840.0, ans=0.125 2024-09-18 12:10:03,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=439920.0, ans=0.125 2024-09-18 12:10:21,586 INFO [train.py:1198] (0/2) Epoch 25, batch 1400, loss[loss=0.2128, ctc_loss=0.1093, cr_loss=0.3223, attn_decoder_loss=0.2171, over 29573.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1285, cr_loss=0.372, attn_decoder_loss=0.2463, over 5808018.07 frames. ], batch size: 69, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:10:29,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440000.0, ans=0.125 2024-09-18 12:10:32,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=440000.0, ans=10.0 2024-09-18 12:10:53,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=440080.0, ans=0.125 2024-09-18 12:11:12,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=440120.0, ans=0.1 2024-09-18 12:11:29,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=440160.0, ans=0.2 2024-09-18 12:11:30,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.24 vs. limit=15.0 2024-09-18 12:11:34,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440160.0, ans=0.125 2024-09-18 12:11:38,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440160.0, ans=0.1 2024-09-18 12:11:38,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=440160.0, ans=0.2 2024-09-18 12:11:41,708 INFO [train.py:1198] (0/2) Epoch 25, batch 1450, loss[loss=0.2658, ctc_loss=0.1523, cr_loss=0.4289, attn_decoder_loss=0.2688, over 29428.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1288, cr_loss=0.3726, attn_decoder_loss=0.2468, over 5804225.06 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:11:42,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-18 12:11:52,214 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.790e+01 8.701e+01 9.305e+01 9.884e+01 1.753e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-18 12:11:57,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440240.0, ans=0.1 2024-09-18 12:12:01,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=440240.0, ans=0.125 2024-09-18 12:12:08,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.40 vs. limit=22.5 2024-09-18 12:12:19,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=440280.0, ans=0.2 2024-09-18 12:12:21,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=440280.0, ans=0.02 2024-09-18 12:12:34,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=440320.0, ans=0.125 2024-09-18 12:12:43,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=440360.0, ans=10.0 2024-09-18 12:12:53,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-09-18 12:12:57,321 INFO [train.py:1198] (0/2) Epoch 25, batch 1500, loss[loss=0.2475, ctc_loss=0.1252, cr_loss=0.3688, attn_decoder_loss=0.2529, over 29630.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.129, cr_loss=0.3732, attn_decoder_loss=0.2474, over 5806151.31 frames. ], batch size: 86, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:13:23,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=440440.0, ans=0.125 2024-09-18 12:13:46,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.77 vs. limit=15.0 2024-09-18 12:13:47,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=440520.0, ans=0.1 2024-09-18 12:13:49,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440520.0, ans=0.1 2024-09-18 12:13:50,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=440520.0, ans=0.0 2024-09-18 12:13:52,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=440520.0, ans=0.125 2024-09-18 12:13:55,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440520.0, ans=0.1 2024-09-18 12:13:57,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440560.0, ans=0.1 2024-09-18 12:14:10,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=440560.0, ans=0.0 2024-09-18 12:14:11,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=440600.0, ans=0.0 2024-09-18 12:14:13,207 INFO [train.py:1198] (0/2) Epoch 25, batch 1550, loss[loss=0.254, ctc_loss=0.1365, cr_loss=0.3838, attn_decoder_loss=0.2586, over 29535.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1292, cr_loss=0.373, attn_decoder_loss=0.2471, over 5782306.12 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 8.0 2024-09-18 12:14:16,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=440600.0, ans=0.0 2024-09-18 12:14:23,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.497e+01 9.186e+01 9.794e+01 2.835e+02, threshold=1.837e+02, percent-clipped=2.0 2024-09-18 12:14:35,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-18 12:14:41,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=440680.0, ans=0.0 2024-09-18 12:15:33,647 INFO [train.py:1198] (0/2) Epoch 25, batch 1600, loss[loss=0.2497, ctc_loss=0.1272, cr_loss=0.3527, attn_decoder_loss=0.2555, over 29689.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1298, cr_loss=0.3738, attn_decoder_loss=0.2474, over 5765450.97 frames. ], batch size: 85, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 12:15:36,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-18 12:15:43,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=440800.0, ans=0.07 2024-09-18 12:15:53,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=440840.0, ans=0.125 2024-09-18 12:15:59,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=440840.0, ans=0.125 2024-09-18 12:16:07,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=440880.0, ans=0.125 2024-09-18 12:16:08,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=440880.0, ans=0.125 2024-09-18 12:16:33,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-18 12:16:36,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-18 12:16:49,308 INFO [train.py:1198] (0/2) Epoch 25, batch 1650, loss[loss=0.2504, ctc_loss=0.1352, cr_loss=0.3896, attn_decoder_loss=0.2546, over 29728.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1295, cr_loss=0.3733, attn_decoder_loss=0.2471, over 5758936.36 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:17:01,297 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.636e+01 9.380e+01 1.005e+02 4.034e+02, threshold=1.876e+02, percent-clipped=3.0 2024-09-18 12:17:13,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441040.0, ans=0.1 2024-09-18 12:17:30,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=441080.0, ans=0.2 2024-09-18 12:17:36,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=441120.0, ans=0.0 2024-09-18 12:18:03,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-18 12:18:04,970 INFO [train.py:1198] (0/2) Epoch 25, batch 1700, loss[loss=0.2153, ctc_loss=0.1143, cr_loss=0.3431, attn_decoder_loss=0.2189, over 29565.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1287, cr_loss=0.3719, attn_decoder_loss=0.2466, over 5780835.66 frames. ], batch size: 69, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:18:20,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=441240.0, ans=0.125 2024-09-18 12:18:39,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441280.0, ans=0.1 2024-09-18 12:18:45,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=441280.0, ans=0.025 2024-09-18 12:18:48,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=441280.0, ans=0.125 2024-09-18 12:19:04,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=441320.0, ans=0.125 2024-09-18 12:19:25,315 INFO [train.py:1198] (0/2) Epoch 25, batch 1750, loss[loss=0.2121, ctc_loss=0.1113, cr_loss=0.3436, attn_decoder_loss=0.2157, over 29333.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1284, cr_loss=0.3716, attn_decoder_loss=0.2463, over 5789593.89 frames. ], batch size: 67, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:19:37,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.435e+01 8.870e+01 9.715e+01 1.342e+02, threshold=1.774e+02, percent-clipped=0.0 2024-09-18 12:19:41,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-18 12:19:42,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=441440.0, ans=0.2 2024-09-18 12:20:14,396 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:20:20,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-18 12:20:41,483 INFO [train.py:1198] (0/2) Epoch 25, batch 1800, loss[loss=0.2574, ctc_loss=0.1394, cr_loss=0.4003, attn_decoder_loss=0.2616, over 29692.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1284, cr_loss=0.371, attn_decoder_loss=0.2464, over 5792895.72 frames. ], batch size: 83, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:20:49,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=441600.0, ans=0.0 2024-09-18 12:21:06,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-18 12:21:15,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-09-18 12:21:25,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=441720.0, ans=0.2 2024-09-18 12:21:27,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=441720.0, ans=0.0 2024-09-18 12:21:36,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=441720.0, ans=0.5 2024-09-18 12:21:45,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=441760.0, ans=0.125 2024-09-18 12:21:56,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=441800.0, ans=0.125 2024-09-18 12:21:57,727 INFO [train.py:1198] (0/2) Epoch 25, batch 1850, loss[loss=0.2457, ctc_loss=0.1278, cr_loss=0.3651, attn_decoder_loss=0.2507, over 29607.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1278, cr_loss=0.3703, attn_decoder_loss=0.2462, over 5798928.66 frames. ], batch size: 86, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:22:06,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=441800.0, ans=0.1 2024-09-18 12:22:09,694 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.488e+01 8.939e+01 9.551e+01 1.184e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-18 12:22:34,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2024-09-18 12:22:45,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=441920.0, ans=0.0 2024-09-18 12:22:52,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-18 12:22:59,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-18 12:23:06,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=441960.0, ans=0.2 2024-09-18 12:23:11,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-09-18 12:23:12,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=441960.0, ans=0.125 2024-09-18 12:23:14,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=442000.0, ans=0.025 2024-09-18 12:23:15,236 INFO [train.py:1198] (0/2) Epoch 25, batch 1900, loss[loss=0.2552, ctc_loss=0.1347, cr_loss=0.3823, attn_decoder_loss=0.26, over 29717.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.128, cr_loss=0.3712, attn_decoder_loss=0.2467, over 5806835.88 frames. ], batch size: 89, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:23:25,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=442000.0, ans=0.05 2024-09-18 12:23:33,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=442040.0, ans=0.125 2024-09-18 12:23:39,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=442040.0, ans=10.0 2024-09-18 12:23:52,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-09-18 12:24:04,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2024-09-18 12:24:21,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=442160.0, ans=0.0 2024-09-18 12:24:33,478 INFO [train.py:1198] (0/2) Epoch 25, batch 1950, loss[loss=0.2379, ctc_loss=0.1313, cr_loss=0.3931, attn_decoder_loss=0.2411, over 29447.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1287, cr_loss=0.3725, attn_decoder_loss=0.2479, over 5820961.61 frames. ], batch size: 78, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:24:45,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.609e+01 9.254e+01 9.710e+01 4.424e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-18 12:24:47,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=442240.0, ans=0.125 2024-09-18 12:24:54,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=442240.0, ans=0.2 2024-09-18 12:25:01,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=442240.0, ans=0.2 2024-09-18 12:25:39,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=442360.0, ans=0.125 2024-09-18 12:25:41,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=442360.0, ans=22.5 2024-09-18 12:25:46,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=442360.0, ans=0.2 2024-09-18 12:25:49,611 INFO [train.py:1198] (0/2) Epoch 25, batch 2000, loss[loss=0.2191, ctc_loss=0.1159, cr_loss=0.3499, attn_decoder_loss=0.2228, over 29340.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1291, cr_loss=0.373, attn_decoder_loss=0.2481, over 5796983.38 frames. ], batch size: 67, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 12:25:57,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=442400.0, ans=10.0 2024-09-18 12:26:18,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=442440.0, ans=0.0 2024-09-18 12:26:30,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=442480.0, ans=0.2 2024-09-18 12:26:32,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=442480.0, ans=0.0 2024-09-18 12:26:34,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=442480.0, ans=0.0 2024-09-18 12:26:34,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=442480.0, ans=0.125 2024-09-18 12:26:48,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442520.0, ans=0.1 2024-09-18 12:26:49,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=442520.0, ans=0.07 2024-09-18 12:26:54,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=442560.0, ans=0.0 2024-09-18 12:27:03,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=442560.0, ans=0.125 2024-09-18 12:27:03,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442560.0, ans=0.0 2024-09-18 12:27:07,724 INFO [train.py:1198] (0/2) Epoch 25, batch 2050, loss[loss=0.22, ctc_loss=0.1139, cr_loss=0.3447, attn_decoder_loss=0.2242, over 29420.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1288, cr_loss=0.3722, attn_decoder_loss=0.2474, over 5789234.25 frames. ], batch size: 70, lr: 4.45e-03, grad_scale: 8.0 2024-09-18 12:27:19,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=442600.0, ans=0.125 2024-09-18 12:27:23,554 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 8.436e+01 8.905e+01 9.396e+01 1.982e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-18 12:27:29,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442640.0, ans=0.0 2024-09-18 12:27:58,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=442720.0, ans=0.125 2024-09-18 12:28:12,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-09-18 12:28:16,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2024-09-18 12:28:25,816 INFO [train.py:1198] (0/2) Epoch 25, batch 2100, loss[loss=0.2481, ctc_loss=0.1348, cr_loss=0.3834, attn_decoder_loss=0.2521, over 29779.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1285, cr_loss=0.3713, attn_decoder_loss=0.2468, over 5801232.31 frames. ], batch size: 81, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:28:26,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.76 vs. limit=6.0 2024-09-18 12:28:35,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=442800.0, ans=0.95 2024-09-18 12:28:36,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=442800.0, ans=0.2 2024-09-18 12:28:44,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2024-09-18 12:28:47,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=442840.0, ans=0.125 2024-09-18 12:29:09,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=442920.0, ans=0.0 2024-09-18 12:29:26,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=442960.0, ans=0.125 2024-09-18 12:29:29,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=442960.0, ans=0.2 2024-09-18 12:29:32,382 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:29:38,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=442960.0, ans=0.2 2024-09-18 12:29:39,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=443000.0, ans=0.2 2024-09-18 12:29:41,142 INFO [train.py:1198] (0/2) Epoch 25, batch 2150, loss[loss=0.2368, ctc_loss=0.13, cr_loss=0.3788, attn_decoder_loss=0.2402, over 29457.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1274, cr_loss=0.3695, attn_decoder_loss=0.246, over 5816463.15 frames. ], batch size: 78, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:29:41,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.94 vs. limit=10.0 2024-09-18 12:29:43,108 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:29:52,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=443000.0, ans=0.125 2024-09-18 12:29:54,895 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.437e+01 8.935e+01 9.622e+01 1.303e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-18 12:30:07,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=443040.0, ans=0.2 2024-09-18 12:30:18,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=443080.0, ans=0.1 2024-09-18 12:30:20,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=443080.0, ans=0.0 2024-09-18 12:30:35,082 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:30:53,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-09-18 12:30:59,574 INFO [train.py:1198] (0/2) Epoch 25, batch 2200, loss[loss=0.2508, ctc_loss=0.127, cr_loss=0.3521, attn_decoder_loss=0.2567, over 29628.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.128, cr_loss=0.3706, attn_decoder_loss=0.2463, over 5812665.67 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:31:09,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=443200.0, ans=0.0 2024-09-18 12:31:54,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-18 12:31:59,815 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:32:16,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.95 vs. limit=22.5 2024-09-18 12:32:17,484 INFO [train.py:1198] (0/2) Epoch 25, batch 2250, loss[loss=0.2516, ctc_loss=0.131, cr_loss=0.3862, attn_decoder_loss=0.2565, over 29689.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1279, cr_loss=0.3707, attn_decoder_loss=0.2463, over 5811821.71 frames. ], batch size: 82, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:32:31,059 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.316e+01 8.883e+01 9.424e+01 4.658e+02, threshold=1.777e+02, percent-clipped=2.0 2024-09-18 12:33:03,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=12.0 2024-09-18 12:33:07,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=443520.0, ans=0.2 2024-09-18 12:33:15,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=15.0 2024-09-18 12:33:22,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=443560.0, ans=0.05 2024-09-18 12:33:33,023 INFO [train.py:1198] (0/2) Epoch 25, batch 2300, loss[loss=0.2241, ctc_loss=0.1141, cr_loss=0.3552, attn_decoder_loss=0.2284, over 29315.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.127, cr_loss=0.369, attn_decoder_loss=0.2454, over 5799523.79 frames. ], batch size: 71, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:33:46,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=443640.0, ans=0.0 2024-09-18 12:34:09,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=443680.0, ans=0.0 2024-09-18 12:34:28,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=443720.0, ans=0.125 2024-09-18 12:34:51,415 INFO [train.py:1198] (0/2) Epoch 25, batch 2350, loss[loss=0.2534, ctc_loss=0.1338, cr_loss=0.3852, attn_decoder_loss=0.2581, over 29691.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1273, cr_loss=0.3695, attn_decoder_loss=0.2456, over 5804728.64 frames. ], batch size: 83, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:34:53,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=443800.0, ans=0.125 2024-09-18 12:35:04,930 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.571e+01 9.088e+01 9.554e+01 1.522e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-18 12:35:27,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443880.0, ans=0.125 2024-09-18 12:35:38,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-09-18 12:35:43,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.37 vs. limit=15.0 2024-09-18 12:35:59,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=443960.0, ans=0.125 2024-09-18 12:36:02,793 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:36:10,419 INFO [train.py:1198] (0/2) Epoch 25, batch 2400, loss[loss=0.2296, ctc_loss=0.1194, cr_loss=0.3475, attn_decoder_loss=0.2341, over 29518.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1277, cr_loss=0.3701, attn_decoder_loss=0.2458, over 5808367.93 frames. ], batch size: 76, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 12:36:13,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=444000.0, ans=0.05 2024-09-18 12:36:19,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=444000.0, ans=0.125 2024-09-18 12:36:43,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-09-18 12:36:45,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=444080.0, ans=0.125 2024-09-18 12:36:59,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444120.0, ans=0.0 2024-09-18 12:37:08,432 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:37:26,070 INFO [train.py:1198] (0/2) Epoch 25, batch 2450, loss[loss=0.2454, ctc_loss=0.1256, cr_loss=0.3734, attn_decoder_loss=0.2504, over 29720.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1283, cr_loss=0.3714, attn_decoder_loss=0.2468, over 5786358.89 frames. ], batch size: 82, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:37:34,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-09-18 12:37:40,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.902e+01 9.594e+01 1.053e+02 2.320e+02, threshold=1.919e+02, percent-clipped=3.0 2024-09-18 12:38:07,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=444280.0, ans=0.125 2024-09-18 12:38:36,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=444360.0, ans=0.125 2024-09-18 12:38:43,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.10 vs. limit=15.0 2024-09-18 12:38:43,749 INFO [train.py:1198] (0/2) Epoch 25, batch 2500, loss[loss=0.2446, ctc_loss=0.1192, cr_loss=0.3593, attn_decoder_loss=0.2506, over 29614.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1285, cr_loss=0.3721, attn_decoder_loss=0.2468, over 5796785.94 frames. ], batch size: 86, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:39:19,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=444480.0, ans=0.125 2024-09-18 12:39:34,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=444520.0, ans=0.025 2024-09-18 12:39:35,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=444520.0, ans=0.125 2024-09-18 12:39:41,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=444520.0, ans=0.95 2024-09-18 12:39:45,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=444560.0, ans=0.1 2024-09-18 12:40:02,052 INFO [train.py:1198] (0/2) Epoch 25, batch 2550, loss[loss=0.2273, ctc_loss=0.1176, cr_loss=0.3568, attn_decoder_loss=0.2316, over 29355.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1281, cr_loss=0.3714, attn_decoder_loss=0.2466, over 5798937.52 frames. ], batch size: 67, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 12:40:17,178 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.248e+01 8.745e+01 9.244e+01 1.627e+02, threshold=1.749e+02, percent-clipped=0.0 2024-09-18 12:41:18,551 INFO [train.py:1198] (0/2) Epoch 25, batch 2600, loss[loss=0.2345, ctc_loss=0.1232, cr_loss=0.3658, attn_decoder_loss=0.2387, over 29449.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1282, cr_loss=0.3711, attn_decoder_loss=0.2469, over 5794332.44 frames. ], batch size: 78, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:41:22,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-18 12:41:27,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=444800.0, ans=0.0 2024-09-18 12:41:44,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=444840.0, ans=0.125 2024-09-18 12:42:03,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444880.0, ans=0.1 2024-09-18 12:42:15,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-18 12:42:24,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=444960.0, ans=0.95 2024-09-18 12:42:25,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=444960.0, ans=0.125 2024-09-18 12:42:36,073 INFO [train.py:1198] (0/2) Epoch 25, batch 2650, loss[loss=0.2559, ctc_loss=0.1412, cr_loss=0.4102, attn_decoder_loss=0.2595, over 29261.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1279, cr_loss=0.3711, attn_decoder_loss=0.2467, over 5801391.12 frames. ], batch size: 100, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:42:42,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=12.0 2024-09-18 12:42:48,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445000.0, ans=0.1 2024-09-18 12:42:51,111 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.486e+01 8.913e+01 9.474e+01 1.768e+02, threshold=1.783e+02, percent-clipped=1.0 2024-09-18 12:43:02,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=445040.0, ans=15.0 2024-09-18 12:43:03,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=445040.0, ans=0.2 2024-09-18 12:43:04,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=445080.0, ans=0.0 2024-09-18 12:43:34,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=445120.0, ans=0.125 2024-09-18 12:43:38,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=445160.0, ans=0.025 2024-09-18 12:43:53,472 INFO [train.py:1198] (0/2) Epoch 25, batch 2700, loss[loss=0.2535, ctc_loss=0.1305, cr_loss=0.3848, attn_decoder_loss=0.2586, over 29517.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1288, cr_loss=0.3725, attn_decoder_loss=0.2474, over 5795328.70 frames. ], batch size: 87, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:44:26,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=445280.0, ans=0.0 2024-09-18 12:44:36,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=445280.0, ans=0.125 2024-09-18 12:44:43,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-09-18 12:45:02,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=445360.0, ans=0.1 2024-09-18 12:45:09,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=8.0 2024-09-18 12:45:09,582 INFO [train.py:1198] (0/2) Epoch 25, batch 2750, loss[loss=0.2308, ctc_loss=0.1182, cr_loss=0.3492, attn_decoder_loss=0.2355, over 29517.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1281, cr_loss=0.3713, attn_decoder_loss=0.2463, over 5793317.83 frames. ], batch size: 75, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:45:13,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=445400.0, ans=0.0 2024-09-18 12:45:17,536 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:45:24,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.540e+01 9.041e+01 9.626e+01 3.086e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-18 12:45:35,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=445440.0, ans=0.0 2024-09-18 12:45:41,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=22.5 2024-09-18 12:45:49,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-09-18 12:45:58,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.67 vs. limit=22.5 2024-09-18 12:45:59,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=445520.0, ans=0.125 2024-09-18 12:46:04,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=22.5 2024-09-18 12:46:28,563 INFO [train.py:1198] (0/2) Epoch 25, batch 2800, loss[loss=0.2624, ctc_loss=0.1542, cr_loss=0.3821, attn_decoder_loss=0.266, over 19898.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1286, cr_loss=0.3719, attn_decoder_loss=0.2465, over 5772895.37 frames. ], batch size: 210, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 12:46:28,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=445600.0, ans=0.125 2024-09-18 12:46:33,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=445600.0, ans=0.125 2024-09-18 12:46:52,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=445640.0, ans=0.125 2024-09-18 12:47:10,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=445680.0, ans=0.07 2024-09-18 12:47:14,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=445720.0, ans=0.125 2024-09-18 12:47:17,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=445720.0, ans=0.125 2024-09-18 12:47:28,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=445720.0, ans=0.0 2024-09-18 12:47:46,147 INFO [train.py:1198] (0/2) Epoch 25, batch 2850, loss[loss=0.2244, ctc_loss=0.1178, cr_loss=0.356, attn_decoder_loss=0.2284, over 29505.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1287, cr_loss=0.3724, attn_decoder_loss=0.2467, over 5758423.92 frames. ], batch size: 77, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:47:49,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=445800.0, ans=0.125 2024-09-18 12:47:50,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=445800.0, ans=0.2 2024-09-18 12:47:53,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445800.0, ans=0.1 2024-09-18 12:47:57,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=445800.0, ans=0.125 2024-09-18 12:48:02,747 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.566e+01 9.294e+01 9.797e+01 4.897e+02, threshold=1.859e+02, percent-clipped=4.0 2024-09-18 12:48:12,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=445840.0, ans=0.2 2024-09-18 12:48:35,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=445920.0, ans=0.025 2024-09-18 12:49:01,845 INFO [train.py:1198] (0/2) Epoch 25, batch 2900, loss[loss=0.243, ctc_loss=0.1356, cr_loss=0.3923, attn_decoder_loss=0.2462, over 29412.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1294, cr_loss=0.3744, attn_decoder_loss=0.248, over 5784675.55 frames. ], batch size: 79, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:49:03,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=446000.0, ans=0.0 2024-09-18 12:49:04,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-09-18 12:49:09,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=446000.0, ans=0.2 2024-09-18 12:49:23,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=446040.0, ans=0.1 2024-09-18 12:49:45,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=446080.0, ans=0.125 2024-09-18 12:50:03,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=446160.0, ans=0.1 2024-09-18 12:50:06,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=446160.0, ans=0.125 2024-09-18 12:50:19,439 INFO [train.py:1198] (0/2) Epoch 25, batch 2950, loss[loss=0.2314, ctc_loss=0.1189, cr_loss=0.3632, attn_decoder_loss=0.2358, over 29501.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.128, cr_loss=0.3716, attn_decoder_loss=0.2465, over 5780423.70 frames. ], batch size: 75, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:50:19,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446200.0, ans=0.1 2024-09-18 12:50:22,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=446200.0, ans=0.0 2024-09-18 12:50:25,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=446200.0, ans=0.125 2024-09-18 12:50:25,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446200.0, ans=0.1 2024-09-18 12:50:34,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=446240.0, ans=0.125 2024-09-18 12:50:36,057 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.376e+01 8.898e+01 9.637e+01 1.288e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-18 12:50:41,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=446240.0, ans=0.0 2024-09-18 12:50:45,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=446240.0, ans=0.0 2024-09-18 12:50:52,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.75 vs. limit=15.0 2024-09-18 12:50:59,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=446280.0, ans=0.125 2024-09-18 12:51:07,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=446320.0, ans=0.95 2024-09-18 12:51:38,084 INFO [train.py:1198] (0/2) Epoch 25, batch 3000, loss[loss=0.2422, ctc_loss=0.1289, cr_loss=0.3786, attn_decoder_loss=0.2464, over 29730.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1278, cr_loss=0.3711, attn_decoder_loss=0.2464, over 5780386.62 frames. ], batch size: 81, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:51:38,085 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 12:51:56,649 INFO [train.py:1230] (0/2) Epoch 25, validation: loss=0.2113, ctc_loss=0.03809, cr_loss=5.582e-15, attn_decoder_loss=0.2305, over 944034.00 frames. 2024-09-18 12:51:56,650 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 12:51:58,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=446400.0, ans=0.0 2024-09-18 12:52:01,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=446400.0, ans=0.125 2024-09-18 12:52:22,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=446440.0, ans=0.125 2024-09-18 12:52:24,352 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:52:34,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=446480.0, ans=0.125 2024-09-18 12:53:05,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.22 vs. limit=22.5 2024-09-18 12:53:12,532 INFO [train.py:1198] (0/2) Epoch 25, batch 3050, loss[loss=0.231, ctc_loss=0.119, cr_loss=0.3568, attn_decoder_loss=0.2355, over 29529.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1283, cr_loss=0.3727, attn_decoder_loss=0.2473, over 5774679.08 frames. ], batch size: 76, lr: 4.43e-03, grad_scale: 8.0 2024-09-18 12:53:18,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=446600.0, ans=0.125 2024-09-18 12:53:19,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=446600.0, ans=0.0 2024-09-18 12:53:31,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.617e+01 9.221e+01 9.973e+01 3.035e+02, threshold=1.844e+02, percent-clipped=2.0 2024-09-18 12:53:42,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=446640.0, ans=0.2 2024-09-18 12:53:54,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=446680.0, ans=0.125 2024-09-18 12:54:08,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=446720.0, ans=0.0 2024-09-18 12:54:10,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=446720.0, ans=0.035 2024-09-18 12:54:30,187 INFO [train.py:1198] (0/2) Epoch 25, batch 3100, loss[loss=0.2542, ctc_loss=0.1379, cr_loss=0.3973, attn_decoder_loss=0.2583, over 29311.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1283, cr_loss=0.3725, attn_decoder_loss=0.2469, over 5774939.09 frames. ], batch size: 100, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:54:36,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=446800.0, ans=0.125 2024-09-18 12:54:46,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=446840.0, ans=0.125 2024-09-18 12:55:01,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446880.0, ans=0.1 2024-09-18 12:55:05,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=446880.0, ans=0.125 2024-09-18 12:55:16,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=446920.0, ans=0.0 2024-09-18 12:55:21,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=446920.0, ans=0.125 2024-09-18 12:55:27,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=446920.0, ans=0.125 2024-09-18 12:55:28,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=446920.0, ans=0.125 2024-09-18 12:55:48,326 INFO [train.py:1198] (0/2) Epoch 25, batch 3150, loss[loss=0.2533, ctc_loss=0.1388, cr_loss=0.3788, attn_decoder_loss=0.2576, over 28882.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1285, cr_loss=0.3731, attn_decoder_loss=0.2472, over 5781386.18 frames. ], batch size: 104, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:56:05,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.618e+01 9.043e+01 9.824e+01 1.542e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 12:56:22,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=447080.0, ans=0.025 2024-09-18 12:56:32,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=447120.0, ans=0.125 2024-09-18 12:56:32,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=447120.0, ans=0.0 2024-09-18 12:56:42,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=447120.0, ans=0.125 2024-09-18 12:56:46,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447120.0, ans=0.125 2024-09-18 12:56:47,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447160.0, ans=0.1 2024-09-18 12:56:52,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-18 12:57:04,425 INFO [train.py:1198] (0/2) Epoch 25, batch 3200, loss[loss=0.2463, ctc_loss=0.1307, cr_loss=0.3501, attn_decoder_loss=0.2514, over 29392.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1284, cr_loss=0.3719, attn_decoder_loss=0.2468, over 5791620.51 frames. ], batch size: 79, lr: 4.42e-03, grad_scale: 16.0 2024-09-18 12:57:17,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=447200.0, ans=0.05 2024-09-18 12:57:35,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=447280.0, ans=0.025 2024-09-18 12:57:58,756 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:58:18,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=447360.0, ans=0.125 2024-09-18 12:58:22,740 INFO [train.py:1198] (0/2) Epoch 25, batch 3250, loss[loss=0.2478, ctc_loss=0.1275, cr_loss=0.3754, attn_decoder_loss=0.2528, over 29720.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1286, cr_loss=0.3721, attn_decoder_loss=0.247, over 5798919.73 frames. ], batch size: 84, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:58:23,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=447400.0, ans=0.0 2024-09-18 12:58:40,941 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.602e+01 9.212e+01 9.778e+01 1.600e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-18 12:58:47,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447440.0, ans=0.1 2024-09-18 12:58:49,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=447440.0, ans=0.2 2024-09-18 12:58:55,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=12.0 2024-09-18 12:58:56,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-09-18 12:58:58,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447480.0, ans=0.1 2024-09-18 12:59:01,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=447480.0, ans=0.1 2024-09-18 12:59:10,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=447520.0, ans=0.0 2024-09-18 12:59:13,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=447520.0, ans=0.125 2024-09-18 12:59:21,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=447520.0, ans=0.125 2024-09-18 12:59:30,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447560.0, ans=0.1 2024-09-18 12:59:33,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=447560.0, ans=0.05 2024-09-18 12:59:40,969 INFO [train.py:1198] (0/2) Epoch 25, batch 3300, loss[loss=0.2429, ctc_loss=0.1221, cr_loss=0.3657, attn_decoder_loss=0.2482, over 28419.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1274, cr_loss=0.3699, attn_decoder_loss=0.2455, over 5797153.84 frames. ], batch size: 111, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 12:59:43,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=447600.0, ans=10.0 2024-09-18 13:00:05,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=447640.0, ans=0.0 2024-09-18 13:00:28,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=447720.0, ans=0.0 2024-09-18 13:00:34,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447720.0, ans=0.1 2024-09-18 13:00:37,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=447720.0, ans=0.0 2024-09-18 13:00:48,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=447760.0, ans=0.125 2024-09-18 13:00:58,871 INFO [train.py:1198] (0/2) Epoch 25, batch 3350, loss[loss=0.2529, ctc_loss=0.1391, cr_loss=0.3976, attn_decoder_loss=0.2567, over 28864.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.128, cr_loss=0.3713, attn_decoder_loss=0.2462, over 5773403.49 frames. ], batch size: 104, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:01:17,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.840e+01 9.298e+01 1.002e+02 3.178e+02, threshold=1.860e+02, percent-clipped=4.0 2024-09-18 13:01:19,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=447840.0, ans=22.5 2024-09-18 13:01:25,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=447840.0, ans=0.125 2024-09-18 13:01:27,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-18 13:01:28,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=447880.0, ans=0.0 2024-09-18 13:01:35,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=447880.0, ans=0.125 2024-09-18 13:01:48,003 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:02:04,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=447960.0, ans=0.125 2024-09-18 13:02:13,999 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-112000.pt 2024-09-18 13:02:22,682 INFO [train.py:1198] (0/2) Epoch 25, batch 3400, loss[loss=0.2113, ctc_loss=0.1139, cr_loss=0.3381, attn_decoder_loss=0.2146, over 29380.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1284, cr_loss=0.3719, attn_decoder_loss=0.2463, over 5766485.63 frames. ], batch size: 67, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:02:22,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=448000.0, ans=0.125 2024-09-18 13:02:33,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=448000.0, ans=0.125 2024-09-18 13:02:46,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2024-09-18 13:02:59,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=448080.0, ans=0.125 2024-09-18 13:03:02,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-09-18 13:03:10,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=448120.0, ans=0.0 2024-09-18 13:03:17,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=15.0 2024-09-18 13:03:18,755 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:03:27,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448160.0, ans=0.1 2024-09-18 13:03:30,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=448160.0, ans=0.2 2024-09-18 13:03:38,933 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2024-09-18 13:03:40,799 INFO [train.py:1198] (0/2) Epoch 25, batch 3450, loss[loss=0.2586, ctc_loss=0.1349, cr_loss=0.3881, attn_decoder_loss=0.2637, over 28121.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1286, cr_loss=0.3723, attn_decoder_loss=0.2469, over 5773500.77 frames. ], batch size: 111, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:03:44,284 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:03:58,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.450e+01 9.075e+01 9.587e+01 1.383e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 13:04:06,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=22.5 2024-09-18 13:04:08,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448240.0, ans=0.1 2024-09-18 13:04:31,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=448320.0, ans=0.035 2024-09-18 13:04:37,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=448320.0, ans=10.0 2024-09-18 13:04:47,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=448360.0, ans=0.125 2024-09-18 13:04:57,768 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:04:58,899 INFO [train.py:1198] (0/2) Epoch 25, batch 3500, loss[loss=0.2171, ctc_loss=0.1025, cr_loss=0.3181, attn_decoder_loss=0.2228, over 29333.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1281, cr_loss=0.3711, attn_decoder_loss=0.2462, over 5776705.94 frames. ], batch size: 71, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:05:14,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=448440.0, ans=0.125 2024-09-18 13:05:14,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=448440.0, ans=0.04949747468305833 2024-09-18 13:05:30,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-09-18 13:05:48,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=448520.0, ans=0.07 2024-09-18 13:05:54,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=448520.0, ans=0.125 2024-09-18 13:05:55,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=448520.0, ans=0.1 2024-09-18 13:06:13,960 INFO [train.py:1198] (0/2) Epoch 25, batch 3550, loss[loss=0.2577, ctc_loss=0.1353, cr_loss=0.3942, attn_decoder_loss=0.2625, over 29706.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1279, cr_loss=0.371, attn_decoder_loss=0.2462, over 5784502.79 frames. ], batch size: 89, lr: 4.42e-03, grad_scale: 8.0 2024-09-18 13:06:15,725 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:06:21,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=448600.0, ans=0.0 2024-09-18 13:06:28,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=448640.0, ans=0.125 2024-09-18 13:06:31,448 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.657e+01 9.167e+01 9.744e+01 2.782e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-18 13:06:34,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=448640.0, ans=0.0 2024-09-18 13:07:04,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=448720.0, ans=0.0 2024-09-18 13:07:07,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=448720.0, ans=0.025 2024-09-18 13:07:16,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=448760.0, ans=0.2 2024-09-18 13:07:24,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.07 vs. limit=15.0 2024-09-18 13:07:28,554 INFO [train.py:1198] (0/2) Epoch 25, batch 3600, loss[loss=0.2367, ctc_loss=0.123, cr_loss=0.3573, attn_decoder_loss=0.2414, over 29478.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1276, cr_loss=0.3709, attn_decoder_loss=0.2462, over 5793848.34 frames. ], batch size: 77, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 13:07:41,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=448800.0, ans=0.0 2024-09-18 13:08:03,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=448880.0, ans=0.0 2024-09-18 13:08:05,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=448880.0, ans=0.125 2024-09-18 13:08:18,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=448920.0, ans=0.125 2024-09-18 13:08:21,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=448920.0, ans=0.125 2024-09-18 13:08:23,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=448920.0, ans=0.5 2024-09-18 13:08:30,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=448960.0, ans=0.0 2024-09-18 13:08:33,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=448960.0, ans=0.125 2024-09-18 13:08:39,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=448960.0, ans=0.2 2024-09-18 13:08:44,785 INFO [train.py:1198] (0/2) Epoch 25, batch 3650, loss[loss=0.2662, ctc_loss=0.1461, cr_loss=0.4305, attn_decoder_loss=0.27, over 29509.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1274, cr_loss=0.3698, attn_decoder_loss=0.2459, over 5795317.69 frames. ], batch size: 90, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 13:08:53,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=449000.0, ans=22.5 2024-09-18 13:08:55,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=449000.0, ans=0.07 2024-09-18 13:09:02,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.539e+01 8.955e+01 9.424e+01 1.447e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 13:09:18,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2024-09-18 13:09:41,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=449120.0, ans=0.2 2024-09-18 13:09:44,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=449160.0, ans=0.125 2024-09-18 13:09:59,592 INFO [train.py:1198] (0/2) Epoch 25, batch 3700, loss[loss=0.261, ctc_loss=0.1398, cr_loss=0.3904, attn_decoder_loss=0.2658, over 29712.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1275, cr_loss=0.3703, attn_decoder_loss=0.2462, over 5805356.94 frames. ], batch size: 84, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:09:59,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=449200.0, ans=0.0 2024-09-18 13:10:16,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=449240.0, ans=0.025 2024-09-18 13:10:16,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=449240.0, ans=0.0 2024-09-18 13:10:18,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=449240.0, ans=0.1 2024-09-18 13:10:22,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.24 vs. limit=15.0 2024-09-18 13:10:31,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=449280.0, ans=0.125 2024-09-18 13:10:44,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449320.0, ans=0.1 2024-09-18 13:10:56,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=449320.0, ans=0.2 2024-09-18 13:11:15,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=449400.0, ans=0.0 2024-09-18 13:11:16,099 INFO [train.py:1198] (0/2) Epoch 25, batch 3750, loss[loss=0.2194, ctc_loss=0.1201, cr_loss=0.3559, attn_decoder_loss=0.2225, over 29358.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1278, cr_loss=0.3709, attn_decoder_loss=0.2461, over 5808679.54 frames. ], batch size: 67, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:11:25,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=449400.0, ans=0.125 2024-09-18 13:11:25,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=449400.0, ans=0.125 2024-09-18 13:11:26,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=449400.0, ans=0.0 2024-09-18 13:11:31,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=449440.0, ans=0.1 2024-09-18 13:11:35,578 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.392e+01 8.983e+01 9.467e+01 5.174e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 13:11:37,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.08 vs. limit=22.5 2024-09-18 13:11:43,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=449440.0, ans=0.125 2024-09-18 13:11:58,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-09-18 13:12:20,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=449560.0, ans=0.125 2024-09-18 13:12:22,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=449560.0, ans=0.0 2024-09-18 13:12:23,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=449560.0, ans=0.2 2024-09-18 13:12:26,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=449560.0, ans=0.125 2024-09-18 13:12:28,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=449560.0, ans=0.95 2024-09-18 13:12:31,189 INFO [train.py:1198] (0/2) Epoch 25, batch 3800, loss[loss=0.246, ctc_loss=0.124, cr_loss=0.3757, attn_decoder_loss=0.2512, over 29606.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1274, cr_loss=0.3699, attn_decoder_loss=0.2457, over 5800381.58 frames. ], batch size: 86, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:12:34,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=449600.0, ans=0.125 2024-09-18 13:12:36,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=449600.0, ans=0.125 2024-09-18 13:12:58,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.08 vs. limit=10.0 2024-09-18 13:13:19,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.80 vs. limit=10.0 2024-09-18 13:13:35,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=449760.0, ans=0.125 2024-09-18 13:13:47,363 INFO [train.py:1198] (0/2) Epoch 25, batch 3850, loss[loss=0.2638, ctc_loss=0.1488, cr_loss=0.4071, attn_decoder_loss=0.2675, over 29272.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1272, cr_loss=0.3698, attn_decoder_loss=0.2454, over 5815269.13 frames. ], batch size: 100, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:13:47,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=449800.0, ans=0.125 2024-09-18 13:13:53,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-09-18 13:14:06,411 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.692e+01 9.184e+01 9.971e+01 1.957e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 13:14:17,358 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:14:24,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=449880.0, ans=0.125 2024-09-18 13:14:31,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=449920.0, ans=0.125 2024-09-18 13:14:33,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-18 13:14:37,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.18 vs. limit=10.0 2024-09-18 13:15:02,115 INFO [train.py:1198] (0/2) Epoch 25, batch 3900, loss[loss=0.2513, ctc_loss=0.1365, cr_loss=0.3823, attn_decoder_loss=0.2556, over 29631.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1276, cr_loss=0.3707, attn_decoder_loss=0.246, over 5818984.38 frames. ], batch size: 86, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:15:05,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=450000.0, ans=0.0 2024-09-18 13:15:15,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=450040.0, ans=0.125 2024-09-18 13:15:17,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.23 vs. limit=22.5 2024-09-18 13:15:29,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=450040.0, ans=0.2 2024-09-18 13:15:37,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=12.0 2024-09-18 13:15:49,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450120.0, ans=0.125 2024-09-18 13:15:51,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=450120.0, ans=0.0 2024-09-18 13:16:10,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=450160.0, ans=0.95 2024-09-18 13:16:16,554 INFO [train.py:1198] (0/2) Epoch 25, batch 3950, loss[loss=0.2576, ctc_loss=0.1443, cr_loss=0.4002, attn_decoder_loss=0.2613, over 29447.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1273, cr_loss=0.3702, attn_decoder_loss=0.246, over 5837898.56 frames. ], batch size: 97, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:16:32,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2024-09-18 13:16:37,511 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.544e+01 9.055e+01 9.627e+01 1.387e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-18 13:16:48,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=22.5 2024-09-18 13:16:54,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=450280.0, ans=0.0 2024-09-18 13:17:32,514 INFO [train.py:1198] (0/2) Epoch 25, batch 4000, loss[loss=0.2277, ctc_loss=0.1119, cr_loss=0.329, attn_decoder_loss=0.2333, over 29510.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1276, cr_loss=0.3702, attn_decoder_loss=0.246, over 5814711.41 frames. ], batch size: 74, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 13:17:40,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=450400.0, ans=0.0 2024-09-18 13:17:45,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2024-09-18 13:18:00,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=450480.0, ans=0.0 2024-09-18 13:18:15,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=450520.0, ans=0.2 2024-09-18 13:18:48,056 INFO [train.py:1198] (0/2) Epoch 25, batch 4050, loss[loss=0.2633, ctc_loss=0.1592, cr_loss=0.3871, attn_decoder_loss=0.2663, over 20209.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1279, cr_loss=0.3708, attn_decoder_loss=0.2461, over 5798214.66 frames. ], batch size: 210, lr: 4.41e-03, grad_scale: 8.0 2024-09-18 13:18:51,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450600.0, ans=0.1 2024-09-18 13:18:54,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-18 13:18:57,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=450600.0, ans=0.125 2024-09-18 13:19:05,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-09-18 13:19:05,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450640.0, ans=0.1 2024-09-18 13:19:08,360 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.851e+01 9.697e+01 1.095e+02 3.076e+02, threshold=1.939e+02, percent-clipped=2.0 2024-09-18 13:19:28,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.90 vs. limit=22.5 2024-09-18 13:19:31,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=450720.0, ans=0.125 2024-09-18 13:19:37,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=450720.0, ans=0.0 2024-09-18 13:19:37,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=450720.0, ans=0.125 2024-09-18 13:19:54,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450760.0, ans=0.1 2024-09-18 13:20:01,374 INFO [train.py:1198] (0/2) Epoch 25, batch 4100, loss[loss=0.2624, ctc_loss=0.1437, cr_loss=0.396, attn_decoder_loss=0.2668, over 29522.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1284, cr_loss=0.3716, attn_decoder_loss=0.2463, over 5793822.08 frames. ], batch size: 90, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:20:06,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2024-09-18 13:20:08,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=450800.0, ans=0.0 2024-09-18 13:20:15,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-18 13:20:23,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=450840.0, ans=0.125 2024-09-18 13:20:32,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450880.0, ans=0.125 2024-09-18 13:20:37,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=22.5 2024-09-18 13:20:39,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=450880.0, ans=0.0 2024-09-18 13:20:40,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-09-18 13:20:51,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=450920.0, ans=0.025 2024-09-18 13:21:00,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=450960.0, ans=0.1 2024-09-18 13:21:01,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=22.5 2024-09-18 13:21:01,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=450960.0, ans=0.025 2024-09-18 13:21:14,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=451000.0, ans=0.125 2024-09-18 13:21:16,040 INFO [train.py:1198] (0/2) Epoch 25, batch 4150, loss[loss=0.2338, ctc_loss=0.1288, cr_loss=0.3636, attn_decoder_loss=0.2374, over 29495.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1281, cr_loss=0.3715, attn_decoder_loss=0.246, over 5799471.11 frames. ], batch size: 77, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:21:36,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.352e+01 8.963e+01 9.819e+01 3.617e+02, threshold=1.793e+02, percent-clipped=2.0 2024-09-18 13:21:44,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=451080.0, ans=0.125 2024-09-18 13:21:47,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=451080.0, ans=0.0 2024-09-18 13:21:47,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=451080.0, ans=0.2 2024-09-18 13:22:02,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-09-18 13:22:19,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=451160.0, ans=0.125 2024-09-18 13:22:30,265 INFO [train.py:1198] (0/2) Epoch 25, batch 4200, loss[loss=0.2655, ctc_loss=0.152, cr_loss=0.4201, attn_decoder_loss=0.2688, over 29506.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1283, cr_loss=0.372, attn_decoder_loss=0.2463, over 5801636.99 frames. ], batch size: 90, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:22:32,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=451200.0, ans=0.025 2024-09-18 13:23:15,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=22.5 2024-09-18 13:23:35,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=451360.0, ans=0.2 2024-09-18 13:23:36,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=451360.0, ans=0.125 2024-09-18 13:23:45,266 INFO [train.py:1198] (0/2) Epoch 25, batch 4250, loss[loss=0.2316, ctc_loss=0.1156, cr_loss=0.3418, attn_decoder_loss=0.2369, over 29511.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1278, cr_loss=0.3713, attn_decoder_loss=0.2464, over 5807612.38 frames. ], batch size: 74, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:23:45,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=451400.0, ans=0.5 2024-09-18 13:24:03,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=451440.0, ans=0.0 2024-09-18 13:24:04,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451440.0, ans=0.125 2024-09-18 13:24:05,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.84 vs. limit=22.5 2024-09-18 13:24:05,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.410e+01 8.848e+01 9.485e+01 3.555e+02, threshold=1.770e+02, percent-clipped=1.0 2024-09-18 13:24:17,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=451480.0, ans=0.0 2024-09-18 13:24:25,221 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:24:26,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=451480.0, ans=0.1 2024-09-18 13:24:36,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=451520.0, ans=0.0 2024-09-18 13:24:51,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=451560.0, ans=0.0 2024-09-18 13:24:57,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=451560.0, ans=0.1 2024-09-18 13:24:58,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=451600.0, ans=0.0 2024-09-18 13:24:59,841 INFO [train.py:1198] (0/2) Epoch 25, batch 4300, loss[loss=0.2433, ctc_loss=0.1265, cr_loss=0.3685, attn_decoder_loss=0.2481, over 29549.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1275, cr_loss=0.3705, attn_decoder_loss=0.2464, over 5796859.01 frames. ], batch size: 87, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:25:04,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=451600.0, ans=0.0 2024-09-18 13:25:05,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2024-09-18 13:25:18,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-09-18 13:25:35,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=451680.0, ans=0.2 2024-09-18 13:25:44,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=451720.0, ans=0.125 2024-09-18 13:25:49,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=451720.0, ans=0.125 2024-09-18 13:26:00,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=451760.0, ans=0.125 2024-09-18 13:26:13,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-09-18 13:26:15,314 INFO [train.py:1198] (0/2) Epoch 25, batch 4350, loss[loss=0.2521, ctc_loss=0.1367, cr_loss=0.3751, attn_decoder_loss=0.2566, over 29518.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1306, cr_loss=0.3768, attn_decoder_loss=0.2498, over 5799080.20 frames. ], batch size: 97, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:26:20,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451800.0, ans=0.1 2024-09-18 13:26:24,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=451800.0, ans=0.0 2024-09-18 13:26:29,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451840.0, ans=0.1 2024-09-18 13:26:30,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=451840.0, ans=0.0 2024-09-18 13:26:36,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 8.772e+01 9.206e+01 9.719e+01 3.076e+02, threshold=1.841e+02, percent-clipped=2.0 2024-09-18 13:26:45,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451880.0, ans=0.1 2024-09-18 13:27:00,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=451920.0, ans=0.125 2024-09-18 13:27:04,141 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:27:18,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=451960.0, ans=0.125 2024-09-18 13:27:21,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=451960.0, ans=0.1 2024-09-18 13:27:29,898 INFO [train.py:1198] (0/2) Epoch 25, batch 4400, loss[loss=0.2537, ctc_loss=0.1395, cr_loss=0.3822, attn_decoder_loss=0.2579, over 27221.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1322, cr_loss=0.3801, attn_decoder_loss=0.2519, over 5769166.07 frames. ], batch size: 124, lr: 4.40e-03, grad_scale: 16.0 2024-09-18 13:27:37,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=452000.0, ans=0.07 2024-09-18 13:27:40,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=452000.0, ans=0.125 2024-09-18 13:27:45,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=12.0 2024-09-18 13:27:50,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=452040.0, ans=0.125 2024-09-18 13:27:52,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=452040.0, ans=0.1 2024-09-18 13:27:54,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-09-18 13:28:09,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=452080.0, ans=0.09899494936611666 2024-09-18 13:28:11,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=452080.0, ans=0.2 2024-09-18 13:28:44,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=22.5 2024-09-18 13:28:44,345 INFO [train.py:1198] (0/2) Epoch 25, batch 4450, loss[loss=0.2669, ctc_loss=0.158, cr_loss=0.3915, attn_decoder_loss=0.2703, over 20212.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1364, cr_loss=0.385, attn_decoder_loss=0.2545, over 5575090.92 frames. ], batch size: 209, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:28:44,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=452200.0, ans=0.025 2024-09-18 13:28:49,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452200.0, ans=0.1 2024-09-18 13:28:50,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=452200.0, ans=0.0 2024-09-18 13:29:07,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.138e+01 9.104e+01 9.870e+01 1.187e+02 3.111e+02, threshold=1.974e+02, percent-clipped=3.0 2024-09-18 13:29:10,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=452240.0, ans=0.2 2024-09-18 13:29:31,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2024-09-18 13:29:34,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=452320.0, ans=0.125 2024-09-18 13:29:42,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=452320.0, ans=0.0 2024-09-18 13:30:00,168 INFO [train.py:1198] (0/2) Epoch 25, batch 4500, loss[loss=0.2546, ctc_loss=0.1459, cr_loss=0.3798, attn_decoder_loss=0.2582, over 19677.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1404, cr_loss=0.3873, attn_decoder_loss=0.2566, over 5235547.44 frames. ], batch size: 209, lr: 4.40e-03, grad_scale: 8.0 2024-09-18 13:30:20,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=452440.0, ans=0.125 2024-09-18 13:30:23,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452440.0, ans=0.1 2024-09-18 13:30:30,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452480.0, ans=0.0 2024-09-18 13:30:37,649 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-25.pt 2024-09-18 13:31:31,899 INFO [train.py:1198] (0/2) Epoch 26, batch 0, loss[loss=0.2292, ctc_loss=0.1129, cr_loss=0.3446, attn_decoder_loss=0.2344, over 29599.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1129, cr_loss=0.3446, attn_decoder_loss=0.2344, over 29599.00 frames. ], batch size: 73, lr: 4.31e-03, grad_scale: 16.0 2024-09-18 13:31:31,900 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 13:31:52,393 INFO [train.py:1230] (0/2) Epoch 26, validation: loss=0.2126, ctc_loss=0.03779, cr_loss=5.994e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-18 13:31:52,394 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 13:31:58,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=452500.0, ans=0.0 2024-09-18 13:32:06,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452540.0, ans=0.1 2024-09-18 13:32:15,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452540.0, ans=0.1 2024-09-18 13:32:39,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=452620.0, ans=0.2 2024-09-18 13:32:45,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452620.0, ans=0.1 2024-09-18 13:32:52,739 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.299e+01 1.068e+02 1.174e+02 2.339e+02, threshold=2.135e+02, percent-clipped=1.0 2024-09-18 13:33:07,811 INFO [train.py:1198] (0/2) Epoch 26, batch 50, loss[loss=0.2047, ctc_loss=0.09705, cr_loss=0.3127, attn_decoder_loss=0.2097, over 29467.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1302, cr_loss=0.3758, attn_decoder_loss=0.2481, over 1267884.01 frames. ], batch size: 70, lr: 4.31e-03, grad_scale: 16.0 2024-09-18 13:33:08,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=452700.0, ans=0.0 2024-09-18 13:33:53,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2024-09-18 13:34:18,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=452860.0, ans=0.0 2024-09-18 13:34:24,115 INFO [train.py:1198] (0/2) Epoch 26, batch 100, loss[loss=0.231, ctc_loss=0.1259, cr_loss=0.3824, attn_decoder_loss=0.2341, over 29546.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1302, cr_loss=0.3753, attn_decoder_loss=0.2494, over 2252493.90 frames. ], batch size: 76, lr: 4.31e-03, grad_scale: 8.0 2024-09-18 13:34:42,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=452940.0, ans=0.0 2024-09-18 13:34:57,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=452980.0, ans=0.2 2024-09-18 13:35:27,635 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.544e+01 8.982e+01 9.348e+01 1.241e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-18 13:35:43,420 INFO [train.py:1198] (0/2) Epoch 26, batch 150, loss[loss=0.2223, ctc_loss=0.1209, cr_loss=0.3668, attn_decoder_loss=0.2254, over 29457.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1283, cr_loss=0.3717, attn_decoder_loss=0.2473, over 3048354.46 frames. ], batch size: 70, lr: 4.31e-03, grad_scale: 8.0 2024-09-18 13:35:55,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=453100.0, ans=0.1 2024-09-18 13:35:58,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453140.0, ans=0.1 2024-09-18 13:36:02,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453140.0, ans=0.1 2024-09-18 13:36:22,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=453180.0, ans=10.0 2024-09-18 13:36:29,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=453220.0, ans=0.125 2024-09-18 13:36:31,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.02 vs. limit=15.0 2024-09-18 13:36:50,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=453260.0, ans=0.0 2024-09-18 13:36:58,956 INFO [train.py:1198] (0/2) Epoch 26, batch 200, loss[loss=0.2519, ctc_loss=0.1368, cr_loss=0.3876, attn_decoder_loss=0.2561, over 27426.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.128, cr_loss=0.371, attn_decoder_loss=0.2467, over 3660028.03 frames. ], batch size: 124, lr: 4.31e-03, grad_scale: 8.0 2024-09-18 13:37:27,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=453380.0, ans=0.0 2024-09-18 13:37:36,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=453380.0, ans=0.125 2024-09-18 13:37:36,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=453380.0, ans=0.125 2024-09-18 13:37:55,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=453420.0, ans=0.2 2024-09-18 13:38:00,824 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.772e+01 8.385e+01 8.934e+01 9.482e+01 1.708e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-18 13:38:14,315 INFO [train.py:1198] (0/2) Epoch 26, batch 250, loss[loss=0.2556, ctc_loss=0.1406, cr_loss=0.3865, attn_decoder_loss=0.2597, over 29276.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1271, cr_loss=0.3695, attn_decoder_loss=0.2462, over 4142769.39 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:38:28,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=453540.0, ans=0.125 2024-09-18 13:38:46,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=453580.0, ans=0.2 2024-09-18 13:38:57,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453580.0, ans=0.125 2024-09-18 13:39:03,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453620.0, ans=0.1 2024-09-18 13:39:08,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=453620.0, ans=0.125 2024-09-18 13:39:17,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453660.0, ans=0.1 2024-09-18 13:39:34,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=453700.0, ans=0.2 2024-09-18 13:39:35,292 INFO [train.py:1198] (0/2) Epoch 26, batch 300, loss[loss=0.2462, ctc_loss=0.1332, cr_loss=0.3864, attn_decoder_loss=0.2501, over 29494.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.127, cr_loss=0.3695, attn_decoder_loss=0.2459, over 4510845.22 frames. ], batch size: 92, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:40:01,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=453740.0, ans=0.0 2024-09-18 13:40:04,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453780.0, ans=0.1 2024-09-18 13:40:22,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=453820.0, ans=0.0 2024-09-18 13:40:29,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=453820.0, ans=0.025 2024-09-18 13:40:37,316 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.450e+01 8.925e+01 9.500e+01 1.325e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-18 13:40:50,747 INFO [train.py:1198] (0/2) Epoch 26, batch 350, loss[loss=0.2134, ctc_loss=0.1044, cr_loss=0.3349, attn_decoder_loss=0.218, over 29337.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1272, cr_loss=0.3698, attn_decoder_loss=0.2461, over 4796190.69 frames. ], batch size: 71, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:41:09,167 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:41:43,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=454020.0, ans=0.125 2024-09-18 13:41:47,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454020.0, ans=0.1 2024-09-18 13:42:05,698 INFO [train.py:1198] (0/2) Epoch 26, batch 400, loss[loss=0.2451, ctc_loss=0.1314, cr_loss=0.3728, attn_decoder_loss=0.2495, over 29724.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1265, cr_loss=0.3688, attn_decoder_loss=0.2455, over 5025975.27 frames. ], batch size: 82, lr: 4.30e-03, grad_scale: 16.0 2024-09-18 13:42:24,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=454140.0, ans=0.125 2024-09-18 13:42:50,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=454220.0, ans=0.0 2024-09-18 13:43:07,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.396e+01 8.968e+01 9.786e+01 1.327e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 13:43:26,151 INFO [train.py:1198] (0/2) Epoch 26, batch 450, loss[loss=0.2478, ctc_loss=0.1286, cr_loss=0.3747, attn_decoder_loss=0.2527, over 29682.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1269, cr_loss=0.37, attn_decoder_loss=0.2456, over 5187358.23 frames. ], batch size: 83, lr: 4.30e-03, grad_scale: 16.0 2024-09-18 13:44:12,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454420.0, ans=0.125 2024-09-18 13:44:28,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-18 13:44:32,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=454460.0, ans=0.125 2024-09-18 13:44:42,486 INFO [train.py:1198] (0/2) Epoch 26, batch 500, loss[loss=0.2564, ctc_loss=0.13, cr_loss=0.3716, attn_decoder_loss=0.2621, over 29486.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1265, cr_loss=0.3693, attn_decoder_loss=0.2451, over 5330875.21 frames. ], batch size: 94, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:44:42,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=454500.0, ans=0.125 2024-09-18 13:44:45,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=454500.0, ans=0.0 2024-09-18 13:44:53,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=454500.0, ans=0.0 2024-09-18 13:44:55,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=454500.0, ans=0.2 2024-09-18 13:45:07,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=454540.0, ans=0.04949747468305833 2024-09-18 13:45:24,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=454580.0, ans=0.0 2024-09-18 13:45:40,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=454620.0, ans=0.0 2024-09-18 13:45:46,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.427e+01 8.869e+01 9.503e+01 2.659e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-18 13:45:48,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=454660.0, ans=0.125 2024-09-18 13:45:58,555 INFO [train.py:1198] (0/2) Epoch 26, batch 550, loss[loss=0.2658, ctc_loss=0.1479, cr_loss=0.4119, attn_decoder_loss=0.2698, over 28867.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1267, cr_loss=0.3698, attn_decoder_loss=0.2453, over 5422545.43 frames. ], batch size: 104, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:46:09,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454700.0, ans=0.125 2024-09-18 13:46:38,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=454780.0, ans=0.125 2024-09-18 13:46:47,726 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:46:49,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.92 vs. limit=10.0 2024-09-18 13:46:51,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2024-09-18 13:47:04,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=454860.0, ans=0.0 2024-09-18 13:47:16,789 INFO [train.py:1198] (0/2) Epoch 26, batch 600, loss[loss=0.2472, ctc_loss=0.1307, cr_loss=0.3698, attn_decoder_loss=0.2519, over 29314.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1266, cr_loss=0.3696, attn_decoder_loss=0.2454, over 5509413.92 frames. ], batch size: 100, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:47:21,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=454900.0, ans=0.1 2024-09-18 13:47:21,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=454900.0, ans=0.125 2024-09-18 13:47:23,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=454900.0, ans=0.04949747468305833 2024-09-18 13:47:24,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.95 vs. limit=22.5 2024-09-18 13:47:31,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=454900.0, ans=0.2 2024-09-18 13:47:31,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=454900.0, ans=0.0 2024-09-18 13:47:48,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.68 vs. limit=15.0 2024-09-18 13:48:22,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.526e+01 8.982e+01 9.575e+01 5.252e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-18 13:48:24,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455060.0, ans=0.1 2024-09-18 13:48:30,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=455060.0, ans=0.125 2024-09-18 13:48:31,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=455060.0, ans=0.0 2024-09-18 13:48:34,586 INFO [train.py:1198] (0/2) Epoch 26, batch 650, loss[loss=0.2438, ctc_loss=0.1331, cr_loss=0.3941, attn_decoder_loss=0.2474, over 29772.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1258, cr_loss=0.3681, attn_decoder_loss=0.2447, over 5587581.50 frames. ], batch size: 81, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:48:34,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455100.0, ans=0.1 2024-09-18 13:48:50,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.24 vs. limit=15.0 2024-09-18 13:49:05,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=455180.0, ans=0.1 2024-09-18 13:49:16,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=455180.0, ans=0.0 2024-09-18 13:49:27,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=455220.0, ans=0.1 2024-09-18 13:49:36,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-09-18 13:49:40,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=455260.0, ans=0.2 2024-09-18 13:49:43,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=455260.0, ans=0.125 2024-09-18 13:49:50,939 INFO [train.py:1198] (0/2) Epoch 26, batch 700, loss[loss=0.2233, ctc_loss=0.1142, cr_loss=0.3488, attn_decoder_loss=0.2277, over 29537.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.126, cr_loss=0.3686, attn_decoder_loss=0.2449, over 5637348.01 frames. ], batch size: 76, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:50:07,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=455340.0, ans=0.125 2024-09-18 13:50:08,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-09-18 13:50:08,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-18 13:50:27,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=455380.0, ans=0.0 2024-09-18 13:50:44,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=455420.0, ans=0.125 2024-09-18 13:50:44,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455420.0, ans=0.1 2024-09-18 13:50:45,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=455420.0, ans=0.125 2024-09-18 13:50:51,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=455460.0, ans=0.125 2024-09-18 13:50:54,637 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.356e+01 8.785e+01 9.330e+01 1.328e+02, threshold=1.757e+02, percent-clipped=0.0 2024-09-18 13:51:02,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=455460.0, ans=0.0 2024-09-18 13:51:06,759 INFO [train.py:1198] (0/2) Epoch 26, batch 750, loss[loss=0.2347, ctc_loss=0.1209, cr_loss=0.3634, attn_decoder_loss=0.2393, over 29702.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.126, cr_loss=0.3686, attn_decoder_loss=0.2448, over 5676949.46 frames. ], batch size: 82, lr: 4.30e-03, grad_scale: 8.0 2024-09-18 13:51:13,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-18 13:51:23,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-09-18 13:51:35,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=455540.0, ans=0.2 2024-09-18 13:51:37,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=455540.0, ans=0.125 2024-09-18 13:51:44,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=455580.0, ans=0.125 2024-09-18 13:51:48,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.08 vs. limit=15.0 2024-09-18 13:51:52,551 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:52:10,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=455660.0, ans=0.0 2024-09-18 13:52:10,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.43 vs. limit=15.0 2024-09-18 13:52:18,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.18 vs. limit=10.0 2024-09-18 13:52:26,845 INFO [train.py:1198] (0/2) Epoch 26, batch 800, loss[loss=0.2172, ctc_loss=0.1095, cr_loss=0.344, attn_decoder_loss=0.2215, over 29607.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1264, cr_loss=0.3691, attn_decoder_loss=0.245, over 5707558.59 frames. ], batch size: 73, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 13:52:36,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=12.0 2024-09-18 13:52:55,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-18 13:52:55,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455780.0, ans=0.1 2024-09-18 13:53:01,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=455780.0, ans=0.125 2024-09-18 13:53:13,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=455820.0, ans=0.125 2024-09-18 13:53:18,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-09-18 13:53:27,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-09-18 13:53:29,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-18 13:53:31,736 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.438e+01 9.008e+01 9.520e+01 4.430e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 13:53:38,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.59 vs. limit=15.0 2024-09-18 13:53:42,296 INFO [train.py:1198] (0/2) Epoch 26, batch 850, loss[loss=0.2536, ctc_loss=0.1305, cr_loss=0.3908, attn_decoder_loss=0.2586, over 29708.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1259, cr_loss=0.3681, attn_decoder_loss=0.2446, over 5736023.77 frames. ], batch size: 89, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:53:44,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=12.0 2024-09-18 13:54:08,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=455940.0, ans=0.0 2024-09-18 13:54:14,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=455980.0, ans=0.125 2024-09-18 13:54:36,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456020.0, ans=0.1 2024-09-18 13:54:58,548 INFO [train.py:1198] (0/2) Epoch 26, batch 900, loss[loss=0.2163, ctc_loss=0.09761, cr_loss=0.3113, attn_decoder_loss=0.2226, over 29604.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1262, cr_loss=0.3684, attn_decoder_loss=0.2449, over 5741124.95 frames. ], batch size: 73, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:55:03,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456100.0, ans=0.1 2024-09-18 13:55:26,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=456140.0, ans=0.025 2024-09-18 13:55:42,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=456180.0, ans=0.125 2024-09-18 13:56:07,883 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.650e+01 9.071e+01 9.568e+01 1.657e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 13:56:18,483 INFO [train.py:1198] (0/2) Epoch 26, batch 950, loss[loss=0.2306, ctc_loss=0.1161, cr_loss=0.3401, attn_decoder_loss=0.2358, over 29509.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1265, cr_loss=0.3693, attn_decoder_loss=0.2451, over 5742303.03 frames. ], batch size: 74, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:56:29,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-09-18 13:56:38,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456340.0, ans=0.1 2024-09-18 13:56:55,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=456380.0, ans=0.125 2024-09-18 13:56:56,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=456380.0, ans=0.07 2024-09-18 13:57:01,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=456380.0, ans=0.0 2024-09-18 13:57:14,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=456420.0, ans=0.125 2024-09-18 13:57:29,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=456460.0, ans=0.2 2024-09-18 13:57:33,408 INFO [train.py:1198] (0/2) Epoch 26, batch 1000, loss[loss=0.2248, ctc_loss=0.1107, cr_loss=0.3187, attn_decoder_loss=0.2304, over 29494.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1271, cr_loss=0.3702, attn_decoder_loss=0.2459, over 5735852.97 frames. ], batch size: 77, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:57:38,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=456500.0, ans=0.1 2024-09-18 13:57:44,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-18 13:57:51,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=456540.0, ans=0.0 2024-09-18 13:57:55,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=456540.0, ans=0.0 2024-09-18 13:57:59,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=456540.0, ans=0.0 2024-09-18 13:58:01,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=456540.0, ans=0.2 2024-09-18 13:58:31,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=456620.0, ans=0.0 2024-09-18 13:58:33,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=456660.0, ans=0.0 2024-09-18 13:58:38,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.379e+01 9.044e+01 9.595e+01 2.964e+02, threshold=1.809e+02, percent-clipped=3.0 2024-09-18 13:58:48,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=456700.0, ans=0.0 2024-09-18 13:58:49,374 INFO [train.py:1198] (0/2) Epoch 26, batch 1050, loss[loss=0.25, ctc_loss=0.1365, cr_loss=0.3944, attn_decoder_loss=0.2539, over 29700.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1268, cr_loss=0.3695, attn_decoder_loss=0.2454, over 5745278.78 frames. ], batch size: 85, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 13:59:10,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=456740.0, ans=15.0 2024-09-18 13:59:47,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.32 vs. limit=15.0 2024-09-18 13:59:49,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.66 vs. limit=10.0 2024-09-18 13:59:59,406 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.45 vs. limit=12.0 2024-09-18 14:00:10,565 INFO [train.py:1198] (0/2) Epoch 26, batch 1100, loss[loss=0.2398, ctc_loss=0.1234, cr_loss=0.3844, attn_decoder_loss=0.2442, over 29444.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.127, cr_loss=0.3697, attn_decoder_loss=0.2455, over 5757979.78 frames. ], batch size: 78, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:00:18,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=456900.0, ans=0.0 2024-09-18 14:00:26,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.03 vs. limit=15.0 2024-09-18 14:00:30,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=456940.0, ans=0.125 2024-09-18 14:00:39,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=456980.0, ans=0.0 2024-09-18 14:00:39,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=456980.0, ans=0.125 2024-09-18 14:00:56,536 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:01:07,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=22.5 2024-09-18 14:01:10,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=457060.0, ans=0.125 2024-09-18 14:01:15,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.599e+01 9.010e+01 9.619e+01 1.920e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-18 14:01:20,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457060.0, ans=0.1 2024-09-18 14:01:26,671 INFO [train.py:1198] (0/2) Epoch 26, batch 1150, loss[loss=0.2267, ctc_loss=0.111, cr_loss=0.3348, attn_decoder_loss=0.2321, over 29448.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1267, cr_loss=0.369, attn_decoder_loss=0.2453, over 5756301.35 frames. ], batch size: 78, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:01:30,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=457100.0, ans=0.2 2024-09-18 14:01:48,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=457140.0, ans=0.0 2024-09-18 14:02:05,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=457180.0, ans=0.125 2024-09-18 14:02:44,780 INFO [train.py:1198] (0/2) Epoch 26, batch 1200, loss[loss=0.2553, ctc_loss=0.1397, cr_loss=0.3926, attn_decoder_loss=0.2595, over 29673.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1274, cr_loss=0.3701, attn_decoder_loss=0.2461, over 5748206.62 frames. ], batch size: 85, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 14:02:49,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=457300.0, ans=0.125 2024-09-18 14:02:50,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.46 vs. limit=10.0 2024-09-18 14:02:55,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=457300.0, ans=0.025 2024-09-18 14:03:07,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=457340.0, ans=0.0 2024-09-18 14:03:10,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-09-18 14:03:10,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457340.0, ans=0.125 2024-09-18 14:03:38,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=457420.0, ans=15.0 2024-09-18 14:03:40,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=457420.0, ans=0.95 2024-09-18 14:03:52,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=457460.0, ans=0.125 2024-09-18 14:03:53,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.704e+01 9.142e+01 9.758e+01 1.993e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 14:03:57,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=457460.0, ans=0.125 2024-09-18 14:03:58,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457460.0, ans=0.1 2024-09-18 14:04:02,760 INFO [train.py:1198] (0/2) Epoch 26, batch 1250, loss[loss=0.2602, ctc_loss=0.1421, cr_loss=0.4031, attn_decoder_loss=0.2644, over 29556.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1277, cr_loss=0.3708, attn_decoder_loss=0.2465, over 5775552.15 frames. ], batch size: 92, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:04:23,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2024-09-18 14:04:31,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=457580.0, ans=0.125 2024-09-18 14:04:50,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=457620.0, ans=0.0 2024-09-18 14:05:15,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=457660.0, ans=0.0 2024-09-18 14:05:18,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=457700.0, ans=0.125 2024-09-18 14:05:19,427 INFO [train.py:1198] (0/2) Epoch 26, batch 1300, loss[loss=0.2669, ctc_loss=0.144, cr_loss=0.41, attn_decoder_loss=0.2715, over 28304.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1272, cr_loss=0.3697, attn_decoder_loss=0.2461, over 5780223.15 frames. ], batch size: 111, lr: 4.29e-03, grad_scale: 8.0 2024-09-18 14:05:22,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457700.0, ans=0.1 2024-09-18 14:05:42,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=457740.0, ans=0.0 2024-09-18 14:06:02,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=457780.0, ans=0.1 2024-09-18 14:06:25,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.454e+01 9.061e+01 9.465e+01 1.475e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 14:06:26,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=457860.0, ans=0.125 2024-09-18 14:06:35,253 INFO [train.py:1198] (0/2) Epoch 26, batch 1350, loss[loss=0.2438, ctc_loss=0.1307, cr_loss=0.3685, attn_decoder_loss=0.2482, over 29742.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1263, cr_loss=0.3682, attn_decoder_loss=0.2455, over 5797226.96 frames. ], batch size: 81, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:06:46,052 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=22.5 2024-09-18 14:06:53,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=457940.0, ans=0.0 2024-09-18 14:06:54,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=457940.0, ans=0.0 2024-09-18 14:07:45,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2024-09-18 14:07:55,073 INFO [train.py:1198] (0/2) Epoch 26, batch 1400, loss[loss=0.2098, ctc_loss=0.1029, cr_loss=0.3118, attn_decoder_loss=0.2148, over 29593.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.126, cr_loss=0.3682, attn_decoder_loss=0.2452, over 5807541.58 frames. ], batch size: 69, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:08:14,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=458140.0, ans=0.125 2024-09-18 14:09:01,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.404e+01 8.959e+01 9.350e+01 1.926e+02, threshold=1.792e+02, percent-clipped=1.0 2024-09-18 14:09:10,655 INFO [train.py:1198] (0/2) Epoch 26, batch 1450, loss[loss=0.2654, ctc_loss=0.1464, cr_loss=0.4052, attn_decoder_loss=0.2696, over 29448.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1261, cr_loss=0.3681, attn_decoder_loss=0.2454, over 5805120.99 frames. ], batch size: 94, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:09:20,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=458300.0, ans=0.125 2024-09-18 14:09:32,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=458340.0, ans=0.125 2024-09-18 14:10:16,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-18 14:10:27,089 INFO [train.py:1198] (0/2) Epoch 26, batch 1500, loss[loss=0.2539, ctc_loss=0.1281, cr_loss=0.3826, attn_decoder_loss=0.2594, over 29636.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1266, cr_loss=0.3689, attn_decoder_loss=0.246, over 5806856.28 frames. ], batch size: 86, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:10:27,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=458500.0, ans=0.2 2024-09-18 14:10:28,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-18 14:10:31,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=12.0 2024-09-18 14:10:34,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=458500.0, ans=0.0 2024-09-18 14:10:43,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=458540.0, ans=0.125 2024-09-18 14:10:45,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=458540.0, ans=0.125 2024-09-18 14:11:38,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.611e+01 9.025e+01 9.915e+01 2.823e+02, threshold=1.805e+02, percent-clipped=2.0 2024-09-18 14:11:48,168 INFO [train.py:1198] (0/2) Epoch 26, batch 1550, loss[loss=0.2558, ctc_loss=0.1349, cr_loss=0.3963, attn_decoder_loss=0.2604, over 29476.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1268, cr_loss=0.369, attn_decoder_loss=0.246, over 5782401.85 frames. ], batch size: 90, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:12:12,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=458740.0, ans=0.125 2024-09-18 14:12:19,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-18 14:12:22,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-18 14:12:36,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=458820.0, ans=0.125 2024-09-18 14:12:43,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-18 14:12:46,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.31 vs. limit=15.0 2024-09-18 14:12:46,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-18 14:13:03,831 INFO [train.py:1198] (0/2) Epoch 26, batch 1600, loss[loss=0.2443, ctc_loss=0.1289, cr_loss=0.3573, attn_decoder_loss=0.2492, over 29660.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1271, cr_loss=0.3691, attn_decoder_loss=0.2459, over 5764546.68 frames. ], batch size: 85, lr: 4.28e-03, grad_scale: 16.0 2024-09-18 14:13:05,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=458900.0, ans=0.0 2024-09-18 14:13:14,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.17 vs. limit=22.5 2024-09-18 14:13:22,348 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:13:26,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=458940.0, ans=0.2 2024-09-18 14:13:42,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.28 vs. limit=15.0 2024-09-18 14:13:43,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=458980.0, ans=0.125 2024-09-18 14:13:46,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=458980.0, ans=0.125 2024-09-18 14:13:52,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=459020.0, ans=0.125 2024-09-18 14:13:54,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459020.0, ans=0.1 2024-09-18 14:14:09,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=459060.0, ans=0.125 2024-09-18 14:14:12,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.327e+01 8.927e+01 9.503e+01 2.372e+02, threshold=1.785e+02, percent-clipped=2.0 2024-09-18 14:14:19,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-09-18 14:14:21,599 INFO [train.py:1198] (0/2) Epoch 26, batch 1650, loss[loss=0.2556, ctc_loss=0.1319, cr_loss=0.384, attn_decoder_loss=0.2608, over 29684.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1272, cr_loss=0.3692, attn_decoder_loss=0.2458, over 5759985.89 frames. ], batch size: 89, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:15:22,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=459220.0, ans=0.0 2024-09-18 14:15:23,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=459260.0, ans=0.125 2024-09-18 14:15:25,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.88 vs. limit=10.0 2024-09-18 14:15:28,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459260.0, ans=0.125 2024-09-18 14:15:29,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459260.0, ans=0.1 2024-09-18 14:15:39,936 INFO [train.py:1198] (0/2) Epoch 26, batch 1700, loss[loss=0.2074, ctc_loss=0.09927, cr_loss=0.3267, attn_decoder_loss=0.2122, over 29583.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1267, cr_loss=0.369, attn_decoder_loss=0.2454, over 5781630.93 frames. ], batch size: 69, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:15:40,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=459300.0, ans=0.02 2024-09-18 14:15:50,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-18 14:15:54,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-18 14:15:56,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.28 vs. limit=22.5 2024-09-18 14:16:34,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=459420.0, ans=0.0 2024-09-18 14:16:37,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=459420.0, ans=0.025 2024-09-18 14:16:48,024 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.426e+01 8.833e+01 9.428e+01 1.268e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-18 14:16:48,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=459460.0, ans=0.0 2024-09-18 14:16:51,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=459460.0, ans=0.2 2024-09-18 14:16:52,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=459460.0, ans=0.0 2024-09-18 14:16:55,735 INFO [train.py:1198] (0/2) Epoch 26, batch 1750, loss[loss=0.2131, ctc_loss=0.1123, cr_loss=0.3382, attn_decoder_loss=0.2168, over 29326.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1264, cr_loss=0.3681, attn_decoder_loss=0.2449, over 5789604.65 frames. ], batch size: 67, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:17:03,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=459500.0, ans=0.125 2024-09-18 14:17:09,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=459540.0, ans=0.125 2024-09-18 14:17:11,451 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:17:28,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459580.0, ans=0.1 2024-09-18 14:17:30,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.whiten.whitening_limit, batch_count=459580.0, ans=12.0 2024-09-18 14:17:50,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=459620.0, ans=0.125 2024-09-18 14:17:57,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-09-18 14:18:02,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=459660.0, ans=0.125 2024-09-18 14:18:02,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=459660.0, ans=0.1 2024-09-18 14:18:07,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=459660.0, ans=0.0 2024-09-18 14:18:11,488 INFO [train.py:1198] (0/2) Epoch 26, batch 1800, loss[loss=0.2547, ctc_loss=0.1335, cr_loss=0.3726, attn_decoder_loss=0.2599, over 29695.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1266, cr_loss=0.3689, attn_decoder_loss=0.2455, over 5792854.94 frames. ], batch size: 83, lr: 4.28e-03, grad_scale: 8.0 2024-09-18 14:18:23,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2024-09-18 14:19:00,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=459820.0, ans=0.125 2024-09-18 14:19:23,995 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.569e+01 8.931e+01 9.347e+01 1.247e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-18 14:19:31,602 INFO [train.py:1198] (0/2) Epoch 26, batch 1850, loss[loss=0.2455, ctc_loss=0.1305, cr_loss=0.373, attn_decoder_loss=0.25, over 29633.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1266, cr_loss=0.3689, attn_decoder_loss=0.2454, over 5799651.61 frames. ], batch size: 86, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:19:34,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=459900.0, ans=0.025 2024-09-18 14:19:37,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-09-18 14:19:42,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=459900.0, ans=0.1 2024-09-18 14:19:48,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=459940.0, ans=0.0 2024-09-18 14:19:52,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.15 vs. limit=10.0 2024-09-18 14:20:10,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=459980.0, ans=0.1 2024-09-18 14:20:45,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-09-18 14:20:47,440 INFO [train.py:1198] (0/2) Epoch 26, batch 1900, loss[loss=0.2476, ctc_loss=0.1257, cr_loss=0.3749, attn_decoder_loss=0.2528, over 29679.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1269, cr_loss=0.3698, attn_decoder_loss=0.246, over 5805868.17 frames. ], batch size: 89, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:20:58,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=460100.0, ans=0.1 2024-09-18 14:21:09,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=460140.0, ans=0.2 2024-09-18 14:21:10,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460140.0, ans=0.1 2024-09-18 14:21:19,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=460180.0, ans=0.2 2024-09-18 14:21:26,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=460180.0, ans=0.1 2024-09-18 14:21:38,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460220.0, ans=0.0 2024-09-18 14:21:44,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=460220.0, ans=0.125 2024-09-18 14:21:56,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.718e+01 9.273e+01 9.664e+01 1.625e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-18 14:22:03,703 INFO [train.py:1198] (0/2) Epoch 26, batch 1950, loss[loss=0.2272, ctc_loss=0.116, cr_loss=0.3666, attn_decoder_loss=0.2314, over 29449.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1277, cr_loss=0.3719, attn_decoder_loss=0.2471, over 5820949.63 frames. ], batch size: 78, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:22:31,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=460340.0, ans=0.125 2024-09-18 14:22:45,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=460380.0, ans=0.125 2024-09-18 14:22:48,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.37 vs. limit=6.0 2024-09-18 14:23:11,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=460460.0, ans=0.125 2024-09-18 14:23:14,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=460460.0, ans=0.125 2024-09-18 14:23:16,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=460460.0, ans=0.0 2024-09-18 14:23:21,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-18 14:23:23,251 INFO [train.py:1198] (0/2) Epoch 26, batch 2000, loss[loss=0.2142, ctc_loss=0.1087, cr_loss=0.3372, attn_decoder_loss=0.2184, over 29361.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1277, cr_loss=0.3721, attn_decoder_loss=0.2476, over 5798444.38 frames. ], batch size: 67, lr: 4.27e-03, grad_scale: 16.0 2024-09-18 14:23:34,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=460500.0, ans=0.0 2024-09-18 14:23:37,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460540.0, ans=0.1 2024-09-18 14:24:06,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=460580.0, ans=0.07 2024-09-18 14:24:09,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=460620.0, ans=0.125 2024-09-18 14:24:33,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.701e+01 9.104e+01 9.478e+01 2.564e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 14:24:38,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=460700.0, ans=0.2 2024-09-18 14:24:39,263 INFO [train.py:1198] (0/2) Epoch 26, batch 2050, loss[loss=0.2154, ctc_loss=0.1144, cr_loss=0.35, attn_decoder_loss=0.2189, over 29471.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1272, cr_loss=0.3709, attn_decoder_loss=0.2465, over 5791073.23 frames. ], batch size: 70, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:25:27,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2024-09-18 14:25:31,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=460820.0, ans=0.0 2024-09-18 14:25:43,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=460860.0, ans=0.0 2024-09-18 14:25:55,184 INFO [train.py:1198] (0/2) Epoch 26, batch 2100, loss[loss=0.2358, ctc_loss=0.1216, cr_loss=0.3531, attn_decoder_loss=0.2407, over 29772.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1264, cr_loss=0.3694, attn_decoder_loss=0.2456, over 5803097.02 frames. ], batch size: 81, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:26:05,283 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:26:06,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=460900.0, ans=0.07 2024-09-18 14:26:10,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=22.5 2024-09-18 14:26:19,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=460940.0, ans=15.0 2024-09-18 14:26:39,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=460980.0, ans=0.2 2024-09-18 14:26:44,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=461020.0, ans=0.0 2024-09-18 14:26:56,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=461060.0, ans=0.125 2024-09-18 14:27:08,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.265e+01 8.897e+01 9.459e+01 1.093e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 14:27:10,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461060.0, ans=0.1 2024-09-18 14:27:10,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=461060.0, ans=0.025 2024-09-18 14:27:14,484 INFO [train.py:1198] (0/2) Epoch 26, batch 2150, loss[loss=0.2351, ctc_loss=0.1264, cr_loss=0.3799, attn_decoder_loss=0.2387, over 29461.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1259, cr_loss=0.3683, attn_decoder_loss=0.245, over 5816756.37 frames. ], batch size: 78, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:27:24,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461100.0, ans=0.1 2024-09-18 14:27:46,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=461180.0, ans=0.125 2024-09-18 14:27:52,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=461180.0, ans=0.0 2024-09-18 14:27:52,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=461180.0, ans=0.05 2024-09-18 14:28:01,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=461220.0, ans=0.0 2024-09-18 14:28:03,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=461220.0, ans=0.0 2024-09-18 14:28:14,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=461260.0, ans=0.0 2024-09-18 14:28:27,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=461260.0, ans=0.025 2024-09-18 14:28:30,689 INFO [train.py:1198] (0/2) Epoch 26, batch 2200, loss[loss=0.2557, ctc_loss=0.1365, cr_loss=0.3971, attn_decoder_loss=0.2601, over 29646.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1265, cr_loss=0.3695, attn_decoder_loss=0.2455, over 5814432.55 frames. ], batch size: 86, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:28:40,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=461300.0, ans=0.125 2024-09-18 14:28:46,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=461340.0, ans=0.125 2024-09-18 14:28:48,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 2024-09-18 14:29:10,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2024-09-18 14:29:22,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=461420.0, ans=0.025 2024-09-18 14:29:31,812 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:29:33,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=461460.0, ans=0.125 2024-09-18 14:29:36,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=461460.0, ans=0.0 2024-09-18 14:29:40,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.743e+01 9.086e+01 9.862e+01 3.457e+02, threshold=1.817e+02, percent-clipped=3.0 2024-09-18 14:29:40,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=461460.0, ans=0.125 2024-09-18 14:29:46,507 INFO [train.py:1198] (0/2) Epoch 26, batch 2250, loss[loss=0.2422, ctc_loss=0.1226, cr_loss=0.3527, attn_decoder_loss=0.2477, over 29678.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1262, cr_loss=0.3687, attn_decoder_loss=0.2454, over 5812523.50 frames. ], batch size: 82, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:29:58,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=461500.0, ans=0.0 2024-09-18 14:30:01,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=461500.0, ans=0.0 2024-09-18 14:30:31,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=461580.0, ans=0.0 2024-09-18 14:30:45,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-09-18 14:30:54,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=461660.0, ans=0.125 2024-09-18 14:31:07,030 INFO [train.py:1198] (0/2) Epoch 26, batch 2300, loss[loss=0.2097, ctc_loss=0.1004, cr_loss=0.3222, attn_decoder_loss=0.2147, over 29341.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1257, cr_loss=0.3675, attn_decoder_loss=0.2444, over 5798382.48 frames. ], batch size: 71, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:31:08,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=461700.0, ans=0.0 2024-09-18 14:31:12,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-09-18 14:31:17,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461700.0, ans=0.125 2024-09-18 14:31:22,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461740.0, ans=0.125 2024-09-18 14:31:28,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2024-09-18 14:31:33,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=461740.0, ans=0.0 2024-09-18 14:31:48,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461780.0, ans=0.1 2024-09-18 14:31:50,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-09-18 14:32:09,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=461860.0, ans=0.2 2024-09-18 14:32:16,746 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.397e+01 8.981e+01 9.856e+01 3.624e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-18 14:32:17,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=461860.0, ans=0.1 2024-09-18 14:32:21,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=461900.0, ans=0.125 2024-09-18 14:32:22,734 INFO [train.py:1198] (0/2) Epoch 26, batch 2350, loss[loss=0.2384, ctc_loss=0.1282, cr_loss=0.373, attn_decoder_loss=0.2423, over 29683.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1262, cr_loss=0.3685, attn_decoder_loss=0.2446, over 5803578.79 frames. ], batch size: 83, lr: 4.27e-03, grad_scale: 8.0 2024-09-18 14:32:24,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461900.0, ans=0.1 2024-09-18 14:32:27,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=461900.0, ans=0.125 2024-09-18 14:32:36,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=461940.0, ans=0.2 2024-09-18 14:33:02,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=461980.0, ans=0.125 2024-09-18 14:33:38,602 INFO [train.py:1198] (0/2) Epoch 26, batch 2400, loss[loss=0.23, ctc_loss=0.1193, cr_loss=0.3569, attn_decoder_loss=0.2344, over 29527.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.127, cr_loss=0.3699, attn_decoder_loss=0.2453, over 5807680.64 frames. ], batch size: 76, lr: 4.26e-03, grad_scale: 16.0 2024-09-18 14:33:58,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.91 vs. limit=10.0 2024-09-18 14:34:49,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=462260.0, ans=0.0 2024-09-18 14:34:51,669 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.579e+01 9.212e+01 9.914e+01 2.760e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-18 14:34:57,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462300.0, ans=0.125 2024-09-18 14:34:58,432 INFO [train.py:1198] (0/2) Epoch 26, batch 2450, loss[loss=0.2419, ctc_loss=0.1365, cr_loss=0.3994, attn_decoder_loss=0.2447, over 29702.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1275, cr_loss=0.3708, attn_decoder_loss=0.2462, over 5784374.25 frames. ], batch size: 82, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:34:58,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=462300.0, ans=0.125 2024-09-18 14:35:07,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=462300.0, ans=0.125 2024-09-18 14:35:35,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=462380.0, ans=0.0 2024-09-18 14:36:13,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=462500.0, ans=0.125 2024-09-18 14:36:14,445 INFO [train.py:1198] (0/2) Epoch 26, batch 2500, loss[loss=0.2463, ctc_loss=0.1278, cr_loss=0.3684, attn_decoder_loss=0.2513, over 29651.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1272, cr_loss=0.3704, attn_decoder_loss=0.2462, over 5795372.21 frames. ], batch size: 86, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:36:19,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=462500.0, ans=0.125 2024-09-18 14:36:30,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462540.0, ans=0.125 2024-09-18 14:36:34,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=462540.0, ans=0.125 2024-09-18 14:36:52,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=462580.0, ans=0.95 2024-09-18 14:37:14,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=462660.0, ans=0.0 2024-09-18 14:37:24,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=462660.0, ans=0.2 2024-09-18 14:37:25,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.473e+01 8.987e+01 9.500e+01 1.769e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 14:37:26,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=462660.0, ans=0.125 2024-09-18 14:37:30,365 INFO [train.py:1198] (0/2) Epoch 26, batch 2550, loss[loss=0.2187, ctc_loss=0.1091, cr_loss=0.3347, attn_decoder_loss=0.2235, over 29340.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1271, cr_loss=0.3706, attn_decoder_loss=0.2462, over 5798500.58 frames. ], batch size: 67, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:37:37,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.98 vs. limit=15.0 2024-09-18 14:38:05,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462780.0, ans=0.125 2024-09-18 14:38:45,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462860.0, ans=0.125 2024-09-18 14:38:48,021 INFO [train.py:1198] (0/2) Epoch 26, batch 2600, loss[loss=0.2279, ctc_loss=0.1123, cr_loss=0.3424, attn_decoder_loss=0.2331, over 29435.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1271, cr_loss=0.3711, attn_decoder_loss=0.2465, over 5794113.55 frames. ], batch size: 78, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:38:48,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=462900.0, ans=0.0 2024-09-18 14:38:55,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=462900.0, ans=0.0 2024-09-18 14:39:02,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=462900.0, ans=0.2 2024-09-18 14:39:05,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=462940.0, ans=0.125 2024-09-18 14:39:08,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=462940.0, ans=0.1 2024-09-18 14:39:14,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=462940.0, ans=0.1 2024-09-18 14:39:14,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=462940.0, ans=0.125 2024-09-18 14:39:47,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=463020.0, ans=0.2 2024-09-18 14:39:58,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=463060.0, ans=0.125 2024-09-18 14:40:01,215 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.375e+01 8.942e+01 9.564e+01 2.475e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 14:40:05,706 INFO [train.py:1198] (0/2) Epoch 26, batch 2650, loss[loss=0.252, ctc_loss=0.1313, cr_loss=0.3862, attn_decoder_loss=0.2568, over 29232.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1271, cr_loss=0.3711, attn_decoder_loss=0.2467, over 5800730.06 frames. ], batch size: 100, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:40:16,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=463100.0, ans=0.1 2024-09-18 14:40:37,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2024-09-18 14:40:43,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.23 vs. limit=15.0 2024-09-18 14:41:07,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=463260.0, ans=0.0 2024-09-18 14:41:10,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=15.0 2024-09-18 14:41:11,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=463260.0, ans=0.0 2024-09-18 14:41:18,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=463260.0, ans=0.2 2024-09-18 14:41:19,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-09-18 14:41:23,924 INFO [train.py:1198] (0/2) Epoch 26, batch 2700, loss[loss=0.2388, ctc_loss=0.1166, cr_loss=0.3453, attn_decoder_loss=0.2447, over 29555.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1277, cr_loss=0.3723, attn_decoder_loss=0.2473, over 5796330.18 frames. ], batch size: 87, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:41:55,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=463380.0, ans=0.0 2024-09-18 14:41:57,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=463380.0, ans=0.0 2024-09-18 14:42:00,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=463380.0, ans=0.2 2024-09-18 14:42:12,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=463420.0, ans=0.125 2024-09-18 14:42:35,434 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.497e+01 8.933e+01 9.409e+01 1.999e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 14:42:37,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=463460.0, ans=0.125 2024-09-18 14:42:40,225 INFO [train.py:1198] (0/2) Epoch 26, batch 2750, loss[loss=0.2263, ctc_loss=0.1115, cr_loss=0.3233, attn_decoder_loss=0.2319, over 29519.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.127, cr_loss=0.3707, attn_decoder_loss=0.2463, over 5795594.39 frames. ], batch size: 75, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:42:52,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=22.5 2024-09-18 14:42:59,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=463540.0, ans=0.0 2024-09-18 14:42:59,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-09-18 14:43:23,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=463580.0, ans=0.0 2024-09-18 14:43:26,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=463620.0, ans=0.125 2024-09-18 14:43:28,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=463620.0, ans=0.125 2024-09-18 14:43:36,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-09-18 14:43:58,318 INFO [train.py:1198] (0/2) Epoch 26, batch 2800, loss[loss=0.259, ctc_loss=0.1529, cr_loss=0.4057, attn_decoder_loss=0.2618, over 20060.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1273, cr_loss=0.3717, attn_decoder_loss=0.2465, over 5777221.82 frames. ], batch size: 209, lr: 4.26e-03, grad_scale: 16.0 2024-09-18 14:44:00,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=463700.0, ans=0.0 2024-09-18 14:44:01,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463700.0, ans=0.125 2024-09-18 14:44:06,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=463700.0, ans=0.0 2024-09-18 14:44:10,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=463700.0, ans=0.125 2024-09-18 14:44:18,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463740.0, ans=0.125 2024-09-18 14:44:36,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=463780.0, ans=0.125 2024-09-18 14:44:46,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=463820.0, ans=0.2 2024-09-18 14:44:54,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=463820.0, ans=0.125 2024-09-18 14:45:01,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=463860.0, ans=0.2 2024-09-18 14:45:06,125 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:45:10,998 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.701e+01 9.139e+01 9.864e+01 2.017e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-18 14:45:15,526 INFO [train.py:1198] (0/2) Epoch 26, batch 2850, loss[loss=0.2414, ctc_loss=0.1359, cr_loss=0.3929, attn_decoder_loss=0.2443, over 29480.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.128, cr_loss=0.3725, attn_decoder_loss=0.2471, over 5762069.85 frames. ], batch size: 77, lr: 4.26e-03, grad_scale: 16.0 2024-09-18 14:45:29,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=463940.0, ans=0.125 2024-09-18 14:45:47,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=463980.0, ans=0.0 2024-09-18 14:45:52,413 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-116000.pt 2024-09-18 14:46:06,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.83 vs. limit=6.0 2024-09-18 14:46:10,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=15.0 2024-09-18 14:46:23,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=464060.0, ans=0.09899494936611666 2024-09-18 14:46:26,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=464060.0, ans=0.0 2024-09-18 14:46:36,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=464060.0, ans=0.2 2024-09-18 14:46:37,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=464100.0, ans=0.0 2024-09-18 14:46:40,983 INFO [train.py:1198] (0/2) Epoch 26, batch 2900, loss[loss=0.231, ctc_loss=0.1226, cr_loss=0.3473, attn_decoder_loss=0.2353, over 29423.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1284, cr_loss=0.3734, attn_decoder_loss=0.2477, over 5787462.71 frames. ], batch size: 79, lr: 4.26e-03, grad_scale: 8.0 2024-09-18 14:46:56,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=464140.0, ans=0.125 2024-09-18 14:47:13,179 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:47:23,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=464180.0, ans=0.125 2024-09-18 14:47:52,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=464260.0, ans=0.125 2024-09-18 14:47:53,453 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.512e+01 9.090e+01 9.867e+01 2.207e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-18 14:47:56,514 INFO [train.py:1198] (0/2) Epoch 26, batch 2950, loss[loss=0.2317, ctc_loss=0.1237, cr_loss=0.3678, attn_decoder_loss=0.2356, over 29518.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1273, cr_loss=0.3708, attn_decoder_loss=0.2462, over 5781604.82 frames. ], batch size: 75, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:48:02,864 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:48:03,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-09-18 14:48:25,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464380.0, ans=0.1 2024-09-18 14:48:33,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=464380.0, ans=0.0 2024-09-18 14:48:50,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2024-09-18 14:48:54,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=464420.0, ans=0.125 2024-09-18 14:49:02,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=464460.0, ans=0.015 2024-09-18 14:49:15,210 INFO [train.py:1198] (0/2) Epoch 26, batch 3000, loss[loss=0.2394, ctc_loss=0.1272, cr_loss=0.384, attn_decoder_loss=0.2434, over 29750.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1269, cr_loss=0.3701, attn_decoder_loss=0.2458, over 5783244.11 frames. ], batch size: 81, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:49:15,210 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 14:49:33,743 INFO [train.py:1230] (0/2) Epoch 26, validation: loss=0.2113, ctc_loss=0.03775, cr_loss=5.571e-15, attn_decoder_loss=0.2305, over 944034.00 frames. 2024-09-18 14:49:33,743 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 14:49:40,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=464500.0, ans=0.125 2024-09-18 14:49:56,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2024-09-18 14:50:15,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464580.0, ans=0.1 2024-09-18 14:50:40,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=464660.0, ans=0.025 2024-09-18 14:50:46,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=464660.0, ans=0.04949747468305833 2024-09-18 14:50:49,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.504e+01 9.014e+01 9.631e+01 1.549e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 14:50:52,203 INFO [train.py:1198] (0/2) Epoch 26, batch 3050, loss[loss=0.2343, ctc_loss=0.1182, cr_loss=0.3484, attn_decoder_loss=0.2394, over 29521.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.127, cr_loss=0.3703, attn_decoder_loss=0.2464, over 5778111.13 frames. ], batch size: 76, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:50:52,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=464700.0, ans=0.125 2024-09-18 14:50:54,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=464700.0, ans=0.2 2024-09-18 14:50:57,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2024-09-18 14:51:06,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464740.0, ans=0.1 2024-09-18 14:51:06,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-09-18 14:51:09,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464740.0, ans=0.1 2024-09-18 14:51:18,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=464740.0, ans=0.025 2024-09-18 14:51:54,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=464860.0, ans=0.2 2024-09-18 14:51:55,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-18 14:52:08,312 INFO [train.py:1198] (0/2) Epoch 26, batch 3100, loss[loss=0.2603, ctc_loss=0.1394, cr_loss=0.3839, attn_decoder_loss=0.2652, over 29234.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1269, cr_loss=0.3699, attn_decoder_loss=0.2461, over 5777581.80 frames. ], batch size: 100, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:52:13,336 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:52:29,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=464940.0, ans=0.025 2024-09-18 14:53:09,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-18 14:53:09,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2024-09-18 14:53:23,452 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.570e+01 9.069e+01 9.533e+01 2.948e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-18 14:53:26,520 INFO [train.py:1198] (0/2) Epoch 26, batch 3150, loss[loss=0.2551, ctc_loss=0.1364, cr_loss=0.4029, attn_decoder_loss=0.2593, over 28849.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1267, cr_loss=0.3692, attn_decoder_loss=0.246, over 5784088.12 frames. ], batch size: 104, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:53:26,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=465100.0, ans=0.2 2024-09-18 14:53:39,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.43 vs. limit=22.5 2024-09-18 14:53:49,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=465140.0, ans=0.2 2024-09-18 14:54:19,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=465220.0, ans=0.125 2024-09-18 14:54:21,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=465220.0, ans=0.2 2024-09-18 14:54:32,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=465260.0, ans=0.2 2024-09-18 14:54:44,452 INFO [train.py:1198] (0/2) Epoch 26, batch 3200, loss[loss=0.2302, ctc_loss=0.1203, cr_loss=0.3585, attn_decoder_loss=0.2344, over 29420.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1264, cr_loss=0.3687, attn_decoder_loss=0.2457, over 5793399.83 frames. ], batch size: 79, lr: 4.25e-03, grad_scale: 16.0 2024-09-18 14:54:54,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465300.0, ans=0.1 2024-09-18 14:55:03,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-18 14:55:03,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.96 vs. limit=12.0 2024-09-18 14:55:25,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465380.0, ans=0.1 2024-09-18 14:55:25,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=465380.0, ans=0.125 2024-09-18 14:55:33,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=465420.0, ans=0.05 2024-09-18 14:55:35,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=465420.0, ans=0.025 2024-09-18 14:55:39,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=465420.0, ans=0.025 2024-09-18 14:55:46,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.29 vs. limit=15.0 2024-09-18 14:55:53,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=465460.0, ans=0.125 2024-09-18 14:55:58,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=465460.0, ans=0.0 2024-09-18 14:55:59,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.413e+01 8.869e+01 9.551e+01 1.271e+02, threshold=1.774e+02, percent-clipped=0.0 2024-09-18 14:56:00,551 INFO [train.py:1198] (0/2) Epoch 26, batch 3250, loss[loss=0.2544, ctc_loss=0.1369, cr_loss=0.3886, attn_decoder_loss=0.2588, over 29716.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1265, cr_loss=0.3689, attn_decoder_loss=0.2461, over 5800623.61 frames. ], batch size: 84, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:56:04,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=465500.0, ans=0.0 2024-09-18 14:56:26,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465540.0, ans=0.125 2024-09-18 14:57:01,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.25 vs. limit=15.0 2024-09-18 14:57:05,267 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:57:06,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=465660.0, ans=0.0 2024-09-18 14:57:18,690 INFO [train.py:1198] (0/2) Epoch 26, batch 3300, loss[loss=0.2441, ctc_loss=0.1233, cr_loss=0.3499, attn_decoder_loss=0.2497, over 28658.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1259, cr_loss=0.3673, attn_decoder_loss=0.2449, over 5796952.11 frames. ], batch size: 112, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:57:47,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=465780.0, ans=0.035 2024-09-18 14:57:53,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=465780.0, ans=0.125 2024-09-18 14:57:53,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=465780.0, ans=10.0 2024-09-18 14:57:55,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=465780.0, ans=0.125 2024-09-18 14:58:02,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2024-09-18 14:58:14,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=465820.0, ans=0.2 2024-09-18 14:58:20,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.61 vs. limit=10.0 2024-09-18 14:58:34,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.655e+01 9.163e+01 9.654e+01 2.275e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-18 14:58:36,151 INFO [train.py:1198] (0/2) Epoch 26, batch 3350, loss[loss=0.2551, ctc_loss=0.1336, cr_loss=0.3968, attn_decoder_loss=0.2598, over 28898.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.127, cr_loss=0.3693, attn_decoder_loss=0.246, over 5774070.84 frames. ], batch size: 104, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 14:59:03,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=465940.0, ans=0.0 2024-09-18 14:59:50,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=466100.0, ans=0.0 2024-09-18 14:59:51,879 INFO [train.py:1198] (0/2) Epoch 26, batch 3400, loss[loss=0.2096, ctc_loss=0.1033, cr_loss=0.3325, attn_decoder_loss=0.214, over 29333.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1272, cr_loss=0.3699, attn_decoder_loss=0.2459, over 5766835.19 frames. ], batch size: 67, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 15:00:02,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=466100.0, ans=0.0 2024-09-18 15:00:10,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=466140.0, ans=0.125 2024-09-18 15:00:19,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=466140.0, ans=0.125 2024-09-18 15:00:22,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466180.0, ans=0.125 2024-09-18 15:00:25,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=466180.0, ans=0.5 2024-09-18 15:01:08,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.379e+01 8.854e+01 9.422e+01 2.123e+02, threshold=1.771e+02, percent-clipped=1.0 2024-09-18 15:01:09,876 INFO [train.py:1198] (0/2) Epoch 26, batch 3450, loss[loss=0.2414, ctc_loss=0.1196, cr_loss=0.3621, attn_decoder_loss=0.2469, over 28401.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.127, cr_loss=0.3698, attn_decoder_loss=0.246, over 5774994.57 frames. ], batch size: 111, lr: 4.25e-03, grad_scale: 8.0 2024-09-18 15:01:31,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=466340.0, ans=0.125 2024-09-18 15:01:40,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=466380.0, ans=0.125 2024-09-18 15:02:05,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466420.0, ans=0.0 2024-09-18 15:02:15,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=466460.0, ans=0.0 2024-09-18 15:02:23,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=466460.0, ans=0.0 2024-09-18 15:02:28,044 INFO [train.py:1198] (0/2) Epoch 26, batch 3500, loss[loss=0.2309, ctc_loss=0.1209, cr_loss=0.3608, attn_decoder_loss=0.2351, over 29326.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1271, cr_loss=0.3697, attn_decoder_loss=0.2459, over 5775493.93 frames. ], batch size: 71, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:02:28,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=466500.0, ans=0.125 2024-09-18 15:02:56,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466580.0, ans=0.1 2024-09-18 15:03:32,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=466660.0, ans=0.125 2024-09-18 15:03:32,372 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:03:40,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.52 vs. limit=10.0 2024-09-18 15:03:40,928 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.579e+01 9.256e+01 9.884e+01 2.781e+02, threshold=1.851e+02, percent-clipped=2.0 2024-09-18 15:03:42,420 INFO [train.py:1198] (0/2) Epoch 26, batch 3550, loss[loss=0.2596, ctc_loss=0.1395, cr_loss=0.3918, attn_decoder_loss=0.2642, over 29711.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1271, cr_loss=0.3698, attn_decoder_loss=0.2461, over 5784115.72 frames. ], batch size: 89, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:03:50,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466700.0, ans=0.125 2024-09-18 15:04:07,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=466740.0, ans=0.0 2024-09-18 15:04:09,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-09-18 15:04:18,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=466780.0, ans=0.0 2024-09-18 15:04:34,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=466820.0, ans=0.125 2024-09-18 15:04:56,568 INFO [train.py:1198] (0/2) Epoch 26, batch 3600, loss[loss=0.2394, ctc_loss=0.1314, cr_loss=0.3753, attn_decoder_loss=0.2431, over 29502.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1273, cr_loss=0.3706, attn_decoder_loss=0.2462, over 5793401.10 frames. ], batch size: 77, lr: 4.24e-03, grad_scale: 16.0 2024-09-18 15:05:07,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466900.0, ans=0.1 2024-09-18 15:05:08,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=466900.0, ans=0.1 2024-09-18 15:05:25,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=466980.0, ans=0.125 2024-09-18 15:05:28,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=466980.0, ans=0.125 2024-09-18 15:05:28,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=466980.0, ans=0.125 2024-09-18 15:05:43,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467020.0, ans=0.1 2024-09-18 15:05:58,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=467060.0, ans=0.125 2024-09-18 15:06:12,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 8.525e+01 9.113e+01 9.643e+01 7.477e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-18 15:06:12,899 INFO [train.py:1198] (0/2) Epoch 26, batch 3650, loss[loss=0.2579, ctc_loss=0.1378, cr_loss=0.3848, attn_decoder_loss=0.2627, over 29502.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1263, cr_loss=0.3688, attn_decoder_loss=0.2454, over 5794468.21 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:06:15,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467100.0, ans=0.1 2024-09-18 15:06:23,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=467100.0, ans=0.09899494936611666 2024-09-18 15:06:36,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=467140.0, ans=0.125 2024-09-18 15:07:07,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=467220.0, ans=0.95 2024-09-18 15:07:09,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=467220.0, ans=0.025 2024-09-18 15:07:24,934 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:07:27,490 INFO [train.py:1198] (0/2) Epoch 26, batch 3700, loss[loss=0.2471, ctc_loss=0.1237, cr_loss=0.3631, attn_decoder_loss=0.2528, over 29698.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.126, cr_loss=0.3683, attn_decoder_loss=0.2454, over 5803164.54 frames. ], batch size: 84, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:07:27,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=467300.0, ans=0.0 2024-09-18 15:07:32,449 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:07:35,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=467300.0, ans=0.125 2024-09-18 15:07:38,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=467300.0, ans=0.125 2024-09-18 15:08:21,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=467420.0, ans=0.2 2024-09-18 15:08:21,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=467420.0, ans=0.09899494936611666 2024-09-18 15:08:25,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=467460.0, ans=0.125 2024-09-18 15:08:26,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.00 vs. limit=10.0 2024-09-18 15:08:28,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=467460.0, ans=0.0 2024-09-18 15:08:35,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=467460.0, ans=0.125 2024-09-18 15:08:39,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=467460.0, ans=0.5 2024-09-18 15:08:43,719 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.301e+01 8.766e+01 9.365e+01 1.565e+02, threshold=1.753e+02, percent-clipped=0.0 2024-09-18 15:08:43,746 INFO [train.py:1198] (0/2) Epoch 26, batch 3750, loss[loss=0.2179, ctc_loss=0.1166, cr_loss=0.345, attn_decoder_loss=0.2215, over 29334.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1261, cr_loss=0.3688, attn_decoder_loss=0.2451, over 5806394.34 frames. ], batch size: 67, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:08:47,038 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:08:52,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=467500.0, ans=0.125 2024-09-18 15:09:04,969 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:09:15,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-09-18 15:09:45,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-09-18 15:09:49,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=467660.0, ans=0.125 2024-09-18 15:09:58,322 INFO [train.py:1198] (0/2) Epoch 26, batch 3800, loss[loss=0.245, ctc_loss=0.1203, cr_loss=0.3533, attn_decoder_loss=0.251, over 29630.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1257, cr_loss=0.3678, attn_decoder_loss=0.2447, over 5797275.57 frames. ], batch size: 86, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:10:00,247 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:10:00,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-09-18 15:10:21,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=467740.0, ans=0.125 2024-09-18 15:10:27,030 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:10:27,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2024-09-18 15:10:31,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=467780.0, ans=0.0 2024-09-18 15:10:35,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467780.0, ans=0.1 2024-09-18 15:10:44,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=467820.0, ans=0.025 2024-09-18 15:11:00,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-18 15:11:12,680 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.723e+01 9.262e+01 9.820e+01 3.411e+02, threshold=1.852e+02, percent-clipped=3.0 2024-09-18 15:11:12,706 INFO [train.py:1198] (0/2) Epoch 26, batch 3850, loss[loss=0.2581, ctc_loss=0.1438, cr_loss=0.4129, attn_decoder_loss=0.2616, over 29250.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1259, cr_loss=0.3681, attn_decoder_loss=0.2448, over 5811310.59 frames. ], batch size: 100, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:11:15,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2024-09-18 15:11:16,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=467900.0, ans=0.125 2024-09-18 15:11:27,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=467940.0, ans=0.125 2024-09-18 15:11:30,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=467940.0, ans=0.125 2024-09-18 15:11:32,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=467940.0, ans=0.125 2024-09-18 15:11:38,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=467940.0, ans=0.2 2024-09-18 15:11:44,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=467980.0, ans=0.025 2024-09-18 15:12:08,939 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:12:20,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=468060.0, ans=0.125 2024-09-18 15:12:29,508 INFO [train.py:1198] (0/2) Epoch 26, batch 3900, loss[loss=0.2526, ctc_loss=0.1325, cr_loss=0.3785, attn_decoder_loss=0.2575, over 29632.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1259, cr_loss=0.3678, attn_decoder_loss=0.2453, over 5815012.37 frames. ], batch size: 86, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:12:36,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.20 vs. limit=22.5 2024-09-18 15:12:48,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.73 vs. limit=15.0 2024-09-18 15:12:57,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=468180.0, ans=0.1 2024-09-18 15:13:11,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468180.0, ans=0.0 2024-09-18 15:13:12,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=468220.0, ans=0.125 2024-09-18 15:13:15,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=468220.0, ans=0.125 2024-09-18 15:13:26,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=12.0 2024-09-18 15:13:43,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.582e+01 9.076e+01 9.520e+01 1.404e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 15:13:43,725 INFO [train.py:1198] (0/2) Epoch 26, batch 3950, loss[loss=0.2597, ctc_loss=0.1409, cr_loss=0.4061, attn_decoder_loss=0.2639, over 29444.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1259, cr_loss=0.3684, attn_decoder_loss=0.2453, over 5834648.99 frames. ], batch size: 97, lr: 4.24e-03, grad_scale: 8.0 2024-09-18 15:13:53,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=468300.0, ans=0.125 2024-09-18 15:14:26,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=468380.0, ans=0.125 2024-09-18 15:14:31,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.33 vs. limit=12.0 2024-09-18 15:14:41,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=468420.0, ans=0.0 2024-09-18 15:14:54,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2024-09-18 15:14:58,757 INFO [train.py:1198] (0/2) Epoch 26, batch 4000, loss[loss=0.2152, ctc_loss=0.1032, cr_loss=0.3282, attn_decoder_loss=0.2204, over 29524.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1257, cr_loss=0.3671, attn_decoder_loss=0.245, over 5812346.31 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 16.0 2024-09-18 15:15:07,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=468500.0, ans=0.2 2024-09-18 15:15:10,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468500.0, ans=0.125 2024-09-18 15:15:29,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=468580.0, ans=0.125 2024-09-18 15:15:31,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=468580.0, ans=0.125 2024-09-18 15:15:45,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.51 vs. limit=12.0 2024-09-18 15:15:53,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=468620.0, ans=0.05 2024-09-18 15:16:14,193 INFO [train.py:1198] (0/2) Epoch 26, batch 4050, loss[loss=0.2614, ctc_loss=0.1525, cr_loss=0.3686, attn_decoder_loss=0.2653, over 20875.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1256, cr_loss=0.3665, attn_decoder_loss=0.2448, over 5796882.15 frames. ], batch size: 209, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:16:15,589 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.012e+01 8.606e+01 9.122e+01 9.849e+01 6.037e+02, threshold=1.824e+02, percent-clipped=3.0 2024-09-18 15:16:18,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=468700.0, ans=0.2 2024-09-18 15:16:23,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=468700.0, ans=0.125 2024-09-18 15:16:59,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=468820.0, ans=0.2 2024-09-18 15:17:28,159 INFO [train.py:1198] (0/2) Epoch 26, batch 4100, loss[loss=0.2502, ctc_loss=0.1318, cr_loss=0.3941, attn_decoder_loss=0.2546, over 29530.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1263, cr_loss=0.3679, attn_decoder_loss=0.245, over 5791512.62 frames. ], batch size: 90, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:17:32,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=468900.0, ans=0.125 2024-09-18 15:17:34,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=468900.0, ans=0.125 2024-09-18 15:17:55,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2024-09-18 15:18:05,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-18 15:18:09,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=468980.0, ans=0.0 2024-09-18 15:18:11,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=469020.0, ans=0.0 2024-09-18 15:18:12,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469020.0, ans=0.1 2024-09-18 15:18:29,928 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:18:40,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=469060.0, ans=0.125 2024-09-18 15:18:42,763 INFO [train.py:1198] (0/2) Epoch 26, batch 4150, loss[loss=0.2325, ctc_loss=0.1228, cr_loss=0.3861, attn_decoder_loss=0.2361, over 29503.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1261, cr_loss=0.368, attn_decoder_loss=0.2447, over 5798104.96 frames. ], batch size: 77, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:18:44,182 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.459e+01 8.973e+01 9.469e+01 6.878e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-18 15:18:53,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=469100.0, ans=0.2 2024-09-18 15:19:00,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=469140.0, ans=0.2 2024-09-18 15:19:09,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469140.0, ans=0.1 2024-09-18 15:19:22,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469180.0, ans=0.1 2024-09-18 15:19:27,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=469220.0, ans=0.2 2024-09-18 15:19:43,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=469260.0, ans=0.125 2024-09-18 15:19:56,319 INFO [train.py:1198] (0/2) Epoch 26, batch 4200, loss[loss=0.2666, ctc_loss=0.1534, cr_loss=0.4232, attn_decoder_loss=0.2698, over 29512.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1264, cr_loss=0.3687, attn_decoder_loss=0.2451, over 5799729.12 frames. ], batch size: 90, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:20:00,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469300.0, ans=0.1 2024-09-18 15:20:08,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=469300.0, ans=0.0 2024-09-18 15:20:10,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.31 vs. limit=15.0 2024-09-18 15:20:33,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=469380.0, ans=0.125 2024-09-18 15:20:38,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=469380.0, ans=0.035 2024-09-18 15:20:46,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=469420.0, ans=0.0 2024-09-18 15:21:02,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=469460.0, ans=0.125 2024-09-18 15:21:10,803 INFO [train.py:1198] (0/2) Epoch 26, batch 4250, loss[loss=0.235, ctc_loss=0.1264, cr_loss=0.3844, attn_decoder_loss=0.2386, over 29529.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1262, cr_loss=0.3683, attn_decoder_loss=0.2453, over 5805367.35 frames. ], batch size: 74, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:21:12,222 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.717e+01 9.053e+01 9.730e+01 2.394e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-18 15:21:34,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=469540.0, ans=0.125 2024-09-18 15:22:06,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=469620.0, ans=0.07 2024-09-18 15:22:07,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-09-18 15:22:25,794 INFO [train.py:1198] (0/2) Epoch 26, batch 4300, loss[loss=0.2492, ctc_loss=0.1262, cr_loss=0.3686, attn_decoder_loss=0.2547, over 29500.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.126, cr_loss=0.3683, attn_decoder_loss=0.2455, over 5794747.27 frames. ], batch size: 87, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:22:35,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=469700.0, ans=0.0 2024-09-18 15:22:36,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=469700.0, ans=0.125 2024-09-18 15:22:41,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.08 vs. limit=15.0 2024-09-18 15:22:48,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=469740.0, ans=0.125 2024-09-18 15:22:55,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=469780.0, ans=0.125 2024-09-18 15:23:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=469820.0, ans=0.125 2024-09-18 15:23:17,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=469820.0, ans=0.1 2024-09-18 15:23:30,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=469860.0, ans=10.0 2024-09-18 15:23:40,793 INFO [train.py:1198] (0/2) Epoch 26, batch 4350, loss[loss=0.2465, ctc_loss=0.1258, cr_loss=0.3609, attn_decoder_loss=0.2519, over 29533.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1285, cr_loss=0.3741, attn_decoder_loss=0.2487, over 5796754.15 frames. ], batch size: 97, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:23:42,286 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.612e+01 9.127e+01 9.671e+01 1.308e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-18 15:23:48,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=469900.0, ans=0.0 2024-09-18 15:24:09,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=469980.0, ans=0.125 2024-09-18 15:24:10,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=469980.0, ans=0.0 2024-09-18 15:24:23,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470020.0, ans=0.1 2024-09-18 15:24:38,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=470060.0, ans=0.125 2024-09-18 15:24:44,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=470060.0, ans=0.125 2024-09-18 15:24:47,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470060.0, ans=0.1 2024-09-18 15:24:54,005 INFO [train.py:1198] (0/2) Epoch 26, batch 4400, loss[loss=0.2563, ctc_loss=0.1456, cr_loss=0.4085, attn_decoder_loss=0.2596, over 27246.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1303, cr_loss=0.3769, attn_decoder_loss=0.2509, over 5767840.50 frames. ], batch size: 124, lr: 4.23e-03, grad_scale: 16.0 2024-09-18 15:24:55,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=470100.0, ans=0.1 2024-09-18 15:24:57,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470100.0, ans=0.1 2024-09-18 15:25:00,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.49 vs. limit=6.0 2024-09-18 15:25:20,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=470140.0, ans=0.2 2024-09-18 15:25:22,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=470180.0, ans=0.125 2024-09-18 15:25:37,767 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:26:06,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=470260.0, ans=0.025 2024-09-18 15:26:09,418 INFO [train.py:1198] (0/2) Epoch 26, batch 4450, loss[loss=0.267, ctc_loss=0.1538, cr_loss=0.3973, attn_decoder_loss=0.2707, over 20287.00 frames. ], tot_loss[loss=0.2492, ctc_loss=0.1345, cr_loss=0.3825, attn_decoder_loss=0.2534, over 5573461.36 frames. ], batch size: 210, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:26:12,367 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 9.167e+01 9.608e+01 1.048e+02 2.652e+02, threshold=1.922e+02, percent-clipped=1.0 2024-09-18 15:26:26,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-09-18 15:27:07,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=470420.0, ans=0.1 2024-09-18 15:27:25,445 INFO [train.py:1198] (0/2) Epoch 26, batch 4500, loss[loss=0.2607, ctc_loss=0.1547, cr_loss=0.3889, attn_decoder_loss=0.2639, over 19626.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1386, cr_loss=0.3851, attn_decoder_loss=0.2557, over 5233363.15 frames. ], batch size: 210, lr: 4.23e-03, grad_scale: 8.0 2024-09-18 15:27:44,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2024-09-18 15:28:02,400 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-26.pt 2024-09-18 15:28:54,202 INFO [train.py:1198] (0/2) Epoch 27, batch 0, loss[loss=0.2194, ctc_loss=0.104, cr_loss=0.3239, attn_decoder_loss=0.225, over 29570.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.104, cr_loss=0.3239, attn_decoder_loss=0.225, over 29570.00 frames. ], batch size: 73, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 15:28:54,202 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 15:29:01,849 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.4264, 3.6627, 3.6778, 3.6779], device='cuda:0') 2024-09-18 15:29:10,365 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1557, 4.4262, 3.8561, 4.0711], device='cuda:0') 2024-09-18 15:29:12,732 INFO [train.py:1230] (0/2) Epoch 27, validation: loss=0.2127, ctc_loss=0.03797, cr_loss=5.907e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-18 15:29:12,733 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 15:29:46,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=470680.0, ans=0.0 2024-09-18 15:29:53,212 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 1.034e+02 1.128e+02 1.240e+02 3.218e+02, threshold=2.256e+02, percent-clipped=3.0 2024-09-18 15:30:04,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=470720.0, ans=0.125 2024-09-18 15:30:05,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=470720.0, ans=0.0 2024-09-18 15:30:28,359 INFO [train.py:1198] (0/2) Epoch 27, batch 50, loss[loss=0.2172, ctc_loss=0.1071, cr_loss=0.3277, attn_decoder_loss=0.2221, over 29435.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1271, cr_loss=0.3695, attn_decoder_loss=0.2455, over 1269115.75 frames. ], batch size: 70, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 15:30:30,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=470800.0, ans=0.0 2024-09-18 15:30:55,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=470840.0, ans=0.2 2024-09-18 15:31:04,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.13 vs. limit=15.0 2024-09-18 15:31:07,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=470880.0, ans=0.0 2024-09-18 15:31:08,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=470880.0, ans=0.125 2024-09-18 15:31:37,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=470960.0, ans=0.125 2024-09-18 15:31:47,308 INFO [train.py:1198] (0/2) Epoch 27, batch 100, loss[loss=0.2239, ctc_loss=0.1107, cr_loss=0.332, attn_decoder_loss=0.2291, over 29529.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.128, cr_loss=0.3726, attn_decoder_loss=0.2474, over 2253429.13 frames. ], batch size: 76, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 15:31:52,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-18 15:32:07,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=471040.0, ans=0.1 2024-09-18 15:32:12,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=471040.0, ans=0.07 2024-09-18 15:32:14,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471040.0, ans=0.125 2024-09-18 15:32:28,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.524e+01 9.170e+01 9.614e+01 1.417e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-18 15:32:40,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=22.5 2024-09-18 15:32:45,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471160.0, ans=0.1 2024-09-18 15:32:58,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.13 vs. limit=10.0 2024-09-18 15:33:02,264 INFO [train.py:1198] (0/2) Epoch 27, batch 150, loss[loss=0.2149, ctc_loss=0.1051, cr_loss=0.3327, attn_decoder_loss=0.2197, over 29417.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1259, cr_loss=0.3683, attn_decoder_loss=0.2454, over 3048209.46 frames. ], batch size: 70, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:33:13,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=471200.0, ans=0.125 2024-09-18 15:33:28,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=471240.0, ans=0.125 2024-09-18 15:33:44,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=471280.0, ans=0.2 2024-09-18 15:33:47,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=471320.0, ans=0.0 2024-09-18 15:33:52,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2024-09-18 15:33:52,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-18 15:33:55,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=471320.0, ans=0.125 2024-09-18 15:34:17,455 INFO [train.py:1198] (0/2) Epoch 27, batch 200, loss[loss=0.2534, ctc_loss=0.1325, cr_loss=0.3964, attn_decoder_loss=0.2581, over 27183.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1252, cr_loss=0.3677, attn_decoder_loss=0.2444, over 3659797.50 frames. ], batch size: 124, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:34:23,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471400.0, ans=0.1 2024-09-18 15:34:29,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=471400.0, ans=0.1 2024-09-18 15:34:33,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=22.5 2024-09-18 15:34:59,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471480.0, ans=0.1 2024-09-18 15:35:04,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.473e+01 8.928e+01 9.557e+01 1.148e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-18 15:35:10,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=471520.0, ans=0.1 2024-09-18 15:35:18,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=471520.0, ans=0.09899494936611666 2024-09-18 15:35:18,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=471520.0, ans=0.2 2024-09-18 15:35:28,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=471560.0, ans=0.125 2024-09-18 15:35:37,439 INFO [train.py:1198] (0/2) Epoch 27, batch 250, loss[loss=0.2577, ctc_loss=0.1411, cr_loss=0.4084, attn_decoder_loss=0.2615, over 29269.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1255, cr_loss=0.3676, attn_decoder_loss=0.2445, over 4141133.67 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:35:40,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=471600.0, ans=0.125 2024-09-18 15:36:13,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=471680.0, ans=0.0 2024-09-18 15:36:38,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-09-18 15:36:53,042 INFO [train.py:1198] (0/2) Epoch 27, batch 300, loss[loss=0.2508, ctc_loss=0.1355, cr_loss=0.3946, attn_decoder_loss=0.2548, over 29537.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.125, cr_loss=0.366, attn_decoder_loss=0.2442, over 4508899.19 frames. ], batch size: 92, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:36:56,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=471800.0, ans=0.125 2024-09-18 15:37:11,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=471840.0, ans=0.125 2024-09-18 15:37:19,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=471840.0, ans=0.2 2024-09-18 15:37:31,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=471880.0, ans=0.125 2024-09-18 15:37:35,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 8.118e+01 8.847e+01 9.359e+01 3.678e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-18 15:37:44,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=471920.0, ans=0.125 2024-09-18 15:37:48,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=471920.0, ans=0.125 2024-09-18 15:37:52,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=471960.0, ans=0.05 2024-09-18 15:37:54,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=471960.0, ans=0.025 2024-09-18 15:37:57,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=471960.0, ans=0.09899494936611666 2024-09-18 15:38:06,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471960.0, ans=0.1 2024-09-18 15:38:08,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=472000.0, ans=0.125 2024-09-18 15:38:09,252 INFO [train.py:1198] (0/2) Epoch 27, batch 350, loss[loss=0.2154, ctc_loss=0.1084, cr_loss=0.3306, attn_decoder_loss=0.2199, over 29314.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1256, cr_loss=0.3669, attn_decoder_loss=0.2449, over 4794809.73 frames. ], batch size: 71, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:38:35,194 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:38:43,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=472080.0, ans=0.125 2024-09-18 15:38:59,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=472120.0, ans=0.0 2024-09-18 15:39:07,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=472120.0, ans=0.1 2024-09-18 15:39:29,370 INFO [train.py:1198] (0/2) Epoch 27, batch 400, loss[loss=0.2417, ctc_loss=0.1192, cr_loss=0.3633, attn_decoder_loss=0.2472, over 29701.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1255, cr_loss=0.367, attn_decoder_loss=0.2446, over 5023935.81 frames. ], batch size: 82, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 15:39:31,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=472200.0, ans=0.125 2024-09-18 15:39:34,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=472200.0, ans=0.2 2024-09-18 15:39:55,682 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:40:11,987 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.649e+01 9.100e+01 9.719e+01 1.502e+02, threshold=1.820e+02, percent-clipped=0.0 2024-09-18 15:40:27,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=472320.0, ans=0.125 2024-09-18 15:40:30,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=472360.0, ans=0.0 2024-09-18 15:40:45,450 INFO [train.py:1198] (0/2) Epoch 27, batch 450, loss[loss=0.2554, ctc_loss=0.1418, cr_loss=0.4002, attn_decoder_loss=0.2591, over 29695.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.126, cr_loss=0.3685, attn_decoder_loss=0.2453, over 5186742.95 frames. ], batch size: 83, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:41:30,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=12.0 2024-09-18 15:41:34,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=472520.0, ans=0.0 2024-09-18 15:41:42,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.42 vs. limit=15.0 2024-09-18 15:42:02,023 INFO [train.py:1198] (0/2) Epoch 27, batch 500, loss[loss=0.2619, ctc_loss=0.1448, cr_loss=0.425, attn_decoder_loss=0.2655, over 29427.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1258, cr_loss=0.3686, attn_decoder_loss=0.2449, over 5329472.10 frames. ], batch size: 94, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:42:03,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=472600.0, ans=0.125 2024-09-18 15:42:10,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-09-18 15:42:41,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=472680.0, ans=0.0 2024-09-18 15:42:50,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.545e+01 8.912e+01 9.466e+01 2.661e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-18 15:42:51,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=12.0 2024-09-18 15:42:52,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=472720.0, ans=0.2 2024-09-18 15:43:05,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=472720.0, ans=0.125 2024-09-18 15:43:06,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.09 vs. limit=22.5 2024-09-18 15:43:23,226 INFO [train.py:1198] (0/2) Epoch 27, batch 550, loss[loss=0.2452, ctc_loss=0.1259, cr_loss=0.362, attn_decoder_loss=0.2504, over 28904.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1254, cr_loss=0.3673, attn_decoder_loss=0.245, over 5422509.15 frames. ], batch size: 104, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:43:25,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=472800.0, ans=0.0 2024-09-18 15:43:44,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=472840.0, ans=0.1 2024-09-18 15:43:44,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=472840.0, ans=0.125 2024-09-18 15:43:52,534 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:43:56,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.10 vs. limit=6.0 2024-09-18 15:44:19,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-18 15:44:21,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472920.0, ans=0.1 2024-09-18 15:44:39,398 INFO [train.py:1198] (0/2) Epoch 27, batch 600, loss[loss=0.2623, ctc_loss=0.1381, cr_loss=0.3906, attn_decoder_loss=0.2674, over 29249.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1258, cr_loss=0.3686, attn_decoder_loss=0.2453, over 5509367.75 frames. ], batch size: 100, lr: 4.14e-03, grad_scale: 8.0 2024-09-18 15:44:41,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=473000.0, ans=0.0 2024-09-18 15:44:42,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=473000.0, ans=0.125 2024-09-18 15:45:11,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=473080.0, ans=0.0 2024-09-18 15:45:14,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=473080.0, ans=0.0 2024-09-18 15:45:15,768 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:45:22,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.416e+01 8.737e+01 9.314e+01 1.829e+02, threshold=1.747e+02, percent-clipped=2.0 2024-09-18 15:45:55,002 INFO [train.py:1198] (0/2) Epoch 27, batch 650, loss[loss=0.2361, ctc_loss=0.1118, cr_loss=0.3327, attn_decoder_loss=0.2425, over 29747.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1245, cr_loss=0.3659, attn_decoder_loss=0.2444, over 5585989.04 frames. ], batch size: 81, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:45:58,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=473200.0, ans=0.05 2024-09-18 15:46:02,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473200.0, ans=0.1 2024-09-18 15:46:10,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=473240.0, ans=0.05 2024-09-18 15:46:27,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473280.0, ans=0.1 2024-09-18 15:47:15,997 INFO [train.py:1198] (0/2) Epoch 27, batch 700, loss[loss=0.2306, ctc_loss=0.1217, cr_loss=0.3617, attn_decoder_loss=0.2347, over 29563.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1247, cr_loss=0.3664, attn_decoder_loss=0.2448, over 5636723.76 frames. ], batch size: 76, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:47:37,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=473440.0, ans=0.125 2024-09-18 15:47:39,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473440.0, ans=0.1 2024-09-18 15:47:58,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=473480.0, ans=0.125 2024-09-18 15:48:00,170 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.450e+01 9.020e+01 9.619e+01 3.078e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 15:48:32,684 INFO [train.py:1198] (0/2) Epoch 27, batch 750, loss[loss=0.2465, ctc_loss=0.1288, cr_loss=0.3753, attn_decoder_loss=0.2513, over 29679.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1244, cr_loss=0.3656, attn_decoder_loss=0.2444, over 5676381.75 frames. ], batch size: 82, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:48:54,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473640.0, ans=0.1 2024-09-18 15:48:55,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-09-18 15:49:01,766 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:49:08,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2024-09-18 15:49:11,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=473680.0, ans=0.0 2024-09-18 15:49:20,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=473720.0, ans=0.1 2024-09-18 15:49:26,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=473720.0, ans=0.2 2024-09-18 15:49:45,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.64 vs. limit=22.5 2024-09-18 15:49:45,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2024-09-18 15:49:46,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=473760.0, ans=0.125 2024-09-18 15:49:47,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=473800.0, ans=0.0 2024-09-18 15:49:48,910 INFO [train.py:1198] (0/2) Epoch 27, batch 800, loss[loss=0.224, ctc_loss=0.1121, cr_loss=0.3499, attn_decoder_loss=0.2287, over 29628.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1249, cr_loss=0.3664, attn_decoder_loss=0.2447, over 5706884.86 frames. ], batch size: 73, lr: 4.13e-03, grad_scale: 16.0 2024-09-18 15:49:51,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2024-09-18 15:49:53,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=473800.0, ans=0.125 2024-09-18 15:50:11,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=473840.0, ans=0.125 2024-09-18 15:50:14,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=473840.0, ans=0.125 2024-09-18 15:50:18,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=12.0 2024-09-18 15:50:27,256 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:50:28,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=473880.0, ans=0.035 2024-09-18 15:50:35,330 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.485e+01 9.104e+01 9.795e+01 7.519e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 15:50:41,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=473920.0, ans=0.0 2024-09-18 15:50:41,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=473920.0, ans=0.1 2024-09-18 15:51:09,194 INFO [train.py:1198] (0/2) Epoch 27, batch 850, loss[loss=0.2538, ctc_loss=0.1296, cr_loss=0.3738, attn_decoder_loss=0.2593, over 29708.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1243, cr_loss=0.3656, attn_decoder_loss=0.2443, over 5736343.07 frames. ], batch size: 89, lr: 4.13e-03, grad_scale: 16.0 2024-09-18 15:51:16,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=474000.0, ans=0.125 2024-09-18 15:51:19,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=474000.0, ans=0.2 2024-09-18 15:51:39,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=474080.0, ans=0.0 2024-09-18 15:51:48,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=474080.0, ans=0.2 2024-09-18 15:51:53,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=474120.0, ans=0.125 2024-09-18 15:52:07,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474120.0, ans=0.125 2024-09-18 15:52:10,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=474160.0, ans=0.0 2024-09-18 15:52:12,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-18 15:52:16,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2024-09-18 15:52:24,937 INFO [train.py:1198] (0/2) Epoch 27, batch 900, loss[loss=0.2111, ctc_loss=0.09848, cr_loss=0.3115, attn_decoder_loss=0.2167, over 29572.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1245, cr_loss=0.3661, attn_decoder_loss=0.2443, over 5741025.85 frames. ], batch size: 73, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:52:28,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474200.0, ans=0.1 2024-09-18 15:52:40,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.38 vs. limit=15.0 2024-09-18 15:53:10,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.501e+01 8.938e+01 9.467e+01 2.355e+02, threshold=1.788e+02, percent-clipped=2.0 2024-09-18 15:53:11,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-09-18 15:53:16,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=474320.0, ans=0.1 2024-09-18 15:53:37,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2024-09-18 15:53:41,188 INFO [train.py:1198] (0/2) Epoch 27, batch 950, loss[loss=0.2306, ctc_loss=0.1159, cr_loss=0.3477, attn_decoder_loss=0.2356, over 29514.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1248, cr_loss=0.3663, attn_decoder_loss=0.2446, over 5742537.43 frames. ], batch size: 74, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:53:43,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=8.0 2024-09-18 15:53:44,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474400.0, ans=0.1 2024-09-18 15:54:26,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=474480.0, ans=0.125 2024-09-18 15:54:29,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.69 vs. limit=12.0 2024-09-18 15:55:01,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-18 15:55:01,614 INFO [train.py:1198] (0/2) Epoch 27, batch 1000, loss[loss=0.2306, ctc_loss=0.1166, cr_loss=0.3559, attn_decoder_loss=0.2353, over 29504.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1259, cr_loss=0.3679, attn_decoder_loss=0.2453, over 5737418.41 frames. ], batch size: 77, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:55:09,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=474600.0, ans=0.09899494936611666 2024-09-18 15:55:20,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=474640.0, ans=0.0 2024-09-18 15:55:23,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=474640.0, ans=0.0 2024-09-18 15:55:47,617 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.547e+01 9.112e+01 9.993e+01 2.254e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 15:56:00,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=474720.0, ans=0.04949747468305833 2024-09-18 15:56:04,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=474760.0, ans=0.125 2024-09-18 15:56:10,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=474760.0, ans=0.1 2024-09-18 15:56:13,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=474760.0, ans=0.0 2024-09-18 15:56:17,920 INFO [train.py:1198] (0/2) Epoch 27, batch 1050, loss[loss=0.2583, ctc_loss=0.1366, cr_loss=0.4031, attn_decoder_loss=0.2629, over 29702.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1256, cr_loss=0.3676, attn_decoder_loss=0.2448, over 5744773.64 frames. ], batch size: 85, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:56:21,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=474800.0, ans=0.2 2024-09-18 15:56:38,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=474840.0, ans=0.125 2024-09-18 15:56:45,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=474840.0, ans=0.05 2024-09-18 15:57:07,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474920.0, ans=0.1 2024-09-18 15:57:34,521 INFO [train.py:1198] (0/2) Epoch 27, batch 1100, loss[loss=0.2319, ctc_loss=0.119, cr_loss=0.3462, attn_decoder_loss=0.2367, over 29439.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1254, cr_loss=0.3676, attn_decoder_loss=0.2446, over 5756406.35 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:57:36,443 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:57:43,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-18 15:58:08,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475080.0, ans=0.1 2024-09-18 15:58:09,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=475080.0, ans=0.1 2024-09-18 15:58:10,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-18 15:58:18,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=475080.0, ans=0.025 2024-09-18 15:58:18,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=475080.0, ans=0.125 2024-09-18 15:58:22,629 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.448e+01 9.006e+01 9.632e+01 1.338e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 15:58:28,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=475120.0, ans=0.07 2024-09-18 15:58:28,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=475120.0, ans=0.0 2024-09-18 15:58:31,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=475120.0, ans=0.2 2024-09-18 15:58:41,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.39 vs. limit=15.0 2024-09-18 15:58:55,779 INFO [train.py:1198] (0/2) Epoch 27, batch 1150, loss[loss=0.2363, ctc_loss=0.12, cr_loss=0.3586, attn_decoder_loss=0.2413, over 29439.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1248, cr_loss=0.3658, attn_decoder_loss=0.2442, over 5754164.93 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 8.0 2024-09-18 15:59:11,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=475240.0, ans=10.0 2024-09-18 15:59:12,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=475240.0, ans=0.125 2024-09-18 15:59:15,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=475240.0, ans=0.125 2024-09-18 15:59:23,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=475240.0, ans=0.2 2024-09-18 15:59:47,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475320.0, ans=0.04949747468305833 2024-09-18 15:59:54,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.14 vs. limit=15.0 2024-09-18 16:00:04,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=475360.0, ans=0.125 2024-09-18 16:00:10,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475400.0, ans=0.1 2024-09-18 16:00:11,933 INFO [train.py:1198] (0/2) Epoch 27, batch 1200, loss[loss=0.2496, ctc_loss=0.1328, cr_loss=0.3708, attn_decoder_loss=0.2543, over 29696.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1259, cr_loss=0.3676, attn_decoder_loss=0.2453, over 5746382.68 frames. ], batch size: 85, lr: 4.12e-03, grad_scale: 16.0 2024-09-18 16:00:29,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-18 16:00:45,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=475480.0, ans=0.025 2024-09-18 16:00:59,251 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.652e+01 9.107e+01 9.727e+01 1.637e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-18 16:01:04,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=475520.0, ans=10.0 2024-09-18 16:01:04,309 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:01:25,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=475560.0, ans=0.0 2024-09-18 16:01:27,920 INFO [train.py:1198] (0/2) Epoch 27, batch 1250, loss[loss=0.2572, ctc_loss=0.1349, cr_loss=0.3853, attn_decoder_loss=0.2622, over 29508.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1264, cr_loss=0.3694, attn_decoder_loss=0.2461, over 5773981.05 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:01:48,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=22.5 2024-09-18 16:01:51,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475640.0, ans=0.1 2024-09-18 16:01:59,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-09-18 16:02:00,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=475680.0, ans=0.125 2024-09-18 16:02:13,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=475680.0, ans=0.2 2024-09-18 16:02:17,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475720.0, ans=0.1 2024-09-18 16:02:48,700 INFO [train.py:1198] (0/2) Epoch 27, batch 1300, loss[loss=0.2456, ctc_loss=0.1269, cr_loss=0.3701, attn_decoder_loss=0.2506, over 28188.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1261, cr_loss=0.3689, attn_decoder_loss=0.2455, over 5779113.18 frames. ], batch size: 111, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:02:52,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=475800.0, ans=0.0 2024-09-18 16:02:58,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=475800.0, ans=0.125 2024-09-18 16:03:11,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=475840.0, ans=0.125 2024-09-18 16:03:20,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=475880.0, ans=0.5 2024-09-18 16:03:28,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=475880.0, ans=0.2 2024-09-18 16:03:31,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=475880.0, ans=0.035 2024-09-18 16:03:35,951 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.380e+01 8.992e+01 9.418e+01 1.555e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-18 16:03:40,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=475920.0, ans=0.2 2024-09-18 16:03:53,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475960.0, ans=0.1 2024-09-18 16:03:56,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=475960.0, ans=10.0 2024-09-18 16:04:01,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=475960.0, ans=0.125 2024-09-18 16:04:04,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-09-18 16:04:05,251 INFO [train.py:1198] (0/2) Epoch 27, batch 1350, loss[loss=0.246, ctc_loss=0.1244, cr_loss=0.3474, attn_decoder_loss=0.2518, over 29762.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1253, cr_loss=0.3672, attn_decoder_loss=0.2449, over 5796113.87 frames. ], batch size: 81, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:04:10,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.67 vs. limit=10.0 2024-09-18 16:04:17,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=476000.0, ans=0.0 2024-09-18 16:04:20,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=476040.0, ans=0.125 2024-09-18 16:04:33,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=476080.0, ans=0.125 2024-09-18 16:04:38,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=476080.0, ans=0.2 2024-09-18 16:04:45,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=476080.0, ans=0.125 2024-09-18 16:04:53,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=476120.0, ans=0.125 2024-09-18 16:05:19,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=476200.0, ans=0.1 2024-09-18 16:05:20,666 INFO [train.py:1198] (0/2) Epoch 27, batch 1400, loss[loss=0.21, ctc_loss=0.1116, cr_loss=0.333, attn_decoder_loss=0.2135, over 29586.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1252, cr_loss=0.367, attn_decoder_loss=0.2447, over 5807359.02 frames. ], batch size: 69, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:05:28,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=476200.0, ans=0.0 2024-09-18 16:05:30,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=476200.0, ans=0.125 2024-09-18 16:05:34,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=476240.0, ans=0.2 2024-09-18 16:06:09,970 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.342e+01 8.774e+01 9.500e+01 1.505e+02, threshold=1.755e+02, percent-clipped=0.0 2024-09-18 16:06:38,572 INFO [train.py:1198] (0/2) Epoch 27, batch 1450, loss[loss=0.2516, ctc_loss=0.1374, cr_loss=0.3934, attn_decoder_loss=0.2555, over 29475.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1254, cr_loss=0.3675, attn_decoder_loss=0.245, over 5804397.22 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:06:45,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=476400.0, ans=0.2 2024-09-18 16:07:11,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=476480.0, ans=0.1 2024-09-18 16:07:23,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=476480.0, ans=0.1 2024-09-18 16:07:26,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=476520.0, ans=0.125 2024-09-18 16:07:56,834 INFO [train.py:1198] (0/2) Epoch 27, batch 1500, loss[loss=0.2413, ctc_loss=0.124, cr_loss=0.3767, attn_decoder_loss=0.246, over 29614.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1253, cr_loss=0.3679, attn_decoder_loss=0.2452, over 5805453.38 frames. ], batch size: 86, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:08:27,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=476680.0, ans=0.0 2024-09-18 16:08:38,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=476680.0, ans=0.1 2024-09-18 16:08:41,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=476720.0, ans=0.04949747468305833 2024-09-18 16:08:44,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.578e+01 9.265e+01 1.012e+02 4.469e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-18 16:09:01,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476760.0, ans=0.1 2024-09-18 16:09:13,770 INFO [train.py:1198] (0/2) Epoch 27, batch 1550, loss[loss=0.2539, ctc_loss=0.1427, cr_loss=0.4061, attn_decoder_loss=0.2573, over 29472.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1256, cr_loss=0.3684, attn_decoder_loss=0.2453, over 5781541.65 frames. ], batch size: 90, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:09:47,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=476880.0, ans=0.2 2024-09-18 16:09:57,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=476880.0, ans=0.0 2024-09-18 16:10:00,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=476920.0, ans=0.125 2024-09-18 16:10:08,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2024-09-18 16:10:15,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476960.0, ans=0.1 2024-09-18 16:10:17,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=476960.0, ans=0.0 2024-09-18 16:10:31,961 INFO [train.py:1198] (0/2) Epoch 27, batch 1600, loss[loss=0.2441, ctc_loss=0.126, cr_loss=0.3702, attn_decoder_loss=0.249, over 29669.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1256, cr_loss=0.368, attn_decoder_loss=0.2452, over 5764444.30 frames. ], batch size: 85, lr: 4.12e-03, grad_scale: 16.0 2024-09-18 16:10:51,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-18 16:10:55,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=477040.0, ans=0.125 2024-09-18 16:10:55,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=477040.0, ans=0.125 2024-09-18 16:11:23,076 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.546e+01 9.000e+01 9.569e+01 2.285e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 16:11:26,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=477120.0, ans=0.125 2024-09-18 16:11:37,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=477160.0, ans=0.125 2024-09-18 16:11:40,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.94 vs. limit=15.0 2024-09-18 16:11:50,231 INFO [train.py:1198] (0/2) Epoch 27, batch 1650, loss[loss=0.2503, ctc_loss=0.1274, cr_loss=0.3691, attn_decoder_loss=0.2558, over 29709.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1253, cr_loss=0.3674, attn_decoder_loss=0.2448, over 5759113.86 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:12:08,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=477240.0, ans=0.125 2024-09-18 16:12:23,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=477280.0, ans=0.0 2024-09-18 16:12:24,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-09-18 16:12:55,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477360.0, ans=0.1 2024-09-18 16:13:05,974 INFO [train.py:1198] (0/2) Epoch 27, batch 1700, loss[loss=0.2117, ctc_loss=0.1099, cr_loss=0.3377, attn_decoder_loss=0.2155, over 29579.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.125, cr_loss=0.3674, attn_decoder_loss=0.2448, over 5779223.19 frames. ], batch size: 69, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:13:20,226 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:13:45,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=477480.0, ans=0.0 2024-09-18 16:13:56,897 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.516e+01 9.095e+01 9.729e+01 1.325e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-18 16:14:07,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=477560.0, ans=0.125 2024-09-18 16:14:24,658 INFO [train.py:1198] (0/2) Epoch 27, batch 1750, loss[loss=0.2147, ctc_loss=0.1116, cr_loss=0.3515, attn_decoder_loss=0.2184, over 29344.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1248, cr_loss=0.3669, attn_decoder_loss=0.2444, over 5787471.15 frames. ], batch size: 67, lr: 4.12e-03, grad_scale: 8.0 2024-09-18 16:14:42,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=477640.0, ans=0.125 2024-09-18 16:14:45,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=477640.0, ans=0.04949747468305833 2024-09-18 16:14:51,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=477640.0, ans=0.025 2024-09-18 16:15:42,637 INFO [train.py:1198] (0/2) Epoch 27, batch 1800, loss[loss=0.2507, ctc_loss=0.1272, cr_loss=0.3613, attn_decoder_loss=0.2564, over 29691.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.125, cr_loss=0.3673, attn_decoder_loss=0.2447, over 5790555.29 frames. ], batch size: 83, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:15:50,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=477800.0, ans=0.2 2024-09-18 16:16:00,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-18 16:16:17,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=477880.0, ans=0.125 2024-09-18 16:16:31,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.306e+01 8.965e+01 9.478e+01 1.194e+02, threshold=1.793e+02, percent-clipped=0.0 2024-09-18 16:16:35,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477920.0, ans=0.125 2024-09-18 16:16:35,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2024-09-18 16:16:45,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=477960.0, ans=0.125 2024-09-18 16:16:55,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=477960.0, ans=0.95 2024-09-18 16:16:59,471 INFO [train.py:1198] (0/2) Epoch 27, batch 1850, loss[loss=0.2491, ctc_loss=0.1188, cr_loss=0.359, attn_decoder_loss=0.2555, over 29627.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1246, cr_loss=0.3663, attn_decoder_loss=0.2442, over 5795156.91 frames. ], batch size: 86, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:17:06,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2024-09-18 16:17:16,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2024-09-18 16:17:19,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=478040.0, ans=0.125 2024-09-18 16:17:44,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478080.0, ans=0.125 2024-09-18 16:17:47,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=478120.0, ans=0.0 2024-09-18 16:17:55,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2024-09-18 16:18:12,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=478160.0, ans=0.035 2024-09-18 16:18:17,764 INFO [train.py:1198] (0/2) Epoch 27, batch 1900, loss[loss=0.2456, ctc_loss=0.1226, cr_loss=0.3718, attn_decoder_loss=0.2509, over 29713.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1249, cr_loss=0.367, attn_decoder_loss=0.2448, over 5803278.60 frames. ], batch size: 89, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:18:31,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=478240.0, ans=0.125 2024-09-18 16:18:44,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=478240.0, ans=0.125 2024-09-18 16:18:57,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=478280.0, ans=0.1 2024-09-18 16:19:01,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=478280.0, ans=0.0 2024-09-18 16:19:01,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff3.min_abs, batch_count=478280.0, ans=0.2 2024-09-18 16:19:08,922 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.594e+01 9.103e+01 9.777e+01 2.715e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-18 16:19:09,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478320.0, ans=0.1 2024-09-18 16:19:35,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478400.0, ans=0.1 2024-09-18 16:19:35,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=478400.0, ans=0.125 2024-09-18 16:19:36,746 INFO [train.py:1198] (0/2) Epoch 27, batch 1950, loss[loss=0.2423, ctc_loss=0.1278, cr_loss=0.3703, attn_decoder_loss=0.2468, over 29458.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1259, cr_loss=0.3698, attn_decoder_loss=0.2462, over 5818434.88 frames. ], batch size: 78, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:19:38,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478400.0, ans=0.1 2024-09-18 16:20:12,024 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:20:25,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=478520.0, ans=0.125 2024-09-18 16:20:26,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=478520.0, ans=0.025 2024-09-18 16:20:52,625 INFO [train.py:1198] (0/2) Epoch 27, batch 2000, loss[loss=0.2188, ctc_loss=0.1169, cr_loss=0.3494, attn_decoder_loss=0.2224, over 29325.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1262, cr_loss=0.3699, attn_decoder_loss=0.2465, over 5795938.91 frames. ], batch size: 67, lr: 4.11e-03, grad_scale: 16.0 2024-09-18 16:21:05,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478600.0, ans=0.0 2024-09-18 16:21:38,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2024-09-18 16:21:45,122 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.586e+01 9.013e+01 9.702e+01 5.300e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-18 16:21:53,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.14 vs. limit=15.0 2024-09-18 16:21:55,288 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.56 vs. limit=15.0 2024-09-18 16:22:00,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=478760.0, ans=0.0 2024-09-18 16:22:10,908 INFO [train.py:1198] (0/2) Epoch 27, batch 2050, loss[loss=0.211, ctc_loss=0.1039, cr_loss=0.3165, attn_decoder_loss=0.2159, over 29448.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1255, cr_loss=0.3684, attn_decoder_loss=0.2454, over 5788679.82 frames. ], batch size: 70, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:22:11,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=478800.0, ans=0.2 2024-09-18 16:22:19,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=22.5 2024-09-18 16:22:36,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=478840.0, ans=0.125 2024-09-18 16:22:52,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=478880.0, ans=0.125 2024-09-18 16:22:57,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=478920.0, ans=0.125 2024-09-18 16:23:09,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=12.0 2024-09-18 16:23:19,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-09-18 16:23:20,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478960.0, ans=0.0 2024-09-18 16:23:21,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=478960.0, ans=0.125 2024-09-18 16:23:28,893 INFO [train.py:1198] (0/2) Epoch 27, batch 2100, loss[loss=0.2335, ctc_loss=0.1145, cr_loss=0.3557, attn_decoder_loss=0.2389, over 29750.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1249, cr_loss=0.3678, attn_decoder_loss=0.245, over 5802062.48 frames. ], batch size: 81, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:23:41,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=479000.0, ans=0.025 2024-09-18 16:23:49,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=479040.0, ans=0.125 2024-09-18 16:23:57,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=479080.0, ans=0.2 2024-09-18 16:24:04,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.17 vs. limit=10.0 2024-09-18 16:24:18,658 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.253e+01 8.787e+01 9.429e+01 1.232e+02, threshold=1.757e+02, percent-clipped=0.0 2024-09-18 16:24:38,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479160.0, ans=0.1 2024-09-18 16:24:45,041 INFO [train.py:1198] (0/2) Epoch 27, batch 2150, loss[loss=0.2394, ctc_loss=0.1233, cr_loss=0.3604, attn_decoder_loss=0.2443, over 29444.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1244, cr_loss=0.3666, attn_decoder_loss=0.2444, over 5816418.53 frames. ], batch size: 78, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:24:56,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=479200.0, ans=0.125 2024-09-18 16:25:02,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=479240.0, ans=0.125 2024-09-18 16:25:28,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=479280.0, ans=0.125 2024-09-18 16:25:51,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=479360.0, ans=0.125 2024-09-18 16:25:58,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2024-09-18 16:26:03,687 INFO [train.py:1198] (0/2) Epoch 27, batch 2200, loss[loss=0.2577, ctc_loss=0.1342, cr_loss=0.402, attn_decoder_loss=0.2625, over 29636.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1252, cr_loss=0.3684, attn_decoder_loss=0.2448, over 5812586.48 frames. ], batch size: 86, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:26:07,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=479400.0, ans=0.0 2024-09-18 16:26:23,555 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:26:49,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=479480.0, ans=15.0 2024-09-18 16:26:55,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.471e+01 9.024e+01 9.757e+01 3.508e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-18 16:27:11,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-18 16:27:20,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=479600.0, ans=0.125 2024-09-18 16:27:21,641 INFO [train.py:1198] (0/2) Epoch 27, batch 2250, loss[loss=0.2365, ctc_loss=0.1119, cr_loss=0.3285, attn_decoder_loss=0.243, over 29701.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.125, cr_loss=0.368, attn_decoder_loss=0.2447, over 5810801.06 frames. ], batch size: 82, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:27:44,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=479640.0, ans=0.025 2024-09-18 16:27:53,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=479680.0, ans=0.5 2024-09-18 16:28:01,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=479680.0, ans=0.0 2024-09-18 16:28:10,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=479720.0, ans=0.95 2024-09-18 16:28:22,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=479760.0, ans=0.025 2024-09-18 16:28:24,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=479760.0, ans=0.0 2024-09-18 16:28:37,661 INFO [train.py:1198] (0/2) Epoch 27, batch 2300, loss[loss=0.2152, ctc_loss=0.1078, cr_loss=0.3431, attn_decoder_loss=0.2195, over 29291.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1248, cr_loss=0.3675, attn_decoder_loss=0.244, over 5798696.72 frames. ], batch size: 71, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:28:42,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479800.0, ans=0.1 2024-09-18 16:28:55,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=479840.0, ans=0.125 2024-09-18 16:29:24,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=479920.0, ans=0.0 2024-09-18 16:29:29,673 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.211e+01 8.889e+01 9.358e+01 1.563e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-18 16:29:31,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=479920.0, ans=0.2 2024-09-18 16:29:42,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479960.0, ans=0.1 2024-09-18 16:29:54,606 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-120000.pt 2024-09-18 16:30:02,918 INFO [train.py:1198] (0/2) Epoch 27, batch 2350, loss[loss=0.2498, ctc_loss=0.1235, cr_loss=0.365, attn_decoder_loss=0.2557, over 29697.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1248, cr_loss=0.3673, attn_decoder_loss=0.2443, over 5803673.30 frames. ], batch size: 83, lr: 4.11e-03, grad_scale: 8.0 2024-09-18 16:30:15,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=480000.0, ans=0.2 2024-09-18 16:30:19,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=480040.0, ans=0.0 2024-09-18 16:30:25,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=480040.0, ans=0.02 2024-09-18 16:30:27,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=480040.0, ans=0.125 2024-09-18 16:30:29,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.91 vs. limit=12.0 2024-09-18 16:30:45,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=480080.0, ans=0.2 2024-09-18 16:31:11,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=480160.0, ans=0.0 2024-09-18 16:31:15,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480160.0, ans=0.1 2024-09-18 16:31:20,880 INFO [train.py:1198] (0/2) Epoch 27, batch 2400, loss[loss=0.2373, ctc_loss=0.1238, cr_loss=0.3798, attn_decoder_loss=0.2414, over 29540.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1251, cr_loss=0.368, attn_decoder_loss=0.2448, over 5807651.55 frames. ], batch size: 76, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 16:31:22,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480200.0, ans=0.1 2024-09-18 16:31:22,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=480200.0, ans=0.125 2024-09-18 16:31:25,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=480200.0, ans=0.125 2024-09-18 16:31:38,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=480240.0, ans=0.125 2024-09-18 16:31:43,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-18 16:31:55,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.45 vs. limit=22.5 2024-09-18 16:32:02,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=480280.0, ans=0.025 2024-09-18 16:32:12,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.717e+01 9.101e+01 9.636e+01 2.464e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-18 16:32:24,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=480360.0, ans=0.0 2024-09-18 16:32:36,819 INFO [train.py:1198] (0/2) Epoch 27, batch 2450, loss[loss=0.2473, ctc_loss=0.1347, cr_loss=0.3619, attn_decoder_loss=0.2518, over 29708.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1257, cr_loss=0.3693, attn_decoder_loss=0.2457, over 5783287.67 frames. ], batch size: 82, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:32:54,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.44 vs. limit=6.0 2024-09-18 16:33:01,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=480440.0, ans=0.07 2024-09-18 16:33:04,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=480440.0, ans=0.0 2024-09-18 16:33:20,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.94 vs. limit=22.5 2024-09-18 16:33:30,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=480520.0, ans=0.125 2024-09-18 16:33:35,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=480520.0, ans=0.0 2024-09-18 16:33:54,966 INFO [train.py:1198] (0/2) Epoch 27, batch 2500, loss[loss=0.246, ctc_loss=0.1175, cr_loss=0.3533, attn_decoder_loss=0.2524, over 29636.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1255, cr_loss=0.3689, attn_decoder_loss=0.2456, over 5794617.62 frames. ], batch size: 86, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:33:59,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=480600.0, ans=0.125 2024-09-18 16:34:05,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=480600.0, ans=0.025 2024-09-18 16:34:21,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=480640.0, ans=0.2 2024-09-18 16:34:24,612 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:34:24,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=480680.0, ans=0.125 2024-09-18 16:34:35,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2024-09-18 16:34:36,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=480680.0, ans=0.0 2024-09-18 16:34:43,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=480720.0, ans=0.2 2024-09-18 16:34:49,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.457e+01 8.825e+01 9.370e+01 1.600e+02, threshold=1.765e+02, percent-clipped=0.0 2024-09-18 16:35:14,346 INFO [train.py:1198] (0/2) Epoch 27, batch 2550, loss[loss=0.2202, ctc_loss=0.1125, cr_loss=0.3374, attn_decoder_loss=0.2246, over 29326.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1254, cr_loss=0.3684, attn_decoder_loss=0.2454, over 5797330.63 frames. ], batch size: 67, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:35:19,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=480800.0, ans=0.125 2024-09-18 16:35:25,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=480800.0, ans=0.0 2024-09-18 16:35:34,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=480840.0, ans=0.1 2024-09-18 16:35:37,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=480840.0, ans=0.04949747468305833 2024-09-18 16:35:49,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480880.0, ans=0.125 2024-09-18 16:35:55,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=480880.0, ans=0.07 2024-09-18 16:36:01,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=480920.0, ans=0.125 2024-09-18 16:36:03,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=480920.0, ans=0.125 2024-09-18 16:36:30,359 INFO [train.py:1198] (0/2) Epoch 27, batch 2600, loss[loss=0.2382, ctc_loss=0.126, cr_loss=0.3867, attn_decoder_loss=0.242, over 29446.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1252, cr_loss=0.3683, attn_decoder_loss=0.2457, over 5794059.86 frames. ], batch size: 78, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:36:47,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=481040.0, ans=0.125 2024-09-18 16:36:56,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=481040.0, ans=0.1 2024-09-18 16:37:15,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-09-18 16:37:15,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=481080.0, ans=0.125 2024-09-18 16:37:24,585 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.374e+01 8.989e+01 9.651e+01 1.905e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-18 16:37:48,743 INFO [train.py:1198] (0/2) Epoch 27, batch 2650, loss[loss=0.2677, ctc_loss=0.1472, cr_loss=0.4027, attn_decoder_loss=0.2721, over 29241.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1255, cr_loss=0.3691, attn_decoder_loss=0.246, over 5800935.59 frames. ], batch size: 100, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:38:02,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=481240.0, ans=0.0 2024-09-18 16:38:16,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=12.0 2024-09-18 16:38:19,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=481280.0, ans=0.125 2024-09-18 16:38:22,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=481280.0, ans=0.5 2024-09-18 16:38:28,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=481280.0, ans=0.0 2024-09-18 16:38:45,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=481320.0, ans=0.0 2024-09-18 16:38:53,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481360.0, ans=0.1 2024-09-18 16:39:05,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=481400.0, ans=0.05 2024-09-18 16:39:06,855 INFO [train.py:1198] (0/2) Epoch 27, batch 2700, loss[loss=0.2663, ctc_loss=0.1447, cr_loss=0.4078, attn_decoder_loss=0.2707, over 29536.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.126, cr_loss=0.3697, attn_decoder_loss=0.2464, over 5796443.58 frames. ], batch size: 87, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:39:22,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=481440.0, ans=0.125 2024-09-18 16:39:22,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=481440.0, ans=0.125 2024-09-18 16:39:22,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=481440.0, ans=0.0 2024-09-18 16:39:58,199 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.531e+01 8.958e+01 9.495e+01 1.703e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 16:40:03,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2024-09-18 16:40:19,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481560.0, ans=0.125 2024-09-18 16:40:22,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.30 vs. limit=12.0 2024-09-18 16:40:23,152 INFO [train.py:1198] (0/2) Epoch 27, batch 2750, loss[loss=0.2253, ctc_loss=0.1131, cr_loss=0.3509, attn_decoder_loss=0.2299, over 29523.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1252, cr_loss=0.3678, attn_decoder_loss=0.245, over 5794909.23 frames. ], batch size: 75, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:40:33,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2024-09-18 16:40:36,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-09-18 16:40:43,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=481640.0, ans=0.125 2024-09-18 16:40:44,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=481640.0, ans=0.05 2024-09-18 16:40:46,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=481640.0, ans=0.125 2024-09-18 16:41:12,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481720.0, ans=0.125 2024-09-18 16:41:15,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=22.5 2024-09-18 16:41:17,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=481720.0, ans=0.0 2024-09-18 16:41:29,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=481760.0, ans=0.2 2024-09-18 16:41:41,644 INFO [train.py:1198] (0/2) Epoch 27, batch 2800, loss[loss=0.2631, ctc_loss=0.157, cr_loss=0.4018, attn_decoder_loss=0.266, over 20048.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1253, cr_loss=0.3675, attn_decoder_loss=0.245, over 5776233.41 frames. ], batch size: 209, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 16:41:55,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=481840.0, ans=0.0 2024-09-18 16:41:57,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=481840.0, ans=0.0 2024-09-18 16:42:01,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=481840.0, ans=0.0 2024-09-18 16:42:09,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=481840.0, ans=0.125 2024-09-18 16:42:13,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=481880.0, ans=0.125 2024-09-18 16:42:25,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481920.0, ans=0.1 2024-09-18 16:42:34,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=481920.0, ans=10.0 2024-09-18 16:42:35,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.581e+01 9.268e+01 9.879e+01 2.017e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-18 16:42:43,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=481960.0, ans=0.2 2024-09-18 16:42:47,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=481960.0, ans=0.0 2024-09-18 16:42:59,618 INFO [train.py:1198] (0/2) Epoch 27, batch 2850, loss[loss=0.2342, ctc_loss=0.1266, cr_loss=0.3715, attn_decoder_loss=0.2379, over 29503.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1263, cr_loss=0.3695, attn_decoder_loss=0.2457, over 5761849.93 frames. ], batch size: 77, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 16:43:34,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=482080.0, ans=0.0 2024-09-18 16:43:43,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=482120.0, ans=0.125 2024-09-18 16:43:52,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.01 vs. limit=15.0 2024-09-18 16:43:57,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=482120.0, ans=0.125 2024-09-18 16:44:07,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=482160.0, ans=0.125 2024-09-18 16:44:15,381 INFO [train.py:1198] (0/2) Epoch 27, batch 2900, loss[loss=0.2326, ctc_loss=0.1167, cr_loss=0.356, attn_decoder_loss=0.2376, over 29424.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1266, cr_loss=0.3705, attn_decoder_loss=0.2466, over 5787281.01 frames. ], batch size: 79, lr: 4.10e-03, grad_scale: 8.0 2024-09-18 16:44:18,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=482200.0, ans=0.0 2024-09-18 16:44:20,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482200.0, ans=0.1 2024-09-18 16:44:36,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=482240.0, ans=0.02 2024-09-18 16:44:40,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-09-18 16:45:04,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=482320.0, ans=0.125 2024-09-18 16:45:12,316 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.546e+01 8.987e+01 9.686e+01 7.083e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 16:45:15,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=482320.0, ans=0.09899494936611666 2024-09-18 16:45:21,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=482360.0, ans=10.0 2024-09-18 16:45:28,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-09-18 16:45:33,863 INFO [train.py:1198] (0/2) Epoch 27, batch 2950, loss[loss=0.234, ctc_loss=0.1211, cr_loss=0.3629, attn_decoder_loss=0.2385, over 29540.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1254, cr_loss=0.3677, attn_decoder_loss=0.2452, over 5782023.57 frames. ], batch size: 75, lr: 4.09e-03, grad_scale: 4.0 2024-09-18 16:45:35,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=482400.0, ans=0.125 2024-09-18 16:45:50,865 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:46:15,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482480.0, ans=0.1 2024-09-18 16:46:38,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=482560.0, ans=0.09899494936611666 2024-09-18 16:46:48,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=482560.0, ans=0.2 2024-09-18 16:46:52,446 INFO [train.py:1198] (0/2) Epoch 27, batch 3000, loss[loss=0.2291, ctc_loss=0.1205, cr_loss=0.3475, attn_decoder_loss=0.2334, over 29770.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1251, cr_loss=0.3671, attn_decoder_loss=0.245, over 5783858.03 frames. ], batch size: 81, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:46:52,446 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 16:47:10,904 INFO [train.py:1230] (0/2) Epoch 27, validation: loss=0.212, ctc_loss=0.03868, cr_loss=6.15e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 16:47:10,905 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 16:47:11,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=482600.0, ans=0.0 2024-09-18 16:47:11,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-18 16:47:40,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2024-09-18 16:47:43,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=482680.0, ans=0.0 2024-09-18 16:47:52,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=482680.0, ans=0.1 2024-09-18 16:48:05,514 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.646e+01 9.161e+01 1.019e+02 2.247e+02, threshold=1.832e+02, percent-clipped=1.0 2024-09-18 16:48:16,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=482760.0, ans=0.1 2024-09-18 16:48:26,847 INFO [train.py:1198] (0/2) Epoch 27, batch 3050, loss[loss=0.2301, ctc_loss=0.113, cr_loss=0.3387, attn_decoder_loss=0.2356, over 29529.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1256, cr_loss=0.3673, attn_decoder_loss=0.2458, over 5776509.67 frames. ], batch size: 76, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:48:38,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=482800.0, ans=0.05 2024-09-18 16:48:40,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=482800.0, ans=0.0 2024-09-18 16:48:58,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=482880.0, ans=0.125 2024-09-18 16:49:20,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=482920.0, ans=0.0 2024-09-18 16:49:30,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=482960.0, ans=0.125 2024-09-18 16:49:40,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482960.0, ans=0.1 2024-09-18 16:49:42,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=482960.0, ans=0.125 2024-09-18 16:49:44,756 INFO [train.py:1198] (0/2) Epoch 27, batch 3100, loss[loss=0.2479, ctc_loss=0.1343, cr_loss=0.3847, attn_decoder_loss=0.252, over 29307.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1251, cr_loss=0.3666, attn_decoder_loss=0.2452, over 5776045.98 frames. ], batch size: 100, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:49:46,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=483000.0, ans=0.125 2024-09-18 16:50:12,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483040.0, ans=0.125 2024-09-18 16:50:13,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=483080.0, ans=0.125 2024-09-18 16:50:14,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-18 16:50:15,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=483080.0, ans=0.125 2024-09-18 16:50:22,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-09-18 16:50:32,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=483120.0, ans=0.0 2024-09-18 16:50:41,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.612e+01 9.047e+01 9.758e+01 3.006e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-18 16:50:43,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=483120.0, ans=0.0 2024-09-18 16:51:03,302 INFO [train.py:1198] (0/2) Epoch 27, batch 3150, loss[loss=0.2599, ctc_loss=0.1402, cr_loss=0.388, attn_decoder_loss=0.2646, over 28832.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1251, cr_loss=0.3669, attn_decoder_loss=0.2452, over 5782399.03 frames. ], batch size: 104, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:51:51,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=483320.0, ans=0.0 2024-09-18 16:51:59,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=483320.0, ans=0.125 2024-09-18 16:52:03,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=483360.0, ans=0.125 2024-09-18 16:52:16,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=483360.0, ans=0.025 2024-09-18 16:52:18,921 INFO [train.py:1198] (0/2) Epoch 27, batch 3200, loss[loss=0.2404, ctc_loss=0.1303, cr_loss=0.3872, attn_decoder_loss=0.244, over 29412.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1247, cr_loss=0.3668, attn_decoder_loss=0.2445, over 5792648.46 frames. ], batch size: 79, lr: 4.09e-03, grad_scale: 16.0 2024-09-18 16:52:22,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=483400.0, ans=0.2 2024-09-18 16:52:24,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2024-09-18 16:52:30,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=483400.0, ans=0.0 2024-09-18 16:52:52,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=483480.0, ans=0.0 2024-09-18 16:53:16,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.478e+01 8.969e+01 9.595e+01 1.807e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 16:53:24,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=483560.0, ans=0.125 2024-09-18 16:53:27,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=483560.0, ans=0.2 2024-09-18 16:53:34,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=22.5 2024-09-18 16:53:37,346 INFO [train.py:1198] (0/2) Epoch 27, batch 3250, loss[loss=0.2437, ctc_loss=0.1227, cr_loss=0.356, attn_decoder_loss=0.2492, over 29696.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1248, cr_loss=0.3674, attn_decoder_loss=0.245, over 5799421.93 frames. ], batch size: 84, lr: 4.09e-03, grad_scale: 16.0 2024-09-18 16:53:46,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=483600.0, ans=0.125 2024-09-18 16:53:49,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=483600.0, ans=0.0 2024-09-18 16:53:49,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=483600.0, ans=0.125 2024-09-18 16:53:49,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=483600.0, ans=0.1 2024-09-18 16:54:07,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=483680.0, ans=0.2 2024-09-18 16:54:34,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-09-18 16:54:54,880 INFO [train.py:1198] (0/2) Epoch 27, batch 3300, loss[loss=0.2473, ctc_loss=0.1255, cr_loss=0.3667, attn_decoder_loss=0.2526, over 28316.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.124, cr_loss=0.3657, attn_decoder_loss=0.2439, over 5795434.50 frames. ], batch size: 111, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:55:04,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=483800.0, ans=0.125 2024-09-18 16:55:22,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483840.0, ans=0.1 2024-09-18 16:55:33,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=483880.0, ans=0.125 2024-09-18 16:55:33,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-09-18 16:55:50,532 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.485e+01 9.035e+01 9.621e+01 1.592e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-18 16:56:00,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=483960.0, ans=0.0 2024-09-18 16:56:09,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=484000.0, ans=0.05 2024-09-18 16:56:10,664 INFO [train.py:1198] (0/2) Epoch 27, batch 3350, loss[loss=0.2529, ctc_loss=0.13, cr_loss=0.3661, attn_decoder_loss=0.2585, over 28822.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1251, cr_loss=0.3674, attn_decoder_loss=0.2449, over 5774139.39 frames. ], batch size: 104, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:56:17,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=484000.0, ans=0.125 2024-09-18 16:56:25,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484000.0, ans=0.1 2024-09-18 16:56:37,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=484040.0, ans=0.0 2024-09-18 16:57:05,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484120.0, ans=0.125 2024-09-18 16:57:11,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.50 vs. limit=10.0 2024-09-18 16:57:12,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484160.0, ans=0.1 2024-09-18 16:57:28,520 INFO [train.py:1198] (0/2) Epoch 27, batch 3400, loss[loss=0.218, ctc_loss=0.1137, cr_loss=0.3378, attn_decoder_loss=0.2221, over 29368.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1256, cr_loss=0.368, attn_decoder_loss=0.2451, over 5767629.20 frames. ], batch size: 67, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:58:00,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=484280.0, ans=0.125 2024-09-18 16:58:07,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=484280.0, ans=0.5 2024-09-18 16:58:26,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.585e+01 9.028e+01 9.662e+01 1.590e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-18 16:58:32,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2024-09-18 16:58:36,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.44 vs. limit=15.0 2024-09-18 16:58:39,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=484360.0, ans=0.0 2024-09-18 16:58:46,435 INFO [train.py:1198] (0/2) Epoch 27, batch 3450, loss[loss=0.2433, ctc_loss=0.119, cr_loss=0.3456, attn_decoder_loss=0.2494, over 28326.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1256, cr_loss=0.3679, attn_decoder_loss=0.2452, over 5774992.32 frames. ], batch size: 111, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 16:58:58,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=484400.0, ans=0.035 2024-09-18 16:59:09,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=484440.0, ans=0.125 2024-09-18 16:59:14,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=484440.0, ans=0.1 2024-09-18 16:59:15,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484480.0, ans=0.1 2024-09-18 16:59:18,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=484480.0, ans=0.0 2024-09-18 16:59:39,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=484520.0, ans=0.1 2024-09-18 16:59:50,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=484560.0, ans=0.1 2024-09-18 17:00:04,106 INFO [train.py:1198] (0/2) Epoch 27, batch 3500, loss[loss=0.2197, ctc_loss=0.1048, cr_loss=0.3193, attn_decoder_loss=0.2254, over 29308.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1255, cr_loss=0.3675, attn_decoder_loss=0.2448, over 5777172.87 frames. ], batch size: 71, lr: 4.09e-03, grad_scale: 8.0 2024-09-18 17:00:44,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=484680.0, ans=0.125 2024-09-18 17:00:59,675 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.562e+01 8.977e+01 9.669e+01 2.220e+02, threshold=1.795e+02, percent-clipped=2.0 2024-09-18 17:01:02,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=484760.0, ans=0.0 2024-09-18 17:01:19,574 INFO [train.py:1198] (0/2) Epoch 27, batch 3550, loss[loss=0.2569, ctc_loss=0.1339, cr_loss=0.4018, attn_decoder_loss=0.2617, over 29727.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1253, cr_loss=0.3677, attn_decoder_loss=0.2448, over 5783914.34 frames. ], batch size: 89, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:02:08,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=484920.0, ans=0.125 2024-09-18 17:02:33,738 INFO [train.py:1198] (0/2) Epoch 27, batch 3600, loss[loss=0.2446, ctc_loss=0.1298, cr_loss=0.3763, attn_decoder_loss=0.249, over 29498.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1251, cr_loss=0.3673, attn_decoder_loss=0.2449, over 5792005.68 frames. ], batch size: 77, lr: 4.08e-03, grad_scale: 16.0 2024-09-18 17:02:43,084 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:02:50,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=485040.0, ans=0.05 2024-09-18 17:03:19,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=485120.0, ans=0.125 2024-09-18 17:03:30,862 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.400e+01 9.013e+01 9.523e+01 1.334e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 17:03:32,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=485120.0, ans=0.125 2024-09-18 17:03:35,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=485160.0, ans=0.125 2024-09-18 17:03:42,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=485160.0, ans=0.125 2024-09-18 17:03:45,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=485160.0, ans=0.0 2024-09-18 17:03:50,129 INFO [train.py:1198] (0/2) Epoch 27, batch 3650, loss[loss=0.2578, ctc_loss=0.138, cr_loss=0.3956, attn_decoder_loss=0.2623, over 29495.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1251, cr_loss=0.3674, attn_decoder_loss=0.2446, over 5794148.14 frames. ], batch size: 90, lr: 4.08e-03, grad_scale: 16.0 2024-09-18 17:03:50,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=485200.0, ans=0.125 2024-09-18 17:03:53,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=485200.0, ans=0.025 2024-09-18 17:04:06,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=485240.0, ans=0.0 2024-09-18 17:04:19,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-09-18 17:04:23,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=485280.0, ans=0.125 2024-09-18 17:04:23,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485280.0, ans=0.1 2024-09-18 17:04:29,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=485280.0, ans=0.1 2024-09-18 17:04:29,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=485280.0, ans=0.125 2024-09-18 17:04:36,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-18 17:04:38,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=485320.0, ans=0.0 2024-09-18 17:05:04,962 INFO [train.py:1198] (0/2) Epoch 27, batch 3700, loss[loss=0.2461, ctc_loss=0.1258, cr_loss=0.3747, attn_decoder_loss=0.2511, over 29697.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1252, cr_loss=0.3679, attn_decoder_loss=0.2449, over 5804420.64 frames. ], batch size: 84, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:05:12,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-18 17:05:49,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485520.0, ans=0.125 2024-09-18 17:05:55,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=485520.0, ans=0.0 2024-09-18 17:06:00,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=485520.0, ans=0.0 2024-09-18 17:06:00,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=485520.0, ans=0.0 2024-09-18 17:06:01,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.031e+01 8.546e+01 8.927e+01 9.450e+01 1.781e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-18 17:06:04,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485560.0, ans=0.125 2024-09-18 17:06:18,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=485560.0, ans=0.0 2024-09-18 17:06:21,381 INFO [train.py:1198] (0/2) Epoch 27, batch 3750, loss[loss=0.2184, ctc_loss=0.1088, cr_loss=0.3288, attn_decoder_loss=0.2233, over 29311.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1251, cr_loss=0.368, attn_decoder_loss=0.2446, over 5807492.06 frames. ], batch size: 67, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:06:23,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2024-09-18 17:06:24,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=485600.0, ans=0.125 2024-09-18 17:06:32,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=485600.0, ans=0.2 2024-09-18 17:06:43,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=485640.0, ans=0.125 2024-09-18 17:06:51,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=485680.0, ans=0.125 2024-09-18 17:07:06,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485720.0, ans=0.125 2024-09-18 17:07:35,760 INFO [train.py:1198] (0/2) Epoch 27, batch 3800, loss[loss=0.2502, ctc_loss=0.1296, cr_loss=0.3728, attn_decoder_loss=0.2553, over 29641.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1254, cr_loss=0.3687, attn_decoder_loss=0.2446, over 5798210.51 frames. ], batch size: 86, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:07:36,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=485800.0, ans=0.2 2024-09-18 17:07:48,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=485800.0, ans=0.0 2024-09-18 17:08:22,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.65 vs. limit=15.0 2024-09-18 17:08:25,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=485920.0, ans=0.0 2024-09-18 17:08:31,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=485920.0, ans=0.125 2024-09-18 17:08:32,466 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.550e+01 9.227e+01 9.705e+01 1.468e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-18 17:08:35,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=485960.0, ans=0.0 2024-09-18 17:08:37,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=485960.0, ans=0.125 2024-09-18 17:08:49,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=486000.0, ans=0.0 2024-09-18 17:08:50,198 INFO [train.py:1198] (0/2) Epoch 27, batch 3850, loss[loss=0.255, ctc_loss=0.1325, cr_loss=0.3768, attn_decoder_loss=0.2602, over 29310.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1252, cr_loss=0.3688, attn_decoder_loss=0.2447, over 5812804.19 frames. ], batch size: 100, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:08:57,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486000.0, ans=0.1 2024-09-18 17:08:57,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=486000.0, ans=0.09899494936611666 2024-09-18 17:09:14,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-09-18 17:09:30,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=486080.0, ans=0.125 2024-09-18 17:09:52,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2024-09-18 17:10:06,033 INFO [train.py:1198] (0/2) Epoch 27, batch 3900, loss[loss=0.2488, ctc_loss=0.1242, cr_loss=0.3373, attn_decoder_loss=0.2551, over 29635.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1252, cr_loss=0.3687, attn_decoder_loss=0.2449, over 5816900.57 frames. ], batch size: 86, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:10:26,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486240.0, ans=0.1 2024-09-18 17:10:35,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486280.0, ans=0.125 2024-09-18 17:10:41,807 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:10:43,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=486280.0, ans=0.2 2024-09-18 17:11:02,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.580e+01 9.073e+01 9.587e+01 1.534e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-18 17:11:07,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=486360.0, ans=10.0 2024-09-18 17:11:20,657 INFO [train.py:1198] (0/2) Epoch 27, batch 3950, loss[loss=0.2541, ctc_loss=0.1306, cr_loss=0.3947, attn_decoder_loss=0.2591, over 29499.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1246, cr_loss=0.3679, attn_decoder_loss=0.2445, over 5836080.57 frames. ], batch size: 97, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:11:25,412 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:12:11,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=486520.0, ans=0.125 2024-09-18 17:12:28,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=486560.0, ans=0.0 2024-09-18 17:12:35,440 INFO [train.py:1198] (0/2) Epoch 27, batch 4000, loss[loss=0.2239, ctc_loss=0.1078, cr_loss=0.3326, attn_decoder_loss=0.2295, over 29531.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1247, cr_loss=0.3675, attn_decoder_loss=0.2446, over 5812415.69 frames. ], batch size: 74, lr: 4.08e-03, grad_scale: 16.0 2024-09-18 17:13:02,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.44 vs. limit=10.0 2024-09-18 17:13:06,770 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:13:08,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=486680.0, ans=0.125 2024-09-18 17:13:33,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.740e+01 9.217e+01 9.696e+01 1.612e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-18 17:13:49,531 INFO [train.py:1198] (0/2) Epoch 27, batch 4050, loss[loss=0.2636, ctc_loss=0.1596, cr_loss=0.3991, attn_decoder_loss=0.2663, over 20779.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1248, cr_loss=0.3674, attn_decoder_loss=0.2444, over 5796739.94 frames. ], batch size: 211, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:13:52,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486800.0, ans=0.125 2024-09-18 17:13:54,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=486800.0, ans=0.125 2024-09-18 17:13:55,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=486800.0, ans=0.0 2024-09-18 17:14:07,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=486840.0, ans=0.09899494936611666 2024-09-18 17:14:26,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486880.0, ans=0.1 2024-09-18 17:15:03,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=22.5 2024-09-18 17:15:04,289 INFO [train.py:1198] (0/2) Epoch 27, batch 4100, loss[loss=0.2573, ctc_loss=0.139, cr_loss=0.4108, attn_decoder_loss=0.2613, over 29492.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1252, cr_loss=0.3679, attn_decoder_loss=0.2448, over 5792473.89 frames. ], batch size: 90, lr: 4.08e-03, grad_scale: 8.0 2024-09-18 17:15:17,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=487040.0, ans=0.125 2024-09-18 17:15:37,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487080.0, ans=0.1 2024-09-18 17:15:46,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2024-09-18 17:16:03,239 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.410e+01 8.915e+01 9.592e+01 1.452e+02, threshold=1.783e+02, percent-clipped=0.0 2024-09-18 17:16:13,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487160.0, ans=0.1 2024-09-18 17:16:19,986 INFO [train.py:1198] (0/2) Epoch 27, batch 4150, loss[loss=0.231, ctc_loss=0.1116, cr_loss=0.3421, attn_decoder_loss=0.2367, over 29503.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1249, cr_loss=0.3674, attn_decoder_loss=0.2447, over 5798520.97 frames. ], batch size: 77, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:16:29,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=487200.0, ans=0.0 2024-09-18 17:16:58,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=487280.0, ans=0.025 2024-09-18 17:16:59,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=487280.0, ans=0.125 2024-09-18 17:17:06,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=487320.0, ans=0.2 2024-09-18 17:17:20,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=487360.0, ans=0.125 2024-09-18 17:17:32,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=487400.0, ans=0.0 2024-09-18 17:17:33,697 INFO [train.py:1198] (0/2) Epoch 27, batch 4200, loss[loss=0.262, ctc_loss=0.1476, cr_loss=0.4338, attn_decoder_loss=0.2651, over 29519.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1251, cr_loss=0.3678, attn_decoder_loss=0.2451, over 5800317.11 frames. ], batch size: 90, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:17:37,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-18 17:17:47,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=487440.0, ans=0.0 2024-09-18 17:17:55,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.71 vs. limit=12.0 2024-09-18 17:18:32,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.514e+01 8.963e+01 9.288e+01 3.975e+02, threshold=1.793e+02, percent-clipped=1.0 2024-09-18 17:18:48,514 INFO [train.py:1198] (0/2) Epoch 27, batch 4250, loss[loss=0.2266, ctc_loss=0.1141, cr_loss=0.3409, attn_decoder_loss=0.2315, over 29520.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1253, cr_loss=0.3683, attn_decoder_loss=0.2452, over 5806147.75 frames. ], batch size: 74, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:18:54,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=487600.0, ans=0.07 2024-09-18 17:19:06,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=487640.0, ans=0.0 2024-09-18 17:19:09,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=487640.0, ans=0.125 2024-09-18 17:19:25,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=487680.0, ans=0.125 2024-09-18 17:20:02,926 INFO [train.py:1198] (0/2) Epoch 27, batch 4300, loss[loss=0.2575, ctc_loss=0.1386, cr_loss=0.4013, attn_decoder_loss=0.2618, over 29536.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1255, cr_loss=0.3693, attn_decoder_loss=0.2455, over 5795622.57 frames. ], batch size: 87, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:20:04,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=487800.0, ans=0.125 2024-09-18 17:20:19,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=487840.0, ans=0.0 2024-09-18 17:20:46,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=487920.0, ans=0.5 2024-09-18 17:21:00,647 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.751e+01 9.154e+01 9.778e+01 2.419e+02, threshold=1.831e+02, percent-clipped=1.0 2024-09-18 17:21:17,483 INFO [train.py:1198] (0/2) Epoch 27, batch 4350, loss[loss=0.2539, ctc_loss=0.1349, cr_loss=0.4017, attn_decoder_loss=0.2582, over 29423.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.128, cr_loss=0.3741, attn_decoder_loss=0.2485, over 5797573.24 frames. ], batch size: 97, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:21:20,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=488000.0, ans=0.2 2024-09-18 17:21:25,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=488000.0, ans=0.125 2024-09-18 17:21:43,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=488040.0, ans=0.125 2024-09-18 17:21:44,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=488040.0, ans=0.125 2024-09-18 17:21:53,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=488080.0, ans=0.125 2024-09-18 17:21:55,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=488080.0, ans=0.125 2024-09-18 17:21:56,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488080.0, ans=0.1 2024-09-18 17:22:03,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=488120.0, ans=0.0 2024-09-18 17:22:22,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488160.0, ans=0.1 2024-09-18 17:22:28,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.72 vs. limit=10.0 2024-09-18 17:22:32,264 INFO [train.py:1198] (0/2) Epoch 27, batch 4400, loss[loss=0.2621, ctc_loss=0.144, cr_loss=0.4197, attn_decoder_loss=0.2659, over 27691.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1293, cr_loss=0.3765, attn_decoder_loss=0.2506, over 5768792.46 frames. ], batch size: 125, lr: 4.07e-03, grad_scale: 16.0 2024-09-18 17:22:42,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=488200.0, ans=0.125 2024-09-18 17:22:47,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-18 17:22:53,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=488240.0, ans=0.125 2024-09-18 17:22:56,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=488240.0, ans=0.125 2024-09-18 17:23:00,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-18 17:23:07,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=488280.0, ans=0.0 2024-09-18 17:23:12,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488280.0, ans=0.1 2024-09-18 17:23:29,590 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 8.897e+01 9.375e+01 9.833e+01 4.108e+02, threshold=1.875e+02, percent-clipped=1.0 2024-09-18 17:23:42,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=488360.0, ans=0.125 2024-09-18 17:23:46,272 INFO [train.py:1198] (0/2) Epoch 27, batch 4450, loss[loss=0.2672, ctc_loss=0.1664, cr_loss=0.416, attn_decoder_loss=0.2692, over 20209.00 frames. ], tot_loss[loss=0.2485, ctc_loss=0.1333, cr_loss=0.3819, attn_decoder_loss=0.2528, over 5583409.54 frames. ], batch size: 210, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:23:50,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=488400.0, ans=0.07 2024-09-18 17:24:00,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=488440.0, ans=0.125 2024-09-18 17:24:03,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=488440.0, ans=0.0 2024-09-18 17:24:28,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=488480.0, ans=0.125 2024-09-18 17:24:38,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=488520.0, ans=0.0 2024-09-18 17:24:43,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-09-18 17:24:55,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=488560.0, ans=0.125 2024-09-18 17:24:55,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=488560.0, ans=0.125 2024-09-18 17:25:02,154 INFO [train.py:1198] (0/2) Epoch 27, batch 4500, loss[loss=0.2558, ctc_loss=0.1455, cr_loss=0.3984, attn_decoder_loss=0.2592, over 20015.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1366, cr_loss=0.3828, attn_decoder_loss=0.2548, over 5238580.84 frames. ], batch size: 209, lr: 4.07e-03, grad_scale: 8.0 2024-09-18 17:25:25,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488640.0, ans=0.1 2024-09-18 17:25:29,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=488640.0, ans=0.05 2024-09-18 17:25:39,617 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-27.pt 2024-09-18 17:26:25,154 INFO [train.py:1198] (0/2) Epoch 28, batch 0, loss[loss=0.2157, ctc_loss=0.09692, cr_loss=0.3006, attn_decoder_loss=0.2222, over 29581.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.09692, cr_loss=0.3006, attn_decoder_loss=0.2222, over 29581.00 frames. ], batch size: 73, lr: 3.99e-03, grad_scale: 16.0 2024-09-18 17:26:25,155 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 17:26:45,481 INFO [train.py:1230] (0/2) Epoch 28, validation: loss=0.2131, ctc_loss=0.0377, cr_loss=5.605e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-18 17:26:45,481 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 17:27:07,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=488740.0, ans=0.025 2024-09-18 17:27:07,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=488740.0, ans=0.125 2024-09-18 17:27:07,701 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-18 17:27:09,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 1.052e+02 1.136e+02 1.230e+02 3.342e+02, threshold=2.271e+02, percent-clipped=3.0 2024-09-18 17:27:10,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=10.36 vs. limit=12.0 2024-09-18 17:27:23,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.55 vs. limit=15.0 2024-09-18 17:27:34,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=488820.0, ans=0.125 2024-09-18 17:27:54,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=488860.0, ans=0.0 2024-09-18 17:28:01,705 INFO [train.py:1198] (0/2) Epoch 28, batch 50, loss[loss=0.2166, ctc_loss=0.1101, cr_loss=0.3457, attn_decoder_loss=0.2208, over 29403.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1269, cr_loss=0.3721, attn_decoder_loss=0.2462, over 1268783.47 frames. ], batch size: 70, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:28:17,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.46 vs. limit=15.0 2024-09-18 17:28:20,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=488940.0, ans=0.025 2024-09-18 17:28:21,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=488940.0, ans=0.2 2024-09-18 17:28:29,599 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:28:35,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-18 17:28:52,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.35 vs. limit=22.5 2024-09-18 17:28:58,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=489020.0, ans=0.0 2024-09-18 17:29:07,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=489060.0, ans=0.125 2024-09-18 17:29:17,675 INFO [train.py:1198] (0/2) Epoch 28, batch 100, loss[loss=0.2333, ctc_loss=0.1266, cr_loss=0.3769, attn_decoder_loss=0.2368, over 29552.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.128, cr_loss=0.3737, attn_decoder_loss=0.2482, over 2253421.04 frames. ], batch size: 76, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:29:23,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=489100.0, ans=0.0 2024-09-18 17:29:41,539 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.514e+01 8.987e+01 9.639e+01 1.687e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-18 17:30:05,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2024-09-18 17:30:08,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489220.0, ans=0.0 2024-09-18 17:30:17,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=489260.0, ans=0.025 2024-09-18 17:30:35,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=489300.0, ans=0.125 2024-09-18 17:30:36,915 INFO [train.py:1198] (0/2) Epoch 28, batch 150, loss[loss=0.2077, ctc_loss=0.1001, cr_loss=0.3322, attn_decoder_loss=0.2123, over 29446.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1253, cr_loss=0.3688, attn_decoder_loss=0.2455, over 3048532.00 frames. ], batch size: 70, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:30:45,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.28 vs. limit=15.0 2024-09-18 17:30:47,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=489300.0, ans=0.125 2024-09-18 17:31:01,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=489340.0, ans=0.125 2024-09-18 17:31:07,423 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:31:33,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.66 vs. limit=15.0 2024-09-18 17:31:35,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-09-18 17:31:44,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=489460.0, ans=0.0 2024-09-18 17:31:52,051 INFO [train.py:1198] (0/2) Epoch 28, batch 200, loss[loss=0.2543, ctc_loss=0.1332, cr_loss=0.3744, attn_decoder_loss=0.2595, over 27159.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.124, cr_loss=0.3663, attn_decoder_loss=0.2442, over 3659567.05 frames. ], batch size: 124, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:31:54,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=489500.0, ans=0.2 2024-09-18 17:32:00,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=489500.0, ans=0.0 2024-09-18 17:32:06,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=489540.0, ans=0.09899494936611666 2024-09-18 17:32:08,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-09-18 17:32:09,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489540.0, ans=0.1 2024-09-18 17:32:09,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2024-09-18 17:32:16,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.292e+01 9.011e+01 9.460e+01 1.346e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-18 17:32:33,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=489580.0, ans=0.125 2024-09-18 17:32:37,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-09-18 17:32:43,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=489620.0, ans=0.125 2024-09-18 17:33:08,529 INFO [train.py:1198] (0/2) Epoch 28, batch 250, loss[loss=0.2506, ctc_loss=0.1364, cr_loss=0.3843, attn_decoder_loss=0.2548, over 29260.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1239, cr_loss=0.3664, attn_decoder_loss=0.2441, over 4141398.11 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:33:09,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-18 17:33:16,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=489700.0, ans=0.0 2024-09-18 17:33:21,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=489700.0, ans=0.0 2024-09-18 17:33:27,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-18 17:33:30,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=489740.0, ans=0.125 2024-09-18 17:33:31,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489740.0, ans=0.0 2024-09-18 17:33:47,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=489780.0, ans=0.0 2024-09-18 17:33:48,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=489780.0, ans=0.0 2024-09-18 17:34:00,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=489820.0, ans=0.125 2024-09-18 17:34:20,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=489860.0, ans=0.125 2024-09-18 17:34:25,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=489900.0, ans=10.0 2024-09-18 17:34:26,611 INFO [train.py:1198] (0/2) Epoch 28, batch 300, loss[loss=0.263, ctc_loss=0.1378, cr_loss=0.4058, attn_decoder_loss=0.2679, over 29533.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1234, cr_loss=0.3652, attn_decoder_loss=0.2438, over 4510818.57 frames. ], batch size: 92, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:34:41,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=489900.0, ans=0.125 2024-09-18 17:34:50,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=489940.0, ans=0.0 2024-09-18 17:34:50,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=489940.0, ans=0.125 2024-09-18 17:34:53,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.453e+01 8.832e+01 9.524e+01 1.905e+02, threshold=1.766e+02, percent-clipped=1.0 2024-09-18 17:34:58,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-18 17:35:09,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.31 vs. limit=6.0 2024-09-18 17:35:12,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-09-18 17:35:37,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=490060.0, ans=0.0 2024-09-18 17:35:44,882 INFO [train.py:1198] (0/2) Epoch 28, batch 350, loss[loss=0.219, ctc_loss=0.1044, cr_loss=0.3239, attn_decoder_loss=0.2246, over 29718.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1237, cr_loss=0.366, attn_decoder_loss=0.2445, over 4796395.22 frames. ], batch size: 72, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:36:06,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2024-09-18 17:36:29,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=490220.0, ans=0.2 2024-09-18 17:36:32,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-09-18 17:36:50,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=490260.0, ans=0.0 2024-09-18 17:36:56,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=490260.0, ans=0.0 2024-09-18 17:37:00,262 INFO [train.py:1198] (0/2) Epoch 28, batch 400, loss[loss=0.2418, ctc_loss=0.1233, cr_loss=0.3626, attn_decoder_loss=0.2469, over 29725.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1234, cr_loss=0.366, attn_decoder_loss=0.2444, over 5025156.70 frames. ], batch size: 82, lr: 3.99e-03, grad_scale: 16.0 2024-09-18 17:37:22,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=490340.0, ans=0.0 2024-09-18 17:37:26,414 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.632e+01 9.035e+01 9.717e+01 2.941e+02, threshold=1.807e+02, percent-clipped=3.0 2024-09-18 17:37:43,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=490380.0, ans=0.125 2024-09-18 17:37:52,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-09-18 17:38:19,616 INFO [train.py:1198] (0/2) Epoch 28, batch 450, loss[loss=0.2424, ctc_loss=0.1284, cr_loss=0.3797, attn_decoder_loss=0.2467, over 29689.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1241, cr_loss=0.3674, attn_decoder_loss=0.2446, over 5185589.89 frames. ], batch size: 83, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:38:43,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=490540.0, ans=0.0 2024-09-18 17:38:54,590 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:39:00,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=490580.0, ans=0.0 2024-09-18 17:39:29,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=490660.0, ans=0.125 2024-09-18 17:39:32,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=490660.0, ans=0.0 2024-09-18 17:39:38,418 INFO [train.py:1198] (0/2) Epoch 28, batch 500, loss[loss=0.2613, ctc_loss=0.135, cr_loss=0.3838, attn_decoder_loss=0.2668, over 29420.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1232, cr_loss=0.3657, attn_decoder_loss=0.2437, over 5328287.09 frames. ], batch size: 94, lr: 3.99e-03, grad_scale: 8.0 2024-09-18 17:39:54,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=490740.0, ans=0.125 2024-09-18 17:39:59,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.08 vs. limit=22.5 2024-09-18 17:40:04,209 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.478e+01 8.864e+01 9.440e+01 1.535e+02, threshold=1.773e+02, percent-clipped=0.0 2024-09-18 17:40:16,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=490780.0, ans=0.125 2024-09-18 17:40:41,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=490860.0, ans=0.125 2024-09-18 17:40:42,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=490860.0, ans=0.125 2024-09-18 17:40:54,418 INFO [train.py:1198] (0/2) Epoch 28, batch 550, loss[loss=0.2478, ctc_loss=0.1275, cr_loss=0.3793, attn_decoder_loss=0.2528, over 28909.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1237, cr_loss=0.3662, attn_decoder_loss=0.244, over 5422723.98 frames. ], batch size: 104, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:40:56,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=490900.0, ans=0.0 2024-09-18 17:41:01,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2024-09-18 17:41:08,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-18 17:41:22,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-09-18 17:41:56,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=491060.0, ans=0.125 2024-09-18 17:41:56,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=491060.0, ans=0.0 2024-09-18 17:42:06,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=491060.0, ans=0.025 2024-09-18 17:42:12,517 INFO [train.py:1198] (0/2) Epoch 28, batch 600, loss[loss=0.2572, ctc_loss=0.1376, cr_loss=0.3932, attn_decoder_loss=0.2617, over 29324.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.124, cr_loss=0.3669, attn_decoder_loss=0.2442, over 5508291.78 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:42:21,807 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:42:29,274 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:42:40,165 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.281e+01 8.877e+01 9.486e+01 1.809e+02, threshold=1.775e+02, percent-clipped=1.0 2024-09-18 17:42:53,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2024-09-18 17:43:05,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=491220.0, ans=0.04949747468305833 2024-09-18 17:43:15,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=491260.0, ans=0.95 2024-09-18 17:43:22,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=491260.0, ans=0.125 2024-09-18 17:43:29,932 INFO [train.py:1198] (0/2) Epoch 28, batch 650, loss[loss=0.2402, ctc_loss=0.1324, cr_loss=0.3889, attn_decoder_loss=0.2436, over 29778.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1231, cr_loss=0.3659, attn_decoder_loss=0.2435, over 5585636.97 frames. ], batch size: 81, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:43:45,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=491340.0, ans=0.2 2024-09-18 17:43:45,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=491340.0, ans=0.04949747468305833 2024-09-18 17:44:14,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=491420.0, ans=0.0 2024-09-18 17:44:19,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=491420.0, ans=0.125 2024-09-18 17:44:21,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=491420.0, ans=0.0 2024-09-18 17:44:34,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=491460.0, ans=0.125 2024-09-18 17:44:46,049 INFO [train.py:1198] (0/2) Epoch 28, batch 700, loss[loss=0.2385, ctc_loss=0.1223, cr_loss=0.3821, attn_decoder_loss=0.243, over 29538.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1234, cr_loss=0.3664, attn_decoder_loss=0.244, over 5636398.80 frames. ], batch size: 76, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:44:46,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=491500.0, ans=0.125 2024-09-18 17:44:58,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=491500.0, ans=0.0 2024-09-18 17:44:58,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=491500.0, ans=0.0 2024-09-18 17:45:11,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.262e+01 8.777e+01 9.267e+01 2.724e+02, threshold=1.755e+02, percent-clipped=1.0 2024-09-18 17:45:29,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=491580.0, ans=0.125 2024-09-18 17:45:40,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.52 vs. limit=15.0 2024-09-18 17:45:53,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=491660.0, ans=0.125 2024-09-18 17:46:01,809 INFO [train.py:1198] (0/2) Epoch 28, batch 750, loss[loss=0.2545, ctc_loss=0.1407, cr_loss=0.4031, attn_decoder_loss=0.2582, over 29691.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1232, cr_loss=0.3659, attn_decoder_loss=0.2437, over 5676151.64 frames. ], batch size: 82, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:46:13,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=491700.0, ans=0.0 2024-09-18 17:46:20,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=491740.0, ans=0.0 2024-09-18 17:46:51,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2024-09-18 17:47:17,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=491860.0, ans=0.0 2024-09-18 17:47:21,495 INFO [train.py:1198] (0/2) Epoch 28, batch 800, loss[loss=0.215, ctc_loss=0.1035, cr_loss=0.3151, attn_decoder_loss=0.2204, over 29575.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1231, cr_loss=0.3655, attn_decoder_loss=0.2435, over 5707356.34 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 16.0 2024-09-18 17:47:32,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=491900.0, ans=0.0 2024-09-18 17:47:47,343 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.491e+01 9.037e+01 9.523e+01 1.873e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 17:47:49,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491940.0, ans=0.125 2024-09-18 17:48:27,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-09-18 17:48:32,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=492060.0, ans=0.5 2024-09-18 17:48:37,062 INFO [train.py:1198] (0/2) Epoch 28, batch 850, loss[loss=0.2565, ctc_loss=0.1305, cr_loss=0.3827, attn_decoder_loss=0.262, over 29703.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1232, cr_loss=0.3658, attn_decoder_loss=0.2436, over 5736830.54 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:48:37,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492100.0, ans=0.1 2024-09-18 17:49:03,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=12.0 2024-09-18 17:49:14,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-18 17:49:16,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=492180.0, ans=0.05 2024-09-18 17:49:19,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=492180.0, ans=0.125 2024-09-18 17:49:37,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=492260.0, ans=0.125 2024-09-18 17:49:52,779 INFO [train.py:1198] (0/2) Epoch 28, batch 900, loss[loss=0.2133, ctc_loss=0.1, cr_loss=0.319, attn_decoder_loss=0.2188, over 29628.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1235, cr_loss=0.3663, attn_decoder_loss=0.244, over 5741806.84 frames. ], batch size: 73, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:50:17,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=492340.0, ans=0.0 2024-09-18 17:50:19,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492340.0, ans=0.1 2024-09-18 17:50:21,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.505e+01 9.006e+01 9.829e+01 2.830e+02, threshold=1.801e+02, percent-clipped=3.0 2024-09-18 17:50:22,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492340.0, ans=0.125 2024-09-18 17:50:26,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492380.0, ans=0.125 2024-09-18 17:50:37,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=12.0 2024-09-18 17:50:44,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=492420.0, ans=0.0 2024-09-18 17:50:44,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=492420.0, ans=0.2 2024-09-18 17:50:59,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=492460.0, ans=0.125 2024-09-18 17:51:12,933 INFO [train.py:1198] (0/2) Epoch 28, batch 950, loss[loss=0.222, ctc_loss=0.1138, cr_loss=0.3498, attn_decoder_loss=0.2263, over 29538.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1239, cr_loss=0.3672, attn_decoder_loss=0.2444, over 5742777.37 frames. ], batch size: 74, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:51:13,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=492500.0, ans=0.0 2024-09-18 17:51:17,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=492500.0, ans=0.125 2024-09-18 17:51:19,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=492500.0, ans=0.125 2024-09-18 17:51:23,921 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:51:42,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=492580.0, ans=0.125 2024-09-18 17:51:45,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=492580.0, ans=0.125 2024-09-18 17:52:13,766 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:52:15,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=492660.0, ans=0.125 2024-09-18 17:52:28,271 INFO [train.py:1198] (0/2) Epoch 28, batch 1000, loss[loss=0.2315, ctc_loss=0.1205, cr_loss=0.3629, attn_decoder_loss=0.2358, over 29492.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1244, cr_loss=0.3678, attn_decoder_loss=0.245, over 5737258.43 frames. ], batch size: 77, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:52:41,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.90 vs. limit=15.0 2024-09-18 17:52:55,756 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.563e+01 9.173e+01 1.012e+02 1.591e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-18 17:53:07,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-18 17:53:08,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=492780.0, ans=0.1 2024-09-18 17:53:11,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=492780.0, ans=0.0 2024-09-18 17:53:27,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=492860.0, ans=0.025 2024-09-18 17:53:37,057 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:53:46,488 INFO [train.py:1198] (0/2) Epoch 28, batch 1050, loss[loss=0.2466, ctc_loss=0.1224, cr_loss=0.3601, attn_decoder_loss=0.2524, over 29676.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1237, cr_loss=0.3663, attn_decoder_loss=0.2442, over 5745139.33 frames. ], batch size: 85, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:54:22,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.63 vs. limit=15.0 2024-09-18 17:54:28,184 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:54:33,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=493020.0, ans=0.04949747468305833 2024-09-18 17:54:48,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.96 vs. limit=22.5 2024-09-18 17:54:54,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=493060.0, ans=0.025 2024-09-18 17:54:57,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=493060.0, ans=10.0 2024-09-18 17:55:04,308 INFO [train.py:1198] (0/2) Epoch 28, batch 1100, loss[loss=0.2335, ctc_loss=0.1238, cr_loss=0.3552, attn_decoder_loss=0.2378, over 29446.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1237, cr_loss=0.3661, attn_decoder_loss=0.2439, over 5758345.28 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:55:12,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=493100.0, ans=0.125 2024-09-18 17:55:29,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.31 vs. limit=15.0 2024-09-18 17:55:31,739 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.310e+01 8.930e+01 9.558e+01 2.939e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-18 17:55:32,155 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:55:54,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=493220.0, ans=0.125 2024-09-18 17:56:02,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=493220.0, ans=0.0 2024-09-18 17:56:14,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=493260.0, ans=0.0 2024-09-18 17:56:19,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=493300.0, ans=0.125 2024-09-18 17:56:20,581 INFO [train.py:1198] (0/2) Epoch 28, batch 1150, loss[loss=0.229, ctc_loss=0.1113, cr_loss=0.3349, attn_decoder_loss=0.2346, over 29464.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1236, cr_loss=0.3659, attn_decoder_loss=0.2439, over 5755219.47 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 8.0 2024-09-18 17:56:33,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=493300.0, ans=0.125 2024-09-18 17:56:42,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493340.0, ans=0.1 2024-09-18 17:56:42,217 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:57:38,536 INFO [train.py:1198] (0/2) Epoch 28, batch 1200, loss[loss=0.2504, ctc_loss=0.136, cr_loss=0.3568, attn_decoder_loss=0.2552, over 29671.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1244, cr_loss=0.3675, attn_decoder_loss=0.2446, over 5747577.92 frames. ], batch size: 85, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 17:58:06,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493540.0, ans=0.1 2024-09-18 17:58:07,207 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.554e+01 9.030e+01 9.625e+01 2.213e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-18 17:58:12,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=493580.0, ans=0.2 2024-09-18 17:58:13,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2024-09-18 17:58:25,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=493620.0, ans=0.1 2024-09-18 17:58:39,291 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:58:45,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=493660.0, ans=0.125 2024-09-18 17:58:56,918 INFO [train.py:1198] (0/2) Epoch 28, batch 1250, loss[loss=0.2539, ctc_loss=0.1277, cr_loss=0.3895, attn_decoder_loss=0.2593, over 29546.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1247, cr_loss=0.3685, attn_decoder_loss=0.2448, over 5775109.85 frames. ], batch size: 92, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 17:58:58,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=493700.0, ans=0.07 2024-09-18 17:58:58,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=493700.0, ans=0.1 2024-09-18 17:59:12,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=493740.0, ans=0.125 2024-09-18 17:59:42,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=493820.0, ans=0.0 2024-09-18 17:59:42,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=493820.0, ans=0.125 2024-09-18 17:59:52,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=493820.0, ans=0.125 2024-09-18 17:59:59,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=493860.0, ans=0.125 2024-09-18 18:00:13,105 INFO [train.py:1198] (0/2) Epoch 28, batch 1300, loss[loss=0.2512, ctc_loss=0.1244, cr_loss=0.3546, attn_decoder_loss=0.2574, over 28422.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1245, cr_loss=0.3675, attn_decoder_loss=0.2444, over 5780605.07 frames. ], batch size: 111, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:00:19,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=493900.0, ans=0.0 2024-09-18 18:00:25,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=493900.0, ans=0.125 2024-09-18 18:00:36,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=493940.0, ans=0.025 2024-09-18 18:00:40,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=493940.0, ans=0.125 2024-09-18 18:00:41,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.590e+01 9.154e+01 9.575e+01 1.829e+02, threshold=1.831e+02, percent-clipped=1.0 2024-09-18 18:00:48,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493980.0, ans=0.1 2024-09-18 18:01:00,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=494020.0, ans=0.0 2024-09-18 18:01:02,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=494020.0, ans=0.025 2024-09-18 18:01:07,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=494020.0, ans=0.0 2024-09-18 18:01:18,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=494060.0, ans=0.1 2024-09-18 18:01:29,044 INFO [train.py:1198] (0/2) Epoch 28, batch 1350, loss[loss=0.2379, ctc_loss=0.1179, cr_loss=0.3579, attn_decoder_loss=0.2433, over 29764.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1238, cr_loss=0.3659, attn_decoder_loss=0.2439, over 5799246.92 frames. ], batch size: 81, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:01:39,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=494100.0, ans=0.125 2024-09-18 18:01:48,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494140.0, ans=0.1 2024-09-18 18:01:53,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=494140.0, ans=0.125 2024-09-18 18:01:58,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=494140.0, ans=0.0 2024-09-18 18:02:22,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=494220.0, ans=0.125 2024-09-18 18:02:26,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=494220.0, ans=0.125 2024-09-18 18:02:32,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=494260.0, ans=0.035 2024-09-18 18:02:48,567 INFO [train.py:1198] (0/2) Epoch 28, batch 1400, loss[loss=0.2064, ctc_loss=0.1028, cr_loss=0.3101, attn_decoder_loss=0.211, over 29606.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1237, cr_loss=0.3659, attn_decoder_loss=0.2437, over 5809316.60 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:03:07,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=15.0 2024-09-18 18:03:17,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.548e+01 9.065e+01 9.786e+01 1.272e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-18 18:03:24,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2024-09-18 18:03:42,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=494420.0, ans=0.125 2024-09-18 18:03:47,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=494420.0, ans=0.0 2024-09-18 18:04:04,979 INFO [train.py:1198] (0/2) Epoch 28, batch 1450, loss[loss=0.2595, ctc_loss=0.1345, cr_loss=0.4021, attn_decoder_loss=0.2645, over 29469.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.124, cr_loss=0.3663, attn_decoder_loss=0.2444, over 5804923.36 frames. ], batch size: 94, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:04:22,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=494540.0, ans=0.0 2024-09-18 18:04:44,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494580.0, ans=0.1 2024-09-18 18:04:49,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494620.0, ans=0.1 2024-09-18 18:05:13,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494660.0, ans=0.1 2024-09-18 18:05:15,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=494660.0, ans=0.025 2024-09-18 18:05:20,930 INFO [train.py:1198] (0/2) Epoch 28, batch 1500, loss[loss=0.2539, ctc_loss=0.1239, cr_loss=0.3694, attn_decoder_loss=0.2601, over 29636.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1243, cr_loss=0.3668, attn_decoder_loss=0.2448, over 5806175.37 frames. ], batch size: 86, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:05:31,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.61 vs. limit=15.0 2024-09-18 18:05:38,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=494740.0, ans=0.125 2024-09-18 18:05:49,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494740.0, ans=0.1 2024-09-18 18:05:52,395 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.636e+01 9.142e+01 9.701e+01 7.436e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-18 18:06:12,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=494820.0, ans=12.0 2024-09-18 18:06:13,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=494820.0, ans=0.0 2024-09-18 18:06:41,533 INFO [train.py:1198] (0/2) Epoch 28, batch 1550, loss[loss=0.2532, ctc_loss=0.1378, cr_loss=0.3838, attn_decoder_loss=0.2575, over 29482.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1246, cr_loss=0.3667, attn_decoder_loss=0.2449, over 5780978.22 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:07:04,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=494940.0, ans=0.125 2024-09-18 18:07:04,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=494940.0, ans=0.07 2024-09-18 18:07:06,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2024-09-18 18:07:19,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=494980.0, ans=0.0 2024-09-18 18:07:33,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.58 vs. limit=10.0 2024-09-18 18:07:38,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.50 vs. limit=6.0 2024-09-18 18:07:45,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=495060.0, ans=0.125 2024-09-18 18:07:53,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-09-18 18:07:57,363 INFO [train.py:1198] (0/2) Epoch 28, batch 1600, loss[loss=0.2478, ctc_loss=0.1233, cr_loss=0.3663, attn_decoder_loss=0.2535, over 29655.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1246, cr_loss=0.3665, attn_decoder_loss=0.2447, over 5763807.06 frames. ], batch size: 85, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 18:08:14,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495140.0, ans=0.1 2024-09-18 18:08:18,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=495140.0, ans=0.0 2024-09-18 18:08:27,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.529e+01 9.034e+01 9.836e+01 1.943e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 18:08:45,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=495220.0, ans=0.125 2024-09-18 18:09:05,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=495260.0, ans=0.0 2024-09-18 18:09:15,398 INFO [train.py:1198] (0/2) Epoch 28, batch 1650, loss[loss=0.258, ctc_loss=0.1394, cr_loss=0.4175, attn_decoder_loss=0.2619, over 29707.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1245, cr_loss=0.3666, attn_decoder_loss=0.2446, over 5758153.19 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:09:18,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=495300.0, ans=0.125 2024-09-18 18:09:24,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=495300.0, ans=0.125 2024-09-18 18:09:24,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=495300.0, ans=0.0 2024-09-18 18:09:27,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=495300.0, ans=0.0 2024-09-18 18:09:29,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=495340.0, ans=0.0 2024-09-18 18:09:31,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=495340.0, ans=0.125 2024-09-18 18:09:31,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=495340.0, ans=0.125 2024-09-18 18:09:31,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=495340.0, ans=0.025 2024-09-18 18:09:40,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=495340.0, ans=0.125 2024-09-18 18:09:47,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=495380.0, ans=0.125 2024-09-18 18:09:58,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=495380.0, ans=0.0 2024-09-18 18:10:10,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=495420.0, ans=0.0 2024-09-18 18:10:18,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=495460.0, ans=0.0 2024-09-18 18:10:23,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=495460.0, ans=0.125 2024-09-18 18:10:23,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2024-09-18 18:10:33,326 INFO [train.py:1198] (0/2) Epoch 28, batch 1700, loss[loss=0.2217, ctc_loss=0.1071, cr_loss=0.3307, attn_decoder_loss=0.2271, over 29582.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1246, cr_loss=0.3675, attn_decoder_loss=0.2445, over 5779217.84 frames. ], batch size: 69, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:10:36,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=495500.0, ans=0.125 2024-09-18 18:10:45,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=495500.0, ans=0.0 2024-09-18 18:10:57,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=495540.0, ans=0.0 2024-09-18 18:11:03,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.597e+01 9.283e+01 9.916e+01 1.626e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-18 18:11:08,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=495580.0, ans=0.0 2024-09-18 18:11:08,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-09-18 18:11:09,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=495580.0, ans=0.125 2024-09-18 18:11:26,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=495620.0, ans=0.1 2024-09-18 18:11:49,156 INFO [train.py:1198] (0/2) Epoch 28, batch 1750, loss[loss=0.2141, ctc_loss=0.1066, cr_loss=0.3323, attn_decoder_loss=0.2186, over 29357.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1238, cr_loss=0.3666, attn_decoder_loss=0.2439, over 5787504.55 frames. ], batch size: 67, lr: 3.97e-03, grad_scale: 8.0 2024-09-18 18:11:53,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=495700.0, ans=0.125 2024-09-18 18:11:57,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=495700.0, ans=0.125 2024-09-18 18:12:17,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495740.0, ans=0.1 2024-09-18 18:12:32,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.48 vs. limit=22.5 2024-09-18 18:12:50,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=495860.0, ans=0.1 2024-09-18 18:13:05,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=495900.0, ans=0.0 2024-09-18 18:13:07,144 INFO [train.py:1198] (0/2) Epoch 28, batch 1800, loss[loss=0.2711, ctc_loss=0.1563, cr_loss=0.457, attn_decoder_loss=0.2737, over 29681.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1243, cr_loss=0.3675, attn_decoder_loss=0.2443, over 5790967.49 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:13:07,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=495900.0, ans=0.0 2024-09-18 18:13:15,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=495900.0, ans=0.0 2024-09-18 18:13:25,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495940.0, ans=0.125 2024-09-18 18:13:37,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.359e+01 8.858e+01 9.396e+01 1.273e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-18 18:13:44,064 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-124000.pt 2024-09-18 18:14:02,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.14 vs. limit=22.5 2024-09-18 18:14:12,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=496020.0, ans=0.0 2024-09-18 18:14:20,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=496060.0, ans=0.0 2024-09-18 18:14:32,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-18 18:14:32,849 INFO [train.py:1198] (0/2) Epoch 28, batch 1850, loss[loss=0.2554, ctc_loss=0.1345, cr_loss=0.3809, attn_decoder_loss=0.2604, over 29653.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1239, cr_loss=0.3667, attn_decoder_loss=0.2442, over 5797187.42 frames. ], batch size: 86, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:14:39,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-18 18:14:49,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=496140.0, ans=0.09899494936611666 2024-09-18 18:14:53,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496140.0, ans=0.0 2024-09-18 18:14:54,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496140.0, ans=0.1 2024-09-18 18:15:12,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=496180.0, ans=0.2 2024-09-18 18:15:28,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-09-18 18:15:48,303 INFO [train.py:1198] (0/2) Epoch 28, batch 1900, loss[loss=0.2471, ctc_loss=0.1235, cr_loss=0.3644, attn_decoder_loss=0.2528, over 29708.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1243, cr_loss=0.3676, attn_decoder_loss=0.2447, over 5804662.39 frames. ], batch size: 89, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:15:50,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.74 vs. limit=15.0 2024-09-18 18:16:01,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-18 18:16:06,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=496340.0, ans=0.125 2024-09-18 18:16:14,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=496340.0, ans=0.125 2024-09-18 18:16:18,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.544e+01 9.072e+01 9.391e+01 1.587e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-18 18:16:28,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.05 vs. limit=10.0 2024-09-18 18:16:46,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=496420.0, ans=0.125 2024-09-18 18:16:46,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496420.0, ans=0.1 2024-09-18 18:16:54,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2024-09-18 18:17:06,241 INFO [train.py:1198] (0/2) Epoch 28, batch 1950, loss[loss=0.2384, ctc_loss=0.1292, cr_loss=0.3873, attn_decoder_loss=0.2419, over 29449.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.125, cr_loss=0.3692, attn_decoder_loss=0.2458, over 5819832.79 frames. ], batch size: 78, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:17:25,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-18 18:17:28,446 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2024-09-18 18:17:45,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=496580.0, ans=22.5 2024-09-18 18:17:45,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=496580.0, ans=0.2 2024-09-18 18:18:00,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496620.0, ans=0.125 2024-09-18 18:18:13,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.15 vs. limit=15.0 2024-09-18 18:18:24,157 INFO [train.py:1198] (0/2) Epoch 28, batch 2000, loss[loss=0.2205, ctc_loss=0.1166, cr_loss=0.3461, attn_decoder_loss=0.2244, over 29324.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1255, cr_loss=0.3695, attn_decoder_loss=0.2462, over 5797610.63 frames. ], batch size: 67, lr: 3.96e-03, grad_scale: 16.0 2024-09-18 18:18:55,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.591e+01 9.006e+01 9.471e+01 1.475e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 18:19:31,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=496860.0, ans=0.125 2024-09-18 18:19:40,000 INFO [train.py:1198] (0/2) Epoch 28, batch 2050, loss[loss=0.2185, ctc_loss=0.1104, cr_loss=0.3364, attn_decoder_loss=0.223, over 29455.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1251, cr_loss=0.3688, attn_decoder_loss=0.2455, over 5788274.19 frames. ], batch size: 70, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:19:40,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=496900.0, ans=0.5 2024-09-18 18:19:59,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=496940.0, ans=0.125 2024-09-18 18:20:36,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=497020.0, ans=0.0 2024-09-18 18:20:45,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=497060.0, ans=0.2 2024-09-18 18:20:58,299 INFO [train.py:1198] (0/2) Epoch 28, batch 2100, loss[loss=0.2402, ctc_loss=0.1186, cr_loss=0.3512, attn_decoder_loss=0.2459, over 29760.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1244, cr_loss=0.3677, attn_decoder_loss=0.2449, over 5800164.97 frames. ], batch size: 81, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:21:29,768 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.428e+01 8.818e+01 9.232e+01 1.075e+02, threshold=1.764e+02, percent-clipped=0.0 2024-09-18 18:21:37,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=497180.0, ans=0.125 2024-09-18 18:21:43,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=497220.0, ans=0.2 2024-09-18 18:21:55,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=497220.0, ans=0.1 2024-09-18 18:22:00,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=497260.0, ans=0.125 2024-09-18 18:22:04,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=497260.0, ans=0.125 2024-09-18 18:22:13,527 INFO [train.py:1198] (0/2) Epoch 28, batch 2150, loss[loss=0.2358, ctc_loss=0.13, cr_loss=0.387, attn_decoder_loss=0.2389, over 29458.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1235, cr_loss=0.3663, attn_decoder_loss=0.2441, over 5813959.48 frames. ], batch size: 78, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:22:15,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=497300.0, ans=0.125 2024-09-18 18:22:17,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.27 vs. limit=15.0 2024-09-18 18:22:22,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=497300.0, ans=0.2 2024-09-18 18:23:01,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=497420.0, ans=0.125 2024-09-18 18:23:01,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=497420.0, ans=0.125 2024-09-18 18:23:06,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.82 vs. limit=15.0 2024-09-18 18:23:06,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=15.0 2024-09-18 18:23:31,677 INFO [train.py:1198] (0/2) Epoch 28, batch 2200, loss[loss=0.2538, ctc_loss=0.1393, cr_loss=0.3941, attn_decoder_loss=0.2577, over 29624.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1238, cr_loss=0.3671, attn_decoder_loss=0.2443, over 5809929.24 frames. ], batch size: 86, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:23:48,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=12.0 2024-09-18 18:23:54,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=497540.0, ans=0.125 2024-09-18 18:23:59,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-18 18:24:03,366 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.572e+01 8.974e+01 9.491e+01 1.804e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-18 18:24:03,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=497580.0, ans=0.0 2024-09-18 18:24:19,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=497620.0, ans=0.07 2024-09-18 18:24:19,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=497620.0, ans=0.07 2024-09-18 18:24:30,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=497620.0, ans=0.0 2024-09-18 18:24:42,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.47 vs. limit=15.0 2024-09-18 18:24:46,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=497700.0, ans=0.125 2024-09-18 18:24:47,950 INFO [train.py:1198] (0/2) Epoch 28, batch 2250, loss[loss=0.2345, ctc_loss=0.1229, cr_loss=0.3496, attn_decoder_loss=0.2391, over 29722.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1238, cr_loss=0.3666, attn_decoder_loss=0.244, over 5809044.57 frames. ], batch size: 82, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:24:56,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=497700.0, ans=0.0 2024-09-18 18:25:04,196 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:25:05,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=497740.0, ans=0.0 2024-09-18 18:25:17,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=497740.0, ans=0.2 2024-09-18 18:25:23,983 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:25:28,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=497780.0, ans=0.025 2024-09-18 18:25:43,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.05 vs. limit=22.5 2024-09-18 18:26:05,886 INFO [train.py:1198] (0/2) Epoch 28, batch 2300, loss[loss=0.21, ctc_loss=0.1003, cr_loss=0.3077, attn_decoder_loss=0.2154, over 29318.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1233, cr_loss=0.3655, attn_decoder_loss=0.2432, over 5797540.21 frames. ], batch size: 71, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:26:06,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=497900.0, ans=0.025 2024-09-18 18:26:32,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=497940.0, ans=0.0 2024-09-18 18:26:39,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.383e+01 8.665e+01 9.441e+01 6.698e+02, threshold=1.733e+02, percent-clipped=3.0 2024-09-18 18:26:45,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497980.0, ans=0.1 2024-09-18 18:26:49,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=497980.0, ans=0.125 2024-09-18 18:26:49,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-09-18 18:26:53,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=498020.0, ans=0.2 2024-09-18 18:27:01,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=498020.0, ans=0.0 2024-09-18 18:27:19,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=498060.0, ans=0.125 2024-09-18 18:27:23,864 INFO [train.py:1198] (0/2) Epoch 28, batch 2350, loss[loss=0.2511, ctc_loss=0.1359, cr_loss=0.3997, attn_decoder_loss=0.255, over 29705.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1234, cr_loss=0.3657, attn_decoder_loss=0.2433, over 5803527.76 frames. ], batch size: 83, lr: 3.96e-03, grad_scale: 8.0 2024-09-18 18:27:32,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=498100.0, ans=0.125 2024-09-18 18:27:54,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2024-09-18 18:28:17,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=498220.0, ans=0.125 2024-09-18 18:28:18,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=498220.0, ans=0.07 2024-09-18 18:28:38,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=498300.0, ans=0.125 2024-09-18 18:28:39,728 INFO [train.py:1198] (0/2) Epoch 28, batch 2400, loss[loss=0.2367, ctc_loss=0.127, cr_loss=0.3836, attn_decoder_loss=0.2404, over 29544.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.124, cr_loss=0.3666, attn_decoder_loss=0.244, over 5807095.92 frames. ], batch size: 76, lr: 3.96e-03, grad_scale: 16.0 2024-09-18 18:28:48,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=498300.0, ans=0.02 2024-09-18 18:28:51,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.90 vs. limit=22.5 2024-09-18 18:28:57,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=498340.0, ans=0.05 2024-09-18 18:29:07,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=498340.0, ans=0.0 2024-09-18 18:29:09,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=498340.0, ans=0.0 2024-09-18 18:29:15,197 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.714e+01 9.180e+01 9.673e+01 2.821e+02, threshold=1.836e+02, percent-clipped=1.0 2024-09-18 18:29:30,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=12.0 2024-09-18 18:29:58,171 INFO [train.py:1198] (0/2) Epoch 28, batch 2450, loss[loss=0.2512, ctc_loss=0.1321, cr_loss=0.3812, attn_decoder_loss=0.2559, over 29687.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1245, cr_loss=0.3675, attn_decoder_loss=0.2445, over 5784158.61 frames. ], batch size: 82, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:29:58,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498500.0, ans=0.1 2024-09-18 18:30:08,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=498500.0, ans=0.125 2024-09-18 18:30:13,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498540.0, ans=0.1 2024-09-18 18:30:16,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-18 18:30:42,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=498580.0, ans=0.125 2024-09-18 18:30:52,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=498620.0, ans=0.0 2024-09-18 18:31:10,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498660.0, ans=0.1 2024-09-18 18:31:16,314 INFO [train.py:1198] (0/2) Epoch 28, batch 2500, loss[loss=0.2481, ctc_loss=0.1206, cr_loss=0.3645, attn_decoder_loss=0.2541, over 29652.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1244, cr_loss=0.367, attn_decoder_loss=0.2445, over 5794223.49 frames. ], batch size: 86, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:31:24,031 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:31:48,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2024-09-18 18:31:49,772 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.525e+01 9.051e+01 9.521e+01 3.075e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-18 18:31:51,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=498780.0, ans=0.125 2024-09-18 18:32:05,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=498820.0, ans=0.125 2024-09-18 18:32:08,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=498820.0, ans=0.125 2024-09-18 18:32:11,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=498820.0, ans=0.2 2024-09-18 18:32:31,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=498900.0, ans=0.0 2024-09-18 18:32:32,428 INFO [train.py:1198] (0/2) Epoch 28, batch 2550, loss[loss=0.2219, ctc_loss=0.112, cr_loss=0.3525, attn_decoder_loss=0.2263, over 29343.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1243, cr_loss=0.3668, attn_decoder_loss=0.2444, over 5797108.41 frames. ], batch size: 67, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:33:01,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.31 vs. limit=15.0 2024-09-18 18:33:06,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498980.0, ans=0.1 2024-09-18 18:33:08,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-18 18:33:12,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498980.0, ans=0.1 2024-09-18 18:33:50,490 INFO [train.py:1198] (0/2) Epoch 28, batch 2600, loss[loss=0.2379, ctc_loss=0.1243, cr_loss=0.3798, attn_decoder_loss=0.2421, over 29439.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1244, cr_loss=0.3672, attn_decoder_loss=0.2447, over 5793344.62 frames. ], batch size: 78, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:33:53,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=499100.0, ans=0.125 2024-09-18 18:34:04,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=499140.0, ans=0.0 2024-09-18 18:34:13,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.94 vs. limit=10.0 2024-09-18 18:34:14,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=12.0 2024-09-18 18:34:25,550 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.719e+01 9.111e+01 9.618e+01 2.208e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 18:34:26,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-18 18:34:44,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=499220.0, ans=0.125 2024-09-18 18:34:54,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2024-09-18 18:35:02,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=499260.0, ans=0.125 2024-09-18 18:35:05,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=499260.0, ans=0.0 2024-09-18 18:35:07,770 INFO [train.py:1198] (0/2) Epoch 28, batch 2650, loss[loss=0.2583, ctc_loss=0.1344, cr_loss=0.4043, attn_decoder_loss=0.2631, over 29314.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1242, cr_loss=0.3672, attn_decoder_loss=0.2449, over 5801197.34 frames. ], batch size: 100, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:35:14,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=22.5 2024-09-18 18:35:18,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=499300.0, ans=0.025 2024-09-18 18:35:23,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=499340.0, ans=0.125 2024-09-18 18:35:36,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2024-09-18 18:35:41,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=499380.0, ans=0.125 2024-09-18 18:35:43,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.87 vs. limit=15.0 2024-09-18 18:36:05,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499420.0, ans=0.1 2024-09-18 18:36:16,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=499460.0, ans=0.0 2024-09-18 18:36:22,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=499500.0, ans=0.125 2024-09-18 18:36:25,507 INFO [train.py:1198] (0/2) Epoch 28, batch 2700, loss[loss=0.2445, ctc_loss=0.1132, cr_loss=0.3558, attn_decoder_loss=0.2511, over 29533.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1248, cr_loss=0.3682, attn_decoder_loss=0.2454, over 5796859.82 frames. ], batch size: 87, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:36:37,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-09-18 18:36:37,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=499500.0, ans=0.125 2024-09-18 18:36:47,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.85 vs. limit=15.0 2024-09-18 18:36:56,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=499580.0, ans=0.2 2024-09-18 18:36:58,805 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.414e+01 8.942e+01 9.601e+01 1.842e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 18:37:08,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=499580.0, ans=0.125 2024-09-18 18:37:32,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=499660.0, ans=0.1 2024-09-18 18:37:33,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2024-09-18 18:37:37,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=499660.0, ans=0.5 2024-09-18 18:37:38,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=499660.0, ans=0.125 2024-09-18 18:37:38,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=499660.0, ans=0.0 2024-09-18 18:37:41,533 INFO [train.py:1198] (0/2) Epoch 28, batch 2750, loss[loss=0.2338, ctc_loss=0.1178, cr_loss=0.3392, attn_decoder_loss=0.2392, over 29507.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1237, cr_loss=0.3661, attn_decoder_loss=0.2443, over 5796605.73 frames. ], batch size: 75, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:37:55,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=499740.0, ans=0.0 2024-09-18 18:38:11,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=499740.0, ans=0.2 2024-09-18 18:38:19,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2024-09-18 18:38:27,806 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:38:29,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499820.0, ans=0.1 2024-09-18 18:38:40,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=499820.0, ans=0.125 2024-09-18 18:38:42,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=499820.0, ans=15.0 2024-09-18 18:38:57,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=499860.0, ans=0.0 2024-09-18 18:38:59,707 INFO [train.py:1198] (0/2) Epoch 28, batch 2800, loss[loss=0.2579, ctc_loss=0.1493, cr_loss=0.3827, attn_decoder_loss=0.2615, over 20329.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.124, cr_loss=0.3667, attn_decoder_loss=0.2445, over 5777438.67 frames. ], batch size: 213, lr: 3.95e-03, grad_scale: 16.0 2024-09-18 18:39:03,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.25 vs. limit=10.0 2024-09-18 18:39:07,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=499900.0, ans=0.0 2024-09-18 18:39:15,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=499940.0, ans=0.125 2024-09-18 18:39:34,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.662e+01 9.200e+01 9.823e+01 1.916e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-18 18:39:36,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499980.0, ans=0.1 2024-09-18 18:39:40,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2024-09-18 18:39:43,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.30 vs. limit=10.0 2024-09-18 18:40:17,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-09-18 18:40:18,068 INFO [train.py:1198] (0/2) Epoch 28, batch 2850, loss[loss=0.234, ctc_loss=0.1178, cr_loss=0.351, attn_decoder_loss=0.2391, over 29508.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1244, cr_loss=0.367, attn_decoder_loss=0.2448, over 5762628.05 frames. ], batch size: 77, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:40:19,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=500100.0, ans=0.125 2024-09-18 18:40:28,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=500100.0, ans=0.125 2024-09-18 18:40:45,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=500140.0, ans=0.125 2024-09-18 18:40:48,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=500180.0, ans=0.0 2024-09-18 18:40:51,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500180.0, ans=0.1 2024-09-18 18:40:51,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=500180.0, ans=0.0 2024-09-18 18:41:03,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=8.0 2024-09-18 18:41:04,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-09-18 18:41:33,987 INFO [train.py:1198] (0/2) Epoch 28, batch 2900, loss[loss=0.2401, ctc_loss=0.1278, cr_loss=0.3883, attn_decoder_loss=0.2439, over 29392.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1253, cr_loss=0.3692, attn_decoder_loss=0.2461, over 5787904.27 frames. ], batch size: 79, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:41:50,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=500340.0, ans=0.125 2024-09-18 18:42:10,954 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.571e+01 8.982e+01 9.611e+01 1.691e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-18 18:42:22,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=500420.0, ans=6.0 2024-09-18 18:42:23,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=500420.0, ans=0.125 2024-09-18 18:42:35,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=500460.0, ans=0.025 2024-09-18 18:42:51,879 INFO [train.py:1198] (0/2) Epoch 28, batch 2950, loss[loss=0.2227, ctc_loss=0.1094, cr_loss=0.333, attn_decoder_loss=0.2279, over 29517.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1245, cr_loss=0.3675, attn_decoder_loss=0.2448, over 5782167.21 frames. ], batch size: 75, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:42:59,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=12.0 2024-09-18 18:43:06,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-18 18:43:14,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=500540.0, ans=0.125 2024-09-18 18:43:33,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=500580.0, ans=15.0 2024-09-18 18:43:37,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=500620.0, ans=0.1 2024-09-18 18:43:51,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=500660.0, ans=0.125 2024-09-18 18:43:59,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=500660.0, ans=0.2 2024-09-18 18:44:06,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2024-09-18 18:44:10,171 INFO [train.py:1198] (0/2) Epoch 28, batch 3000, loss[loss=0.232, ctc_loss=0.1159, cr_loss=0.342, attn_decoder_loss=0.2373, over 29756.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1244, cr_loss=0.3674, attn_decoder_loss=0.2449, over 5783475.66 frames. ], batch size: 81, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:44:10,171 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 18:44:28,709 INFO [train.py:1230] (0/2) Epoch 28, validation: loss=0.2115, ctc_loss=0.03821, cr_loss=5.852e-15, attn_decoder_loss=0.2307, over 944034.00 frames. 2024-09-18 18:44:28,709 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 18:44:44,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=500740.0, ans=0.125 2024-09-18 18:44:50,942 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-09-18 18:45:03,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.580e+01 9.034e+01 9.618e+01 2.130e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-18 18:45:07,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=500780.0, ans=0.125 2024-09-18 18:45:40,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-18 18:45:45,098 INFO [train.py:1198] (0/2) Epoch 28, batch 3050, loss[loss=0.2237, ctc_loss=0.1216, cr_loss=0.3597, attn_decoder_loss=0.227, over 29540.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1247, cr_loss=0.3675, attn_decoder_loss=0.2452, over 5776660.39 frames. ], batch size: 76, lr: 3.95e-03, grad_scale: 8.0 2024-09-18 18:46:02,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=500940.0, ans=0.0 2024-09-18 18:46:05,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=500940.0, ans=0.125 2024-09-18 18:46:07,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500940.0, ans=0.1 2024-09-18 18:46:24,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=500980.0, ans=0.0 2024-09-18 18:46:44,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-18 18:46:49,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=501060.0, ans=0.2 2024-09-18 18:46:50,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-09-18 18:46:51,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=501060.0, ans=0.125 2024-09-18 18:47:02,941 INFO [train.py:1198] (0/2) Epoch 28, batch 3100, loss[loss=0.2552, ctc_loss=0.1411, cr_loss=0.4208, attn_decoder_loss=0.2585, over 29269.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1246, cr_loss=0.3674, attn_decoder_loss=0.2449, over 5776854.86 frames. ], batch size: 100, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:47:04,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=501100.0, ans=0.025 2024-09-18 18:47:27,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=501140.0, ans=0.125 2024-09-18 18:47:37,590 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.481e+01 8.983e+01 9.463e+01 1.324e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-18 18:48:19,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=501300.0, ans=0.125 2024-09-18 18:48:20,792 INFO [train.py:1198] (0/2) Epoch 28, batch 3150, loss[loss=0.251, ctc_loss=0.1259, cr_loss=0.3646, attn_decoder_loss=0.2568, over 28888.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1243, cr_loss=0.3665, attn_decoder_loss=0.2448, over 5784126.65 frames. ], batch size: 104, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:48:37,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=501340.0, ans=0.0 2024-09-18 18:49:15,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=501420.0, ans=0.05 2024-09-18 18:49:33,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=501460.0, ans=0.0 2024-09-18 18:49:36,033 INFO [train.py:1198] (0/2) Epoch 28, batch 3200, loss[loss=0.2344, ctc_loss=0.1202, cr_loss=0.3635, attn_decoder_loss=0.239, over 29415.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.124, cr_loss=0.3662, attn_decoder_loss=0.2445, over 5794840.10 frames. ], batch size: 79, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 18:49:52,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=501540.0, ans=0.125 2024-09-18 18:49:53,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=501540.0, ans=0.125 2024-09-18 18:50:13,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 8.510e+01 8.995e+01 9.300e+01 1.777e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-18 18:50:18,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=501580.0, ans=0.2 2024-09-18 18:50:26,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=501620.0, ans=0.125 2024-09-18 18:50:44,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501660.0, ans=0.125 2024-09-18 18:50:44,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=501660.0, ans=10.0 2024-09-18 18:50:54,444 INFO [train.py:1198] (0/2) Epoch 28, batch 3250, loss[loss=0.2594, ctc_loss=0.1422, cr_loss=0.419, attn_decoder_loss=0.2631, over 29723.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1238, cr_loss=0.3665, attn_decoder_loss=0.2448, over 5801708.42 frames. ], batch size: 84, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 18:51:00,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=501700.0, ans=0.1 2024-09-18 18:51:06,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=501700.0, ans=0.0 2024-09-18 18:51:19,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.38 vs. limit=22.5 2024-09-18 18:51:39,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501820.0, ans=0.1 2024-09-18 18:51:50,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=501820.0, ans=0.0 2024-09-18 18:51:52,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=501820.0, ans=0.125 2024-09-18 18:52:02,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=501860.0, ans=0.07 2024-09-18 18:52:11,768 INFO [train.py:1198] (0/2) Epoch 28, batch 3300, loss[loss=0.247, ctc_loss=0.1233, cr_loss=0.3717, attn_decoder_loss=0.2525, over 28231.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1232, cr_loss=0.3647, attn_decoder_loss=0.2438, over 5797954.20 frames. ], batch size: 111, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:52:12,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=501900.0, ans=0.125 2024-09-18 18:52:21,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=501900.0, ans=0.05 2024-09-18 18:52:25,956 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:52:38,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=501940.0, ans=0.025 2024-09-18 18:52:44,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=501980.0, ans=0.125 2024-09-18 18:52:48,162 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.472e+01 9.021e+01 9.788e+01 2.409e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-18 18:53:27,371 INFO [train.py:1198] (0/2) Epoch 28, batch 3350, loss[loss=0.2509, ctc_loss=0.1344, cr_loss=0.3869, attn_decoder_loss=0.2552, over 28809.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1242, cr_loss=0.3669, attn_decoder_loss=0.2447, over 5774198.39 frames. ], batch size: 104, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:53:30,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=502100.0, ans=0.125 2024-09-18 18:53:35,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=502100.0, ans=0.125 2024-09-18 18:53:42,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=502100.0, ans=0.125 2024-09-18 18:53:45,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=502140.0, ans=0.2 2024-09-18 18:53:54,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=502140.0, ans=0.125 2024-09-18 18:54:03,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=502180.0, ans=0.2 2024-09-18 18:54:10,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.69 vs. limit=5.0 2024-09-18 18:54:13,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=502220.0, ans=0.0 2024-09-18 18:54:22,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2024-09-18 18:54:45,498 INFO [train.py:1198] (0/2) Epoch 28, batch 3400, loss[loss=0.2101, ctc_loss=0.1103, cr_loss=0.3282, attn_decoder_loss=0.2139, over 29358.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1244, cr_loss=0.3673, attn_decoder_loss=0.2448, over 5766045.93 frames. ], batch size: 67, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:54:57,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=502300.0, ans=0.1 2024-09-18 18:55:09,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=502340.0, ans=0.025 2024-09-18 18:55:14,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=502380.0, ans=0.0 2024-09-18 18:55:20,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=502380.0, ans=0.2 2024-09-18 18:55:21,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.459e+01 8.977e+01 9.782e+01 2.197e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-18 18:55:25,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=502380.0, ans=0.125 2024-09-18 18:55:30,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-09-18 18:55:54,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=502460.0, ans=0.125 2024-09-18 18:55:57,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=502460.0, ans=0.0 2024-09-18 18:56:03,380 INFO [train.py:1198] (0/2) Epoch 28, batch 3450, loss[loss=0.2498, ctc_loss=0.1215, cr_loss=0.362, attn_decoder_loss=0.256, over 28386.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1245, cr_loss=0.3679, attn_decoder_loss=0.2452, over 5774539.48 frames. ], batch size: 111, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:56:21,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=502540.0, ans=0.125 2024-09-18 18:56:23,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=502540.0, ans=0.025 2024-09-18 18:56:23,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=502540.0, ans=0.125 2024-09-18 18:56:33,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=502580.0, ans=0.2 2024-09-18 18:57:18,866 INFO [train.py:1198] (0/2) Epoch 28, batch 3500, loss[loss=0.2173, ctc_loss=0.1061, cr_loss=0.3361, attn_decoder_loss=0.2222, over 29315.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1242, cr_loss=0.3671, attn_decoder_loss=0.2447, over 5776732.07 frames. ], batch size: 71, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:57:22,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=502700.0, ans=0.1 2024-09-18 18:57:25,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.99 vs. limit=10.0 2024-09-18 18:57:26,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502700.0, ans=0.1 2024-09-18 18:57:50,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=502780.0, ans=0.09899494936611666 2024-09-18 18:57:57,151 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.602e+01 9.014e+01 9.488e+01 1.440e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-18 18:58:01,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=502780.0, ans=0.1 2024-09-18 18:58:05,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=502820.0, ans=0.125 2024-09-18 18:58:26,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.48 vs. limit=15.0 2024-09-18 18:58:30,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=502860.0, ans=0.125 2024-09-18 18:58:35,919 INFO [train.py:1198] (0/2) Epoch 28, batch 3550, loss[loss=0.2478, ctc_loss=0.1243, cr_loss=0.3623, attn_decoder_loss=0.2535, over 29698.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1241, cr_loss=0.3671, attn_decoder_loss=0.2447, over 5782739.46 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 8.0 2024-09-18 18:58:45,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-18 18:59:12,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=502980.0, ans=0.125 2024-09-18 18:59:12,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=502980.0, ans=0.125 2024-09-18 18:59:13,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=502980.0, ans=0.0 2024-09-18 18:59:17,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=502980.0, ans=0.125 2024-09-18 18:59:40,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=503060.0, ans=0.2 2024-09-18 18:59:40,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503060.0, ans=0.1 2024-09-18 18:59:43,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-18 18:59:50,231 INFO [train.py:1198] (0/2) Epoch 28, batch 3600, loss[loss=0.2448, ctc_loss=0.1248, cr_loss=0.3574, attn_decoder_loss=0.2502, over 29497.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1239, cr_loss=0.3665, attn_decoder_loss=0.2445, over 5792869.46 frames. ], batch size: 77, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 18:59:57,420 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-09-18 19:00:02,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=503100.0, ans=0.5 2024-09-18 19:00:05,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=503140.0, ans=0.2 2024-09-18 19:00:09,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.39 vs. limit=10.0 2024-09-18 19:00:18,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=503180.0, ans=0.125 2024-09-18 19:00:21,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=503180.0, ans=0.05 2024-09-18 19:00:22,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.36 vs. limit=10.0 2024-09-18 19:00:23,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=503180.0, ans=0.0 2024-09-18 19:00:25,999 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.358e+01 8.868e+01 9.352e+01 4.010e+02, threshold=1.774e+02, percent-clipped=1.0 2024-09-18 19:00:28,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=503180.0, ans=0.0 2024-09-18 19:00:28,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=503180.0, ans=0.125 2024-09-18 19:00:56,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2024-09-18 19:00:59,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=503260.0, ans=10.0 2024-09-18 19:01:04,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=503260.0, ans=0.0 2024-09-18 19:01:07,132 INFO [train.py:1198] (0/2) Epoch 28, batch 3650, loss[loss=0.2533, ctc_loss=0.132, cr_loss=0.3903, attn_decoder_loss=0.2582, over 29506.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1236, cr_loss=0.3664, attn_decoder_loss=0.244, over 5794803.36 frames. ], batch size: 90, lr: 3.94e-03, grad_scale: 16.0 2024-09-18 19:01:08,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=503300.0, ans=0.09899494936611666 2024-09-18 19:01:34,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503340.0, ans=0.1 2024-09-18 19:01:41,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503380.0, ans=0.1 2024-09-18 19:01:47,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=503380.0, ans=0.2 2024-09-18 19:01:48,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.96 vs. limit=22.5 2024-09-18 19:02:07,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503460.0, ans=0.1 2024-09-18 19:02:15,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=503460.0, ans=0.0 2024-09-18 19:02:21,716 INFO [train.py:1198] (0/2) Epoch 28, batch 3700, loss[loss=0.2426, ctc_loss=0.1281, cr_loss=0.3688, attn_decoder_loss=0.2471, over 29698.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1237, cr_loss=0.3663, attn_decoder_loss=0.244, over 5804670.52 frames. ], batch size: 84, lr: 3.93e-03, grad_scale: 16.0 2024-09-18 19:02:27,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503500.0, ans=0.1 2024-09-18 19:02:32,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.80 vs. limit=22.5 2024-09-18 19:02:34,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-18 19:02:58,673 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.604e+01 9.187e+01 9.989e+01 2.860e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-18 19:03:07,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=503620.0, ans=0.0 2024-09-18 19:03:10,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.63 vs. limit=15.0 2024-09-18 19:03:16,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=503620.0, ans=0.125 2024-09-18 19:03:26,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2024-09-18 19:03:36,037 INFO [train.py:1198] (0/2) Epoch 28, batch 3750, loss[loss=0.2046, ctc_loss=0.09922, cr_loss=0.2978, attn_decoder_loss=0.2097, over 29399.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1234, cr_loss=0.3658, attn_decoder_loss=0.2438, over 5808098.48 frames. ], batch size: 67, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:03:49,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=503740.0, ans=0.125 2024-09-18 19:03:52,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=503740.0, ans=0.125 2024-09-18 19:04:03,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=503740.0, ans=0.0 2024-09-18 19:04:05,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=503780.0, ans=0.125 2024-09-18 19:04:22,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503820.0, ans=0.1 2024-09-18 19:04:52,049 INFO [train.py:1198] (0/2) Epoch 28, batch 3800, loss[loss=0.251, ctc_loss=0.1273, cr_loss=0.3685, attn_decoder_loss=0.2565, over 29638.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1233, cr_loss=0.3653, attn_decoder_loss=0.2435, over 5798447.89 frames. ], batch size: 86, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:05:15,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.84 vs. limit=22.5 2024-09-18 19:05:18,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.24 vs. limit=15.0 2024-09-18 19:05:25,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=503980.0, ans=0.0 2024-09-18 19:05:29,965 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.413e+01 8.933e+01 9.626e+01 3.409e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 19:05:57,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=504060.0, ans=0.0 2024-09-18 19:06:01,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=504060.0, ans=0.125 2024-09-18 19:06:01,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504060.0, ans=0.1 2024-09-18 19:06:01,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=504060.0, ans=0.125 2024-09-18 19:06:07,082 INFO [train.py:1198] (0/2) Epoch 28, batch 3850, loss[loss=0.2592, ctc_loss=0.1268, cr_loss=0.3826, attn_decoder_loss=0.2654, over 29308.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1229, cr_loss=0.365, attn_decoder_loss=0.2434, over 5811688.22 frames. ], batch size: 100, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:06:13,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=504100.0, ans=0.125 2024-09-18 19:06:20,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=504100.0, ans=0.0 2024-09-18 19:06:34,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=504140.0, ans=0.0 2024-09-18 19:06:37,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=504180.0, ans=0.0 2024-09-18 19:06:42,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-09-18 19:06:59,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=504220.0, ans=0.035 2024-09-18 19:07:14,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=504260.0, ans=0.0 2024-09-18 19:07:23,016 INFO [train.py:1198] (0/2) Epoch 28, batch 3900, loss[loss=0.2608, ctc_loss=0.1369, cr_loss=0.3969, attn_decoder_loss=0.2658, over 29648.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1235, cr_loss=0.3664, attn_decoder_loss=0.244, over 5815810.11 frames. ], batch size: 86, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:07:36,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-18 19:07:48,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=504340.0, ans=0.125 2024-09-18 19:07:56,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-18 19:08:00,192 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.446e+01 8.921e+01 9.410e+01 1.233e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-18 19:08:03,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=504380.0, ans=0.2 2024-09-18 19:08:37,164 INFO [train.py:1198] (0/2) Epoch 28, batch 3950, loss[loss=0.2503, ctc_loss=0.133, cr_loss=0.3781, attn_decoder_loss=0.2549, over 29475.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1234, cr_loss=0.3665, attn_decoder_loss=0.2441, over 5835375.78 frames. ], batch size: 97, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:08:37,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=504500.0, ans=0.05 2024-09-18 19:08:50,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=504540.0, ans=0.125 2024-09-18 19:08:52,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=504540.0, ans=0.125 2024-09-18 19:08:58,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=504540.0, ans=0.05 2024-09-18 19:09:01,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=504540.0, ans=0.125 2024-09-18 19:09:06,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.54 vs. limit=22.5 2024-09-18 19:09:15,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=504580.0, ans=0.0 2024-09-18 19:09:30,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=504620.0, ans=0.025 2024-09-18 19:09:48,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.07 vs. limit=15.0 2024-09-18 19:09:52,195 INFO [train.py:1198] (0/2) Epoch 28, batch 4000, loss[loss=0.224, ctc_loss=0.1108, cr_loss=0.3488, attn_decoder_loss=0.2288, over 29490.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1234, cr_loss=0.3661, attn_decoder_loss=0.2441, over 5813130.76 frames. ], batch size: 74, lr: 3.93e-03, grad_scale: 16.0 2024-09-18 19:09:52,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=504700.0, ans=0.2 2024-09-18 19:09:59,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=504700.0, ans=0.0 2024-09-18 19:10:15,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=504740.0, ans=0.125 2024-09-18 19:10:28,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=504780.0, ans=0.0 2024-09-18 19:10:29,489 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.096e+01 8.633e+01 9.036e+01 9.608e+01 3.784e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-18 19:10:34,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504780.0, ans=0.1 2024-09-18 19:10:42,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.52 vs. limit=15.0 2024-09-18 19:10:43,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=504820.0, ans=0.125 2024-09-18 19:11:08,042 INFO [train.py:1198] (0/2) Epoch 28, batch 4050, loss[loss=0.2603, ctc_loss=0.1538, cr_loss=0.3771, attn_decoder_loss=0.2637, over 20134.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1231, cr_loss=0.3652, attn_decoder_loss=0.2438, over 5797189.24 frames. ], batch size: 209, lr: 3.93e-03, grad_scale: 16.0 2024-09-18 19:11:11,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=504900.0, ans=0.09899494936611666 2024-09-18 19:11:21,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=504940.0, ans=0.025 2024-09-18 19:11:27,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=504940.0, ans=0.0 2024-09-18 19:11:34,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-18 19:11:50,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=505020.0, ans=0.125 2024-09-18 19:12:05,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-09-18 19:12:21,990 INFO [train.py:1198] (0/2) Epoch 28, batch 4100, loss[loss=0.2532, ctc_loss=0.1367, cr_loss=0.4024, attn_decoder_loss=0.2572, over 29516.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1236, cr_loss=0.3662, attn_decoder_loss=0.2442, over 5792516.23 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:12:30,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=505100.0, ans=0.2 2024-09-18 19:12:58,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=505180.0, ans=0.05 2024-09-18 19:13:00,178 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.509e+01 9.171e+01 9.842e+01 2.303e+02, threshold=1.834e+02, percent-clipped=2.0 2024-09-18 19:13:13,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505220.0, ans=0.1 2024-09-18 19:13:21,152 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:13:35,870 INFO [train.py:1198] (0/2) Epoch 28, batch 4150, loss[loss=0.2361, ctc_loss=0.1182, cr_loss=0.3577, attn_decoder_loss=0.2412, over 29520.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1231, cr_loss=0.3652, attn_decoder_loss=0.2437, over 5797279.06 frames. ], batch size: 77, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:13:49,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=505340.0, ans=0.1 2024-09-18 19:13:49,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-09-18 19:14:19,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=505420.0, ans=0.125 2024-09-18 19:14:31,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=505420.0, ans=0.125 2024-09-18 19:14:33,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=505420.0, ans=0.0 2024-09-18 19:14:48,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=505460.0, ans=10.0 2024-09-18 19:14:50,771 INFO [train.py:1198] (0/2) Epoch 28, batch 4200, loss[loss=0.2499, ctc_loss=0.1316, cr_loss=0.3802, attn_decoder_loss=0.2546, over 29502.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1233, cr_loss=0.3655, attn_decoder_loss=0.244, over 5800059.80 frames. ], batch size: 90, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:15:30,601 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.561e+01 9.045e+01 9.717e+01 1.244e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-18 19:15:35,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-09-18 19:15:55,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=505660.0, ans=0.125 2024-09-18 19:15:56,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=505660.0, ans=0.0 2024-09-18 19:16:00,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505660.0, ans=0.1 2024-09-18 19:16:04,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505700.0, ans=0.1 2024-09-18 19:16:06,011 INFO [train.py:1198] (0/2) Epoch 28, batch 4250, loss[loss=0.2272, ctc_loss=0.1114, cr_loss=0.3443, attn_decoder_loss=0.2325, over 29495.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1233, cr_loss=0.3652, attn_decoder_loss=0.2441, over 5805515.06 frames. ], batch size: 74, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:16:15,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=505700.0, ans=0.95 2024-09-18 19:16:17,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=505700.0, ans=0.2 2024-09-18 19:16:20,971 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:16:44,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=505780.0, ans=0.0 2024-09-18 19:16:51,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=505820.0, ans=0.0 2024-09-18 19:17:06,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=505860.0, ans=0.125 2024-09-18 19:17:06,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=505860.0, ans=0.2 2024-09-18 19:17:19,609 INFO [train.py:1198] (0/2) Epoch 28, batch 4300, loss[loss=0.2499, ctc_loss=0.1328, cr_loss=0.3975, attn_decoder_loss=0.2541, over 29508.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1231, cr_loss=0.3646, attn_decoder_loss=0.2442, over 5794061.05 frames. ], batch size: 87, lr: 3.93e-03, grad_scale: 8.0 2024-09-18 19:17:23,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.02 vs. limit=15.0 2024-09-18 19:17:46,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=505940.0, ans=0.125 2024-09-18 19:17:58,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.600e+01 9.054e+01 9.453e+01 1.609e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-18 19:17:59,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=505980.0, ans=0.125 2024-09-18 19:18:15,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=506020.0, ans=0.125 2024-09-18 19:18:18,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=506060.0, ans=0.2 2024-09-18 19:18:35,179 INFO [train.py:1198] (0/2) Epoch 28, batch 4350, loss[loss=0.2574, ctc_loss=0.1333, cr_loss=0.3825, attn_decoder_loss=0.2627, over 29479.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1261, cr_loss=0.3708, attn_decoder_loss=0.2476, over 5796466.00 frames. ], batch size: 97, lr: 3.92e-03, grad_scale: 8.0 2024-09-18 19:18:41,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=506100.0, ans=0.1 2024-09-18 19:18:51,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=506140.0, ans=0.0 2024-09-18 19:19:04,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.13 vs. limit=15.0 2024-09-18 19:19:09,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=506180.0, ans=0.125 2024-09-18 19:19:12,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=506180.0, ans=0.125 2024-09-18 19:19:24,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=506220.0, ans=0.125 2024-09-18 19:19:27,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=506220.0, ans=0.0 2024-09-18 19:19:35,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=506260.0, ans=0.0 2024-09-18 19:19:43,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.27 vs. limit=15.0 2024-09-18 19:19:45,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=12.0 2024-09-18 19:19:47,497 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:19:48,761 INFO [train.py:1198] (0/2) Epoch 28, batch 4400, loss[loss=0.258, ctc_loss=0.1499, cr_loss=0.4174, attn_decoder_loss=0.2607, over 27663.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1279, cr_loss=0.3741, attn_decoder_loss=0.25, over 5767731.37 frames. ], batch size: 125, lr: 3.92e-03, grad_scale: 16.0 2024-09-18 19:20:09,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2024-09-18 19:20:20,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-18 19:20:28,784 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.311e+01 8.874e+01 9.241e+01 9.772e+01 1.532e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-18 19:20:30,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=506380.0, ans=0.2 2024-09-18 19:20:50,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=506460.0, ans=0.0 2024-09-18 19:20:51,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=506460.0, ans=0.125 2024-09-18 19:20:53,459 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:21:03,587 INFO [train.py:1198] (0/2) Epoch 28, batch 4450, loss[loss=0.2586, ctc_loss=0.1486, cr_loss=0.3782, attn_decoder_loss=0.2624, over 20815.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1318, cr_loss=0.3791, attn_decoder_loss=0.2523, over 5582972.97 frames. ], batch size: 209, lr: 3.92e-03, grad_scale: 8.0 2024-09-18 19:21:11,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=506500.0, ans=0.0 2024-09-18 19:21:17,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=506540.0, ans=0.07 2024-09-18 19:21:26,531 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:21:44,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-09-18 19:21:59,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=506620.0, ans=0.125 2024-09-18 19:22:12,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.76 vs. limit=10.0 2024-09-18 19:22:18,779 INFO [train.py:1198] (0/2) Epoch 28, batch 4500, loss[loss=0.2588, ctc_loss=0.1528, cr_loss=0.3801, attn_decoder_loss=0.2622, over 19913.00 frames. ], tot_loss[loss=0.2502, ctc_loss=0.1358, cr_loss=0.382, attn_decoder_loss=0.2544, over 5239747.58 frames. ], batch size: 209, lr: 3.92e-03, grad_scale: 8.0 2024-09-18 19:22:19,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=506700.0, ans=0.035 2024-09-18 19:22:29,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=506700.0, ans=0.025 2024-09-18 19:22:38,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=506740.0, ans=0.0 2024-09-18 19:22:47,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506780.0, ans=0.1 2024-09-18 19:22:55,543 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-28.pt 2024-09-18 19:23:47,611 INFO [train.py:1198] (0/2) Epoch 29, batch 0, loss[loss=0.2176, ctc_loss=0.1034, cr_loss=0.3261, attn_decoder_loss=0.2231, over 29605.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1034, cr_loss=0.3261, attn_decoder_loss=0.2231, over 29605.00 frames. ], batch size: 73, lr: 3.85e-03, grad_scale: 16.0 2024-09-18 19:23:47,612 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 19:23:50,221 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2259, 3.5959, 3.4950, 3.7498, 3.6269, 3.7059, 2.9567, 3.8993], device='cuda:0') 2024-09-18 19:23:53,418 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4153, 4.8573, 5.4239, 5.1193], device='cuda:0') 2024-09-18 19:24:06,126 INFO [train.py:1230] (0/2) Epoch 29, validation: loss=0.2126, ctc_loss=0.03746, cr_loss=5.58e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-18 19:24:06,126 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 19:24:09,039 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.797e+01 1.050e+02 1.169e+02 1.299e+02 2.763e+02, threshold=2.337e+02, percent-clipped=3.0 2024-09-18 19:24:15,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=506800.0, ans=0.125 2024-09-18 19:24:47,401 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:25:03,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=506920.0, ans=0.0 2024-09-18 19:25:08,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=506960.0, ans=0.0 2024-09-18 19:25:12,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=506960.0, ans=0.125 2024-09-18 19:25:21,688 INFO [train.py:1198] (0/2) Epoch 29, batch 50, loss[loss=0.2132, ctc_loss=0.1017, cr_loss=0.3097, attn_decoder_loss=0.2187, over 29425.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1267, cr_loss=0.3706, attn_decoder_loss=0.2456, over 1266174.42 frames. ], batch size: 70, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:25:23,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-09-18 19:25:31,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=507000.0, ans=0.125 2024-09-18 19:25:46,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=507040.0, ans=0.125 2024-09-18 19:26:07,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2024-09-18 19:26:09,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2024-09-18 19:26:13,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=507120.0, ans=0.2 2024-09-18 19:26:16,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2024-09-18 19:26:41,684 INFO [train.py:1198] (0/2) Epoch 29, batch 100, loss[loss=0.2289, ctc_loss=0.1162, cr_loss=0.3557, attn_decoder_loss=0.2335, over 29527.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.127, cr_loss=0.3722, attn_decoder_loss=0.2471, over 2251765.49 frames. ], batch size: 76, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:26:46,194 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.735e+01 9.318e+01 1.000e+02 1.586e+02, threshold=1.864e+02, percent-clipped=0.0 2024-09-18 19:26:52,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=507200.0, ans=0.0 2024-09-18 19:27:09,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=507240.0, ans=0.125 2024-09-18 19:27:24,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-18 19:27:24,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2024-09-18 19:27:49,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=507360.0, ans=0.2 2024-09-18 19:27:55,320 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:27:56,472 INFO [train.py:1198] (0/2) Epoch 29, batch 150, loss[loss=0.22, ctc_loss=0.1109, cr_loss=0.349, attn_decoder_loss=0.2244, over 29423.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1247, cr_loss=0.3675, attn_decoder_loss=0.2446, over 3046372.31 frames. ], batch size: 70, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:27:58,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2024-09-18 19:27:59,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=507400.0, ans=0.0 2024-09-18 19:28:01,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=10.0 2024-09-18 19:28:02,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=507400.0, ans=0.125 2024-09-18 19:28:14,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=507440.0, ans=0.2 2024-09-18 19:28:37,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2024-09-18 19:29:02,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=507560.0, ans=0.125 2024-09-18 19:29:07,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-09-18 19:29:11,259 INFO [train.py:1198] (0/2) Epoch 29, batch 200, loss[loss=0.2569, ctc_loss=0.1385, cr_loss=0.403, attn_decoder_loss=0.2611, over 27264.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1244, cr_loss=0.3673, attn_decoder_loss=0.2444, over 3658782.65 frames. ], batch size: 124, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:29:15,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.328e+01 8.818e+01 9.310e+01 1.091e+02, threshold=1.764e+02, percent-clipped=0.0 2024-09-18 19:29:29,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=22.5 2024-09-18 19:29:37,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2024-09-18 19:29:53,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=507680.0, ans=0.125 2024-09-18 19:30:04,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=507720.0, ans=0.0 2024-09-18 19:30:12,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=507720.0, ans=0.2 2024-09-18 19:30:15,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=507760.0, ans=0.2 2024-09-18 19:30:26,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.20 vs. limit=10.0 2024-09-18 19:30:31,825 INFO [train.py:1198] (0/2) Epoch 29, batch 250, loss[loss=0.2508, ctc_loss=0.1343, cr_loss=0.3801, attn_decoder_loss=0.2553, over 29268.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1242, cr_loss=0.3674, attn_decoder_loss=0.2442, over 4140732.98 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:30:33,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=507800.0, ans=0.125 2024-09-18 19:30:33,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507800.0, ans=0.1 2024-09-18 19:30:37,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2024-09-18 19:30:44,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.18 vs. limit=22.5 2024-09-18 19:30:47,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.13 vs. limit=10.0 2024-09-18 19:30:53,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507840.0, ans=0.125 2024-09-18 19:30:57,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=507840.0, ans=0.1 2024-09-18 19:31:18,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=507920.0, ans=0.125 2024-09-18 19:31:26,619 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:31:30,099 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-18 19:31:47,704 INFO [train.py:1198] (0/2) Epoch 29, batch 300, loss[loss=0.2572, ctc_loss=0.1365, cr_loss=0.3873, attn_decoder_loss=0.262, over 29560.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1235, cr_loss=0.3662, attn_decoder_loss=0.2437, over 4509501.48 frames. ], batch size: 92, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:31:50,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508000.0, ans=0.1 2024-09-18 19:31:52,192 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.405e+01 8.844e+01 9.472e+01 2.622e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-18 19:31:57,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=508000.0, ans=0.125 2024-09-18 19:32:33,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=508120.0, ans=0.125 2024-09-18 19:32:39,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=508120.0, ans=0.125 2024-09-18 19:33:03,340 INFO [train.py:1198] (0/2) Epoch 29, batch 350, loss[loss=0.2187, ctc_loss=0.1077, cr_loss=0.3303, attn_decoder_loss=0.2237, over 29326.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1239, cr_loss=0.3673, attn_decoder_loss=0.2442, over 4794953.18 frames. ], batch size: 71, lr: 3.85e-03, grad_scale: 8.0 2024-09-18 19:33:25,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=508240.0, ans=0.125 2024-09-18 19:33:36,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=508280.0, ans=15.0 2024-09-18 19:33:46,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=508280.0, ans=0.07 2024-09-18 19:34:10,078 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:34:17,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=508360.0, ans=0.2 2024-09-18 19:34:23,108 INFO [train.py:1198] (0/2) Epoch 29, batch 400, loss[loss=0.2399, ctc_loss=0.1231, cr_loss=0.3818, attn_decoder_loss=0.2444, over 29692.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1234, cr_loss=0.3669, attn_decoder_loss=0.2441, over 5024398.64 frames. ], batch size: 82, lr: 3.85e-03, grad_scale: 16.0 2024-09-18 19:34:27,752 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.478e+01 8.916e+01 9.451e+01 2.866e+02, threshold=1.783e+02, percent-clipped=2.0 2024-09-18 19:34:31,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508400.0, ans=0.1 2024-09-18 19:35:04,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=508480.0, ans=10.0 2024-09-18 19:35:13,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=508520.0, ans=0.125 2024-09-18 19:35:21,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=508520.0, ans=0.0 2024-09-18 19:35:21,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-09-18 19:35:30,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=508560.0, ans=0.5 2024-09-18 19:35:39,004 INFO [train.py:1198] (0/2) Epoch 29, batch 450, loss[loss=0.2427, ctc_loss=0.134, cr_loss=0.381, attn_decoder_loss=0.2463, over 29707.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1238, cr_loss=0.3674, attn_decoder_loss=0.2442, over 5187217.84 frames. ], batch size: 83, lr: 3.85e-03, grad_scale: 16.0 2024-09-18 19:35:48,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=508600.0, ans=0.0 2024-09-18 19:36:01,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-09-18 19:36:12,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=508680.0, ans=0.2 2024-09-18 19:36:35,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-09-18 19:36:42,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=508760.0, ans=0.0 2024-09-18 19:36:46,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=508760.0, ans=0.0 2024-09-18 19:36:55,947 INFO [train.py:1198] (0/2) Epoch 29, batch 500, loss[loss=0.2564, ctc_loss=0.1331, cr_loss=0.3897, attn_decoder_loss=0.2614, over 29470.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1231, cr_loss=0.3659, attn_decoder_loss=0.2435, over 5330764.93 frames. ], batch size: 94, lr: 3.84e-03, grad_scale: 16.0 2024-09-18 19:37:00,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 8.526e+01 8.926e+01 9.589e+01 3.622e+02, threshold=1.785e+02, percent-clipped=3.0 2024-09-18 19:37:15,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=508840.0, ans=0.125 2024-09-18 19:37:34,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=508880.0, ans=0.125 2024-09-18 19:38:14,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=509000.0, ans=0.0 2024-09-18 19:38:15,951 INFO [train.py:1198] (0/2) Epoch 29, batch 550, loss[loss=0.2451, ctc_loss=0.1235, cr_loss=0.3577, attn_decoder_loss=0.2507, over 28788.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1231, cr_loss=0.3651, attn_decoder_loss=0.2435, over 5423925.04 frames. ], batch size: 104, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:38:25,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=509000.0, ans=0.125 2024-09-18 19:38:37,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=509040.0, ans=0.0 2024-09-18 19:38:56,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=509080.0, ans=0.0 2024-09-18 19:38:59,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=509120.0, ans=0.125 2024-09-18 19:39:31,419 INFO [train.py:1198] (0/2) Epoch 29, batch 600, loss[loss=0.253, ctc_loss=0.1328, cr_loss=0.3841, attn_decoder_loss=0.2578, over 29176.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.123, cr_loss=0.3654, attn_decoder_loss=0.2435, over 5508904.25 frames. ], batch size: 100, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:39:37,603 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.468e+01 8.932e+01 9.529e+01 2.879e+02, threshold=1.786e+02, percent-clipped=3.0 2024-09-18 19:39:47,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2024-09-18 19:39:55,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=509240.0, ans=0.125 2024-09-18 19:40:18,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=509320.0, ans=0.2 2024-09-18 19:40:25,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=509320.0, ans=0.07 2024-09-18 19:40:28,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=509320.0, ans=0.2 2024-09-18 19:40:36,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=509360.0, ans=0.125 2024-09-18 19:40:46,516 INFO [train.py:1198] (0/2) Epoch 29, batch 650, loss[loss=0.2401, ctc_loss=0.1189, cr_loss=0.3609, attn_decoder_loss=0.2456, over 29747.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1223, cr_loss=0.3641, attn_decoder_loss=0.243, over 5586583.62 frames. ], batch size: 81, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:40:54,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=509400.0, ans=0.125 2024-09-18 19:41:20,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=509480.0, ans=0.125 2024-09-18 19:41:23,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=509480.0, ans=0.125 2024-09-18 19:41:40,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=509520.0, ans=10.0 2024-09-18 19:41:41,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=509520.0, ans=0.025 2024-09-18 19:41:44,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=509520.0, ans=0.1 2024-09-18 19:41:48,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=509560.0, ans=0.125 2024-09-18 19:41:50,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=509560.0, ans=0.125 2024-09-18 19:41:55,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=509560.0, ans=0.125 2024-09-18 19:42:06,945 INFO [train.py:1198] (0/2) Epoch 29, batch 700, loss[loss=0.2286, ctc_loss=0.1199, cr_loss=0.3862, attn_decoder_loss=0.2321, over 29546.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1226, cr_loss=0.3648, attn_decoder_loss=0.2434, over 5637917.47 frames. ], batch size: 76, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:42:07,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=509600.0, ans=0.0 2024-09-18 19:42:12,943 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.515e+01 8.488e+01 8.956e+01 9.496e+01 1.572e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 19:42:16,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=509600.0, ans=0.0 2024-09-18 19:42:22,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=509640.0, ans=0.0 2024-09-18 19:42:27,011 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:42:30,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=509640.0, ans=0.125 2024-09-18 19:42:36,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=509680.0, ans=0.0 2024-09-18 19:42:43,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=509680.0, ans=0.125 2024-09-18 19:42:51,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509720.0, ans=0.0 2024-09-18 19:42:51,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2024-09-18 19:42:53,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.42 vs. limit=15.0 2024-09-18 19:42:59,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=509720.0, ans=0.2 2024-09-18 19:43:23,118 INFO [train.py:1198] (0/2) Epoch 29, batch 750, loss[loss=0.2398, ctc_loss=0.1217, cr_loss=0.3667, attn_decoder_loss=0.2448, over 29703.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1221, cr_loss=0.3642, attn_decoder_loss=0.2429, over 5676840.54 frames. ], batch size: 82, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:43:40,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-18 19:43:45,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=509840.0, ans=0.0 2024-09-18 19:44:11,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=509920.0, ans=0.0 2024-09-18 19:44:11,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=509920.0, ans=0.125 2024-09-18 19:44:38,393 INFO [train.py:1198] (0/2) Epoch 29, batch 800, loss[loss=0.2372, ctc_loss=0.1255, cr_loss=0.3862, attn_decoder_loss=0.241, over 29631.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1222, cr_loss=0.3645, attn_decoder_loss=0.2428, over 5708809.82 frames. ], batch size: 73, lr: 3.84e-03, grad_scale: 16.0 2024-09-18 19:44:44,456 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.377e+01 8.861e+01 9.386e+01 4.532e+02, threshold=1.772e+02, percent-clipped=1.0 2024-09-18 19:44:44,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510000.0, ans=0.125 2024-09-18 19:44:47,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=510000.0, ans=0.2 2024-09-18 19:44:50,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=510000.0, ans=0.125 2024-09-18 19:45:30,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=510120.0, ans=0.125 2024-09-18 19:45:37,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2024-09-18 19:45:55,658 INFO [train.py:1198] (0/2) Epoch 29, batch 850, loss[loss=0.2504, ctc_loss=0.1304, cr_loss=0.3692, attn_decoder_loss=0.2556, over 29706.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1219, cr_loss=0.3634, attn_decoder_loss=0.2426, over 5736928.35 frames. ], batch size: 89, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:46:18,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510240.0, ans=0.125 2024-09-18 19:46:21,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=510240.0, ans=0.0 2024-09-18 19:46:41,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=510320.0, ans=0.125 2024-09-18 19:46:54,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=510320.0, ans=0.0 2024-09-18 19:46:58,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=510360.0, ans=0.125 2024-09-18 19:47:06,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-09-18 19:47:07,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=510360.0, ans=0.2 2024-09-18 19:47:13,955 INFO [train.py:1198] (0/2) Epoch 29, batch 900, loss[loss=0.2249, ctc_loss=0.1218, cr_loss=0.3693, attn_decoder_loss=0.2281, over 29601.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1224, cr_loss=0.3644, attn_decoder_loss=0.2431, over 5740888.19 frames. ], batch size: 73, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:47:15,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510400.0, ans=0.1 2024-09-18 19:47:21,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.540e+01 9.030e+01 9.336e+01 1.932e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-18 19:47:22,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-18 19:47:53,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.49 vs. limit=5.0 2024-09-18 19:47:55,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510480.0, ans=0.1 2024-09-18 19:48:05,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=510520.0, ans=0.0 2024-09-18 19:48:10,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=510520.0, ans=0.125 2024-09-18 19:48:29,425 INFO [train.py:1198] (0/2) Epoch 29, batch 950, loss[loss=0.2207, ctc_loss=0.1067, cr_loss=0.347, attn_decoder_loss=0.2257, over 29519.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1223, cr_loss=0.3643, attn_decoder_loss=0.2431, over 5742586.58 frames. ], batch size: 74, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:48:44,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=510640.0, ans=0.2 2024-09-18 19:49:15,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=510720.0, ans=0.125 2024-09-18 19:49:18,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=510720.0, ans=0.125 2024-09-18 19:49:19,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-18 19:49:30,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=510720.0, ans=15.0 2024-09-18 19:49:46,882 INFO [train.py:1198] (0/2) Epoch 29, batch 1000, loss[loss=0.2411, ctc_loss=0.1288, cr_loss=0.3697, attn_decoder_loss=0.2454, over 29522.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1235, cr_loss=0.3665, attn_decoder_loss=0.2442, over 5736686.37 frames. ], batch size: 77, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:49:56,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.627e+01 9.386e+01 1.009e+02 2.634e+02, threshold=1.877e+02, percent-clipped=2.0 2024-09-18 19:50:21,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-09-18 19:50:33,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=510920.0, ans=0.1 2024-09-18 19:50:45,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=510920.0, ans=0.035 2024-09-18 19:50:51,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=510960.0, ans=0.0 2024-09-18 19:51:04,703 INFO [train.py:1198] (0/2) Epoch 29, batch 1050, loss[loss=0.2552, ctc_loss=0.1383, cr_loss=0.3859, attn_decoder_loss=0.2597, over 29677.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1234, cr_loss=0.3663, attn_decoder_loss=0.2437, over 5744088.07 frames. ], batch size: 85, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:51:29,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=511040.0, ans=0.025 2024-09-18 19:51:30,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-09-18 19:51:42,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-09-18 19:52:13,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=511160.0, ans=0.125 2024-09-18 19:52:21,211 INFO [train.py:1198] (0/2) Epoch 29, batch 1100, loss[loss=0.2353, ctc_loss=0.1222, cr_loss=0.3697, attn_decoder_loss=0.2396, over 29446.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1232, cr_loss=0.3661, attn_decoder_loss=0.2433, over 5756633.96 frames. ], batch size: 78, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:52:28,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 8.572e+01 8.922e+01 9.420e+01 4.206e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-18 19:52:38,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=511240.0, ans=0.0 2024-09-18 19:52:41,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511240.0, ans=0.1 2024-09-18 19:52:57,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2024-09-18 19:52:59,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.14 vs. limit=15.0 2024-09-18 19:53:17,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=511320.0, ans=0.0 2024-09-18 19:53:28,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=511360.0, ans=0.0 2024-09-18 19:53:38,729 INFO [train.py:1198] (0/2) Epoch 29, batch 1150, loss[loss=0.2392, ctc_loss=0.1231, cr_loss=0.3727, attn_decoder_loss=0.2438, over 29474.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1231, cr_loss=0.3656, attn_decoder_loss=0.2433, over 5754156.62 frames. ], batch size: 78, lr: 3.84e-03, grad_scale: 8.0 2024-09-18 19:53:47,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.74 vs. limit=15.0 2024-09-18 19:53:53,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=511400.0, ans=0.2 2024-09-18 19:53:54,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-18 19:54:04,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511440.0, ans=0.1 2024-09-18 19:54:05,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=511440.0, ans=0.125 2024-09-18 19:54:08,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-18 19:54:11,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=511480.0, ans=0.125 2024-09-18 19:54:11,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=511480.0, ans=0.0 2024-09-18 19:54:42,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=511560.0, ans=0.2 2024-09-18 19:54:45,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-18 19:54:54,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=511560.0, ans=0.07 2024-09-18 19:54:56,981 INFO [train.py:1198] (0/2) Epoch 29, batch 1200, loss[loss=0.2536, ctc_loss=0.135, cr_loss=0.4027, attn_decoder_loss=0.2578, over 29678.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1237, cr_loss=0.3662, attn_decoder_loss=0.2442, over 5746772.75 frames. ], batch size: 85, lr: 3.83e-03, grad_scale: 16.0 2024-09-18 19:54:58,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=511600.0, ans=0.125 2024-09-18 19:55:04,484 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.543e+01 9.016e+01 9.683e+01 2.653e+02, threshold=1.803e+02, percent-clipped=3.0 2024-09-18 19:55:36,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-18 19:55:36,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=511680.0, ans=0.125 2024-09-18 19:55:41,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=511720.0, ans=0.125 2024-09-18 19:55:47,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=511720.0, ans=0.125 2024-09-18 19:55:51,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=511720.0, ans=0.2 2024-09-18 19:56:09,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=511760.0, ans=0.125 2024-09-18 19:56:12,586 INFO [train.py:1198] (0/2) Epoch 29, batch 1250, loss[loss=0.2493, ctc_loss=0.1279, cr_loss=0.3834, attn_decoder_loss=0.2543, over 29486.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1238, cr_loss=0.367, attn_decoder_loss=0.2448, over 5774760.91 frames. ], batch size: 92, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 19:56:14,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511800.0, ans=0.1 2024-09-18 19:56:22,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=511800.0, ans=0.025 2024-09-18 19:56:35,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=511840.0, ans=0.0 2024-09-18 19:57:05,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=511920.0, ans=0.125 2024-09-18 19:57:09,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=511920.0, ans=0.0 2024-09-18 19:57:29,765 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-128000.pt 2024-09-18 19:57:38,311 INFO [train.py:1198] (0/2) Epoch 29, batch 1300, loss[loss=0.2469, ctc_loss=0.1162, cr_loss=0.3499, attn_decoder_loss=0.2536, over 28228.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1237, cr_loss=0.3671, attn_decoder_loss=0.2443, over 5779861.04 frames. ], batch size: 111, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 19:57:47,484 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.526e+01 8.940e+01 9.401e+01 4.173e+02, threshold=1.788e+02, percent-clipped=2.0 2024-09-18 19:58:17,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=512080.0, ans=0.2 2024-09-18 19:58:19,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=512080.0, ans=0.125 2024-09-18 19:58:23,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=512080.0, ans=0.0 2024-09-18 19:58:29,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=512120.0, ans=0.0 2024-09-18 19:58:38,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=512120.0, ans=0.1 2024-09-18 19:58:52,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2024-09-18 19:58:56,558 INFO [train.py:1198] (0/2) Epoch 29, batch 1350, loss[loss=0.2392, ctc_loss=0.1222, cr_loss=0.3634, attn_decoder_loss=0.2441, over 29760.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1232, cr_loss=0.3665, attn_decoder_loss=0.244, over 5796580.64 frames. ], batch size: 81, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 19:59:02,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=512200.0, ans=0.2 2024-09-18 19:59:19,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=512240.0, ans=0.125 2024-09-18 19:59:19,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=512240.0, ans=0.125 2024-09-18 19:59:23,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=512240.0, ans=0.125 2024-09-18 19:59:29,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512280.0, ans=0.1 2024-09-18 19:59:31,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.33 vs. limit=15.0 2024-09-18 19:59:32,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=512280.0, ans=0.025 2024-09-18 19:59:34,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=512280.0, ans=0.0 2024-09-18 19:59:56,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512360.0, ans=0.1 2024-09-18 20:00:11,672 INFO [train.py:1198] (0/2) Epoch 29, batch 1400, loss[loss=0.2157, ctc_loss=0.1152, cr_loss=0.3427, attn_decoder_loss=0.2193, over 29617.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1229, cr_loss=0.3656, attn_decoder_loss=0.2439, over 5807736.05 frames. ], batch size: 69, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:00:19,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=512400.0, ans=0.0 2024-09-18 20:00:20,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.361e+01 8.836e+01 9.387e+01 1.190e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-18 20:00:27,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-18 20:00:31,793 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:00:32,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.42 vs. limit=10.0 2024-09-18 20:00:39,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=512440.0, ans=0.125 2024-09-18 20:01:13,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-18 20:01:26,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=512560.0, ans=0.0 2024-09-18 20:01:27,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=512600.0, ans=0.1 2024-09-18 20:01:29,200 INFO [train.py:1198] (0/2) Epoch 29, batch 1450, loss[loss=0.2419, ctc_loss=0.1209, cr_loss=0.3728, attn_decoder_loss=0.2471, over 29402.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.123, cr_loss=0.3655, attn_decoder_loss=0.2442, over 5804756.77 frames. ], batch size: 94, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:01:32,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=512600.0, ans=0.0 2024-09-18 20:01:37,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=512600.0, ans=0.025 2024-09-18 20:01:40,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-18 20:02:06,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-09-18 20:02:12,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=512680.0, ans=0.0 2024-09-18 20:02:39,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=512760.0, ans=0.2 2024-09-18 20:02:47,663 INFO [train.py:1198] (0/2) Epoch 29, batch 1500, loss[loss=0.2471, ctc_loss=0.1326, cr_loss=0.4037, attn_decoder_loss=0.2508, over 29635.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1236, cr_loss=0.3667, attn_decoder_loss=0.2447, over 5804937.60 frames. ], batch size: 86, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:02:58,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.696e+01 9.136e+01 9.651e+01 1.564e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-18 20:02:58,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=512800.0, ans=0.025 2024-09-18 20:02:59,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=12.0 2024-09-18 20:03:02,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2024-09-18 20:03:21,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=512880.0, ans=0.0 2024-09-18 20:03:21,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=512880.0, ans=0.95 2024-09-18 20:03:32,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=512920.0, ans=0.125 2024-09-18 20:03:34,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=512920.0, ans=0.125 2024-09-18 20:03:35,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-09-18 20:03:41,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=512920.0, ans=0.1 2024-09-18 20:04:03,793 INFO [train.py:1198] (0/2) Epoch 29, batch 1550, loss[loss=0.254, ctc_loss=0.1388, cr_loss=0.4093, attn_decoder_loss=0.2577, over 29517.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.124, cr_loss=0.3672, attn_decoder_loss=0.2448, over 5780508.16 frames. ], batch size: 90, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:04:14,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=513000.0, ans=0.0 2024-09-18 20:04:14,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=513000.0, ans=0.2 2024-09-18 20:04:39,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=513080.0, ans=0.125 2024-09-18 20:05:18,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=513160.0, ans=0.0 2024-09-18 20:05:21,279 INFO [train.py:1198] (0/2) Epoch 29, batch 1600, loss[loss=0.2425, ctc_loss=0.1187, cr_loss=0.3567, attn_decoder_loss=0.2483, over 29688.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.124, cr_loss=0.3668, attn_decoder_loss=0.2446, over 5763845.48 frames. ], batch size: 85, lr: 3.83e-03, grad_scale: 16.0 2024-09-18 20:05:28,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-18 20:05:31,643 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.586e+01 9.089e+01 9.783e+01 2.042e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-18 20:05:37,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=513240.0, ans=0.025 2024-09-18 20:05:38,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=513240.0, ans=0.0 2024-09-18 20:05:48,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.44 vs. limit=15.0 2024-09-18 20:05:57,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513280.0, ans=0.1 2024-09-18 20:06:04,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-18 20:06:19,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=513320.0, ans=0.125 2024-09-18 20:06:34,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.79 vs. limit=10.0 2024-09-18 20:06:39,180 INFO [train.py:1198] (0/2) Epoch 29, batch 1650, loss[loss=0.254, ctc_loss=0.1346, cr_loss=0.4047, attn_decoder_loss=0.2583, over 29698.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1237, cr_loss=0.3665, attn_decoder_loss=0.2441, over 5758142.04 frames. ], batch size: 89, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:06:56,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.11 vs. limit=15.0 2024-09-18 20:07:08,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.03 vs. limit=22.5 2024-09-18 20:07:11,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=513480.0, ans=0.125 2024-09-18 20:07:25,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=513520.0, ans=0.125 2024-09-18 20:07:33,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=513520.0, ans=10.0 2024-09-18 20:07:39,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=513560.0, ans=0.05 2024-09-18 20:07:41,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=513560.0, ans=0.125 2024-09-18 20:07:43,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2024-09-18 20:07:53,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=513600.0, ans=0.125 2024-09-18 20:07:55,034 INFO [train.py:1198] (0/2) Epoch 29, batch 1700, loss[loss=0.215, ctc_loss=0.1026, cr_loss=0.3152, attn_decoder_loss=0.2205, over 29596.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.123, cr_loss=0.3651, attn_decoder_loss=0.2437, over 5779929.53 frames. ], batch size: 69, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:08:07,209 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.371e+01 8.901e+01 9.499e+01 1.304e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-18 20:08:09,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-09-18 20:08:13,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=513640.0, ans=0.0 2024-09-18 20:08:32,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=513680.0, ans=0.125 2024-09-18 20:08:41,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=513720.0, ans=0.125 2024-09-18 20:08:47,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=513720.0, ans=0.125 2024-09-18 20:09:00,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=513760.0, ans=0.2 2024-09-18 20:09:02,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=513760.0, ans=0.0 2024-09-18 20:09:12,834 INFO [train.py:1198] (0/2) Epoch 29, batch 1750, loss[loss=0.2158, ctc_loss=0.1045, cr_loss=0.3304, attn_decoder_loss=0.2208, over 29348.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1229, cr_loss=0.3652, attn_decoder_loss=0.2433, over 5788115.85 frames. ], batch size: 67, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:09:19,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=513800.0, ans=0.0 2024-09-18 20:09:23,762 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:09:35,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=513840.0, ans=0.0 2024-09-18 20:10:06,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2024-09-18 20:10:21,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513960.0, ans=0.1 2024-09-18 20:10:30,207 INFO [train.py:1198] (0/2) Epoch 29, batch 1800, loss[loss=0.2548, ctc_loss=0.1402, cr_loss=0.396, attn_decoder_loss=0.2588, over 29692.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1231, cr_loss=0.366, attn_decoder_loss=0.2436, over 5791570.14 frames. ], batch size: 83, lr: 3.83e-03, grad_scale: 8.0 2024-09-18 20:10:38,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-18 20:10:42,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.472e+01 8.834e+01 9.561e+01 3.303e+02, threshold=1.767e+02, percent-clipped=1.0 2024-09-18 20:11:08,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=514080.0, ans=0.1 2024-09-18 20:11:15,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=514120.0, ans=0.0 2024-09-18 20:11:31,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-18 20:11:46,039 INFO [train.py:1198] (0/2) Epoch 29, batch 1850, loss[loss=0.2428, ctc_loss=0.1185, cr_loss=0.3443, attn_decoder_loss=0.2489, over 29632.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1231, cr_loss=0.3659, attn_decoder_loss=0.2435, over 5797679.78 frames. ], batch size: 86, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:11:50,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=514200.0, ans=0.125 2024-09-18 20:11:50,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=514200.0, ans=0.125 2024-09-18 20:11:58,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=514200.0, ans=0.125 2024-09-18 20:12:05,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=514240.0, ans=0.1 2024-09-18 20:12:07,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=514240.0, ans=0.125 2024-09-18 20:12:18,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=514280.0, ans=0.0 2024-09-18 20:12:32,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=514320.0, ans=0.125 2024-09-18 20:12:42,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=514320.0, ans=0.1 2024-09-18 20:12:42,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=514320.0, ans=0.125 2024-09-18 20:13:00,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=514360.0, ans=0.0 2024-09-18 20:13:01,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.73 vs. limit=15.0 2024-09-18 20:13:03,716 INFO [train.py:1198] (0/2) Epoch 29, batch 1900, loss[loss=0.2559, ctc_loss=0.1263, cr_loss=0.3784, attn_decoder_loss=0.2619, over 29696.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.123, cr_loss=0.3658, attn_decoder_loss=0.244, over 5805094.32 frames. ], batch size: 89, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:13:10,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=514400.0, ans=0.1 2024-09-18 20:13:15,863 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 8.630e+01 9.084e+01 9.711e+01 2.750e+02, threshold=1.817e+02, percent-clipped=3.0 2024-09-18 20:13:17,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=514440.0, ans=0.0 2024-09-18 20:13:20,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=514440.0, ans=0.0 2024-09-18 20:13:28,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=514440.0, ans=0.125 2024-09-18 20:14:17,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=514560.0, ans=0.2 2024-09-18 20:14:20,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=514600.0, ans=0.125 2024-09-18 20:14:20,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=514600.0, ans=0.125 2024-09-18 20:14:22,102 INFO [train.py:1198] (0/2) Epoch 29, batch 1950, loss[loss=0.2383, ctc_loss=0.128, cr_loss=0.3786, attn_decoder_loss=0.2422, over 29442.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1235, cr_loss=0.3672, attn_decoder_loss=0.2449, over 5819456.59 frames. ], batch size: 78, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:14:24,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=514600.0, ans=0.0 2024-09-18 20:14:39,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=514640.0, ans=0.07 2024-09-18 20:14:43,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=514640.0, ans=0.0 2024-09-18 20:15:04,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=514680.0, ans=0.125 2024-09-18 20:15:37,448 INFO [train.py:1198] (0/2) Epoch 29, batch 2000, loss[loss=0.2062, ctc_loss=0.09518, cr_loss=0.3054, attn_decoder_loss=0.2117, over 29349.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1237, cr_loss=0.3673, attn_decoder_loss=0.2452, over 5795977.24 frames. ], batch size: 67, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:15:45,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=514800.0, ans=0.0 2024-09-18 20:15:49,642 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.639e+01 9.197e+01 9.637e+01 2.415e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-18 20:16:14,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=514880.0, ans=0.125 2024-09-18 20:16:14,789 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:16:25,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=514920.0, ans=0.025 2024-09-18 20:16:28,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=514920.0, ans=0.125 2024-09-18 20:16:55,177 INFO [train.py:1198] (0/2) Epoch 29, batch 2050, loss[loss=0.2193, ctc_loss=0.1017, cr_loss=0.3221, attn_decoder_loss=0.2252, over 29452.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1232, cr_loss=0.3662, attn_decoder_loss=0.2443, over 5788378.58 frames. ], batch size: 70, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:17:06,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=515000.0, ans=0.125 2024-09-18 20:17:07,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=515000.0, ans=0.125 2024-09-18 20:17:59,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=515160.0, ans=0.125 2024-09-18 20:18:01,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=515160.0, ans=0.05 2024-09-18 20:18:08,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=515160.0, ans=0.0 2024-09-18 20:18:12,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515200.0, ans=0.1 2024-09-18 20:18:13,444 INFO [train.py:1198] (0/2) Epoch 29, batch 2100, loss[loss=0.2402, ctc_loss=0.1256, cr_loss=0.3615, attn_decoder_loss=0.2449, over 29755.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1232, cr_loss=0.3661, attn_decoder_loss=0.2441, over 5800610.41 frames. ], batch size: 81, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:18:15,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=515200.0, ans=0.05 2024-09-18 20:18:19,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=515200.0, ans=0.2 2024-09-18 20:18:25,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.420e+01 8.993e+01 9.361e+01 1.152e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-18 20:18:28,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=515240.0, ans=0.125 2024-09-18 20:18:57,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515320.0, ans=0.125 2024-09-18 20:19:09,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.61 vs. limit=10.0 2024-09-18 20:19:18,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=515360.0, ans=0.125 2024-09-18 20:19:27,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.99 vs. limit=12.0 2024-09-18 20:19:28,669 INFO [train.py:1198] (0/2) Epoch 29, batch 2150, loss[loss=0.234, ctc_loss=0.1242, cr_loss=0.3732, attn_decoder_loss=0.2379, over 29446.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1228, cr_loss=0.3657, attn_decoder_loss=0.2437, over 5815449.35 frames. ], batch size: 78, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:19:47,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.12 vs. limit=12.0 2024-09-18 20:19:54,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2024-09-18 20:19:58,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=515440.0, ans=0.0 2024-09-18 20:20:06,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=515480.0, ans=0.125 2024-09-18 20:20:09,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=515480.0, ans=0.125 2024-09-18 20:20:12,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=515480.0, ans=0.125 2024-09-18 20:20:19,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=515520.0, ans=0.125 2024-09-18 20:20:30,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515560.0, ans=0.1 2024-09-18 20:20:46,531 INFO [train.py:1198] (0/2) Epoch 29, batch 2200, loss[loss=0.2553, ctc_loss=0.1362, cr_loss=0.3962, attn_decoder_loss=0.2597, over 29597.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1232, cr_loss=0.3663, attn_decoder_loss=0.2438, over 5811764.77 frames. ], batch size: 86, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:20:48,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=515600.0, ans=0.1 2024-09-18 20:20:51,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=515600.0, ans=0.025 2024-09-18 20:20:58,449 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.349e+01 8.970e+01 9.403e+01 1.511e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 20:21:01,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=515640.0, ans=0.125 2024-09-18 20:21:01,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=515640.0, ans=0.125 2024-09-18 20:21:13,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=515640.0, ans=0.125 2024-09-18 20:21:21,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=515680.0, ans=0.0 2024-09-18 20:21:52,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515760.0, ans=0.125 2024-09-18 20:22:04,185 INFO [train.py:1198] (0/2) Epoch 29, batch 2250, loss[loss=0.2431, ctc_loss=0.1181, cr_loss=0.3532, attn_decoder_loss=0.2492, over 29713.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1224, cr_loss=0.3649, attn_decoder_loss=0.2433, over 5812020.93 frames. ], batch size: 82, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:22:25,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=515840.0, ans=0.125 2024-09-18 20:22:35,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.09 vs. limit=10.0 2024-09-18 20:22:55,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=515920.0, ans=0.125 2024-09-18 20:23:19,804 INFO [train.py:1198] (0/2) Epoch 29, batch 2300, loss[loss=0.2286, ctc_loss=0.1154, cr_loss=0.3594, attn_decoder_loss=0.2332, over 29316.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1221, cr_loss=0.3646, attn_decoder_loss=0.2427, over 5798927.51 frames. ], batch size: 71, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:23:23,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=516000.0, ans=0.0 2024-09-18 20:23:29,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=516000.0, ans=0.1 2024-09-18 20:23:31,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.460e+01 8.964e+01 9.608e+01 5.700e+02, threshold=1.793e+02, percent-clipped=2.0 2024-09-18 20:23:46,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=22.5 2024-09-18 20:23:53,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=12.0 2024-09-18 20:24:02,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=10.62 vs. limit=15.0 2024-09-18 20:24:12,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.45 vs. limit=10.0 2024-09-18 20:24:22,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=516160.0, ans=0.0 2024-09-18 20:24:28,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=516160.0, ans=0.125 2024-09-18 20:24:37,598 INFO [train.py:1198] (0/2) Epoch 29, batch 2350, loss[loss=0.2406, ctc_loss=0.1195, cr_loss=0.3588, attn_decoder_loss=0.2461, over 29705.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1224, cr_loss=0.3651, attn_decoder_loss=0.243, over 5804819.02 frames. ], batch size: 83, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:24:46,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=516200.0, ans=0.125 2024-09-18 20:25:09,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=516280.0, ans=0.0 2024-09-18 20:25:55,367 INFO [train.py:1198] (0/2) Epoch 29, batch 2400, loss[loss=0.2303, ctc_loss=0.1211, cr_loss=0.3905, attn_decoder_loss=0.2337, over 29551.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1224, cr_loss=0.3649, attn_decoder_loss=0.2436, over 5807924.62 frames. ], batch size: 76, lr: 3.82e-03, grad_scale: 16.0 2024-09-18 20:26:00,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2024-09-18 20:26:08,939 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.500e+01 8.937e+01 9.634e+01 2.540e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 20:26:35,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=516480.0, ans=0.125 2024-09-18 20:26:47,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=516520.0, ans=0.1 2024-09-18 20:27:02,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.49 vs. limit=15.0 2024-09-18 20:27:06,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=516560.0, ans=0.0 2024-09-18 20:27:09,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=516600.0, ans=0.125 2024-09-18 20:27:11,186 INFO [train.py:1198] (0/2) Epoch 29, batch 2450, loss[loss=0.2391, ctc_loss=0.1268, cr_loss=0.3819, attn_decoder_loss=0.2431, over 29721.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1231, cr_loss=0.3657, attn_decoder_loss=0.2447, over 5785398.60 frames. ], batch size: 82, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:27:11,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=516600.0, ans=0.2 2024-09-18 20:27:27,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=516640.0, ans=0.0 2024-09-18 20:27:38,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-18 20:27:49,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=516680.0, ans=0.0 2024-09-18 20:27:54,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=516680.0, ans=0.0 2024-09-18 20:28:10,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=516720.0, ans=0.2 2024-09-18 20:28:26,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.95 vs. limit=22.5 2024-09-18 20:28:29,440 INFO [train.py:1198] (0/2) Epoch 29, batch 2500, loss[loss=0.2533, ctc_loss=0.1252, cr_loss=0.3774, attn_decoder_loss=0.2592, over 29621.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1229, cr_loss=0.3651, attn_decoder_loss=0.2444, over 5795332.25 frames. ], batch size: 86, lr: 3.82e-03, grad_scale: 8.0 2024-09-18 20:28:37,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=516800.0, ans=0.125 2024-09-18 20:28:44,581 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.372e+01 8.869e+01 9.573e+01 2.936e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-18 20:28:49,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=516840.0, ans=0.1 2024-09-18 20:29:12,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=516880.0, ans=0.0 2024-09-18 20:29:25,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=516920.0, ans=0.0 2024-09-18 20:29:46,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=517000.0, ans=0.125 2024-09-18 20:29:47,318 INFO [train.py:1198] (0/2) Epoch 29, batch 2550, loss[loss=0.2129, ctc_loss=0.1066, cr_loss=0.3275, attn_decoder_loss=0.2175, over 29364.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1229, cr_loss=0.3653, attn_decoder_loss=0.2445, over 5798752.43 frames. ], batch size: 67, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:30:05,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=517040.0, ans=0.125 2024-09-18 20:30:13,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=517040.0, ans=0.2 2024-09-18 20:30:37,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=517120.0, ans=0.125 2024-09-18 20:30:39,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=517120.0, ans=0.0 2024-09-18 20:30:46,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=517160.0, ans=0.95 2024-09-18 20:30:51,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517160.0, ans=0.1 2024-09-18 20:30:57,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-09-18 20:31:02,910 INFO [train.py:1198] (0/2) Epoch 29, batch 2600, loss[loss=0.2283, ctc_loss=0.1148, cr_loss=0.3403, attn_decoder_loss=0.2334, over 29431.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1234, cr_loss=0.3661, attn_decoder_loss=0.245, over 5794903.68 frames. ], batch size: 78, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:31:10,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=517200.0, ans=0.2 2024-09-18 20:31:17,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.549e+01 8.951e+01 9.409e+01 2.372e+02, threshold=1.790e+02, percent-clipped=2.0 2024-09-18 20:31:30,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=517240.0, ans=0.125 2024-09-18 20:31:33,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=517280.0, ans=0.125 2024-09-18 20:31:47,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=517280.0, ans=0.025 2024-09-18 20:32:17,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=517360.0, ans=0.125 2024-09-18 20:32:20,505 INFO [train.py:1198] (0/2) Epoch 29, batch 2650, loss[loss=0.255, ctc_loss=0.1306, cr_loss=0.381, attn_decoder_loss=0.2604, over 29273.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1238, cr_loss=0.3675, attn_decoder_loss=0.2455, over 5800314.90 frames. ], batch size: 100, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:32:25,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=517400.0, ans=0.09899494936611666 2024-09-18 20:32:41,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-09-18 20:33:21,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=517560.0, ans=0.0 2024-09-18 20:33:33,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=517560.0, ans=0.1 2024-09-18 20:33:35,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517560.0, ans=0.1 2024-09-18 20:33:38,528 INFO [train.py:1198] (0/2) Epoch 29, batch 2700, loss[loss=0.2417, ctc_loss=0.1243, cr_loss=0.3715, attn_decoder_loss=0.2465, over 29547.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1241, cr_loss=0.3682, attn_decoder_loss=0.2457, over 5795806.03 frames. ], batch size: 87, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:33:38,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=517600.0, ans=0.0 2024-09-18 20:33:38,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=517600.0, ans=0.0 2024-09-18 20:33:53,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.653e+01 9.179e+01 9.808e+01 2.021e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-18 20:34:01,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2024-09-18 20:34:05,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517640.0, ans=0.125 2024-09-18 20:34:27,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=517720.0, ans=0.125 2024-09-18 20:34:39,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517760.0, ans=0.1 2024-09-18 20:34:54,588 INFO [train.py:1198] (0/2) Epoch 29, batch 2750, loss[loss=0.2409, ctc_loss=0.1304, cr_loss=0.3891, attn_decoder_loss=0.2446, over 29503.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1229, cr_loss=0.3657, attn_decoder_loss=0.2443, over 5795330.67 frames. ], batch size: 75, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:35:06,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.32 vs. limit=15.0 2024-09-18 20:35:17,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-18 20:35:41,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.86 vs. limit=22.5 2024-09-18 20:35:49,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=517920.0, ans=0.0 2024-09-18 20:35:50,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.65 vs. limit=10.0 2024-09-18 20:35:58,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=517960.0, ans=0.125 2024-09-18 20:35:58,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=517960.0, ans=0.0 2024-09-18 20:36:08,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=517960.0, ans=0.125 2024-09-18 20:36:12,345 INFO [train.py:1198] (0/2) Epoch 29, batch 2800, loss[loss=0.2553, ctc_loss=0.1516, cr_loss=0.3784, attn_decoder_loss=0.2584, over 20434.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1232, cr_loss=0.3656, attn_decoder_loss=0.2445, over 5776290.60 frames. ], batch size: 209, lr: 3.81e-03, grad_scale: 16.0 2024-09-18 20:36:21,557 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:36:28,864 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.371e+01 8.942e+01 9.579e+01 2.215e+02, threshold=1.788e+02, percent-clipped=1.0 2024-09-18 20:36:39,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=518040.0, ans=0.125 2024-09-18 20:36:48,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=518080.0, ans=0.125 2024-09-18 20:37:30,079 INFO [train.py:1198] (0/2) Epoch 29, batch 2850, loss[loss=0.2311, ctc_loss=0.1183, cr_loss=0.3491, attn_decoder_loss=0.2359, over 29515.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1234, cr_loss=0.3655, attn_decoder_loss=0.2448, over 5762084.16 frames. ], batch size: 77, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:37:37,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=518200.0, ans=0.0 2024-09-18 20:37:40,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518200.0, ans=0.1 2024-09-18 20:37:59,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518280.0, ans=0.1 2024-09-18 20:38:22,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=518320.0, ans=0.125 2024-09-18 20:38:26,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=518320.0, ans=0.125 2024-09-18 20:38:32,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=518360.0, ans=0.0 2024-09-18 20:38:46,511 INFO [train.py:1198] (0/2) Epoch 29, batch 2900, loss[loss=0.2377, ctc_loss=0.1208, cr_loss=0.3478, attn_decoder_loss=0.243, over 29417.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.124, cr_loss=0.3672, attn_decoder_loss=0.2456, over 5786964.94 frames. ], batch size: 79, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:38:55,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=518400.0, ans=0.5 2024-09-18 20:39:05,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.499e+01 8.947e+01 9.458e+01 2.522e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-18 20:39:25,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=518480.0, ans=0.0 2024-09-18 20:39:26,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=518480.0, ans=0.2 2024-09-18 20:40:04,388 INFO [train.py:1198] (0/2) Epoch 29, batch 2950, loss[loss=0.233, ctc_loss=0.1263, cr_loss=0.3776, attn_decoder_loss=0.2365, over 29508.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1231, cr_loss=0.3654, attn_decoder_loss=0.2442, over 5782247.62 frames. ], batch size: 75, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:40:07,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=518600.0, ans=0.125 2024-09-18 20:40:12,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.10 vs. limit=22.5 2024-09-18 20:40:58,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=518720.0, ans=0.125 2024-09-18 20:41:01,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=518720.0, ans=0.125 2024-09-18 20:41:22,999 INFO [train.py:1198] (0/2) Epoch 29, batch 3000, loss[loss=0.2466, ctc_loss=0.1228, cr_loss=0.3508, attn_decoder_loss=0.2525, over 29755.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.123, cr_loss=0.3654, attn_decoder_loss=0.2442, over 5782942.76 frames. ], batch size: 81, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:41:22,999 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 20:41:37,626 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9122, 4.4108, 3.8202, 3.8674], device='cuda:0') 2024-09-18 20:41:41,474 INFO [train.py:1230] (0/2) Epoch 29, validation: loss=0.2115, ctc_loss=0.03752, cr_loss=5.604e-15, attn_decoder_loss=0.2309, over 944034.00 frames. 2024-09-18 20:41:41,474 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 20:41:58,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.694e+01 9.323e+01 9.820e+01 2.000e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-18 20:42:15,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=518880.0, ans=0.1 2024-09-18 20:42:24,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=518880.0, ans=0.125 2024-09-18 20:42:49,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=518960.0, ans=0.125 2024-09-18 20:42:59,580 INFO [train.py:1198] (0/2) Epoch 29, batch 3050, loss[loss=0.2249, ctc_loss=0.1095, cr_loss=0.3466, attn_decoder_loss=0.23, over 29534.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1237, cr_loss=0.3668, attn_decoder_loss=0.2451, over 5776876.80 frames. ], batch size: 76, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:43:02,901 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:43:12,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=519000.0, ans=0.1 2024-09-18 20:43:31,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=519080.0, ans=0.125 2024-09-18 20:43:47,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-09-18 20:43:58,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=519160.0, ans=0.0 2024-09-18 20:44:15,291 INFO [train.py:1198] (0/2) Epoch 29, batch 3100, loss[loss=0.2584, ctc_loss=0.142, cr_loss=0.4027, attn_decoder_loss=0.2624, over 29206.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1235, cr_loss=0.366, attn_decoder_loss=0.2445, over 5778094.67 frames. ], batch size: 100, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:44:15,735 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:44:31,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.578e+01 9.222e+01 9.783e+01 2.939e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-18 20:44:35,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=519240.0, ans=0.125 2024-09-18 20:44:42,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=519240.0, ans=0.125 2024-09-18 20:44:57,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=519280.0, ans=0.125 2024-09-18 20:45:21,485 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:45:29,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=519360.0, ans=0.0 2024-09-18 20:45:33,288 INFO [train.py:1198] (0/2) Epoch 29, batch 3150, loss[loss=0.2541, ctc_loss=0.1264, cr_loss=0.362, attn_decoder_loss=0.2603, over 28846.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1235, cr_loss=0.3663, attn_decoder_loss=0.2446, over 5783897.02 frames. ], batch size: 104, lr: 3.81e-03, grad_scale: 8.0 2024-09-18 20:45:52,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2024-09-18 20:45:59,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=519440.0, ans=0.0 2024-09-18 20:46:50,886 INFO [train.py:1198] (0/2) Epoch 29, batch 3200, loss[loss=0.2279, ctc_loss=0.1149, cr_loss=0.3494, attn_decoder_loss=0.2327, over 29430.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1227, cr_loss=0.3644, attn_decoder_loss=0.2437, over 5793621.97 frames. ], batch size: 79, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:46:57,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=519600.0, ans=0.1 2024-09-18 20:47:03,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=519600.0, ans=0.0 2024-09-18 20:47:06,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=519640.0, ans=0.1 2024-09-18 20:47:07,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.418e+01 8.919e+01 9.479e+01 2.582e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-18 20:47:23,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=519680.0, ans=0.0 2024-09-18 20:47:40,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=519720.0, ans=0.0 2024-09-18 20:48:07,096 INFO [train.py:1198] (0/2) Epoch 29, batch 3250, loss[loss=0.2429, ctc_loss=0.1188, cr_loss=0.3645, attn_decoder_loss=0.2485, over 29702.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1226, cr_loss=0.3648, attn_decoder_loss=0.244, over 5799803.73 frames. ], batch size: 84, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:48:26,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519840.0, ans=0.1 2024-09-18 20:48:33,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2024-09-18 20:48:33,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.15 vs. limit=6.0 2024-09-18 20:48:57,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=519920.0, ans=0.125 2024-09-18 20:49:25,266 INFO [train.py:1198] (0/2) Epoch 29, batch 3300, loss[loss=0.2492, ctc_loss=0.1236, cr_loss=0.3556, attn_decoder_loss=0.2552, over 28485.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1221, cr_loss=0.3638, attn_decoder_loss=0.243, over 5796725.08 frames. ], batch size: 112, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:49:33,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520000.0, ans=0.1 2024-09-18 20:49:42,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.577e+01 8.993e+01 9.559e+01 2.414e+02, threshold=1.799e+02, percent-clipped=3.0 2024-09-18 20:50:04,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=12.0 2024-09-18 20:50:08,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=520080.0, ans=0.0 2024-09-18 20:50:30,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=520160.0, ans=0.125 2024-09-18 20:50:37,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=520160.0, ans=0.0 2024-09-18 20:50:43,264 INFO [train.py:1198] (0/2) Epoch 29, batch 3350, loss[loss=0.2623, ctc_loss=0.14, cr_loss=0.3911, attn_decoder_loss=0.2672, over 28918.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1227, cr_loss=0.3646, attn_decoder_loss=0.2437, over 5773914.16 frames. ], batch size: 104, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:50:51,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=520200.0, ans=0.125 2024-09-18 20:50:57,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=520240.0, ans=0.0 2024-09-18 20:51:00,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=520240.0, ans=0.125 2024-09-18 20:51:02,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.02 vs. limit=15.0 2024-09-18 20:51:23,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-18 20:51:24,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=520280.0, ans=0.025 2024-09-18 20:51:38,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2024-09-18 20:51:50,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=520360.0, ans=0.125 2024-09-18 20:51:51,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=520360.0, ans=0.2 2024-09-18 20:51:55,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.27 vs. limit=15.0 2024-09-18 20:51:58,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=520400.0, ans=15.0 2024-09-18 20:51:59,172 INFO [train.py:1198] (0/2) Epoch 29, batch 3400, loss[loss=0.2139, ctc_loss=0.1057, cr_loss=0.3537, attn_decoder_loss=0.218, over 29343.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1232, cr_loss=0.3657, attn_decoder_loss=0.2439, over 5767416.84 frames. ], batch size: 67, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:52:06,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=520400.0, ans=0.125 2024-09-18 20:52:11,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=520400.0, ans=0.0 2024-09-18 20:52:13,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-09-18 20:52:14,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=520440.0, ans=0.125 2024-09-18 20:52:15,975 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.635e+01 9.188e+01 1.005e+02 1.629e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-18 20:52:26,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=520440.0, ans=0.125 2024-09-18 20:52:29,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=520480.0, ans=0.0 2024-09-18 20:52:36,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=520480.0, ans=0.125 2024-09-18 20:52:45,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=520520.0, ans=0.125 2024-09-18 20:52:51,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.30 vs. limit=22.5 2024-09-18 20:53:12,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=520560.0, ans=0.125 2024-09-18 20:53:16,964 INFO [train.py:1198] (0/2) Epoch 29, batch 3450, loss[loss=0.2437, ctc_loss=0.1229, cr_loss=0.3756, attn_decoder_loss=0.2487, over 28573.00 frames. ], tot_loss[loss=0.2394, ctc_loss=0.1234, cr_loss=0.3664, attn_decoder_loss=0.2441, over 5774606.26 frames. ], batch size: 112, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:53:17,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-18 20:53:32,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=520640.0, ans=0.125 2024-09-18 20:53:32,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=520640.0, ans=0.125 2024-09-18 20:53:34,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2024-09-18 20:53:37,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=520640.0, ans=0.0 2024-09-18 20:53:47,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=520680.0, ans=15.0 2024-09-18 20:53:59,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=520680.0, ans=0.125 2024-09-18 20:54:35,153 INFO [train.py:1198] (0/2) Epoch 29, batch 3500, loss[loss=0.2184, ctc_loss=0.1068, cr_loss=0.3374, attn_decoder_loss=0.2233, over 29731.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1231, cr_loss=0.3658, attn_decoder_loss=0.2436, over 5777940.83 frames. ], batch size: 72, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:54:40,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=520800.0, ans=0.07 2024-09-18 20:54:50,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=520840.0, ans=0.125 2024-09-18 20:54:53,389 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.236e+01 8.769e+01 9.566e+01 1.320e+02, threshold=1.754e+02, percent-clipped=0.0 2024-09-18 20:54:58,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=520840.0, ans=0.1 2024-09-18 20:54:58,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=520840.0, ans=0.0 2024-09-18 20:55:13,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=520880.0, ans=0.2 2024-09-18 20:55:19,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520920.0, ans=0.1 2024-09-18 20:55:29,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=520920.0, ans=0.0 2024-09-18 20:55:47,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=520960.0, ans=0.025 2024-09-18 20:55:50,173 INFO [train.py:1198] (0/2) Epoch 29, batch 3550, loss[loss=0.2432, ctc_loss=0.1145, cr_loss=0.3357, attn_decoder_loss=0.25, over 29713.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1228, cr_loss=0.3649, attn_decoder_loss=0.2437, over 5782411.41 frames. ], batch size: 89, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:56:08,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=521040.0, ans=0.2 2024-09-18 20:56:46,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=521120.0, ans=0.0 2024-09-18 20:56:55,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=12.0 2024-09-18 20:57:01,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=521160.0, ans=0.05 2024-09-18 20:57:04,197 INFO [train.py:1198] (0/2) Epoch 29, batch 3600, loss[loss=0.2413, ctc_loss=0.135, cr_loss=0.4152, attn_decoder_loss=0.2439, over 29515.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1224, cr_loss=0.3644, attn_decoder_loss=0.2437, over 5791820.03 frames. ], batch size: 77, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:57:10,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=521200.0, ans=0.1 2024-09-18 20:57:18,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=521240.0, ans=0.125 2024-09-18 20:57:19,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=521240.0, ans=0.125 2024-09-18 20:57:22,271 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.552e+01 8.984e+01 9.474e+01 4.897e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 20:57:50,800 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:57:54,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-18 20:57:56,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521320.0, ans=0.1 2024-09-18 20:57:58,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=521320.0, ans=0.05 2024-09-18 20:58:18,774 INFO [train.py:1198] (0/2) Epoch 29, batch 3650, loss[loss=0.2497, ctc_loss=0.1367, cr_loss=0.3968, attn_decoder_loss=0.2535, over 29504.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.122, cr_loss=0.3634, attn_decoder_loss=0.2429, over 5794681.48 frames. ], batch size: 90, lr: 3.80e-03, grad_scale: 16.0 2024-09-18 20:58:19,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=521400.0, ans=0.0 2024-09-18 20:58:23,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-09-18 20:58:28,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=521400.0, ans=0.0 2024-09-18 20:58:37,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=521440.0, ans=10.0 2024-09-18 20:58:46,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=521440.0, ans=0.0 2024-09-18 20:58:52,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=521480.0, ans=0.125 2024-09-18 20:59:05,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=521520.0, ans=0.125 2024-09-18 20:59:06,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=521520.0, ans=0.0 2024-09-18 20:59:23,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=521560.0, ans=0.125 2024-09-18 20:59:36,062 INFO [train.py:1198] (0/2) Epoch 29, batch 3700, loss[loss=0.2492, ctc_loss=0.1265, cr_loss=0.3715, attn_decoder_loss=0.2545, over 29724.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1224, cr_loss=0.3647, attn_decoder_loss=0.2433, over 5805140.46 frames. ], batch size: 84, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 20:59:55,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.397e+01 8.875e+01 9.405e+01 1.712e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-18 21:00:04,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=12.0 2024-09-18 21:00:05,850 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:00:16,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-09-18 21:00:25,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=521720.0, ans=0.0 2024-09-18 21:00:32,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=521720.0, ans=0.125 2024-09-18 21:00:35,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-18 21:00:52,046 INFO [train.py:1198] (0/2) Epoch 29, batch 3750, loss[loss=0.2175, ctc_loss=0.112, cr_loss=0.347, attn_decoder_loss=0.2215, over 29318.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1223, cr_loss=0.3649, attn_decoder_loss=0.2431, over 5808722.10 frames. ], batch size: 67, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 21:00:59,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=521800.0, ans=0.0 2024-09-18 21:01:17,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521840.0, ans=0.1 2024-09-18 21:01:24,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=521880.0, ans=0.125 2024-09-18 21:01:28,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=521880.0, ans=0.125 2024-09-18 21:01:29,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=521880.0, ans=0.05 2024-09-18 21:01:46,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=521920.0, ans=0.125 2024-09-18 21:02:01,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521960.0, ans=0.1 2024-09-18 21:02:04,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2024-09-18 21:02:06,640 INFO [train.py:1198] (0/2) Epoch 29, batch 3800, loss[loss=0.2403, ctc_loss=0.1194, cr_loss=0.3507, attn_decoder_loss=0.2459, over 29619.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1221, cr_loss=0.3641, attn_decoder_loss=0.2427, over 5799618.40 frames. ], batch size: 86, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 21:02:25,923 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.472e+01 8.859e+01 9.703e+01 1.383e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-18 21:02:28,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.13 vs. limit=15.0 2024-09-18 21:03:00,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=522120.0, ans=0.2 2024-09-18 21:03:06,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=522160.0, ans=0.0 2024-09-18 21:03:13,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=522160.0, ans=0.2 2024-09-18 21:03:20,793 INFO [train.py:1198] (0/2) Epoch 29, batch 3850, loss[loss=0.2586, ctc_loss=0.1425, cr_loss=0.4161, attn_decoder_loss=0.2623, over 29327.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.122, cr_loss=0.3644, attn_decoder_loss=0.2427, over 5814091.29 frames. ], batch size: 100, lr: 3.80e-03, grad_scale: 8.0 2024-09-18 21:03:30,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=522200.0, ans=0.2 2024-09-18 21:03:49,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=522280.0, ans=0.0 2024-09-18 21:04:07,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-18 21:04:20,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=522360.0, ans=0.0 2024-09-18 21:04:31,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=522360.0, ans=0.07 2024-09-18 21:04:32,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=522360.0, ans=0.0 2024-09-18 21:04:34,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522360.0, ans=0.0 2024-09-18 21:04:37,290 INFO [train.py:1198] (0/2) Epoch 29, batch 3900, loss[loss=0.2578, ctc_loss=0.1336, cr_loss=0.3897, attn_decoder_loss=0.2629, over 29641.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1228, cr_loss=0.3658, attn_decoder_loss=0.2434, over 5818166.81 frames. ], batch size: 86, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:04:47,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=522400.0, ans=0.0 2024-09-18 21:04:56,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.663e+01 9.050e+01 9.576e+01 3.697e+02, threshold=1.810e+02, percent-clipped=2.0 2024-09-18 21:04:59,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522440.0, ans=0.1 2024-09-18 21:05:21,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=522520.0, ans=0.125 2024-09-18 21:05:30,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=522520.0, ans=0.09899494936611666 2024-09-18 21:05:36,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=522560.0, ans=0.0 2024-09-18 21:05:49,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=522560.0, ans=0.0 2024-09-18 21:05:52,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=12.0 2024-09-18 21:05:52,776 INFO [train.py:1198] (0/2) Epoch 29, batch 3950, loss[loss=0.2525, ctc_loss=0.1294, cr_loss=0.3827, attn_decoder_loss=0.2577, over 29421.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1224, cr_loss=0.3653, attn_decoder_loss=0.2431, over 5837208.25 frames. ], batch size: 97, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:06:09,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.65 vs. limit=15.0 2024-09-18 21:06:22,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=522680.0, ans=0.2 2024-09-18 21:06:32,955 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:06:46,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=522720.0, ans=0.125 2024-09-18 21:06:49,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=522720.0, ans=0.125 2024-09-18 21:07:06,730 INFO [train.py:1198] (0/2) Epoch 29, batch 4000, loss[loss=0.2247, ctc_loss=0.1093, cr_loss=0.3458, attn_decoder_loss=0.2299, over 29516.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1226, cr_loss=0.3653, attn_decoder_loss=0.2433, over 5813621.32 frames. ], batch size: 74, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:07:20,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=522840.0, ans=0.125 2024-09-18 21:07:25,861 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.544e+01 9.038e+01 9.843e+01 4.905e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-18 21:07:52,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=11.69 vs. limit=15.0 2024-09-18 21:08:00,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=522920.0, ans=0.0 2024-09-18 21:08:21,178 INFO [train.py:1198] (0/2) Epoch 29, batch 4050, loss[loss=0.2633, ctc_loss=0.1549, cr_loss=0.3989, attn_decoder_loss=0.2665, over 19867.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1226, cr_loss=0.3645, attn_decoder_loss=0.2433, over 5797446.17 frames. ], batch size: 209, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:08:22,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=523000.0, ans=0.0 2024-09-18 21:08:32,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.87 vs. limit=15.0 2024-09-18 21:08:37,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=523040.0, ans=10.0 2024-09-18 21:08:43,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=523040.0, ans=0.0 2024-09-18 21:08:50,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=523080.0, ans=0.125 2024-09-18 21:08:53,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=523080.0, ans=0.0 2024-09-18 21:08:54,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523080.0, ans=0.1 2024-09-18 21:09:22,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=523160.0, ans=0.0 2024-09-18 21:09:29,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523160.0, ans=0.125 2024-09-18 21:09:30,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=523160.0, ans=0.0 2024-09-18 21:09:30,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.74 vs. limit=10.0 2024-09-18 21:09:33,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=523160.0, ans=0.0 2024-09-18 21:09:35,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=523200.0, ans=0.07 2024-09-18 21:09:36,356 INFO [train.py:1198] (0/2) Epoch 29, batch 4100, loss[loss=0.2592, ctc_loss=0.1417, cr_loss=0.4046, attn_decoder_loss=0.2633, over 29503.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1224, cr_loss=0.364, attn_decoder_loss=0.2432, over 5793667.09 frames. ], batch size: 90, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:09:48,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=523200.0, ans=0.125 2024-09-18 21:09:52,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=523240.0, ans=0.125 2024-09-18 21:09:56,919 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.631e+01 9.111e+01 9.616e+01 2.001e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-18 21:10:00,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.94 vs. limit=15.0 2024-09-18 21:10:02,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=523240.0, ans=0.0 2024-09-18 21:10:22,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2024-09-18 21:10:29,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=523320.0, ans=0.0 2024-09-18 21:10:43,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=523360.0, ans=0.125 2024-09-18 21:10:43,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=523360.0, ans=0.2 2024-09-18 21:10:51,017 INFO [train.py:1198] (0/2) Epoch 29, batch 4150, loss[loss=0.2407, ctc_loss=0.1275, cr_loss=0.374, attn_decoder_loss=0.2449, over 29494.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1228, cr_loss=0.365, attn_decoder_loss=0.2435, over 5799101.90 frames. ], batch size: 77, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:10:55,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=523400.0, ans=0.125 2024-09-18 21:11:03,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=523400.0, ans=0.0 2024-09-18 21:11:03,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=523400.0, ans=0.125 2024-09-18 21:11:06,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=523440.0, ans=0.125 2024-09-18 21:11:07,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=523440.0, ans=0.025 2024-09-18 21:11:12,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2024-09-18 21:11:19,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523480.0, ans=0.125 2024-09-18 21:11:26,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=523480.0, ans=0.125 2024-09-18 21:11:27,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.05 vs. limit=10.0 2024-09-18 21:11:36,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=523520.0, ans=0.07 2024-09-18 21:11:39,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=523520.0, ans=0.0 2024-09-18 21:11:44,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=523520.0, ans=0.05 2024-09-18 21:12:04,653 INFO [train.py:1198] (0/2) Epoch 29, batch 4200, loss[loss=0.2785, ctc_loss=0.1588, cr_loss=0.445, attn_decoder_loss=0.282, over 29477.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1226, cr_loss=0.3646, attn_decoder_loss=0.2436, over 5800323.36 frames. ], batch size: 90, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:12:14,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2024-09-18 21:12:24,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=12.0 2024-09-18 21:12:25,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.488e+01 8.959e+01 9.406e+01 1.586e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 21:12:31,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523640.0, ans=0.1 2024-09-18 21:12:40,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=523680.0, ans=0.125 2024-09-18 21:12:41,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=523680.0, ans=0.125 2024-09-18 21:13:00,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523720.0, ans=0.1 2024-09-18 21:13:19,320 INFO [train.py:1198] (0/2) Epoch 29, batch 4250, loss[loss=0.2288, ctc_loss=0.1149, cr_loss=0.3555, attn_decoder_loss=0.2336, over 29516.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1225, cr_loss=0.365, attn_decoder_loss=0.2438, over 5806126.12 frames. ], batch size: 74, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:13:20,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=15.0 2024-09-18 21:13:48,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=523880.0, ans=0.125 2024-09-18 21:14:02,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=523920.0, ans=0.1 2024-09-18 21:14:28,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.76 vs. limit=10.0 2024-09-18 21:14:33,927 INFO [train.py:1198] (0/2) Epoch 29, batch 4300, loss[loss=0.2559, ctc_loss=0.1387, cr_loss=0.4091, attn_decoder_loss=0.2598, over 29526.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1224, cr_loss=0.3646, attn_decoder_loss=0.244, over 5796687.86 frames. ], batch size: 87, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:14:54,744 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.566e+01 9.093e+01 9.563e+01 1.622e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-18 21:14:55,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=524040.0, ans=0.0 2024-09-18 21:15:08,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=524080.0, ans=0.0 2024-09-18 21:15:16,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.57 vs. limit=15.0 2024-09-18 21:15:18,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=524120.0, ans=0.2 2024-09-18 21:15:20,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=524120.0, ans=0.125 2024-09-18 21:15:24,738 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:15:25,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.63 vs. limit=15.0 2024-09-18 21:15:46,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=524200.0, ans=0.2 2024-09-18 21:15:48,233 INFO [train.py:1198] (0/2) Epoch 29, batch 4350, loss[loss=0.255, ctc_loss=0.1343, cr_loss=0.4019, attn_decoder_loss=0.2595, over 29504.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1249, cr_loss=0.3697, attn_decoder_loss=0.2471, over 5798108.56 frames. ], batch size: 97, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:15:49,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2024-09-18 21:16:01,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=524240.0, ans=0.2 2024-09-18 21:16:04,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=524240.0, ans=0.025 2024-09-18 21:16:07,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=524240.0, ans=0.1 2024-09-18 21:16:29,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524280.0, ans=0.1 2024-09-18 21:16:30,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2024-09-18 21:16:39,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=524320.0, ans=0.125 2024-09-18 21:16:53,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.68 vs. limit=15.0 2024-09-18 21:17:02,842 INFO [train.py:1198] (0/2) Epoch 29, batch 4400, loss[loss=0.2571, ctc_loss=0.1459, cr_loss=0.4042, attn_decoder_loss=0.2605, over 27608.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.1259, cr_loss=0.3715, attn_decoder_loss=0.249, over 5768382.67 frames. ], batch size: 124, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:17:05,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-09-18 21:17:22,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524440.0, ans=0.1 2024-09-18 21:17:23,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.178e+01 8.977e+01 9.367e+01 9.862e+01 3.705e+02, threshold=1.873e+02, percent-clipped=1.0 2024-09-18 21:17:29,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524440.0, ans=0.1 2024-09-18 21:17:29,632 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:17:50,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=524520.0, ans=0.0 2024-09-18 21:17:55,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-09-18 21:17:56,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=524520.0, ans=0.2 2024-09-18 21:18:16,812 INFO [train.py:1198] (0/2) Epoch 29, batch 4450, loss[loss=0.2618, ctc_loss=0.1488, cr_loss=0.4023, attn_decoder_loss=0.2654, over 20522.00 frames. ], tot_loss[loss=0.2469, ctc_loss=0.13, cr_loss=0.3775, attn_decoder_loss=0.2515, over 5583724.45 frames. ], batch size: 209, lr: 3.79e-03, grad_scale: 16.0 2024-09-18 21:18:45,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=524640.0, ans=0.025 2024-09-18 21:18:48,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=524680.0, ans=0.125 2024-09-18 21:18:51,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=524680.0, ans=0.2 2024-09-18 21:18:52,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524680.0, ans=0.125 2024-09-18 21:18:58,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=524680.0, ans=0.1 2024-09-18 21:19:03,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=524720.0, ans=0.125 2024-09-18 21:19:18,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=524760.0, ans=0.0 2024-09-18 21:19:31,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=524800.0, ans=0.0 2024-09-18 21:19:33,099 INFO [train.py:1198] (0/2) Epoch 29, batch 4500, loss[loss=0.2669, ctc_loss=0.1588, cr_loss=0.4112, attn_decoder_loss=0.2698, over 20450.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1337, cr_loss=0.3801, attn_decoder_loss=0.2535, over 5240174.55 frames. ], batch size: 210, lr: 3.79e-03, grad_scale: 8.0 2024-09-18 21:19:42,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=524800.0, ans=0.125 2024-09-18 21:19:55,855 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.478e+01 1.036e+02 1.116e+02 1.208e+02 3.141e+02, threshold=2.233e+02, percent-clipped=1.0 2024-09-18 21:20:00,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=524840.0, ans=10.0 2024-09-18 21:20:10,760 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-29.pt 2024-09-18 21:21:03,276 INFO [train.py:1198] (0/2) Epoch 30, batch 0, loss[loss=0.219, ctc_loss=0.1109, cr_loss=0.345, attn_decoder_loss=0.2234, over 29597.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1109, cr_loss=0.345, attn_decoder_loss=0.2234, over 29597.00 frames. ], batch size: 73, lr: 3.72e-03, grad_scale: 16.0 2024-09-18 21:21:03,276 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 21:21:23,765 INFO [train.py:1230] (0/2) Epoch 30, validation: loss=0.2119, ctc_loss=0.03754, cr_loss=5.775e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-18 21:21:23,765 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 21:21:27,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=524900.0, ans=0.125 2024-09-18 21:21:33,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.32 vs. limit=10.0 2024-09-18 21:21:39,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=524940.0, ans=0.125 2024-09-18 21:21:39,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=524940.0, ans=0.1 2024-09-18 21:21:43,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=524940.0, ans=0.2 2024-09-18 21:22:30,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-18 21:22:32,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=525060.0, ans=0.125 2024-09-18 21:22:36,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=525060.0, ans=0.125 2024-09-18 21:22:39,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=525100.0, ans=0.025 2024-09-18 21:22:40,135 INFO [train.py:1198] (0/2) Epoch 30, batch 50, loss[loss=0.2138, ctc_loss=0.1045, cr_loss=0.3278, attn_decoder_loss=0.2187, over 29456.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1233, cr_loss=0.3678, attn_decoder_loss=0.2431, over 1267299.47 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 16.0 2024-09-18 21:22:48,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=525100.0, ans=0.125 2024-09-18 21:23:09,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=525180.0, ans=0.04949747468305833 2024-09-18 21:23:11,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=525180.0, ans=0.125 2024-09-18 21:23:17,191 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:23:36,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=525220.0, ans=0.05 2024-09-18 21:23:39,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=525260.0, ans=0.0 2024-09-18 21:23:42,502 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.848e+01 9.545e+01 1.010e+02 1.497e+02, threshold=1.909e+02, percent-clipped=0.0 2024-09-18 21:23:53,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525260.0, ans=0.1 2024-09-18 21:23:54,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=525300.0, ans=0.05 2024-09-18 21:23:56,179 INFO [train.py:1198] (0/2) Epoch 30, batch 100, loss[loss=0.2252, ctc_loss=0.1145, cr_loss=0.3536, attn_decoder_loss=0.2297, over 29520.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1253, cr_loss=0.3719, attn_decoder_loss=0.246, over 2253142.01 frames. ], batch size: 76, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:24:06,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=525300.0, ans=0.125 2024-09-18 21:24:21,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=525340.0, ans=0.0 2024-09-18 21:24:27,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=525380.0, ans=0.0 2024-09-18 21:24:29,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=525380.0, ans=0.125 2024-09-18 21:24:33,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=525380.0, ans=0.0 2024-09-18 21:24:49,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=525420.0, ans=0.2 2024-09-18 21:24:51,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=525420.0, ans=0.125 2024-09-18 21:25:13,043 INFO [train.py:1198] (0/2) Epoch 30, batch 150, loss[loss=0.2193, ctc_loss=0.1092, cr_loss=0.3445, attn_decoder_loss=0.2239, over 29437.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1232, cr_loss=0.3677, attn_decoder_loss=0.2443, over 3047980.48 frames. ], batch size: 70, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:25:22,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-18 21:25:25,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=525500.0, ans=0.1 2024-09-18 21:26:13,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=525620.0, ans=12.0 2024-09-18 21:26:17,353 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.425e+01 8.976e+01 9.725e+01 1.408e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-18 21:26:29,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=525700.0, ans=0.125 2024-09-18 21:26:30,930 INFO [train.py:1198] (0/2) Epoch 30, batch 200, loss[loss=0.2543, ctc_loss=0.1363, cr_loss=0.3934, attn_decoder_loss=0.2587, over 27321.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1224, cr_loss=0.3664, attn_decoder_loss=0.2434, over 3659272.88 frames. ], batch size: 124, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:26:43,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=525700.0, ans=0.125 2024-09-18 21:26:46,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=525740.0, ans=0.125 2024-09-18 21:26:57,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=525740.0, ans=6.0 2024-09-18 21:26:59,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=525740.0, ans=15.0 2024-09-18 21:27:00,157 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:27:09,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=525780.0, ans=0.04949747468305833 2024-09-18 21:27:24,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=525820.0, ans=0.2 2024-09-18 21:27:37,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=525860.0, ans=0.125 2024-09-18 21:27:42,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=525860.0, ans=0.0 2024-09-18 21:27:44,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=525860.0, ans=0.0 2024-09-18 21:27:46,552 INFO [train.py:1198] (0/2) Epoch 30, batch 250, loss[loss=0.2615, ctc_loss=0.1354, cr_loss=0.3863, attn_decoder_loss=0.267, over 29252.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1225, cr_loss=0.3657, attn_decoder_loss=0.2436, over 4142376.32 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:27:49,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=525900.0, ans=0.2 2024-09-18 21:28:08,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=22.5 2024-09-18 21:28:12,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=525940.0, ans=0.125 2024-09-18 21:28:20,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=525980.0, ans=0.0 2024-09-18 21:28:29,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=525980.0, ans=0.0 2024-09-18 21:28:36,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.46 vs. limit=15.0 2024-09-18 21:28:39,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=526020.0, ans=0.025 2024-09-18 21:28:43,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526020.0, ans=0.1 2024-09-18 21:28:50,801 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.439e+01 8.914e+01 9.350e+01 1.362e+02, threshold=1.783e+02, percent-clipped=0.0 2024-09-18 21:29:00,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=526060.0, ans=0.125 2024-09-18 21:29:04,748 INFO [train.py:1198] (0/2) Epoch 30, batch 300, loss[loss=0.2479, ctc_loss=0.1267, cr_loss=0.3882, attn_decoder_loss=0.2528, over 29514.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1223, cr_loss=0.366, attn_decoder_loss=0.2433, over 4508734.64 frames. ], batch size: 92, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:29:05,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=526100.0, ans=0.0 2024-09-18 21:29:15,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=526100.0, ans=0.0 2024-09-18 21:29:16,102 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.81 vs. limit=22.5 2024-09-18 21:29:33,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=526140.0, ans=0.125 2024-09-18 21:29:45,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=526180.0, ans=0.125 2024-09-18 21:30:00,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=526220.0, ans=0.2 2024-09-18 21:30:08,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526260.0, ans=0.1 2024-09-18 21:30:21,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=526300.0, ans=0.025 2024-09-18 21:30:22,650 INFO [train.py:1198] (0/2) Epoch 30, batch 350, loss[loss=0.2144, ctc_loss=0.09692, cr_loss=0.2959, attn_decoder_loss=0.2208, over 29333.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1224, cr_loss=0.3655, attn_decoder_loss=0.2437, over 4794140.64 frames. ], batch size: 71, lr: 3.72e-03, grad_scale: 8.0 2024-09-18 21:31:10,287 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:31:20,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=526420.0, ans=0.125 2024-09-18 21:31:22,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=526460.0, ans=0.025 2024-09-18 21:31:24,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.639e+01 9.253e+01 9.920e+01 3.039e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-18 21:31:38,413 INFO [train.py:1198] (0/2) Epoch 30, batch 400, loss[loss=0.2448, ctc_loss=0.1206, cr_loss=0.3706, attn_decoder_loss=0.2504, over 29706.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1223, cr_loss=0.365, attn_decoder_loss=0.2437, over 5025692.44 frames. ], batch size: 82, lr: 3.72e-03, grad_scale: 16.0 2024-09-18 21:31:55,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=526540.0, ans=0.0 2024-09-18 21:32:42,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526660.0, ans=0.125 2024-09-18 21:32:47,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526660.0, ans=0.1 2024-09-18 21:32:56,028 INFO [train.py:1198] (0/2) Epoch 30, batch 450, loss[loss=0.2464, ctc_loss=0.1238, cr_loss=0.3671, attn_decoder_loss=0.2519, over 29685.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.122, cr_loss=0.3645, attn_decoder_loss=0.2434, over 5188016.55 frames. ], batch size: 83, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:32:56,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2024-09-18 21:33:00,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=526700.0, ans=0.125 2024-09-18 21:33:07,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=526700.0, ans=0.125 2024-09-18 21:33:07,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.95 vs. limit=10.0 2024-09-18 21:33:11,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=526740.0, ans=0.125 2024-09-18 21:33:29,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=526780.0, ans=0.125 2024-09-18 21:33:39,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526780.0, ans=0.1 2024-09-18 21:33:44,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=526820.0, ans=0.125 2024-09-18 21:33:48,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=12.0 2024-09-18 21:34:01,950 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.512e+01 8.936e+01 9.488e+01 1.864e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-18 21:34:06,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=526860.0, ans=0.0 2024-09-18 21:34:13,945 INFO [train.py:1198] (0/2) Epoch 30, batch 500, loss[loss=0.2637, ctc_loss=0.1444, cr_loss=0.4224, attn_decoder_loss=0.2676, over 29442.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1215, cr_loss=0.3636, attn_decoder_loss=0.2427, over 5330605.84 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:34:53,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=526980.0, ans=0.5 2024-09-18 21:35:30,023 INFO [train.py:1198] (0/2) Epoch 30, batch 550, loss[loss=0.2569, ctc_loss=0.1279, cr_loss=0.3925, attn_decoder_loss=0.2625, over 28775.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1211, cr_loss=0.3625, attn_decoder_loss=0.2426, over 5422706.90 frames. ], batch size: 104, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:35:51,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=527140.0, ans=0.125 2024-09-18 21:36:15,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-09-18 21:36:16,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=527220.0, ans=0.0 2024-09-18 21:36:24,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=527220.0, ans=0.125 2024-09-18 21:36:34,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=527260.0, ans=15.0 2024-09-18 21:36:36,358 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.634e+01 8.972e+01 9.427e+01 2.186e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-18 21:36:38,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=527260.0, ans=0.07 2024-09-18 21:36:44,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=527260.0, ans=0.125 2024-09-18 21:36:44,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=527260.0, ans=0.125 2024-09-18 21:36:48,688 INFO [train.py:1198] (0/2) Epoch 30, batch 600, loss[loss=0.2587, ctc_loss=0.1384, cr_loss=0.402, attn_decoder_loss=0.2632, over 29206.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1212, cr_loss=0.3633, attn_decoder_loss=0.243, over 5510178.63 frames. ], batch size: 100, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:36:55,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=527300.0, ans=0.0 2024-09-18 21:37:08,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=527340.0, ans=0.125 2024-09-18 21:37:13,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=527340.0, ans=0.125 2024-09-18 21:37:16,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=527340.0, ans=0.125 2024-09-18 21:37:25,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=527380.0, ans=0.0 2024-09-18 21:37:31,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=527380.0, ans=10.0 2024-09-18 21:37:37,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-18 21:37:43,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=527420.0, ans=0.125 2024-09-18 21:37:53,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=527460.0, ans=0.2 2024-09-18 21:38:06,372 INFO [train.py:1198] (0/2) Epoch 30, batch 650, loss[loss=0.2402, ctc_loss=0.1206, cr_loss=0.364, attn_decoder_loss=0.2454, over 29740.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1208, cr_loss=0.3625, attn_decoder_loss=0.2424, over 5587077.19 frames. ], batch size: 81, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:39:01,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2024-09-18 21:39:09,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.386e+01 8.897e+01 9.302e+01 1.225e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-18 21:39:09,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=527660.0, ans=0.0 2024-09-18 21:39:21,810 INFO [train.py:1198] (0/2) Epoch 30, batch 700, loss[loss=0.2317, ctc_loss=0.1223, cr_loss=0.3486, attn_decoder_loss=0.2361, over 29540.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1209, cr_loss=0.3629, attn_decoder_loss=0.2427, over 5637501.27 frames. ], batch size: 76, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:39:28,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-18 21:39:40,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=527740.0, ans=0.07 2024-09-18 21:40:06,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=527820.0, ans=0.125 2024-09-18 21:40:14,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2024-09-18 21:40:25,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2024-09-18 21:40:39,933 INFO [train.py:1198] (0/2) Epoch 30, batch 750, loss[loss=0.2439, ctc_loss=0.1221, cr_loss=0.3634, attn_decoder_loss=0.2493, over 29711.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1207, cr_loss=0.3618, attn_decoder_loss=0.2423, over 5677379.27 frames. ], batch size: 82, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:40:47,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=527900.0, ans=0.2 2024-09-18 21:40:49,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=527900.0, ans=0.0 2024-09-18 21:40:52,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=527900.0, ans=0.07 2024-09-18 21:41:16,424 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-132000.pt 2024-09-18 21:41:28,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=527980.0, ans=0.125 2024-09-18 21:41:38,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=528020.0, ans=0.125 2024-09-18 21:41:41,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.96 vs. limit=10.0 2024-09-18 21:41:46,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.96 vs. limit=15.0 2024-09-18 21:41:52,809 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.561e+01 8.909e+01 9.515e+01 3.316e+02, threshold=1.782e+02, percent-clipped=2.0 2024-09-18 21:42:04,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.92 vs. limit=22.5 2024-09-18 21:42:04,968 INFO [train.py:1198] (0/2) Epoch 30, batch 800, loss[loss=0.2214, ctc_loss=0.1132, cr_loss=0.3289, attn_decoder_loss=0.2261, over 29606.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1208, cr_loss=0.3622, attn_decoder_loss=0.2421, over 5708120.81 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 16.0 2024-09-18 21:42:23,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=528140.0, ans=0.125 2024-09-18 21:42:23,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=528140.0, ans=0.0 2024-09-18 21:42:26,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=528140.0, ans=0.2 2024-09-18 21:42:39,931 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:43:08,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528260.0, ans=0.1 2024-09-18 21:43:20,040 INFO [train.py:1198] (0/2) Epoch 30, batch 850, loss[loss=0.251, ctc_loss=0.1287, cr_loss=0.3777, attn_decoder_loss=0.2562, over 29712.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1205, cr_loss=0.3614, attn_decoder_loss=0.242, over 5734294.14 frames. ], batch size: 89, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:43:23,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=528300.0, ans=0.125 2024-09-18 21:43:33,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=528340.0, ans=0.125 2024-09-18 21:43:39,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=528340.0, ans=0.125 2024-09-18 21:43:55,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=528380.0, ans=0.2 2024-09-18 21:44:10,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=528420.0, ans=0.0 2024-09-18 21:44:21,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=528460.0, ans=0.125 2024-09-18 21:44:26,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.70 vs. limit=22.5 2024-09-18 21:44:27,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.440e+01 8.960e+01 9.629e+01 1.513e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-18 21:44:33,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528460.0, ans=0.1 2024-09-18 21:44:37,931 INFO [train.py:1198] (0/2) Epoch 30, batch 900, loss[loss=0.2207, ctc_loss=0.1104, cr_loss=0.3385, attn_decoder_loss=0.2255, over 29600.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1207, cr_loss=0.3616, attn_decoder_loss=0.2423, over 5739196.92 frames. ], batch size: 73, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:44:39,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=528500.0, ans=0.5 2024-09-18 21:44:54,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=528540.0, ans=0.125 2024-09-18 21:45:00,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528540.0, ans=0.1 2024-09-18 21:45:02,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2024-09-18 21:45:06,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=528580.0, ans=0.0 2024-09-18 21:45:17,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=528580.0, ans=0.125 2024-09-18 21:45:31,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-18 21:45:32,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=528620.0, ans=0.125 2024-09-18 21:45:36,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=528620.0, ans=0.125 2024-09-18 21:45:37,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=528620.0, ans=0.125 2024-09-18 21:45:43,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528660.0, ans=0.1 2024-09-18 21:45:45,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=528660.0, ans=0.95 2024-09-18 21:45:55,578 INFO [train.py:1198] (0/2) Epoch 30, batch 950, loss[loss=0.2204, ctc_loss=0.1093, cr_loss=0.3473, attn_decoder_loss=0.225, over 29523.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1207, cr_loss=0.3618, attn_decoder_loss=0.2425, over 5741816.45 frames. ], batch size: 74, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:46:20,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-09-18 21:46:49,013 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:46:50,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=528820.0, ans=0.125 2024-09-18 21:47:00,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.476e+01 9.054e+01 9.594e+01 4.825e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-18 21:47:11,026 INFO [train.py:1198] (0/2) Epoch 30, batch 1000, loss[loss=0.2421, ctc_loss=0.1265, cr_loss=0.3747, attn_decoder_loss=0.2466, over 29526.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1215, cr_loss=0.3633, attn_decoder_loss=0.2433, over 5736587.22 frames. ], batch size: 77, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:47:19,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.30 vs. limit=15.0 2024-09-18 21:47:35,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=528940.0, ans=10.0 2024-09-18 21:47:37,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=528940.0, ans=0.0 2024-09-18 21:48:01,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=529020.0, ans=0.0 2024-09-18 21:48:17,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=529060.0, ans=0.125 2024-09-18 21:48:24,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529060.0, ans=0.125 2024-09-18 21:48:28,834 INFO [train.py:1198] (0/2) Epoch 30, batch 1050, loss[loss=0.2473, ctc_loss=0.1274, cr_loss=0.3758, attn_decoder_loss=0.2523, over 29680.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1209, cr_loss=0.3624, attn_decoder_loss=0.2426, over 5744557.33 frames. ], batch size: 85, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:48:39,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=529100.0, ans=0.0 2024-09-18 21:48:44,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.86 vs. limit=15.0 2024-09-18 21:48:48,974 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:48:51,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=529140.0, ans=0.2 2024-09-18 21:48:53,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=529140.0, ans=0.0 2024-09-18 21:49:36,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.411e+01 8.824e+01 9.446e+01 1.337e+02, threshold=1.765e+02, percent-clipped=0.0 2024-09-18 21:49:38,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=529260.0, ans=0.2 2024-09-18 21:49:46,959 INFO [train.py:1198] (0/2) Epoch 30, batch 1100, loss[loss=0.2296, ctc_loss=0.1157, cr_loss=0.3611, attn_decoder_loss=0.2342, over 29439.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1208, cr_loss=0.3625, attn_decoder_loss=0.2423, over 5757634.80 frames. ], batch size: 78, lr: 3.71e-03, grad_scale: 8.0 2024-09-18 21:49:54,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=529300.0, ans=0.125 2024-09-18 21:50:05,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=529340.0, ans=0.0 2024-09-18 21:50:06,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=529340.0, ans=0.0 2024-09-18 21:50:13,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-09-18 21:50:13,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.71 vs. limit=10.0 2024-09-18 21:50:19,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-18 21:50:20,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=529380.0, ans=0.125 2024-09-18 21:50:34,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=529420.0, ans=0.125 2024-09-18 21:51:00,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=529460.0, ans=6.0 2024-09-18 21:51:02,533 INFO [train.py:1198] (0/2) Epoch 30, batch 1150, loss[loss=0.2275, ctc_loss=0.1137, cr_loss=0.3515, attn_decoder_loss=0.2323, over 29441.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1212, cr_loss=0.3634, attn_decoder_loss=0.2425, over 5756979.78 frames. ], batch size: 78, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 21:51:39,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=529580.0, ans=0.125 2024-09-18 21:51:47,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=529620.0, ans=0.025 2024-09-18 21:51:52,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-09-18 21:52:04,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529660.0, ans=0.1 2024-09-18 21:52:10,742 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.491e+01 9.020e+01 1.005e+02 1.994e+02, threshold=1.804e+02, percent-clipped=1.0 2024-09-18 21:52:10,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=529660.0, ans=0.125 2024-09-18 21:52:21,418 INFO [train.py:1198] (0/2) Epoch 30, batch 1200, loss[loss=0.2451, ctc_loss=0.1233, cr_loss=0.3638, attn_decoder_loss=0.2506, over 29665.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1216, cr_loss=0.3641, attn_decoder_loss=0.2434, over 5749111.71 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:52:26,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=529700.0, ans=0.0 2024-09-18 21:52:39,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.49 vs. limit=15.0 2024-09-18 21:52:43,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=529740.0, ans=0.0 2024-09-18 21:52:58,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=529780.0, ans=0.1 2024-09-18 21:53:07,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529820.0, ans=0.1 2024-09-18 21:53:38,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529900.0, ans=0.1 2024-09-18 21:53:40,018 INFO [train.py:1198] (0/2) Epoch 30, batch 1250, loss[loss=0.2442, ctc_loss=0.1182, cr_loss=0.3644, attn_decoder_loss=0.2501, over 29511.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1216, cr_loss=0.3641, attn_decoder_loss=0.2439, over 5776062.80 frames. ], batch size: 92, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:53:50,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2024-09-18 21:54:23,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2024-09-18 21:54:45,398 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.456e+01 8.891e+01 9.459e+01 2.793e+02, threshold=1.778e+02, percent-clipped=2.0 2024-09-18 21:54:55,979 INFO [train.py:1198] (0/2) Epoch 30, batch 1300, loss[loss=0.2491, ctc_loss=0.1199, cr_loss=0.3664, attn_decoder_loss=0.2553, over 28477.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1215, cr_loss=0.3637, attn_decoder_loss=0.2433, over 5780801.92 frames. ], batch size: 112, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:55:20,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=530140.0, ans=0.0 2024-09-18 21:55:47,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-09-18 21:56:02,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=530260.0, ans=0.1 2024-09-18 21:56:03,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=530260.0, ans=0.125 2024-09-18 21:56:14,324 INFO [train.py:1198] (0/2) Epoch 30, batch 1350, loss[loss=0.2421, ctc_loss=0.1218, cr_loss=0.3766, attn_decoder_loss=0.2471, over 29734.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1214, cr_loss=0.3636, attn_decoder_loss=0.2433, over 5797901.68 frames. ], batch size: 81, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 21:56:44,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=530380.0, ans=0.0 2024-09-18 21:56:51,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-09-18 21:57:09,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=530420.0, ans=0.0 2024-09-18 21:57:12,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=530420.0, ans=0.2 2024-09-18 21:57:20,869 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.385e+01 8.847e+01 9.362e+01 1.529e+02, threshold=1.769e+02, percent-clipped=0.0 2024-09-18 21:57:30,064 INFO [train.py:1198] (0/2) Epoch 30, batch 1400, loss[loss=0.2149, ctc_loss=0.1105, cr_loss=0.3362, attn_decoder_loss=0.219, over 29564.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1215, cr_loss=0.3641, attn_decoder_loss=0.243, over 5808926.92 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 21:57:37,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=530500.0, ans=0.125 2024-09-18 21:57:55,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=530540.0, ans=0.125 2024-09-18 21:58:02,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=530580.0, ans=0.125 2024-09-18 21:58:04,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=530580.0, ans=0.125 2024-09-18 21:58:25,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=530620.0, ans=0.2 2024-09-18 21:58:48,074 INFO [train.py:1198] (0/2) Epoch 30, batch 1450, loss[loss=0.2607, ctc_loss=0.1417, cr_loss=0.3979, attn_decoder_loss=0.2651, over 29436.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1218, cr_loss=0.3649, attn_decoder_loss=0.2436, over 5805950.89 frames. ], batch size: 94, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 21:58:57,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=530700.0, ans=0.125 2024-09-18 21:59:20,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=530780.0, ans=0.2 2024-09-18 21:59:39,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=530820.0, ans=0.1 2024-09-18 21:59:56,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 8.746e+01 9.204e+01 9.882e+01 6.648e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-18 22:00:04,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=530900.0, ans=0.2 2024-09-18 22:00:05,880 INFO [train.py:1198] (0/2) Epoch 30, batch 1500, loss[loss=0.2405, ctc_loss=0.1209, cr_loss=0.3645, attn_decoder_loss=0.2457, over 29632.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.122, cr_loss=0.3654, attn_decoder_loss=0.2441, over 5805108.31 frames. ], batch size: 86, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 22:00:10,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=530900.0, ans=0.125 2024-09-18 22:00:24,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.23 vs. limit=22.5 2024-09-18 22:00:42,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=530980.0, ans=0.125 2024-09-18 22:00:45,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=530980.0, ans=0.0 2024-09-18 22:00:52,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=531020.0, ans=0.125 2024-09-18 22:00:53,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531020.0, ans=0.125 2024-09-18 22:01:02,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531020.0, ans=0.1 2024-09-18 22:01:22,000 INFO [train.py:1198] (0/2) Epoch 30, batch 1550, loss[loss=0.253, ctc_loss=0.1356, cr_loss=0.4033, attn_decoder_loss=0.2571, over 29503.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1222, cr_loss=0.3656, attn_decoder_loss=0.2441, over 5780208.61 frames. ], batch size: 90, lr: 3.70e-03, grad_scale: 8.0 2024-09-18 22:01:45,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=531140.0, ans=0.0 2024-09-18 22:01:45,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=531140.0, ans=0.125 2024-09-18 22:02:00,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.75 vs. limit=22.5 2024-09-18 22:02:27,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=531260.0, ans=0.2 2024-09-18 22:02:28,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=531260.0, ans=0.0 2024-09-18 22:02:31,378 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.587e+01 9.165e+01 1.006e+02 3.566e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-18 22:02:40,583 INFO [train.py:1198] (0/2) Epoch 30, batch 1600, loss[loss=0.2434, ctc_loss=0.113, cr_loss=0.3379, attn_decoder_loss=0.2504, over 29679.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1222, cr_loss=0.3655, attn_decoder_loss=0.2438, over 5763362.81 frames. ], batch size: 85, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:02:42,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=531300.0, ans=0.125 2024-09-18 22:02:53,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=531300.0, ans=0.125 2024-09-18 22:02:54,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=531340.0, ans=0.09899494936611666 2024-09-18 22:03:05,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=531340.0, ans=0.2 2024-09-18 22:03:15,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=531380.0, ans=0.125 2024-09-18 22:03:20,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=531380.0, ans=0.125 2024-09-18 22:03:23,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531380.0, ans=0.125 2024-09-18 22:03:23,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=531380.0, ans=0.0 2024-09-18 22:03:49,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=531460.0, ans=0.125 2024-09-18 22:03:50,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=531460.0, ans=0.0 2024-09-18 22:03:58,188 INFO [train.py:1198] (0/2) Epoch 30, batch 1650, loss[loss=0.243, ctc_loss=0.1239, cr_loss=0.3647, attn_decoder_loss=0.2482, over 29726.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.122, cr_loss=0.3647, attn_decoder_loss=0.2435, over 5756722.27 frames. ], batch size: 89, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:04:34,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=531580.0, ans=0.2 2024-09-18 22:04:45,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=531620.0, ans=0.125 2024-09-18 22:04:58,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=531660.0, ans=0.1 2024-09-18 22:05:04,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.547e+01 8.983e+01 9.697e+01 1.906e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-18 22:05:13,433 INFO [train.py:1198] (0/2) Epoch 30, batch 1700, loss[loss=0.2035, ctc_loss=0.09007, cr_loss=0.299, attn_decoder_loss=0.2094, over 29569.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1218, cr_loss=0.3643, attn_decoder_loss=0.2432, over 5778162.51 frames. ], batch size: 69, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:05:15,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531700.0, ans=0.125 2024-09-18 22:05:52,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=531780.0, ans=0.0 2024-09-18 22:06:05,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2024-09-18 22:06:08,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=531820.0, ans=0.125 2024-09-18 22:06:31,438 INFO [train.py:1198] (0/2) Epoch 30, batch 1750, loss[loss=0.2125, ctc_loss=0.1105, cr_loss=0.3454, attn_decoder_loss=0.2162, over 29329.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1215, cr_loss=0.3641, attn_decoder_loss=0.2428, over 5786631.16 frames. ], batch size: 67, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:06:55,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.69 vs. limit=22.5 2024-09-18 22:07:13,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=531980.0, ans=0.125 2024-09-18 22:07:17,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=532020.0, ans=0.07 2024-09-18 22:07:28,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=532020.0, ans=0.125 2024-09-18 22:07:40,551 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.324e+01 8.730e+01 9.634e+01 1.252e+02, threshold=1.746e+02, percent-clipped=0.0 2024-09-18 22:07:46,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=532060.0, ans=0.125 2024-09-18 22:07:47,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2024-09-18 22:07:48,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=532100.0, ans=0.025 2024-09-18 22:07:49,663 INFO [train.py:1198] (0/2) Epoch 30, batch 1800, loss[loss=0.2476, ctc_loss=0.129, cr_loss=0.38, attn_decoder_loss=0.2523, over 29698.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1215, cr_loss=0.3639, attn_decoder_loss=0.243, over 5788954.96 frames. ], batch size: 83, lr: 3.70e-03, grad_scale: 16.0 2024-09-18 22:07:53,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.88 vs. limit=15.0 2024-09-18 22:07:55,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.26 vs. limit=10.0 2024-09-18 22:07:56,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=532100.0, ans=0.125 2024-09-18 22:07:56,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=532100.0, ans=0.0 2024-09-18 22:08:31,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=532180.0, ans=0.125 2024-09-18 22:08:52,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-09-18 22:09:05,418 INFO [train.py:1198] (0/2) Epoch 30, batch 1850, loss[loss=0.2474, ctc_loss=0.1161, cr_loss=0.3613, attn_decoder_loss=0.254, over 29632.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1216, cr_loss=0.3643, attn_decoder_loss=0.243, over 5795266.65 frames. ], batch size: 86, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:09:08,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=532300.0, ans=0.125 2024-09-18 22:09:14,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532300.0, ans=0.1 2024-09-18 22:09:31,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=532340.0, ans=0.0 2024-09-18 22:09:56,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=532420.0, ans=0.125 2024-09-18 22:10:06,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=532460.0, ans=0.2 2024-09-18 22:10:15,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.313e+01 8.937e+01 9.417e+01 1.433e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-18 22:10:22,899 INFO [train.py:1198] (0/2) Epoch 30, batch 1900, loss[loss=0.2487, ctc_loss=0.131, cr_loss=0.3821, attn_decoder_loss=0.2533, over 29701.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1217, cr_loss=0.3641, attn_decoder_loss=0.2435, over 5802692.56 frames. ], batch size: 89, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:10:24,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=532500.0, ans=0.0 2024-09-18 22:10:38,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=532540.0, ans=0.2 2024-09-18 22:10:42,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-18 22:10:44,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=532540.0, ans=0.125 2024-09-18 22:11:10,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=532620.0, ans=0.0 2024-09-18 22:11:27,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=532660.0, ans=0.125 2024-09-18 22:11:32,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532660.0, ans=0.1 2024-09-18 22:11:39,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2024-09-18 22:11:41,039 INFO [train.py:1198] (0/2) Epoch 30, batch 1950, loss[loss=0.2299, ctc_loss=0.1115, cr_loss=0.3412, attn_decoder_loss=0.2354, over 29450.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1224, cr_loss=0.3665, attn_decoder_loss=0.2447, over 5817507.78 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:12:05,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=532740.0, ans=0.125 2024-09-18 22:12:09,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532780.0, ans=0.125 2024-09-18 22:12:11,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=532780.0, ans=0.0 2024-09-18 22:12:25,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=532820.0, ans=0.125 2024-09-18 22:12:34,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=532820.0, ans=0.125 2024-09-18 22:12:46,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=532860.0, ans=0.0 2024-09-18 22:12:46,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=532860.0, ans=0.0 2024-09-18 22:12:49,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.712e+01 9.082e+01 9.590e+01 8.305e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 22:12:56,994 INFO [train.py:1198] (0/2) Epoch 30, batch 2000, loss[loss=0.2152, ctc_loss=0.102, cr_loss=0.339, attn_decoder_loss=0.2203, over 29332.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1231, cr_loss=0.3672, attn_decoder_loss=0.245, over 5794262.31 frames. ], batch size: 67, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:13:16,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=532940.0, ans=0.5 2024-09-18 22:13:18,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=532940.0, ans=0.2 2024-09-18 22:13:18,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-09-18 22:13:31,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=532980.0, ans=0.05 2024-09-18 22:13:37,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=532980.0, ans=0.125 2024-09-18 22:13:39,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532980.0, ans=0.1 2024-09-18 22:13:53,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-09-18 22:13:55,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=533020.0, ans=0.125 2024-09-18 22:14:01,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=533060.0, ans=0.0 2024-09-18 22:14:15,044 INFO [train.py:1198] (0/2) Epoch 30, batch 2050, loss[loss=0.2141, ctc_loss=0.1045, cr_loss=0.3384, attn_decoder_loss=0.2187, over 29419.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1226, cr_loss=0.3658, attn_decoder_loss=0.2441, over 5787038.28 frames. ], batch size: 70, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:14:40,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=15.0 2024-09-18 22:14:54,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=533180.0, ans=0.125 2024-09-18 22:15:02,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=533220.0, ans=0.2 2024-09-18 22:15:03,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=533220.0, ans=0.0 2024-09-18 22:15:05,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=533220.0, ans=0.025 2024-09-18 22:15:26,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.326e+01 8.954e+01 9.784e+01 1.550e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-18 22:15:31,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=533300.0, ans=0.125 2024-09-18 22:15:33,017 INFO [train.py:1198] (0/2) Epoch 30, batch 2100, loss[loss=0.2313, ctc_loss=0.1137, cr_loss=0.3377, attn_decoder_loss=0.2369, over 29743.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1213, cr_loss=0.3633, attn_decoder_loss=0.2431, over 5799197.02 frames. ], batch size: 81, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:15:34,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=533300.0, ans=0.125 2024-09-18 22:15:45,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2024-09-18 22:15:47,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-18 22:16:06,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=533380.0, ans=0.0 2024-09-18 22:16:34,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=533460.0, ans=0.0 2024-09-18 22:16:48,303 INFO [train.py:1198] (0/2) Epoch 30, batch 2150, loss[loss=0.2259, ctc_loss=0.1135, cr_loss=0.3482, attn_decoder_loss=0.2306, over 29449.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.121, cr_loss=0.3635, attn_decoder_loss=0.2425, over 5814478.71 frames. ], batch size: 78, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:17:13,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533540.0, ans=0.1 2024-09-18 22:17:21,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=533580.0, ans=0.0 2024-09-18 22:17:21,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-09-18 22:17:33,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=533580.0, ans=0.025 2024-09-18 22:17:33,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=533580.0, ans=0.0 2024-09-18 22:17:45,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=533620.0, ans=0.025 2024-09-18 22:17:54,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=533660.0, ans=0.125 2024-09-18 22:18:00,805 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.351e+01 8.919e+01 9.432e+01 1.434e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-18 22:18:07,002 INFO [train.py:1198] (0/2) Epoch 30, batch 2200, loss[loss=0.247, ctc_loss=0.126, cr_loss=0.3574, attn_decoder_loss=0.2525, over 29639.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1211, cr_loss=0.3635, attn_decoder_loss=0.2426, over 5812166.69 frames. ], batch size: 86, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:18:30,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=533740.0, ans=0.125 2024-09-18 22:18:30,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533740.0, ans=0.1 2024-09-18 22:18:30,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533740.0, ans=0.1 2024-09-18 22:18:37,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=533780.0, ans=0.0 2024-09-18 22:19:00,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=533820.0, ans=0.0 2024-09-18 22:19:01,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=533820.0, ans=0.125 2024-09-18 22:19:03,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=533820.0, ans=0.0 2024-09-18 22:19:14,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533860.0, ans=0.1 2024-09-18 22:19:24,812 INFO [train.py:1198] (0/2) Epoch 30, batch 2250, loss[loss=0.2404, ctc_loss=0.1233, cr_loss=0.371, attn_decoder_loss=0.2452, over 29718.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1212, cr_loss=0.3633, attn_decoder_loss=0.2426, over 5811729.32 frames. ], batch size: 82, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:19:29,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=533900.0, ans=0.0 2024-09-18 22:19:30,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.46 vs. limit=15.0 2024-09-18 22:19:51,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=533940.0, ans=0.125 2024-09-18 22:19:54,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=533980.0, ans=0.0 2024-09-18 22:19:55,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533980.0, ans=0.1 2024-09-18 22:19:55,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=533980.0, ans=0.1 2024-09-18 22:19:58,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=533980.0, ans=0.125 2024-09-18 22:20:00,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=533980.0, ans=0.0 2024-09-18 22:20:01,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533980.0, ans=0.1 2024-09-18 22:20:17,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-09-18 22:20:19,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=534020.0, ans=0.0 2024-09-18 22:20:21,437 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:20:24,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=534060.0, ans=0.125 2024-09-18 22:20:34,700 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.443e+01 9.095e+01 9.654e+01 4.299e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-18 22:20:39,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=534100.0, ans=0.0 2024-09-18 22:20:40,901 INFO [train.py:1198] (0/2) Epoch 30, batch 2300, loss[loss=0.2162, ctc_loss=0.1027, cr_loss=0.3189, attn_decoder_loss=0.2218, over 29331.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1207, cr_loss=0.3624, attn_decoder_loss=0.242, over 5798687.90 frames. ], batch size: 71, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:20:47,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-18 22:20:50,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=534100.0, ans=0.125 2024-09-18 22:21:22,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534180.0, ans=0.1 2024-09-18 22:21:28,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=534220.0, ans=0.0 2024-09-18 22:21:33,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=534220.0, ans=0.125 2024-09-18 22:21:54,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=534260.0, ans=0.125 2024-09-18 22:21:58,861 INFO [train.py:1198] (0/2) Epoch 30, batch 2350, loss[loss=0.2548, ctc_loss=0.135, cr_loss=0.3788, attn_decoder_loss=0.2597, over 29704.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1212, cr_loss=0.363, attn_decoder_loss=0.2424, over 5804636.73 frames. ], batch size: 83, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:21:59,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=534300.0, ans=0.125 2024-09-18 22:22:08,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-18 22:22:10,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.97 vs. limit=15.0 2024-09-18 22:22:29,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=534380.0, ans=0.0 2024-09-18 22:22:32,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=534380.0, ans=0.025 2024-09-18 22:22:38,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=534380.0, ans=15.0 2024-09-18 22:22:40,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=534380.0, ans=0.04949747468305833 2024-09-18 22:22:49,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=534420.0, ans=0.04949747468305833 2024-09-18 22:23:03,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=534460.0, ans=0.2 2024-09-18 22:23:03,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=534460.0, ans=0.2 2024-09-18 22:23:08,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=534460.0, ans=0.0 2024-09-18 22:23:11,011 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.566e+01 9.166e+01 9.835e+01 1.994e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-18 22:23:11,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=534460.0, ans=0.025 2024-09-18 22:23:17,299 INFO [train.py:1198] (0/2) Epoch 30, batch 2400, loss[loss=0.2293, ctc_loss=0.1124, cr_loss=0.3417, attn_decoder_loss=0.2347, over 29545.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1214, cr_loss=0.3636, attn_decoder_loss=0.2426, over 5808832.62 frames. ], batch size: 76, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:23:28,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=534500.0, ans=0.1 2024-09-18 22:23:32,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=534540.0, ans=0.125 2024-09-18 22:23:46,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=534580.0, ans=0.95 2024-09-18 22:23:51,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=534580.0, ans=0.0 2024-09-18 22:24:00,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-09-18 22:24:13,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=534620.0, ans=0.2 2024-09-18 22:24:33,295 INFO [train.py:1198] (0/2) Epoch 30, batch 2450, loss[loss=0.2363, ctc_loss=0.1187, cr_loss=0.3744, attn_decoder_loss=0.2411, over 29711.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1223, cr_loss=0.3648, attn_decoder_loss=0.2438, over 5784680.72 frames. ], batch size: 82, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:24:35,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.71 vs. limit=22.5 2024-09-18 22:24:57,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=534740.0, ans=0.125 2024-09-18 22:25:10,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=534780.0, ans=0.0 2024-09-18 22:25:31,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-09-18 22:25:40,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.74 vs. limit=6.0 2024-09-18 22:25:44,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.707e+01 9.216e+01 9.749e+01 1.884e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-18 22:25:50,707 INFO [train.py:1198] (0/2) Epoch 30, batch 2500, loss[loss=0.2474, ctc_loss=0.1234, cr_loss=0.3633, attn_decoder_loss=0.2531, over 29616.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1223, cr_loss=0.3647, attn_decoder_loss=0.2438, over 5795687.74 frames. ], batch size: 86, lr: 3.69e-03, grad_scale: 16.0 2024-09-18 22:25:59,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=534900.0, ans=0.125 2024-09-18 22:26:07,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=534940.0, ans=0.0 2024-09-18 22:26:15,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=534940.0, ans=0.0 2024-09-18 22:26:16,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=534940.0, ans=0.125 2024-09-18 22:26:28,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=534980.0, ans=0.125 2024-09-18 22:26:48,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=535020.0, ans=0.125 2024-09-18 22:27:01,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=535060.0, ans=0.0 2024-09-18 22:27:04,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=535060.0, ans=0.0 2024-09-18 22:27:09,176 INFO [train.py:1198] (0/2) Epoch 30, batch 2550, loss[loss=0.2108, ctc_loss=0.1015, cr_loss=0.3205, attn_decoder_loss=0.2158, over 29387.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1224, cr_loss=0.3655, attn_decoder_loss=0.2439, over 5798796.41 frames. ], batch size: 67, lr: 3.69e-03, grad_scale: 8.0 2024-09-18 22:27:18,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-09-18 22:27:19,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=535100.0, ans=0.2 2024-09-18 22:28:10,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535260.0, ans=0.1 2024-09-18 22:28:19,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-09-18 22:28:20,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.542e+01 9.079e+01 9.680e+01 2.807e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 22:28:22,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535260.0, ans=0.1 2024-09-18 22:28:25,097 INFO [train.py:1198] (0/2) Epoch 30, batch 2600, loss[loss=0.2332, ctc_loss=0.1232, cr_loss=0.3756, attn_decoder_loss=0.2371, over 29470.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1222, cr_loss=0.3652, attn_decoder_loss=0.244, over 5795471.67 frames. ], batch size: 78, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:28:31,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=535300.0, ans=0.1 2024-09-18 22:28:32,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=535300.0, ans=0.09899494936611666 2024-09-18 22:28:46,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=535340.0, ans=0.2 2024-09-18 22:28:56,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=535380.0, ans=0.125 2024-09-18 22:29:06,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=535380.0, ans=0.0 2024-09-18 22:29:18,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=535420.0, ans=0.1 2024-09-18 22:29:24,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535420.0, ans=0.1 2024-09-18 22:29:36,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=535460.0, ans=0.125 2024-09-18 22:29:37,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.95 vs. limit=12.0 2024-09-18 22:29:38,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=535460.0, ans=0.025 2024-09-18 22:29:41,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=535500.0, ans=0.09899494936611666 2024-09-18 22:29:42,492 INFO [train.py:1198] (0/2) Epoch 30, batch 2650, loss[loss=0.2548, ctc_loss=0.1307, cr_loss=0.3949, attn_decoder_loss=0.2598, over 29231.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1223, cr_loss=0.3662, attn_decoder_loss=0.2445, over 5802875.23 frames. ], batch size: 100, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:30:06,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=535540.0, ans=0.125 2024-09-18 22:30:06,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535540.0, ans=0.1 2024-09-18 22:30:27,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=535620.0, ans=0.025 2024-09-18 22:30:30,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=535620.0, ans=0.125 2024-09-18 22:30:38,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=535620.0, ans=0.125 2024-09-18 22:30:55,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.677e+01 9.089e+01 9.646e+01 4.909e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-18 22:31:00,015 INFO [train.py:1198] (0/2) Epoch 30, batch 2700, loss[loss=0.2453, ctc_loss=0.1213, cr_loss=0.36, attn_decoder_loss=0.2511, over 29529.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1225, cr_loss=0.3663, attn_decoder_loss=0.2448, over 5798336.44 frames. ], batch size: 87, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:31:03,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=535700.0, ans=0.125 2024-09-18 22:31:03,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=535700.0, ans=0.125 2024-09-18 22:31:27,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=535740.0, ans=0.025 2024-09-18 22:31:34,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535780.0, ans=0.1 2024-09-18 22:31:42,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-09-18 22:31:56,446 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:32:00,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=535860.0, ans=0.2 2024-09-18 22:32:15,740 INFO [train.py:1198] (0/2) Epoch 30, batch 2750, loss[loss=0.2235, ctc_loss=0.1093, cr_loss=0.332, attn_decoder_loss=0.2288, over 29493.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1216, cr_loss=0.3645, attn_decoder_loss=0.2437, over 5797070.69 frames. ], batch size: 75, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:32:22,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=535900.0, ans=0.025 2024-09-18 22:32:49,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=535980.0, ans=0.1 2024-09-18 22:33:10,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-09-18 22:33:12,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-09-18 22:33:14,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=536020.0, ans=0.125 2024-09-18 22:33:15,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2024-09-18 22:33:29,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.488e+01 8.995e+01 9.694e+01 2.537e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-18 22:33:34,348 INFO [train.py:1198] (0/2) Epoch 30, batch 2800, loss[loss=0.2636, ctc_loss=0.1532, cr_loss=0.3792, attn_decoder_loss=0.2674, over 20761.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1221, cr_loss=0.365, attn_decoder_loss=0.2438, over 5779949.49 frames. ], batch size: 209, lr: 3.68e-03, grad_scale: 16.0 2024-09-18 22:34:07,754 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:34:15,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=536180.0, ans=0.2 2024-09-18 22:34:38,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-09-18 22:34:47,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=536260.0, ans=0.125 2024-09-18 22:34:51,679 INFO [train.py:1198] (0/2) Epoch 30, batch 2850, loss[loss=0.2317, ctc_loss=0.1125, cr_loss=0.3514, attn_decoder_loss=0.2371, over 29492.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1224, cr_loss=0.3654, attn_decoder_loss=0.2441, over 5765824.64 frames. ], batch size: 77, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:34:54,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-09-18 22:35:08,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-09-18 22:35:08,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536340.0, ans=0.0 2024-09-18 22:35:10,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=536340.0, ans=0.0 2024-09-18 22:35:26,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=536380.0, ans=10.0 2024-09-18 22:35:28,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=536380.0, ans=0.025 2024-09-18 22:35:37,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=536420.0, ans=0.125 2024-09-18 22:36:04,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.865e+01 8.533e+01 9.000e+01 9.896e+01 2.723e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-18 22:36:07,490 INFO [train.py:1198] (0/2) Epoch 30, batch 2900, loss[loss=0.2376, ctc_loss=0.1235, cr_loss=0.3767, attn_decoder_loss=0.2419, over 29409.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1227, cr_loss=0.3666, attn_decoder_loss=0.2449, over 5790456.80 frames. ], batch size: 79, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:36:10,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536500.0, ans=0.125 2024-09-18 22:36:18,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=536500.0, ans=0.07 2024-09-18 22:36:21,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=536540.0, ans=0.0 2024-09-18 22:36:30,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536540.0, ans=0.125 2024-09-18 22:36:44,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536580.0, ans=0.1 2024-09-18 22:36:50,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=536580.0, ans=0.02 2024-09-18 22:37:01,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=536620.0, ans=0.025 2024-09-18 22:37:11,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2024-09-18 22:37:25,628 INFO [train.py:1198] (0/2) Epoch 30, batch 2950, loss[loss=0.2142, ctc_loss=0.1101, cr_loss=0.3293, attn_decoder_loss=0.2184, over 29530.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1222, cr_loss=0.3651, attn_decoder_loss=0.2438, over 5784616.09 frames. ], batch size: 75, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:37:52,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-18 22:37:58,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=536780.0, ans=0.125 2024-09-18 22:38:41,262 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.475e+01 9.079e+01 9.790e+01 2.714e+02, threshold=1.816e+02, percent-clipped=3.0 2024-09-18 22:38:43,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=536900.0, ans=0.125 2024-09-18 22:38:44,466 INFO [train.py:1198] (0/2) Epoch 30, batch 3000, loss[loss=0.2408, ctc_loss=0.1235, cr_loss=0.3704, attn_decoder_loss=0.2456, over 29740.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1221, cr_loss=0.3649, attn_decoder_loss=0.2437, over 5784887.35 frames. ], batch size: 81, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:38:44,467 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 22:38:53,643 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4863, 4.0234, 4.3950, 3.9746], device='cuda:0') 2024-09-18 22:38:56,198 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.5319, 4.0698, 4.1271, 4.1949], device='cuda:0') 2024-09-18 22:39:02,889 INFO [train.py:1230] (0/2) Epoch 30, validation: loss=0.2118, ctc_loss=0.03796, cr_loss=5.626e-15, attn_decoder_loss=0.2311, over 944034.00 frames. 2024-09-18 22:39:02,889 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 22:39:03,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=536900.0, ans=0.125 2024-09-18 22:39:38,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=536980.0, ans=0.125 2024-09-18 22:39:41,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=536980.0, ans=0.0 2024-09-18 22:39:44,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=536980.0, ans=0.125 2024-09-18 22:39:55,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=537020.0, ans=0.125 2024-09-18 22:40:14,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=537060.0, ans=0.0 2024-09-18 22:40:16,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=537060.0, ans=0.125 2024-09-18 22:40:19,085 INFO [train.py:1198] (0/2) Epoch 30, batch 3050, loss[loss=0.2295, ctc_loss=0.1135, cr_loss=0.3567, attn_decoder_loss=0.2345, over 29534.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1223, cr_loss=0.3648, attn_decoder_loss=0.2442, over 5778863.30 frames. ], batch size: 76, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:40:20,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=537100.0, ans=0.09899494936611666 2024-09-18 22:40:32,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=537140.0, ans=0.035 2024-09-18 22:40:34,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=537140.0, ans=0.0 2024-09-18 22:40:56,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537180.0, ans=0.1 2024-09-18 22:41:02,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537180.0, ans=0.1 2024-09-18 22:41:10,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=537220.0, ans=0.0 2024-09-18 22:41:23,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=537260.0, ans=0.2 2024-09-18 22:41:33,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.461e+01 8.902e+01 9.446e+01 1.923e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-18 22:41:36,775 INFO [train.py:1198] (0/2) Epoch 30, batch 3100, loss[loss=0.2524, ctc_loss=0.1265, cr_loss=0.382, attn_decoder_loss=0.2579, over 29249.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1222, cr_loss=0.3648, attn_decoder_loss=0.2438, over 5778206.77 frames. ], batch size: 100, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:41:49,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.94 vs. limit=15.0 2024-09-18 22:41:49,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2024-09-18 22:41:55,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537340.0, ans=0.1 2024-09-18 22:41:55,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=537340.0, ans=0.0 2024-09-18 22:42:05,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=537380.0, ans=0.5 2024-09-18 22:42:20,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=537380.0, ans=0.0 2024-09-18 22:42:33,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=537420.0, ans=0.2 2024-09-18 22:42:36,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2024-09-18 22:42:54,847 INFO [train.py:1198] (0/2) Epoch 30, batch 3150, loss[loss=0.2538, ctc_loss=0.1314, cr_loss=0.3983, attn_decoder_loss=0.2585, over 28799.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1219, cr_loss=0.3642, attn_decoder_loss=0.2438, over 5783969.00 frames. ], batch size: 104, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:43:10,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=537540.0, ans=0.025 2024-09-18 22:43:22,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=537540.0, ans=0.0 2024-09-18 22:43:26,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-09-18 22:44:00,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=537660.0, ans=0.2 2024-09-18 22:44:06,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=537660.0, ans=0.09899494936611666 2024-09-18 22:44:07,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.378e+01 8.875e+01 9.441e+01 1.254e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-18 22:44:10,848 INFO [train.py:1198] (0/2) Epoch 30, batch 3200, loss[loss=0.2406, ctc_loss=0.1215, cr_loss=0.3807, attn_decoder_loss=0.2453, over 29787.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1217, cr_loss=0.364, attn_decoder_loss=0.2435, over 5793872.66 frames. ], batch size: 80, lr: 3.68e-03, grad_scale: 16.0 2024-09-18 22:44:16,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-18 22:44:31,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=537740.0, ans=0.0 2024-09-18 22:44:37,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=537740.0, ans=0.125 2024-09-18 22:44:53,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=537780.0, ans=0.125 2024-09-18 22:44:57,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=537820.0, ans=0.1 2024-09-18 22:45:01,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-18 22:45:21,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.83 vs. limit=22.5 2024-09-18 22:45:28,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=537900.0, ans=0.2 2024-09-18 22:45:29,319 INFO [train.py:1198] (0/2) Epoch 30, batch 3250, loss[loss=0.2497, ctc_loss=0.1224, cr_loss=0.3674, attn_decoder_loss=0.2557, over 29696.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1221, cr_loss=0.3651, attn_decoder_loss=0.2439, over 5800578.72 frames. ], batch size: 84, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:45:29,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-18 22:45:38,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=537900.0, ans=0.0 2024-09-18 22:45:55,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=537940.0, ans=0.125 2024-09-18 22:46:05,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=537980.0, ans=0.2 2024-09-18 22:46:42,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=538060.0, ans=0.0 2024-09-18 22:46:45,605 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.587e+01 8.918e+01 9.595e+01 3.976e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-18 22:46:45,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=538100.0, ans=0.0 2024-09-18 22:46:47,126 INFO [train.py:1198] (0/2) Epoch 30, batch 3300, loss[loss=0.2453, ctc_loss=0.1198, cr_loss=0.3585, attn_decoder_loss=0.2513, over 28537.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1211, cr_loss=0.3632, attn_decoder_loss=0.2426, over 5798033.94 frames. ], batch size: 111, lr: 3.68e-03, grad_scale: 8.0 2024-09-18 22:47:01,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=538140.0, ans=0.125 2024-09-18 22:47:05,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=538140.0, ans=0.0 2024-09-18 22:47:10,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538140.0, ans=0.1 2024-09-18 22:47:17,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=538180.0, ans=0.125 2024-09-18 22:47:19,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-09-18 22:47:21,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-18 22:47:25,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=538180.0, ans=0.0 2024-09-18 22:47:40,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=538220.0, ans=0.125 2024-09-18 22:47:40,399 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:47:55,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=538260.0, ans=0.0 2024-09-18 22:48:02,461 INFO [train.py:1198] (0/2) Epoch 30, batch 3350, loss[loss=0.2502, ctc_loss=0.1292, cr_loss=0.3852, attn_decoder_loss=0.255, over 28747.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1219, cr_loss=0.3644, attn_decoder_loss=0.2434, over 5774617.31 frames. ], batch size: 104, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:48:33,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=538380.0, ans=0.0 2024-09-18 22:48:41,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=538380.0, ans=0.07 2024-09-18 22:49:01,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=538420.0, ans=0.125 2024-09-18 22:49:11,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-09-18 22:49:19,305 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.711e+01 9.247e+01 9.714e+01 4.351e+02, threshold=1.849e+02, percent-clipped=3.0 2024-09-18 22:49:19,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538500.0, ans=0.125 2024-09-18 22:49:20,832 INFO [train.py:1198] (0/2) Epoch 30, batch 3400, loss[loss=0.2146, ctc_loss=0.107, cr_loss=0.3354, attn_decoder_loss=0.2191, over 29338.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1217, cr_loss=0.3642, attn_decoder_loss=0.2433, over 5767949.64 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:49:33,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=538500.0, ans=0.04949747468305833 2024-09-18 22:50:12,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-09-18 22:50:19,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=538620.0, ans=0.125 2024-09-18 22:50:22,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=538660.0, ans=0.125 2024-09-18 22:50:38,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.57 vs. limit=22.5 2024-09-18 22:50:38,778 INFO [train.py:1198] (0/2) Epoch 30, batch 3450, loss[loss=0.2516, ctc_loss=0.1288, cr_loss=0.3639, attn_decoder_loss=0.2572, over 28277.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1218, cr_loss=0.3641, attn_decoder_loss=0.2435, over 5775385.41 frames. ], batch size: 111, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:50:43,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=538700.0, ans=0.0 2024-09-18 22:50:46,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=15.0 2024-09-18 22:51:05,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.48 vs. limit=15.0 2024-09-18 22:51:35,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=538820.0, ans=0.125 2024-09-18 22:51:52,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.505e+01 9.176e+01 9.588e+01 2.343e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-18 22:51:54,378 INFO [train.py:1198] (0/2) Epoch 30, batch 3500, loss[loss=0.2258, ctc_loss=0.1123, cr_loss=0.3466, attn_decoder_loss=0.2307, over 29337.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1215, cr_loss=0.364, attn_decoder_loss=0.2432, over 5777113.79 frames. ], batch size: 71, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:51:56,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=538900.0, ans=0.125 2024-09-18 22:52:08,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=538940.0, ans=0.0 2024-09-18 22:52:55,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=539060.0, ans=0.0 2024-09-18 22:53:00,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=539060.0, ans=0.125 2024-09-18 22:53:07,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=539060.0, ans=0.0 2024-09-18 22:53:08,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539060.0, ans=0.125 2024-09-18 22:53:11,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-09-18 22:53:11,561 INFO [train.py:1198] (0/2) Epoch 30, batch 3550, loss[loss=0.2456, ctc_loss=0.1216, cr_loss=0.3781, attn_decoder_loss=0.251, over 29701.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1216, cr_loss=0.3645, attn_decoder_loss=0.2436, over 5783166.84 frames. ], batch size: 89, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:53:48,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=539180.0, ans=10.0 2024-09-18 22:54:09,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.28 vs. limit=15.0 2024-09-18 22:54:14,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=539260.0, ans=0.125 2024-09-18 22:54:26,534 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.421e+01 8.886e+01 9.459e+01 1.383e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-18 22:54:28,091 INFO [train.py:1198] (0/2) Epoch 30, batch 3600, loss[loss=0.2211, ctc_loss=0.1068, cr_loss=0.3317, attn_decoder_loss=0.2264, over 29479.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1213, cr_loss=0.3642, attn_decoder_loss=0.2434, over 5792451.98 frames. ], batch size: 77, lr: 3.67e-03, grad_scale: 16.0 2024-09-18 22:54:29,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=539300.0, ans=0.025 2024-09-18 22:54:41,760 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:54:47,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=539340.0, ans=0.2 2024-09-18 22:55:25,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-09-18 22:55:26,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=539460.0, ans=0.2 2024-09-18 22:55:29,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=539460.0, ans=0.125 2024-09-18 22:55:29,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=539460.0, ans=0.125 2024-09-18 22:55:42,275 INFO [train.py:1198] (0/2) Epoch 30, batch 3650, loss[loss=0.25, ctc_loss=0.125, cr_loss=0.3825, attn_decoder_loss=0.2554, over 29510.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.121, cr_loss=0.3634, attn_decoder_loss=0.243, over 5794866.27 frames. ], batch size: 90, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:55:42,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539500.0, ans=0.125 2024-09-18 22:55:43,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=539500.0, ans=0.125 2024-09-18 22:55:57,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=539540.0, ans=0.0 2024-09-18 22:56:02,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-18 22:56:10,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-18 22:56:18,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=539580.0, ans=0.125 2024-09-18 22:56:22,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=539580.0, ans=0.2 2024-09-18 22:56:56,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.540e+01 9.082e+01 9.609e+01 1.779e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-18 22:56:56,927 INFO [train.py:1198] (0/2) Epoch 30, batch 3700, loss[loss=0.2544, ctc_loss=0.131, cr_loss=0.3829, attn_decoder_loss=0.2596, over 29680.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.121, cr_loss=0.363, attn_decoder_loss=0.2431, over 5803902.89 frames. ], batch size: 84, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:57:01,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=539700.0, ans=0.05 2024-09-18 22:57:08,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-18 22:57:10,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=539740.0, ans=0.125 2024-09-18 22:57:15,613 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:57:21,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=539740.0, ans=0.0 2024-09-18 22:57:26,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-18 22:57:27,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=539780.0, ans=0.125 2024-09-18 22:57:29,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.96 vs. limit=15.0 2024-09-18 22:57:36,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-18 22:58:07,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=539860.0, ans=0.0 2024-09-18 22:58:11,625 INFO [train.py:1198] (0/2) Epoch 30, batch 3750, loss[loss=0.2169, ctc_loss=0.1089, cr_loss=0.3235, attn_decoder_loss=0.2217, over 29344.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.121, cr_loss=0.3635, attn_decoder_loss=0.2432, over 5808013.38 frames. ], batch size: 67, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:58:15,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2024-09-18 22:58:29,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=539940.0, ans=0.2 2024-09-18 22:58:34,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=539940.0, ans=0.125 2024-09-18 22:58:41,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=539980.0, ans=0.0 2024-09-18 22:58:45,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=539980.0, ans=0.0 2024-09-18 22:59:00,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.81 vs. limit=12.0 2024-09-18 22:59:21,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=540060.0, ans=0.0 2024-09-18 22:59:28,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.475e+01 8.896e+01 9.603e+01 2.511e+02, threshold=1.779e+02, percent-clipped=2.0 2024-09-18 22:59:28,041 INFO [train.py:1198] (0/2) Epoch 30, batch 3800, loss[loss=0.2523, ctc_loss=0.1307, cr_loss=0.3797, attn_decoder_loss=0.2573, over 29630.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1208, cr_loss=0.3624, attn_decoder_loss=0.2428, over 5798288.53 frames. ], batch size: 86, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 22:59:28,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=540100.0, ans=0.0 2024-09-18 22:59:40,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-09-18 22:59:48,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=540140.0, ans=0.125 2024-09-18 22:59:58,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540180.0, ans=0.1 2024-09-18 23:00:13,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.01 vs. limit=22.5 2024-09-18 23:00:15,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=540220.0, ans=0.0 2024-09-18 23:00:39,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=540260.0, ans=0.0 2024-09-18 23:00:42,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=540300.0, ans=0.125 2024-09-18 23:00:42,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=540300.0, ans=0.125 2024-09-18 23:00:44,096 INFO [train.py:1198] (0/2) Epoch 30, batch 3850, loss[loss=0.2602, ctc_loss=0.1385, cr_loss=0.4049, attn_decoder_loss=0.2647, over 29240.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1209, cr_loss=0.3625, attn_decoder_loss=0.2428, over 5812306.55 frames. ], batch size: 100, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 23:00:53,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=540300.0, ans=0.125 2024-09-18 23:01:09,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=540340.0, ans=0.2 2024-09-18 23:01:21,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=540380.0, ans=0.2 2024-09-18 23:01:31,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2024-09-18 23:01:39,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=540420.0, ans=0.125 2024-09-18 23:01:56,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.68 vs. limit=10.0 2024-09-18 23:01:58,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.547e+01 9.033e+01 9.737e+01 1.184e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-18 23:01:58,784 INFO [train.py:1198] (0/2) Epoch 30, batch 3900, loss[loss=0.2488, ctc_loss=0.1192, cr_loss=0.3517, attn_decoder_loss=0.2554, over 29626.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1214, cr_loss=0.3634, attn_decoder_loss=0.2432, over 5816424.36 frames. ], batch size: 86, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 23:02:21,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-09-18 23:02:45,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-09-18 23:02:49,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=540620.0, ans=0.125 2024-09-18 23:02:53,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.80 vs. limit=15.0 2024-09-18 23:03:13,096 INFO [train.py:1198] (0/2) Epoch 30, batch 3950, loss[loss=0.2595, ctc_loss=0.1455, cr_loss=0.4014, attn_decoder_loss=0.2632, over 29476.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1211, cr_loss=0.3634, attn_decoder_loss=0.2432, over 5835732.16 frames. ], batch size: 97, lr: 3.67e-03, grad_scale: 8.0 2024-09-18 23:03:38,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=540740.0, ans=0.0 2024-09-18 23:03:38,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-09-18 23:04:05,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=540820.0, ans=0.015 2024-09-18 23:04:19,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-18 23:04:27,383 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.475e+01 8.885e+01 9.495e+01 1.627e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-18 23:04:27,404 INFO [train.py:1198] (0/2) Epoch 30, batch 4000, loss[loss=0.2307, ctc_loss=0.1101, cr_loss=0.3418, attn_decoder_loss=0.2366, over 29507.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1212, cr_loss=0.3631, attn_decoder_loss=0.2432, over 5812206.88 frames. ], batch size: 74, lr: 3.67e-03, grad_scale: 16.0 2024-09-18 23:04:29,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540900.0, ans=0.1 2024-09-18 23:04:29,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=540900.0, ans=0.0 2024-09-18 23:04:44,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2024-09-18 23:05:00,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.77 vs. limit=15.0 2024-09-18 23:05:03,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=540980.0, ans=0.1 2024-09-18 23:05:22,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-09-18 23:05:25,622 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:05:43,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=541100.0, ans=0.125 2024-09-18 23:05:44,346 INFO [train.py:1198] (0/2) Epoch 30, batch 4050, loss[loss=0.2547, ctc_loss=0.1415, cr_loss=0.3694, attn_decoder_loss=0.259, over 20967.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1214, cr_loss=0.3633, attn_decoder_loss=0.2431, over 5796941.83 frames. ], batch size: 210, lr: 3.66e-03, grad_scale: 16.0 2024-09-18 23:06:46,093 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:06:53,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=541260.0, ans=0.125 2024-09-18 23:06:57,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.51 vs. limit=15.0 2024-09-18 23:06:57,913 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.748e+01 9.261e+01 9.930e+01 1.570e+02, threshold=1.852e+02, percent-clipped=0.0 2024-09-18 23:06:57,939 INFO [train.py:1198] (0/2) Epoch 30, batch 4100, loss[loss=0.2551, ctc_loss=0.1346, cr_loss=0.3932, attn_decoder_loss=0.2597, over 29528.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1214, cr_loss=0.3631, attn_decoder_loss=0.2431, over 5792719.84 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 16.0 2024-09-18 23:07:02,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2024-09-18 23:07:26,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=541380.0, ans=0.125 2024-09-18 23:08:07,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=541460.0, ans=0.0 2024-09-18 23:08:11,980 INFO [train.py:1198] (0/2) Epoch 30, batch 4150, loss[loss=0.2294, ctc_loss=0.1118, cr_loss=0.3381, attn_decoder_loss=0.2349, over 29508.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1211, cr_loss=0.3626, attn_decoder_loss=0.2428, over 5798068.09 frames. ], batch size: 77, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:08:18,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=541500.0, ans=0.0 2024-09-18 23:08:19,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=541500.0, ans=0.0 2024-09-18 23:08:27,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=541540.0, ans=0.125 2024-09-18 23:08:34,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=541540.0, ans=0.125 2024-09-18 23:08:39,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.04 vs. limit=15.0 2024-09-18 23:09:05,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=541620.0, ans=0.0 2024-09-18 23:09:27,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2024-09-18 23:09:28,243 INFO [train.py:1198] (0/2) Epoch 30, batch 4200, loss[loss=0.2541, ctc_loss=0.149, cr_loss=0.4278, attn_decoder_loss=0.2562, over 29494.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1212, cr_loss=0.3626, attn_decoder_loss=0.243, over 5799197.27 frames. ], batch size: 90, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:09:28,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=541700.0, ans=0.0 2024-09-18 23:09:29,682 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.390e+01 9.004e+01 9.409e+01 1.747e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-18 23:09:29,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541700.0, ans=0.1 2024-09-18 23:09:32,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=541700.0, ans=0.2 2024-09-18 23:09:43,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=541740.0, ans=0.125 2024-09-18 23:09:50,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=541740.0, ans=0.2 2024-09-18 23:09:55,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=541740.0, ans=0.2 2024-09-18 23:09:59,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.29 vs. limit=22.5 2024-09-18 23:10:28,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=541860.0, ans=0.2 2024-09-18 23:10:34,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=541860.0, ans=0.0 2024-09-18 23:10:40,665 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:10:41,862 INFO [train.py:1198] (0/2) Epoch 30, batch 4250, loss[loss=0.2245, ctc_loss=0.1065, cr_loss=0.3455, attn_decoder_loss=0.2299, over 29514.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1208, cr_loss=0.362, attn_decoder_loss=0.243, over 5804535.28 frames. ], batch size: 74, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:10:50,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541900.0, ans=0.1 2024-09-18 23:10:53,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.25 vs. limit=15.0 2024-09-18 23:10:53,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=541900.0, ans=0.2 2024-09-18 23:10:58,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=541940.0, ans=0.125 2024-09-18 23:11:23,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=541980.0, ans=0.025 2024-09-18 23:11:25,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=542020.0, ans=0.125 2024-09-18 23:11:45,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=542060.0, ans=0.125 2024-09-18 23:11:55,755 INFO [train.py:1198] (0/2) Epoch 30, batch 4300, loss[loss=0.2369, ctc_loss=0.1128, cr_loss=0.3436, attn_decoder_loss=0.2431, over 29518.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.121, cr_loss=0.3626, attn_decoder_loss=0.2434, over 5794024.15 frames. ], batch size: 87, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:11:57,300 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.934e+01 8.627e+01 9.132e+01 9.730e+01 6.693e+02, threshold=1.826e+02, percent-clipped=2.0 2024-09-18 23:13:11,892 INFO [train.py:1198] (0/2) Epoch 30, batch 4350, loss[loss=0.2539, ctc_loss=0.1375, cr_loss=0.398, attn_decoder_loss=0.258, over 29436.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1239, cr_loss=0.3684, attn_decoder_loss=0.2467, over 5796300.32 frames. ], batch size: 97, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:13:26,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-18 23:13:38,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542340.0, ans=0.1 2024-09-18 23:13:51,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.73 vs. limit=15.0 2024-09-18 23:13:56,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=542420.0, ans=0.0 2024-09-18 23:14:02,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=542420.0, ans=0.125 2024-09-18 23:14:09,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=542460.0, ans=0.125 2024-09-18 23:14:22,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542460.0, ans=0.1 2024-09-18 23:14:25,121 INFO [train.py:1198] (0/2) Epoch 30, batch 4400, loss[loss=0.2425, ctc_loss=0.1273, cr_loss=0.3731, attn_decoder_loss=0.247, over 27177.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1252, cr_loss=0.3714, attn_decoder_loss=0.2487, over 5766987.37 frames. ], batch size: 124, lr: 3.66e-03, grad_scale: 16.0 2024-09-18 23:14:26,525 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 8.805e+01 9.147e+01 9.646e+01 3.836e+02, threshold=1.829e+02, percent-clipped=2.0 2024-09-18 23:14:32,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=542500.0, ans=0.125 2024-09-18 23:14:53,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=542580.0, ans=0.125 2024-09-18 23:15:35,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=542660.0, ans=0.0 2024-09-18 23:15:35,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=542660.0, ans=0.0 2024-09-18 23:15:39,905 INFO [train.py:1198] (0/2) Epoch 30, batch 4450, loss[loss=0.2518, ctc_loss=0.1462, cr_loss=0.3916, attn_decoder_loss=0.2549, over 20155.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1289, cr_loss=0.3762, attn_decoder_loss=0.2507, over 5574486.38 frames. ], batch size: 209, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:15:50,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=542700.0, ans=0.125 2024-09-18 23:15:52,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=15.0 2024-09-18 23:15:54,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=542740.0, ans=0.125 2024-09-18 23:15:55,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=542740.0, ans=0.2 2024-09-18 23:16:09,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.42 vs. limit=15.0 2024-09-18 23:16:18,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2024-09-18 23:16:21,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=542780.0, ans=0.07 2024-09-18 23:16:30,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=542820.0, ans=0.0 2024-09-18 23:16:31,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=542820.0, ans=0.125 2024-09-18 23:16:50,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=542860.0, ans=0.1 2024-09-18 23:16:53,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.82 vs. limit=10.0 2024-09-18 23:16:54,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=542900.0, ans=0.125 2024-09-18 23:16:55,901 INFO [train.py:1198] (0/2) Epoch 30, batch 4500, loss[loss=0.2598, ctc_loss=0.1451, cr_loss=0.3758, attn_decoder_loss=0.2642, over 20538.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1325, cr_loss=0.3786, attn_decoder_loss=0.2526, over 5232450.92 frames. ], batch size: 209, lr: 3.66e-03, grad_scale: 8.0 2024-09-18 23:16:58,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.613e+01 9.634e+01 1.118e+02 1.226e+02 1.647e+02, threshold=2.235e+02, percent-clipped=0.0 2024-09-18 23:17:11,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=542940.0, ans=0.125 2024-09-18 23:17:18,581 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:17:30,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=542980.0, ans=0.025 2024-09-18 23:17:32,829 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-30.pt 2024-09-18 23:18:19,544 INFO [train.py:1198] (0/2) Epoch 31, batch 0, loss[loss=0.2182, ctc_loss=0.1054, cr_loss=0.3214, attn_decoder_loss=0.2236, over 29607.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1054, cr_loss=0.3214, attn_decoder_loss=0.2236, over 29607.00 frames. ], batch size: 73, lr: 3.60e-03, grad_scale: 16.0 2024-09-18 23:18:19,545 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-18 23:18:37,943 INFO [train.py:1230] (0/2) Epoch 31, validation: loss=0.2119, ctc_loss=0.03668, cr_loss=5.946e-15, attn_decoder_loss=0.2314, over 944034.00 frames. 2024-09-18 23:18:37,944 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-18 23:18:39,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=543000.0, ans=0.125 2024-09-18 23:19:01,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-09-18 23:19:13,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=543080.0, ans=10.0 2024-09-18 23:19:20,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=543080.0, ans=0.0 2024-09-18 23:19:25,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=543120.0, ans=0.05 2024-09-18 23:19:31,004 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:19:56,000 INFO [train.py:1198] (0/2) Epoch 31, batch 50, loss[loss=0.2126, ctc_loss=0.1048, cr_loss=0.3216, attn_decoder_loss=0.2175, over 29416.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1235, cr_loss=0.3683, attn_decoder_loss=0.2447, over 1267633.91 frames. ], batch size: 70, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:19:57,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=543200.0, ans=0.0 2024-09-18 23:20:06,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=543200.0, ans=0.0 2024-09-18 23:20:31,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2024-09-18 23:20:36,124 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:20:38,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-18 23:20:38,772 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.848e+01 9.634e+01 1.110e+02 1.417e+02, threshold=1.927e+02, percent-clipped=0.0 2024-09-18 23:21:06,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=12.0 2024-09-18 23:21:12,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.75 vs. limit=15.0 2024-09-18 23:21:13,386 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:21:14,584 INFO [train.py:1198] (0/2) Epoch 31, batch 100, loss[loss=0.2388, ctc_loss=0.1323, cr_loss=0.3787, attn_decoder_loss=0.2422, over 29521.00 frames. ], tot_loss[loss=0.2416, ctc_loss=0.1248, cr_loss=0.3697, attn_decoder_loss=0.2463, over 2252189.91 frames. ], batch size: 76, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:21:36,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-18 23:21:38,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=543440.0, ans=0.125 2024-09-18 23:21:52,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543480.0, ans=0.125 2024-09-18 23:21:55,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=543480.0, ans=0.125 2024-09-18 23:22:00,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2024-09-18 23:22:09,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=12.0 2024-09-18 23:22:17,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=543560.0, ans=0.125 2024-09-18 23:22:29,484 INFO [train.py:1198] (0/2) Epoch 31, batch 150, loss[loss=0.2196, ctc_loss=0.1082, cr_loss=0.354, attn_decoder_loss=0.2241, over 29437.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1218, cr_loss=0.365, attn_decoder_loss=0.2435, over 3046565.01 frames. ], batch size: 70, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:22:52,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=543640.0, ans=0.2 2024-09-18 23:23:11,517 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.450e+01 8.920e+01 9.351e+01 1.507e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-18 23:23:17,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=543720.0, ans=0.125 2024-09-18 23:23:24,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=12.0 2024-09-18 23:23:47,251 INFO [train.py:1198] (0/2) Epoch 31, batch 200, loss[loss=0.2504, ctc_loss=0.1305, cr_loss=0.38, attn_decoder_loss=0.2553, over 27379.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1211, cr_loss=0.3639, attn_decoder_loss=0.2427, over 3659581.94 frames. ], batch size: 124, lr: 3.60e-03, grad_scale: 8.0 2024-09-18 23:23:49,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-18 23:23:52,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2024-09-18 23:24:05,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=543840.0, ans=0.125 2024-09-18 23:24:26,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=543880.0, ans=0.125 2024-09-18 23:24:32,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=543920.0, ans=0.125 2024-09-18 23:25:04,443 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-136000.pt 2024-09-18 23:25:13,221 INFO [train.py:1198] (0/2) Epoch 31, batch 250, loss[loss=0.2597, ctc_loss=0.1384, cr_loss=0.4089, attn_decoder_loss=0.2641, over 29265.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1211, cr_loss=0.364, attn_decoder_loss=0.2429, over 4142728.68 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:25:42,283 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:25:45,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=544080.0, ans=0.5 2024-09-18 23:25:45,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-18 23:25:48,948 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-18 23:25:55,606 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.439e+01 8.894e+01 9.430e+01 6.449e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-18 23:25:56,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.85 vs. limit=22.5 2024-09-18 23:26:12,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=544160.0, ans=0.0 2024-09-18 23:26:13,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=544160.0, ans=0.0 2024-09-18 23:26:28,640 INFO [train.py:1198] (0/2) Epoch 31, batch 300, loss[loss=0.2516, ctc_loss=0.1321, cr_loss=0.3944, attn_decoder_loss=0.2562, over 29544.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1207, cr_loss=0.3628, attn_decoder_loss=0.2424, over 4511344.09 frames. ], batch size: 92, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:26:59,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=544280.0, ans=0.025 2024-09-18 23:27:08,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=544280.0, ans=0.125 2024-09-18 23:27:16,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.71 vs. limit=10.0 2024-09-18 23:27:35,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=544360.0, ans=0.125 2024-09-18 23:27:41,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=544360.0, ans=0.1 2024-09-18 23:27:46,735 INFO [train.py:1198] (0/2) Epoch 31, batch 350, loss[loss=0.2147, ctc_loss=0.1023, cr_loss=0.3191, attn_decoder_loss=0.2201, over 29321.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1212, cr_loss=0.3633, attn_decoder_loss=0.2433, over 4797360.95 frames. ], batch size: 71, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:27:46,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=544400.0, ans=0.125 2024-09-18 23:28:06,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544440.0, ans=0.1 2024-09-18 23:28:14,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=12.0 2024-09-18 23:28:15,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2024-09-18 23:28:28,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.389e+01 8.860e+01 9.607e+01 2.348e+02, threshold=1.772e+02, percent-clipped=3.0 2024-09-18 23:28:39,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544520.0, ans=0.1 2024-09-18 23:29:01,560 INFO [train.py:1198] (0/2) Epoch 31, batch 400, loss[loss=0.2369, ctc_loss=0.1173, cr_loss=0.3738, attn_decoder_loss=0.2419, over 29712.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1209, cr_loss=0.3624, attn_decoder_loss=0.2432, over 5025681.81 frames. ], batch size: 82, lr: 3.59e-03, grad_scale: 16.0 2024-09-18 23:29:10,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=544600.0, ans=0.125 2024-09-18 23:29:39,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-09-18 23:29:48,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=544720.0, ans=0.125 2024-09-18 23:29:54,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-18 23:30:13,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-18 23:30:20,324 INFO [train.py:1198] (0/2) Epoch 31, batch 450, loss[loss=0.2429, ctc_loss=0.1241, cr_loss=0.3749, attn_decoder_loss=0.2478, over 29705.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1207, cr_loss=0.3626, attn_decoder_loss=0.243, over 5188156.66 frames. ], batch size: 83, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:30:20,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=544800.0, ans=0.1 2024-09-18 23:30:35,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-18 23:31:04,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.525e+01 8.811e+01 9.438e+01 1.510e+02, threshold=1.762e+02, percent-clipped=0.0 2024-09-18 23:31:15,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=544920.0, ans=0.0 2024-09-18 23:31:16,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=544920.0, ans=0.125 2024-09-18 23:31:38,407 INFO [train.py:1198] (0/2) Epoch 31, batch 500, loss[loss=0.2518, ctc_loss=0.1337, cr_loss=0.398, attn_decoder_loss=0.2561, over 29426.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1206, cr_loss=0.3628, attn_decoder_loss=0.2425, over 5330485.22 frames. ], batch size: 94, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:31:46,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=545000.0, ans=0.125 2024-09-18 23:31:46,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=545000.0, ans=0.125 2024-09-18 23:31:48,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545000.0, ans=0.1 2024-09-18 23:31:53,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2024-09-18 23:32:18,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=545080.0, ans=0.1 2024-09-18 23:32:25,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=545120.0, ans=0.0 2024-09-18 23:32:39,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=545160.0, ans=0.125 2024-09-18 23:32:54,166 INFO [train.py:1198] (0/2) Epoch 31, batch 550, loss[loss=0.2603, ctc_loss=0.1433, cr_loss=0.4057, attn_decoder_loss=0.2643, over 28753.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1206, cr_loss=0.362, attn_decoder_loss=0.2425, over 5423482.42 frames. ], batch size: 104, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:33:06,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.44 vs. limit=15.0 2024-09-18 23:33:40,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.523e+01 8.948e+01 9.609e+01 1.463e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-18 23:34:12,442 INFO [train.py:1198] (0/2) Epoch 31, batch 600, loss[loss=0.2486, ctc_loss=0.1278, cr_loss=0.3734, attn_decoder_loss=0.2537, over 29279.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.121, cr_loss=0.3632, attn_decoder_loss=0.2429, over 5511217.78 frames. ], batch size: 100, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:34:12,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=545400.0, ans=0.125 2024-09-18 23:34:15,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=545400.0, ans=0.125 2024-09-18 23:34:35,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2024-09-18 23:34:37,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-18 23:34:52,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=15.0 2024-09-18 23:35:06,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=545520.0, ans=0.2 2024-09-18 23:35:30,097 INFO [train.py:1198] (0/2) Epoch 31, batch 650, loss[loss=0.2441, ctc_loss=0.1251, cr_loss=0.3792, attn_decoder_loss=0.2489, over 29772.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1206, cr_loss=0.3625, attn_decoder_loss=0.2422, over 5588205.08 frames. ], batch size: 81, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:35:31,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=545600.0, ans=0.125 2024-09-18 23:35:42,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545600.0, ans=0.1 2024-09-18 23:35:57,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545640.0, ans=0.1 2024-09-18 23:36:03,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=8.0 2024-09-18 23:36:14,178 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.324e+01 8.831e+01 9.249e+01 1.386e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-18 23:36:37,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=545760.0, ans=0.05 2024-09-18 23:36:44,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2024-09-18 23:36:46,087 INFO [train.py:1198] (0/2) Epoch 31, batch 700, loss[loss=0.2403, ctc_loss=0.128, cr_loss=0.3898, attn_decoder_loss=0.2441, over 29533.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1209, cr_loss=0.3636, attn_decoder_loss=0.2428, over 5636073.64 frames. ], batch size: 76, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:36:47,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=545800.0, ans=0.1 2024-09-18 23:37:52,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=545960.0, ans=0.2 2024-09-18 23:37:58,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=545960.0, ans=0.125 2024-09-18 23:38:04,178 INFO [train.py:1198] (0/2) Epoch 31, batch 750, loss[loss=0.2527, ctc_loss=0.127, cr_loss=0.3908, attn_decoder_loss=0.258, over 29700.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1209, cr_loss=0.3636, attn_decoder_loss=0.2428, over 5676180.18 frames. ], batch size: 82, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:38:15,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=546000.0, ans=0.125 2024-09-18 23:38:47,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.604e+01 9.058e+01 9.496e+01 1.707e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-18 23:38:51,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=546120.0, ans=0.1 2024-09-18 23:38:52,811 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:39:03,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.69 vs. limit=22.5 2024-09-18 23:39:12,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=546160.0, ans=0.0 2024-09-18 23:39:19,499 INFO [train.py:1198] (0/2) Epoch 31, batch 800, loss[loss=0.2123, ctc_loss=0.09193, cr_loss=0.2874, attn_decoder_loss=0.2193, over 29603.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1208, cr_loss=0.3627, attn_decoder_loss=0.2428, over 5706081.86 frames. ], batch size: 73, lr: 3.59e-03, grad_scale: 16.0 2024-09-18 23:39:32,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=546200.0, ans=0.2 2024-09-18 23:39:44,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=546240.0, ans=0.125 2024-09-18 23:40:04,914 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-18 23:40:37,652 INFO [train.py:1198] (0/2) Epoch 31, batch 850, loss[loss=0.2511, ctc_loss=0.1249, cr_loss=0.3802, attn_decoder_loss=0.2567, over 29710.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1202, cr_loss=0.3616, attn_decoder_loss=0.2423, over 5734567.76 frames. ], batch size: 89, lr: 3.59e-03, grad_scale: 16.0 2024-09-18 23:40:37,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=546400.0, ans=0.0 2024-09-18 23:40:42,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546400.0, ans=0.1 2024-09-18 23:40:48,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546400.0, ans=0.125 2024-09-18 23:40:58,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-09-18 23:40:58,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=546440.0, ans=12.0 2024-09-18 23:41:02,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-09-18 23:41:22,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=546480.0, ans=0.125 2024-09-18 23:41:22,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=546480.0, ans=0.05 2024-09-18 23:41:23,561 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.645e+01 9.090e+01 9.691e+01 3.180e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-18 23:41:45,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=546560.0, ans=0.0 2024-09-18 23:41:46,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=546560.0, ans=0.0 2024-09-18 23:41:55,485 INFO [train.py:1198] (0/2) Epoch 31, batch 900, loss[loss=0.2214, ctc_loss=0.1111, cr_loss=0.3564, attn_decoder_loss=0.2258, over 29595.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1207, cr_loss=0.3624, attn_decoder_loss=0.2427, over 5740075.65 frames. ], batch size: 73, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:42:09,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=546640.0, ans=0.125 2024-09-18 23:42:10,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=546640.0, ans=0.0 2024-09-18 23:42:16,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=546640.0, ans=0.125 2024-09-18 23:42:52,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=546720.0, ans=0.2 2024-09-18 23:42:58,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=546760.0, ans=0.2 2024-09-18 23:43:07,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=546760.0, ans=0.125 2024-09-18 23:43:10,547 INFO [train.py:1198] (0/2) Epoch 31, batch 950, loss[loss=0.2157, ctc_loss=0.1115, cr_loss=0.3462, attn_decoder_loss=0.2196, over 29543.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3623, attn_decoder_loss=0.2427, over 5741276.83 frames. ], batch size: 74, lr: 3.59e-03, grad_scale: 8.0 2024-09-18 23:43:23,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546800.0, ans=0.1 2024-09-18 23:43:57,153 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:43:58,298 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.530e+01 9.181e+01 9.954e+01 1.509e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-18 23:43:59,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-09-18 23:44:06,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=546920.0, ans=0.125 2024-09-18 23:44:23,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-18 23:44:28,412 INFO [train.py:1198] (0/2) Epoch 31, batch 1000, loss[loss=0.2298, ctc_loss=0.1151, cr_loss=0.355, attn_decoder_loss=0.2347, over 29483.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1214, cr_loss=0.3639, attn_decoder_loss=0.2434, over 5734371.93 frames. ], batch size: 77, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:44:45,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547040.0, ans=0.1 2024-09-18 23:44:54,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=547040.0, ans=0.125 2024-09-18 23:45:00,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=547080.0, ans=0.0 2024-09-18 23:45:06,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=547080.0, ans=0.0 2024-09-18 23:45:14,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-18 23:45:47,380 INFO [train.py:1198] (0/2) Epoch 31, batch 1050, loss[loss=0.2485, ctc_loss=0.133, cr_loss=0.3906, attn_decoder_loss=0.2526, over 29675.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1211, cr_loss=0.3631, attn_decoder_loss=0.243, over 5742972.28 frames. ], batch size: 85, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:46:10,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=547240.0, ans=0.2 2024-09-18 23:46:14,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-18 23:46:21,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=547280.0, ans=10.0 2024-09-18 23:46:33,059 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.412e+01 9.049e+01 9.703e+01 1.961e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-18 23:46:36,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-09-18 23:46:42,512 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:46:42,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=547320.0, ans=0.0 2024-09-18 23:47:03,202 INFO [train.py:1198] (0/2) Epoch 31, batch 1100, loss[loss=0.2407, ctc_loss=0.1228, cr_loss=0.3709, attn_decoder_loss=0.2456, over 29442.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1206, cr_loss=0.3621, attn_decoder_loss=0.2425, over 5756269.53 frames. ], batch size: 78, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:47:25,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=547440.0, ans=0.2 2024-09-18 23:47:25,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=547440.0, ans=0.2 2024-09-18 23:47:51,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547520.0, ans=0.1 2024-09-18 23:47:51,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2024-09-18 23:48:08,266 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:48:09,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=547560.0, ans=0.2 2024-09-18 23:48:11,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=547560.0, ans=0.125 2024-09-18 23:48:12,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=547560.0, ans=0.125 2024-09-18 23:48:21,595 INFO [train.py:1198] (0/2) Epoch 31, batch 1150, loss[loss=0.2376, ctc_loss=0.1253, cr_loss=0.3756, attn_decoder_loss=0.2418, over 29451.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1204, cr_loss=0.3614, attn_decoder_loss=0.2422, over 5755095.50 frames. ], batch size: 78, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:48:25,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=547600.0, ans=0.0 2024-09-18 23:48:43,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=547640.0, ans=0.125 2024-09-18 23:48:57,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=547680.0, ans=0.05 2024-09-18 23:48:57,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=547680.0, ans=0.0 2024-09-18 23:49:09,381 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.554e+01 9.013e+01 9.585e+01 3.112e+02, threshold=1.803e+02, percent-clipped=2.0 2024-09-18 23:49:30,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=547760.0, ans=0.0 2024-09-18 23:49:39,573 INFO [train.py:1198] (0/2) Epoch 31, batch 1200, loss[loss=0.2406, ctc_loss=0.1133, cr_loss=0.3605, attn_decoder_loss=0.2468, over 29661.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1208, cr_loss=0.3626, attn_decoder_loss=0.2429, over 5747212.38 frames. ], batch size: 85, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:49:53,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=547840.0, ans=0.07 2024-09-18 23:50:29,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2024-09-18 23:50:35,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-18 23:50:42,426 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:50:42,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=547960.0, ans=0.2 2024-09-18 23:50:42,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547960.0, ans=0.1 2024-09-18 23:50:58,373 INFO [train.py:1198] (0/2) Epoch 31, batch 1250, loss[loss=0.2598, ctc_loss=0.1411, cr_loss=0.4142, attn_decoder_loss=0.2638, over 29514.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1212, cr_loss=0.3633, attn_decoder_loss=0.2436, over 5774348.56 frames. ], batch size: 92, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:51:01,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=548000.0, ans=0.125 2024-09-18 23:51:03,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=548000.0, ans=0.125 2024-09-18 23:51:03,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.15 vs. limit=10.0 2024-09-18 23:51:12,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548040.0, ans=0.1 2024-09-18 23:51:18,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=548040.0, ans=0.125 2024-09-18 23:51:25,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=22.5 2024-09-18 23:51:43,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.406e+01 8.800e+01 9.095e+01 1.339e+02, threshold=1.760e+02, percent-clipped=0.0 2024-09-18 23:51:58,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2024-09-18 23:52:06,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-09-18 23:52:14,291 INFO [train.py:1198] (0/2) Epoch 31, batch 1300, loss[loss=0.2477, ctc_loss=0.1237, cr_loss=0.3725, attn_decoder_loss=0.2532, over 28555.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1205, cr_loss=0.3623, attn_decoder_loss=0.2429, over 5778575.17 frames. ], batch size: 112, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:52:18,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.01 vs. limit=15.0 2024-09-18 23:53:04,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-09-18 23:53:08,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=548320.0, ans=0.0 2024-09-18 23:53:25,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=548360.0, ans=0.025 2024-09-18 23:53:32,531 INFO [train.py:1198] (0/2) Epoch 31, batch 1350, loss[loss=0.2354, ctc_loss=0.1179, cr_loss=0.3645, attn_decoder_loss=0.2404, over 29768.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1204, cr_loss=0.362, attn_decoder_loss=0.2426, over 5795840.48 frames. ], batch size: 81, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:53:41,810 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:53:49,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.22 vs. limit=22.5 2024-09-18 23:53:55,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=548440.0, ans=0.0 2024-09-18 23:53:56,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=548440.0, ans=0.125 2024-09-18 23:54:17,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.480e+01 8.970e+01 9.556e+01 1.739e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-18 23:54:25,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.25 vs. limit=12.0 2024-09-18 23:54:31,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=548560.0, ans=0.0 2024-09-18 23:54:43,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=548560.0, ans=0.025 2024-09-18 23:54:47,699 INFO [train.py:1198] (0/2) Epoch 31, batch 1400, loss[loss=0.2045, ctc_loss=0.09596, cr_loss=0.3023, attn_decoder_loss=0.2099, over 29605.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1201, cr_loss=0.3615, attn_decoder_loss=0.2422, over 5806151.50 frames. ], batch size: 69, lr: 3.58e-03, grad_scale: 16.0 2024-09-18 23:54:56,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=548600.0, ans=0.09899494936611666 2024-09-18 23:55:08,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=548640.0, ans=0.2 2024-09-18 23:55:12,759 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:55:20,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=548680.0, ans=0.125 2024-09-18 23:55:24,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=548680.0, ans=0.1 2024-09-18 23:55:32,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=548680.0, ans=0.1 2024-09-18 23:55:38,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=548720.0, ans=0.1 2024-09-18 23:55:53,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=22.5 2024-09-18 23:56:05,900 INFO [train.py:1198] (0/2) Epoch 31, batch 1450, loss[loss=0.2518, ctc_loss=0.1342, cr_loss=0.3758, attn_decoder_loss=0.2566, over 29473.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1201, cr_loss=0.3614, attn_decoder_loss=0.2428, over 5803555.60 frames. ], batch size: 94, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:56:13,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=548800.0, ans=0.1 2024-09-18 23:56:17,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-18 23:56:31,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=548840.0, ans=0.0 2024-09-18 23:56:34,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=548880.0, ans=0.025 2024-09-18 23:56:45,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=548880.0, ans=0.2 2024-09-18 23:56:52,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.514e+01 9.000e+01 9.465e+01 1.182e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-18 23:57:12,965 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:57:18,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2024-09-18 23:57:18,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=548960.0, ans=0.2 2024-09-18 23:57:23,230 INFO [train.py:1198] (0/2) Epoch 31, batch 1500, loss[loss=0.2418, ctc_loss=0.1219, cr_loss=0.3596, attn_decoder_loss=0.2471, over 29651.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1202, cr_loss=0.3618, attn_decoder_loss=0.2431, over 5804589.12 frames. ], batch size: 86, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:57:26,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=549000.0, ans=0.125 2024-09-18 23:57:51,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-18 23:58:14,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.89 vs. limit=15.0 2024-09-18 23:58:15,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=549120.0, ans=0.2 2024-09-18 23:58:20,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=549120.0, ans=0.125 2024-09-18 23:58:20,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=549120.0, ans=0.125 2024-09-18 23:58:41,412 INFO [train.py:1198] (0/2) Epoch 31, batch 1550, loss[loss=0.2485, ctc_loss=0.1317, cr_loss=0.3929, attn_decoder_loss=0.2527, over 29512.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1206, cr_loss=0.3622, attn_decoder_loss=0.2431, over 5782000.45 frames. ], batch size: 90, lr: 3.58e-03, grad_scale: 8.0 2024-09-18 23:58:52,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=549200.0, ans=0.125 2024-09-18 23:59:19,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=549280.0, ans=0.125 2024-09-18 23:59:19,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549280.0, ans=0.1 2024-09-18 23:59:20,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=549280.0, ans=0.125 2024-09-18 23:59:26,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=549320.0, ans=0.125 2024-09-18 23:59:26,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=549320.0, ans=0.07 2024-09-18 23:59:27,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.398e+01 8.813e+01 9.564e+01 2.152e+02, threshold=1.763e+02, percent-clipped=1.0 2024-09-18 23:59:46,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=549360.0, ans=0.2 2024-09-18 23:59:48,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=549360.0, ans=0.125 2024-09-18 23:59:56,784 INFO [train.py:1198] (0/2) Epoch 31, batch 1600, loss[loss=0.2448, ctc_loss=0.1214, cr_loss=0.3588, attn_decoder_loss=0.2506, over 29675.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1209, cr_loss=0.3628, attn_decoder_loss=0.2429, over 5765377.91 frames. ], batch size: 85, lr: 3.58e-03, grad_scale: 16.0 2024-09-19 00:00:39,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=549480.0, ans=0.125 2024-09-19 00:00:55,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=549520.0, ans=0.125 2024-09-19 00:00:59,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=549560.0, ans=0.0 2024-09-19 00:01:02,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=549560.0, ans=0.025 2024-09-19 00:01:07,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=549560.0, ans=0.0 2024-09-19 00:01:10,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=549560.0, ans=0.1 2024-09-19 00:01:10,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=549560.0, ans=0.125 2024-09-19 00:01:14,876 INFO [train.py:1198] (0/2) Epoch 31, batch 1650, loss[loss=0.2498, ctc_loss=0.1298, cr_loss=0.3732, attn_decoder_loss=0.2549, over 29725.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3622, attn_decoder_loss=0.2428, over 5758833.38 frames. ], batch size: 89, lr: 3.58e-03, grad_scale: 8.0 2024-09-19 00:02:03,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.401e+01 8.981e+01 9.648e+01 1.683e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-19 00:02:05,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=549720.0, ans=0.125 2024-09-19 00:02:28,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=549760.0, ans=0.025 2024-09-19 00:02:31,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=549800.0, ans=0.025 2024-09-19 00:02:32,506 INFO [train.py:1198] (0/2) Epoch 31, batch 1700, loss[loss=0.2157, ctc_loss=0.1065, cr_loss=0.3381, attn_decoder_loss=0.2203, over 29583.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1203, cr_loss=0.3621, attn_decoder_loss=0.2427, over 5778722.78 frames. ], batch size: 69, lr: 3.58e-03, grad_scale: 8.0 2024-09-19 00:02:56,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.95 vs. limit=15.0 2024-09-19 00:03:11,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=549880.0, ans=0.0 2024-09-19 00:03:11,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-09-19 00:03:42,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=549960.0, ans=0.125 2024-09-19 00:03:42,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=549960.0, ans=0.125 2024-09-19 00:03:43,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=12.0 2024-09-19 00:03:48,476 INFO [train.py:1198] (0/2) Epoch 31, batch 1750, loss[loss=0.2134, ctc_loss=0.1004, cr_loss=0.3099, attn_decoder_loss=0.2191, over 29337.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1202, cr_loss=0.3619, attn_decoder_loss=0.2424, over 5787513.21 frames. ], batch size: 67, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:03:53,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=12.0 2024-09-19 00:04:35,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=550120.0, ans=0.0 2024-09-19 00:04:36,769 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.450e+01 9.086e+01 9.663e+01 1.697e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 00:05:01,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=550160.0, ans=0.125 2024-09-19 00:05:05,953 INFO [train.py:1198] (0/2) Epoch 31, batch 1800, loss[loss=0.2469, ctc_loss=0.1261, cr_loss=0.3752, attn_decoder_loss=0.252, over 29698.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1204, cr_loss=0.362, attn_decoder_loss=0.2423, over 5790935.79 frames. ], batch size: 83, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:05:16,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=550200.0, ans=0.025 2024-09-19 00:05:26,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-19 00:05:38,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-09-19 00:06:00,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=550320.0, ans=0.5 2024-09-19 00:06:17,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=550360.0, ans=0.125 2024-09-19 00:06:23,828 INFO [train.py:1198] (0/2) Epoch 31, batch 1850, loss[loss=0.2452, ctc_loss=0.1232, cr_loss=0.3525, attn_decoder_loss=0.2509, over 29623.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1203, cr_loss=0.3619, attn_decoder_loss=0.2424, over 5797363.68 frames. ], batch size: 86, lr: 3.57e-03, grad_scale: 4.0 2024-09-19 00:06:48,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=550440.0, ans=0.0 2024-09-19 00:07:14,101 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.638e+01 9.110e+01 9.627e+01 2.703e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 00:07:15,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=550520.0, ans=10.0 2024-09-19 00:07:39,689 INFO [train.py:1198] (0/2) Epoch 31, batch 1900, loss[loss=0.2451, ctc_loss=0.1184, cr_loss=0.3563, attn_decoder_loss=0.2513, over 29715.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1205, cr_loss=0.3623, attn_decoder_loss=0.2429, over 5805547.58 frames. ], batch size: 89, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:07:43,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=550600.0, ans=0.125 2024-09-19 00:07:46,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=550600.0, ans=0.2 2024-09-19 00:08:05,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=550640.0, ans=0.125 2024-09-19 00:08:31,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=550720.0, ans=0.0 2024-09-19 00:08:57,883 INFO [train.py:1198] (0/2) Epoch 31, batch 1950, loss[loss=0.2325, ctc_loss=0.1191, cr_loss=0.3684, attn_decoder_loss=0.2369, over 29446.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.121, cr_loss=0.364, attn_decoder_loss=0.244, over 5820138.71 frames. ], batch size: 78, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:09:11,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=550840.0, ans=0.0 2024-09-19 00:09:23,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2024-09-19 00:09:36,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-09-19 00:09:42,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=550920.0, ans=0.125 2024-09-19 00:09:42,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=550920.0, ans=0.0 2024-09-19 00:09:47,671 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.604e+01 9.078e+01 9.873e+01 2.917e+02, threshold=1.816e+02, percent-clipped=2.0 2024-09-19 00:09:55,407 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:10:09,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=550960.0, ans=0.025 2024-09-19 00:10:15,518 INFO [train.py:1198] (0/2) Epoch 31, batch 2000, loss[loss=0.2186, ctc_loss=0.1103, cr_loss=0.3437, attn_decoder_loss=0.223, over 29328.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1212, cr_loss=0.3642, attn_decoder_loss=0.2442, over 5797911.02 frames. ], batch size: 67, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:10:20,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=551000.0, ans=0.125 2024-09-19 00:10:37,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2024-09-19 00:10:39,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-19 00:10:53,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=551080.0, ans=0.2 2024-09-19 00:10:58,009 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:11:04,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=551120.0, ans=0.0 2024-09-19 00:11:11,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=551120.0, ans=0.0 2024-09-19 00:11:31,489 INFO [train.py:1198] (0/2) Epoch 31, batch 2050, loss[loss=0.215, ctc_loss=0.1074, cr_loss=0.3453, attn_decoder_loss=0.2193, over 29463.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1206, cr_loss=0.3627, attn_decoder_loss=0.2432, over 5789733.65 frames. ], batch size: 70, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:11:48,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=551240.0, ans=0.125 2024-09-19 00:12:00,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=551280.0, ans=0.125 2024-09-19 00:12:06,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=551280.0, ans=0.125 2024-09-19 00:12:21,279 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 8.532e+01 9.006e+01 9.562e+01 1.976e+02, threshold=1.801e+02, percent-clipped=1.0 2024-09-19 00:12:27,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=551320.0, ans=0.125 2024-09-19 00:12:38,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=551360.0, ans=0.125 2024-09-19 00:12:48,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=551400.0, ans=0.125 2024-09-19 00:12:49,613 INFO [train.py:1198] (0/2) Epoch 31, batch 2100, loss[loss=0.2334, ctc_loss=0.118, cr_loss=0.359, attn_decoder_loss=0.2382, over 29770.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3627, attn_decoder_loss=0.2428, over 5800843.37 frames. ], batch size: 81, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:12:56,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=551400.0, ans=0.0 2024-09-19 00:12:58,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551400.0, ans=0.0 2024-09-19 00:13:02,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=551400.0, ans=0.1 2024-09-19 00:13:27,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=551480.0, ans=0.125 2024-09-19 00:13:43,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=551520.0, ans=0.025 2024-09-19 00:13:45,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-09-19 00:14:07,010 INFO [train.py:1198] (0/2) Epoch 31, batch 2150, loss[loss=0.2314, ctc_loss=0.1164, cr_loss=0.3695, attn_decoder_loss=0.236, over 29450.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.12, cr_loss=0.3623, attn_decoder_loss=0.2424, over 5816075.18 frames. ], batch size: 78, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:14:11,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=551600.0, ans=0.125 2024-09-19 00:14:31,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=551640.0, ans=0.125 2024-09-19 00:14:43,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=551680.0, ans=0.125 2024-09-19 00:14:44,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=551680.0, ans=0.125 2024-09-19 00:14:57,048 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.560e+01 8.969e+01 9.441e+01 3.216e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-19 00:15:18,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=551760.0, ans=0.125 2024-09-19 00:15:23,015 INFO [train.py:1198] (0/2) Epoch 31, batch 2200, loss[loss=0.2448, ctc_loss=0.1127, cr_loss=0.3387, attn_decoder_loss=0.252, over 29646.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.12, cr_loss=0.3617, attn_decoder_loss=0.2424, over 5813250.19 frames. ], batch size: 86, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:15:23,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=551800.0, ans=0.035 2024-09-19 00:15:36,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.32 vs. limit=10.0 2024-09-19 00:15:42,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=551840.0, ans=0.2 2024-09-19 00:16:05,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551880.0, ans=0.1 2024-09-19 00:16:12,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.39 vs. limit=15.0 2024-09-19 00:16:19,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=551920.0, ans=0.0 2024-09-19 00:16:22,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=551960.0, ans=0.95 2024-09-19 00:16:39,386 INFO [train.py:1198] (0/2) Epoch 31, batch 2250, loss[loss=0.2382, ctc_loss=0.112, cr_loss=0.3404, attn_decoder_loss=0.2446, over 29684.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1195, cr_loss=0.361, attn_decoder_loss=0.2422, over 5812221.56 frames. ], batch size: 82, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:16:39,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=552000.0, ans=0.125 2024-09-19 00:16:42,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=552000.0, ans=0.125 2024-09-19 00:16:49,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=552000.0, ans=0.125 2024-09-19 00:17:00,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=552040.0, ans=0.2 2024-09-19 00:17:03,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=552040.0, ans=0.05 2024-09-19 00:17:09,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.92 vs. limit=15.0 2024-09-19 00:17:32,962 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.496e+01 8.948e+01 9.461e+01 2.809e+02, threshold=1.790e+02, percent-clipped=1.0 2024-09-19 00:17:38,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-09-19 00:17:57,272 INFO [train.py:1198] (0/2) Epoch 31, batch 2300, loss[loss=0.2132, ctc_loss=0.0993, cr_loss=0.3028, attn_decoder_loss=0.2191, over 29739.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1192, cr_loss=0.3598, attn_decoder_loss=0.2413, over 5798736.49 frames. ], batch size: 72, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:18:05,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=552200.0, ans=0.05 2024-09-19 00:19:01,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552360.0, ans=0.125 2024-09-19 00:19:15,043 INFO [train.py:1198] (0/2) Epoch 31, batch 2350, loss[loss=0.2527, ctc_loss=0.1355, cr_loss=0.3843, attn_decoder_loss=0.2572, over 29685.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1194, cr_loss=0.3602, attn_decoder_loss=0.2416, over 5804134.29 frames. ], batch size: 83, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:19:27,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=552400.0, ans=0.2 2024-09-19 00:19:28,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=552440.0, ans=0.025 2024-09-19 00:19:35,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=552440.0, ans=0.125 2024-09-19 00:19:56,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=552480.0, ans=0.025 2024-09-19 00:19:59,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=552520.0, ans=0.125 2024-09-19 00:19:59,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552520.0, ans=0.125 2024-09-19 00:20:06,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.603e+01 9.093e+01 9.793e+01 1.880e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 00:20:28,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=552560.0, ans=0.0 2024-09-19 00:20:31,125 INFO [train.py:1198] (0/2) Epoch 31, batch 2400, loss[loss=0.2237, ctc_loss=0.106, cr_loss=0.3211, attn_decoder_loss=0.2296, over 29547.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1201, cr_loss=0.3617, attn_decoder_loss=0.2422, over 5807617.74 frames. ], batch size: 76, lr: 3.57e-03, grad_scale: 16.0 2024-09-19 00:20:40,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=552600.0, ans=0.025 2024-09-19 00:21:08,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-09-19 00:21:09,644 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:21:23,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=552720.0, ans=0.125 2024-09-19 00:21:23,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-09-19 00:21:24,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=552720.0, ans=0.2 2024-09-19 00:21:29,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=552720.0, ans=0.125 2024-09-19 00:21:35,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=552760.0, ans=0.125 2024-09-19 00:21:51,231 INFO [train.py:1198] (0/2) Epoch 31, batch 2450, loss[loss=0.2446, ctc_loss=0.1208, cr_loss=0.3679, attn_decoder_loss=0.2502, over 29697.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1207, cr_loss=0.3627, attn_decoder_loss=0.2429, over 5783437.67 frames. ], batch size: 82, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:22:44,294 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.732e+01 9.075e+01 9.673e+01 2.868e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-19 00:22:46,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552920.0, ans=0.1 2024-09-19 00:22:49,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552920.0, ans=0.1 2024-09-19 00:22:53,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=552960.0, ans=0.2 2024-09-19 00:22:59,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=552960.0, ans=0.125 2024-09-19 00:23:06,988 INFO [train.py:1198] (0/2) Epoch 31, batch 2500, loss[loss=0.2435, ctc_loss=0.1136, cr_loss=0.3395, attn_decoder_loss=0.2504, over 29653.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1209, cr_loss=0.3634, attn_decoder_loss=0.2432, over 5793156.71 frames. ], batch size: 86, lr: 3.57e-03, grad_scale: 8.0 2024-09-19 00:23:14,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=553000.0, ans=0.125 2024-09-19 00:23:36,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=553080.0, ans=0.0 2024-09-19 00:23:46,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=553080.0, ans=0.0 2024-09-19 00:23:51,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=553120.0, ans=0.0 2024-09-19 00:24:04,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553120.0, ans=0.1 2024-09-19 00:24:08,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.88 vs. limit=22.5 2024-09-19 00:24:10,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.43 vs. limit=15.0 2024-09-19 00:24:22,898 INFO [train.py:1198] (0/2) Epoch 31, batch 2550, loss[loss=0.208, ctc_loss=0.09618, cr_loss=0.3072, attn_decoder_loss=0.2136, over 29305.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1208, cr_loss=0.3634, attn_decoder_loss=0.2431, over 5796970.77 frames. ], batch size: 67, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:24:40,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=553240.0, ans=0.5 2024-09-19 00:24:41,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=553240.0, ans=0.2 2024-09-19 00:24:58,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=12.0 2024-09-19 00:25:18,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.309e+01 8.709e+01 9.331e+01 1.370e+02, threshold=1.742e+02, percent-clipped=0.0 2024-09-19 00:25:21,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=553320.0, ans=0.0 2024-09-19 00:25:35,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=553360.0, ans=0.125 2024-09-19 00:25:43,196 INFO [train.py:1198] (0/2) Epoch 31, batch 2600, loss[loss=0.2374, ctc_loss=0.1197, cr_loss=0.3636, attn_decoder_loss=0.2424, over 29456.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1207, cr_loss=0.3629, attn_decoder_loss=0.2431, over 5793332.72 frames. ], batch size: 78, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:25:48,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=553400.0, ans=0.0 2024-09-19 00:25:48,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=12.0 2024-09-19 00:25:53,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=553400.0, ans=0.2 2024-09-19 00:25:54,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-19 00:26:19,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=553480.0, ans=0.0 2024-09-19 00:26:19,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=553480.0, ans=0.125 2024-09-19 00:26:19,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=553480.0, ans=0.125 2024-09-19 00:26:33,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2024-09-19 00:26:48,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=553560.0, ans=0.125 2024-09-19 00:26:58,907 INFO [train.py:1198] (0/2) Epoch 31, batch 2650, loss[loss=0.2503, ctc_loss=0.1263, cr_loss=0.3781, attn_decoder_loss=0.2557, over 29216.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1207, cr_loss=0.3631, attn_decoder_loss=0.2434, over 5800012.88 frames. ], batch size: 100, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:27:06,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=553600.0, ans=10.0 2024-09-19 00:27:15,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=553640.0, ans=0.125 2024-09-19 00:27:35,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=553680.0, ans=0.125 2024-09-19 00:27:41,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=553680.0, ans=0.125 2024-09-19 00:27:41,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-09-19 00:27:48,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=553720.0, ans=0.0 2024-09-19 00:27:51,191 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.581e+01 9.000e+01 9.310e+01 1.740e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-19 00:27:56,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553720.0, ans=0.1 2024-09-19 00:28:05,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=553760.0, ans=0.125 2024-09-19 00:28:14,277 INFO [train.py:1198] (0/2) Epoch 31, batch 2700, loss[loss=0.2512, ctc_loss=0.1264, cr_loss=0.371, attn_decoder_loss=0.2569, over 29534.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1211, cr_loss=0.3639, attn_decoder_loss=0.2438, over 5796578.46 frames. ], batch size: 87, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:28:14,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=553800.0, ans=0.025 2024-09-19 00:28:20,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=553800.0, ans=0.125 2024-09-19 00:28:45,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=553880.0, ans=0.025 2024-09-19 00:28:59,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=553880.0, ans=0.2 2024-09-19 00:29:12,902 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:29:31,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=553960.0, ans=0.2 2024-09-19 00:29:34,469 INFO [train.py:1198] (0/2) Epoch 31, batch 2750, loss[loss=0.2317, ctc_loss=0.1286, cr_loss=0.3781, attn_decoder_loss=0.2347, over 29522.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1208, cr_loss=0.3633, attn_decoder_loss=0.2427, over 5795324.11 frames. ], batch size: 75, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:30:23,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554120.0, ans=0.1 2024-09-19 00:30:27,598 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.490e+01 8.925e+01 9.454e+01 1.870e+02, threshold=1.785e+02, percent-clipped=1.0 2024-09-19 00:30:50,894 INFO [train.py:1198] (0/2) Epoch 31, batch 2800, loss[loss=0.2606, ctc_loss=0.149, cr_loss=0.3854, attn_decoder_loss=0.2644, over 20503.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1215, cr_loss=0.3647, attn_decoder_loss=0.2432, over 5776355.72 frames. ], batch size: 209, lr: 3.56e-03, grad_scale: 16.0 2024-09-19 00:30:57,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2024-09-19 00:31:12,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=554240.0, ans=0.5 2024-09-19 00:31:25,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=554280.0, ans=0.0 2024-09-19 00:31:27,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=554280.0, ans=0.125 2024-09-19 00:32:00,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=554360.0, ans=0.125 2024-09-19 00:32:06,638 INFO [train.py:1198] (0/2) Epoch 31, batch 2850, loss[loss=0.2297, ctc_loss=0.1155, cr_loss=0.3761, attn_decoder_loss=0.234, over 29521.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.122, cr_loss=0.3653, attn_decoder_loss=0.2436, over 5762943.17 frames. ], batch size: 77, lr: 3.56e-03, grad_scale: 16.0 2024-09-19 00:32:13,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=554400.0, ans=0.0 2024-09-19 00:32:15,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=554400.0, ans=0.0 2024-09-19 00:32:50,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=554480.0, ans=0.2 2024-09-19 00:32:55,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=554520.0, ans=0.125 2024-09-19 00:33:02,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=554520.0, ans=0.2 2024-09-19 00:33:03,237 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.652e+01 9.121e+01 9.681e+01 2.307e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 00:33:19,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=554560.0, ans=0.125 2024-09-19 00:33:24,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-19 00:33:26,765 INFO [train.py:1198] (0/2) Epoch 31, batch 2900, loss[loss=0.2361, ctc_loss=0.12, cr_loss=0.3726, attn_decoder_loss=0.2407, over 29412.00 frames. ], tot_loss[loss=0.2396, ctc_loss=0.1221, cr_loss=0.3662, attn_decoder_loss=0.2445, over 5788998.06 frames. ], batch size: 79, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:33:27,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=554600.0, ans=0.09899494936611666 2024-09-19 00:33:57,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=554680.0, ans=0.0 2024-09-19 00:33:57,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554680.0, ans=0.1 2024-09-19 00:34:09,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554680.0, ans=0.1 2024-09-19 00:34:12,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=554720.0, ans=0.5 2024-09-19 00:34:14,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=554720.0, ans=0.125 2024-09-19 00:34:40,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.16 vs. limit=15.0 2024-09-19 00:34:42,832 INFO [train.py:1198] (0/2) Epoch 31, batch 2950, loss[loss=0.2211, ctc_loss=0.1071, cr_loss=0.3303, attn_decoder_loss=0.2264, over 29541.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1209, cr_loss=0.3635, attn_decoder_loss=0.2432, over 5783661.58 frames. ], batch size: 75, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:35:08,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=554840.0, ans=0.125 2024-09-19 00:35:20,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=554880.0, ans=0.125 2024-09-19 00:35:22,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554880.0, ans=0.1 2024-09-19 00:35:34,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=554920.0, ans=0.125 2024-09-19 00:35:37,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.492e+01 9.114e+01 9.567e+01 2.273e+02, threshold=1.823e+02, percent-clipped=2.0 2024-09-19 00:35:38,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-09-19 00:35:45,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=554960.0, ans=0.2 2024-09-19 00:35:45,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 00:35:50,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=554960.0, ans=0.0 2024-09-19 00:35:50,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-19 00:35:59,048 INFO [train.py:1198] (0/2) Epoch 31, batch 3000, loss[loss=0.2361, ctc_loss=0.1236, cr_loss=0.387, attn_decoder_loss=0.24, over 29760.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1211, cr_loss=0.3634, attn_decoder_loss=0.2432, over 5784419.92 frames. ], batch size: 81, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:35:59,049 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 00:36:03,199 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8752, 3.3226, 3.6253, 3.7246], device='cuda:0') 2024-09-19 00:36:14,661 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3077, 4.6332, 5.1183, 5.2325], device='cuda:0') 2024-09-19 00:36:19,672 INFO [train.py:1230] (0/2) Epoch 31, validation: loss=0.2117, ctc_loss=0.03748, cr_loss=5.925e-15, attn_decoder_loss=0.2311, over 944034.00 frames. 2024-09-19 00:36:19,673 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 00:36:32,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=555000.0, ans=0.5 2024-09-19 00:37:02,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-19 00:37:03,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=555080.0, ans=0.0 2024-09-19 00:37:19,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=22.5 2024-09-19 00:37:37,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2024-09-19 00:37:38,277 INFO [train.py:1198] (0/2) Epoch 31, batch 3050, loss[loss=0.2309, ctc_loss=0.1221, cr_loss=0.3887, attn_decoder_loss=0.2344, over 29527.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1216, cr_loss=0.3645, attn_decoder_loss=0.2439, over 5779081.78 frames. ], batch size: 76, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:37:40,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555200.0, ans=0.1 2024-09-19 00:37:53,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=555240.0, ans=0.025 2024-09-19 00:38:15,017 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:38:15,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-09-19 00:38:21,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=555280.0, ans=0.0 2024-09-19 00:38:32,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.564e+01 9.261e+01 9.873e+01 2.101e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 00:38:42,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=555360.0, ans=0.2 2024-09-19 00:38:54,005 INFO [train.py:1198] (0/2) Epoch 31, batch 3100, loss[loss=0.2474, ctc_loss=0.1284, cr_loss=0.3982, attn_decoder_loss=0.2518, over 29255.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1211, cr_loss=0.3632, attn_decoder_loss=0.2434, over 5778677.53 frames. ], batch size: 100, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:38:54,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-09-19 00:39:03,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=555400.0, ans=0.1 2024-09-19 00:39:08,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=555440.0, ans=0.125 2024-09-19 00:39:24,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=555480.0, ans=0.0 2024-09-19 00:39:26,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.82 vs. limit=22.5 2024-09-19 00:39:31,111 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:39:44,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=555520.0, ans=0.125 2024-09-19 00:39:50,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2024-09-19 00:40:10,294 INFO [train.py:1198] (0/2) Epoch 31, batch 3150, loss[loss=0.2506, ctc_loss=0.1247, cr_loss=0.362, attn_decoder_loss=0.2565, over 28847.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1211, cr_loss=0.3632, attn_decoder_loss=0.2434, over 5785226.08 frames. ], batch size: 104, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:40:14,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=555600.0, ans=0.125 2024-09-19 00:40:29,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=555640.0, ans=0.125 2024-09-19 00:40:44,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555680.0, ans=0.0 2024-09-19 00:40:46,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=555680.0, ans=0.0 2024-09-19 00:41:09,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.684e+01 9.147e+01 9.580e+01 2.256e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 00:41:09,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555720.0, ans=0.1 2024-09-19 00:41:12,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=555720.0, ans=0.125 2024-09-19 00:41:23,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=555760.0, ans=0.125 2024-09-19 00:41:30,419 INFO [train.py:1198] (0/2) Epoch 31, batch 3200, loss[loss=0.2249, ctc_loss=0.09647, cr_loss=0.3118, attn_decoder_loss=0.2322, over 29394.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1207, cr_loss=0.3633, attn_decoder_loss=0.2429, over 5795084.08 frames. ], batch size: 79, lr: 3.56e-03, grad_scale: 16.0 2024-09-19 00:41:47,948 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.00 vs. limit=15.0 2024-09-19 00:41:48,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=555840.0, ans=0.125 2024-09-19 00:42:01,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=555880.0, ans=0.0 2024-09-19 00:42:30,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=555960.0, ans=0.125 2024-09-19 00:42:37,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=555960.0, ans=0.0 2024-09-19 00:42:46,900 INFO [train.py:1198] (0/2) Epoch 31, batch 3250, loss[loss=0.2396, ctc_loss=0.1206, cr_loss=0.3617, attn_decoder_loss=0.2448, over 29710.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1207, cr_loss=0.3636, attn_decoder_loss=0.2434, over 5801416.62 frames. ], batch size: 84, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:43:13,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-09-19 00:43:42,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.541e+01 8.932e+01 9.487e+01 3.275e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 00:43:55,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=556160.0, ans=0.1 2024-09-19 00:43:55,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=556160.0, ans=0.2 2024-09-19 00:44:02,474 INFO [train.py:1198] (0/2) Epoch 31, batch 3300, loss[loss=0.247, ctc_loss=0.1263, cr_loss=0.3663, attn_decoder_loss=0.2522, over 28110.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1199, cr_loss=0.3617, attn_decoder_loss=0.2422, over 5797155.25 frames. ], batch size: 111, lr: 3.56e-03, grad_scale: 8.0 2024-09-19 00:44:15,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=556200.0, ans=0.0 2024-09-19 00:44:46,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2024-09-19 00:44:52,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=556320.0, ans=0.025 2024-09-19 00:45:01,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=556320.0, ans=0.125 2024-09-19 00:45:03,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2024-09-19 00:45:07,555 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:45:17,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=556360.0, ans=0.07 2024-09-19 00:45:21,978 INFO [train.py:1198] (0/2) Epoch 31, batch 3350, loss[loss=0.2489, ctc_loss=0.1246, cr_loss=0.3703, attn_decoder_loss=0.2545, over 28822.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1204, cr_loss=0.3624, attn_decoder_loss=0.2427, over 5773423.15 frames. ], batch size: 104, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:45:23,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=556400.0, ans=0.0 2024-09-19 00:45:29,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=556400.0, ans=0.2 2024-09-19 00:45:39,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=556440.0, ans=0.1 2024-09-19 00:46:04,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=556480.0, ans=0.0 2024-09-19 00:46:07,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=556520.0, ans=0.0 2024-09-19 00:46:17,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=556520.0, ans=0.125 2024-09-19 00:46:18,236 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.469e+01 9.027e+01 9.591e+01 1.739e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 00:46:18,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=556520.0, ans=0.125 2024-09-19 00:46:22,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=556560.0, ans=0.2 2024-09-19 00:46:38,053 INFO [train.py:1198] (0/2) Epoch 31, batch 3400, loss[loss=0.2187, ctc_loss=0.108, cr_loss=0.3305, attn_decoder_loss=0.2237, over 29359.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1207, cr_loss=0.3625, attn_decoder_loss=0.2426, over 5766102.54 frames. ], batch size: 67, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:46:39,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=556600.0, ans=0.0 2024-09-19 00:46:53,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=556640.0, ans=0.0 2024-09-19 00:47:45,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=556760.0, ans=0.0 2024-09-19 00:47:49,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=556760.0, ans=0.1 2024-09-19 00:47:54,284 INFO [train.py:1198] (0/2) Epoch 31, batch 3450, loss[loss=0.2399, ctc_loss=0.1162, cr_loss=0.3496, attn_decoder_loss=0.2459, over 28214.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1206, cr_loss=0.3626, attn_decoder_loss=0.2427, over 5773405.30 frames. ], batch size: 111, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:48:13,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=556840.0, ans=0.125 2024-09-19 00:48:35,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.43 vs. limit=6.0 2024-09-19 00:48:42,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=556920.0, ans=0.125 2024-09-19 00:48:54,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.684e+01 9.223e+01 9.839e+01 1.576e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-19 00:48:57,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-19 00:49:11,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=556960.0, ans=0.0 2024-09-19 00:49:13,818 INFO [train.py:1198] (0/2) Epoch 31, batch 3500, loss[loss=0.2174, ctc_loss=0.1038, cr_loss=0.3196, attn_decoder_loss=0.2229, over 29339.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1203, cr_loss=0.3619, attn_decoder_loss=0.2424, over 5774981.12 frames. ], batch size: 71, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:49:17,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557000.0, ans=0.1 2024-09-19 00:49:51,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=557080.0, ans=0.125 2024-09-19 00:49:53,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=557080.0, ans=0.025 2024-09-19 00:50:09,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=557120.0, ans=0.0 2024-09-19 00:50:25,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=557160.0, ans=0.0 2024-09-19 00:50:28,515 INFO [train.py:1198] (0/2) Epoch 31, batch 3550, loss[loss=0.2501, ctc_loss=0.1271, cr_loss=0.376, attn_decoder_loss=0.2554, over 29702.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1195, cr_loss=0.3602, attn_decoder_loss=0.242, over 5781415.77 frames. ], batch size: 89, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:50:56,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=557280.0, ans=0.125 2024-09-19 00:51:05,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=557280.0, ans=0.125 2024-09-19 00:51:10,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=557280.0, ans=0.07 2024-09-19 00:51:23,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.575e+01 9.105e+01 9.496e+01 5.708e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-19 00:51:29,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=557360.0, ans=0.2 2024-09-19 00:51:35,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=557360.0, ans=0.125 2024-09-19 00:51:41,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=557400.0, ans=0.2 2024-09-19 00:51:42,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-09-19 00:51:43,114 INFO [train.py:1198] (0/2) Epoch 31, batch 3600, loss[loss=0.2297, ctc_loss=0.1088, cr_loss=0.3412, attn_decoder_loss=0.2355, over 29487.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1197, cr_loss=0.3608, attn_decoder_loss=0.2422, over 5791516.72 frames. ], batch size: 77, lr: 3.55e-03, grad_scale: 16.0 2024-09-19 00:51:52,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=557400.0, ans=0.125 2024-09-19 00:52:03,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.50 vs. limit=12.0 2024-09-19 00:52:22,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=557480.0, ans=0.2 2024-09-19 00:52:55,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=557560.0, ans=0.125 2024-09-19 00:52:58,315 INFO [train.py:1198] (0/2) Epoch 31, batch 3650, loss[loss=0.2519, ctc_loss=0.1235, cr_loss=0.3545, attn_decoder_loss=0.2583, over 29557.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1195, cr_loss=0.3605, attn_decoder_loss=0.242, over 5794158.13 frames. ], batch size: 90, lr: 3.55e-03, grad_scale: 16.0 2024-09-19 00:53:23,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=557640.0, ans=0.0 2024-09-19 00:53:29,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-19 00:53:34,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=557680.0, ans=0.1 2024-09-19 00:53:36,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=557680.0, ans=0.0 2024-09-19 00:53:49,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=557720.0, ans=0.0 2024-09-19 00:53:55,288 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.468e+01 9.016e+01 9.512e+01 1.613e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-19 00:54:16,694 INFO [train.py:1198] (0/2) Epoch 31, batch 3700, loss[loss=0.2378, ctc_loss=0.1204, cr_loss=0.3537, attn_decoder_loss=0.243, over 29698.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1198, cr_loss=0.3609, attn_decoder_loss=0.2422, over 5803591.46 frames. ], batch size: 84, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:54:24,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=557800.0, ans=0.125 2024-09-19 00:54:56,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=557880.0, ans=0.125 2024-09-19 00:55:13,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=557920.0, ans=0.0 2024-09-19 00:55:30,747 INFO [train.py:1198] (0/2) Epoch 31, batch 3750, loss[loss=0.2115, ctc_loss=0.1051, cr_loss=0.3312, attn_decoder_loss=0.216, over 29356.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1201, cr_loss=0.3618, attn_decoder_loss=0.2422, over 5807835.14 frames. ], batch size: 67, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:55:47,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=558040.0, ans=0.125 2024-09-19 00:56:18,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=558120.0, ans=0.0 2024-09-19 00:56:27,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.551e+01 9.139e+01 9.962e+01 3.532e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-19 00:56:38,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=558160.0, ans=0.05 2024-09-19 00:56:45,329 INFO [train.py:1198] (0/2) Epoch 31, batch 3800, loss[loss=0.2527, ctc_loss=0.1422, cr_loss=0.404, attn_decoder_loss=0.256, over 29639.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1203, cr_loss=0.362, attn_decoder_loss=0.2423, over 5797166.80 frames. ], batch size: 86, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:56:57,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=558200.0, ans=0.125 2024-09-19 00:57:08,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=558240.0, ans=0.07 2024-09-19 00:57:16,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=558280.0, ans=22.5 2024-09-19 00:57:32,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=558320.0, ans=0.125 2024-09-19 00:57:33,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=558320.0, ans=10.0 2024-09-19 00:57:38,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2024-09-19 00:57:40,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=558320.0, ans=0.0 2024-09-19 00:57:51,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.36 vs. limit=15.0 2024-09-19 00:57:59,984 INFO [train.py:1198] (0/2) Epoch 31, batch 3850, loss[loss=0.2518, ctc_loss=0.1361, cr_loss=0.4007, attn_decoder_loss=0.2558, over 29275.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1199, cr_loss=0.3614, attn_decoder_loss=0.242, over 5811463.95 frames. ], batch size: 100, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 00:58:22,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558440.0, ans=0.125 2024-09-19 00:58:40,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=558480.0, ans=0.125 2024-09-19 00:58:56,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.602e+01 9.005e+01 9.471e+01 1.448e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 00:58:59,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=558560.0, ans=0.125 2024-09-19 00:59:02,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=558560.0, ans=0.125 2024-09-19 00:59:14,165 INFO [train.py:1198] (0/2) Epoch 31, batch 3900, loss[loss=0.2514, ctc_loss=0.1275, cr_loss=0.396, attn_decoder_loss=0.2563, over 29625.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.12, cr_loss=0.362, attn_decoder_loss=0.2422, over 5816153.37 frames. ], batch size: 86, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:00:19,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=558760.0, ans=0.0 2024-09-19 01:00:28,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=558760.0, ans=0.125 2024-09-19 01:00:30,739 INFO [train.py:1198] (0/2) Epoch 31, batch 3950, loss[loss=0.2542, ctc_loss=0.1276, cr_loss=0.3831, attn_decoder_loss=0.2597, over 29521.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.12, cr_loss=0.3623, attn_decoder_loss=0.2425, over 5835656.53 frames. ], batch size: 97, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:00:47,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=558840.0, ans=0.125 2024-09-19 01:00:55,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-19 01:01:06,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-19 01:01:18,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.02 vs. limit=22.5 2024-09-19 01:01:28,227 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.418e+01 8.953e+01 9.483e+01 1.231e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 01:01:31,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=558960.0, ans=0.0 2024-09-19 01:01:32,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558960.0, ans=0.125 2024-09-19 01:01:35,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=558960.0, ans=0.0 2024-09-19 01:01:37,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558960.0, ans=0.1 2024-09-19 01:01:43,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-09-19 01:01:44,284 INFO [train.py:1198] (0/2) Epoch 31, batch 4000, loss[loss=0.2169, ctc_loss=0.1021, cr_loss=0.3338, attn_decoder_loss=0.2222, over 29521.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1203, cr_loss=0.3628, attn_decoder_loss=0.2426, over 5812702.37 frames. ], batch size: 74, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:02:00,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=559040.0, ans=0.0 2024-09-19 01:02:02,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=559040.0, ans=0.125 2024-09-19 01:02:09,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=559040.0, ans=0.2 2024-09-19 01:02:09,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559040.0, ans=0.1 2024-09-19 01:02:14,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=559080.0, ans=0.125 2024-09-19 01:02:24,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559080.0, ans=0.1 2024-09-19 01:02:26,263 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:02:36,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=559120.0, ans=0.2 2024-09-19 01:02:38,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=559120.0, ans=0.0 2024-09-19 01:02:59,110 INFO [train.py:1198] (0/2) Epoch 31, batch 4050, loss[loss=0.2497, ctc_loss=0.1362, cr_loss=0.3769, attn_decoder_loss=0.2539, over 20089.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1204, cr_loss=0.3628, attn_decoder_loss=0.2424, over 5796457.92 frames. ], batch size: 209, lr: 3.55e-03, grad_scale: 8.0 2024-09-19 01:03:33,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=559280.0, ans=0.125 2024-09-19 01:03:43,339 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:03:56,630 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.784e+01 9.351e+01 1.021e+02 4.182e+02, threshold=1.870e+02, percent-clipped=0.0 2024-09-19 01:04:14,215 INFO [train.py:1198] (0/2) Epoch 31, batch 4100, loss[loss=0.2587, ctc_loss=0.1381, cr_loss=0.4214, attn_decoder_loss=0.2627, over 29507.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1207, cr_loss=0.3632, attn_decoder_loss=0.2426, over 5792989.06 frames. ], batch size: 90, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:04:18,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559400.0, ans=0.1 2024-09-19 01:04:25,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=559400.0, ans=0.125 2024-09-19 01:04:46,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=559480.0, ans=0.0 2024-09-19 01:04:49,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=559480.0, ans=0.0 2024-09-19 01:04:56,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=559480.0, ans=0.2 2024-09-19 01:05:05,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=559520.0, ans=0.1 2024-09-19 01:05:08,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559520.0, ans=0.1 2024-09-19 01:05:27,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=559600.0, ans=0.125 2024-09-19 01:05:28,645 INFO [train.py:1198] (0/2) Epoch 31, batch 4150, loss[loss=0.2321, ctc_loss=0.1157, cr_loss=0.3578, attn_decoder_loss=0.2371, over 29512.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1205, cr_loss=0.3625, attn_decoder_loss=0.2424, over 5798627.09 frames. ], batch size: 77, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:05:29,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=22.5 2024-09-19 01:06:00,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-19 01:06:10,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=559680.0, ans=0.0 2024-09-19 01:06:14,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=559720.0, ans=0.125 2024-09-19 01:06:26,031 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.331e+01 8.844e+01 9.480e+01 1.340e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-19 01:06:42,213 INFO [train.py:1198] (0/2) Epoch 31, batch 4200, loss[loss=0.2545, ctc_loss=0.1422, cr_loss=0.4025, attn_decoder_loss=0.2581, over 29515.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1208, cr_loss=0.3636, attn_decoder_loss=0.2427, over 5800020.69 frames. ], batch size: 90, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:07:01,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=559840.0, ans=0.0 2024-09-19 01:07:13,708 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:07:25,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=559920.0, ans=0.0 2024-09-19 01:07:25,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=559920.0, ans=0.125 2024-09-19 01:07:38,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=559920.0, ans=0.025 2024-09-19 01:07:47,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=559960.0, ans=0.125 2024-09-19 01:07:51,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.24 vs. limit=10.0 2024-09-19 01:07:55,216 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-140000.pt 2024-09-19 01:08:04,341 INFO [train.py:1198] (0/2) Epoch 31, batch 4250, loss[loss=0.2298, ctc_loss=0.1139, cr_loss=0.3393, attn_decoder_loss=0.2352, over 29531.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1202, cr_loss=0.3626, attn_decoder_loss=0.2425, over 5805715.66 frames. ], batch size: 74, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:08:04,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=560000.0, ans=0.125 2024-09-19 01:08:16,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.78 vs. limit=22.5 2024-09-19 01:08:34,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560080.0, ans=0.125 2024-09-19 01:08:38,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-19 01:08:41,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-19 01:08:46,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560080.0, ans=0.1 2024-09-19 01:08:54,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.18 vs. limit=22.5 2024-09-19 01:09:03,969 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.697e+01 9.428e+01 9.992e+01 2.936e+02, threshold=1.886e+02, percent-clipped=1.0 2024-09-19 01:09:16,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=12.0 2024-09-19 01:09:20,359 INFO [train.py:1198] (0/2) Epoch 31, batch 4300, loss[loss=0.2529, ctc_loss=0.13, cr_loss=0.3769, attn_decoder_loss=0.2582, over 29525.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1197, cr_loss=0.3615, attn_decoder_loss=0.2425, over 5794656.54 frames. ], batch size: 87, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:09:44,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=560240.0, ans=0.0 2024-09-19 01:09:51,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560280.0, ans=0.1 2024-09-19 01:10:06,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=560320.0, ans=0.0 2024-09-19 01:10:23,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=560360.0, ans=0.0 2024-09-19 01:10:34,938 INFO [train.py:1198] (0/2) Epoch 31, batch 4350, loss[loss=0.2528, ctc_loss=0.1309, cr_loss=0.3758, attn_decoder_loss=0.258, over 29442.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.122, cr_loss=0.3663, attn_decoder_loss=0.2455, over 5798249.66 frames. ], batch size: 97, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:10:35,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=560400.0, ans=0.2 2024-09-19 01:10:41,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=560400.0, ans=0.0 2024-09-19 01:11:34,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 8.755e+01 9.230e+01 9.613e+01 3.743e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-19 01:11:47,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=560560.0, ans=0.125 2024-09-19 01:11:50,363 INFO [train.py:1198] (0/2) Epoch 31, batch 4400, loss[loss=0.2546, ctc_loss=0.1366, cr_loss=0.4002, attn_decoder_loss=0.2588, over 27305.00 frames. ], tot_loss[loss=0.2426, ctc_loss=0.1236, cr_loss=0.3694, attn_decoder_loss=0.2477, over 5769731.83 frames. ], batch size: 124, lr: 3.54e-03, grad_scale: 16.0 2024-09-19 01:12:23,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-09-19 01:12:49,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-09-19 01:13:05,147 INFO [train.py:1198] (0/2) Epoch 31, batch 4450, loss[loss=0.2649, ctc_loss=0.1595, cr_loss=0.3967, attn_decoder_loss=0.2678, over 19833.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1276, cr_loss=0.3752, attn_decoder_loss=0.25, over 5582974.73 frames. ], batch size: 210, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:13:17,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2024-09-19 01:13:18,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=560800.0, ans=0.025 2024-09-19 01:13:32,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560840.0, ans=0.1 2024-09-19 01:13:33,735 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:13:49,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=560920.0, ans=0.0 2024-09-19 01:13:56,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=560920.0, ans=0.2 2024-09-19 01:14:00,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560920.0, ans=0.1 2024-09-19 01:14:06,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.287e+01 9.493e+01 1.082e+02 1.219e+02 3.408e+02, threshold=2.163e+02, percent-clipped=1.0 2024-09-19 01:14:09,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=560960.0, ans=0.0 2024-09-19 01:14:10,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=560960.0, ans=0.125 2024-09-19 01:14:19,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=561000.0, ans=0.0 2024-09-19 01:14:21,377 INFO [train.py:1198] (0/2) Epoch 31, batch 4500, loss[loss=0.2533, ctc_loss=0.1426, cr_loss=0.3664, attn_decoder_loss=0.2575, over 20748.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.131, cr_loss=0.3772, attn_decoder_loss=0.2518, over 5244502.52 frames. ], batch size: 210, lr: 3.54e-03, grad_scale: 8.0 2024-09-19 01:14:29,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=561000.0, ans=0.125 2024-09-19 01:14:58,626 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-31.pt 2024-09-19 01:15:44,697 INFO [train.py:1198] (0/2) Epoch 32, batch 0, loss[loss=0.2194, ctc_loss=0.107, cr_loss=0.3323, attn_decoder_loss=0.2245, over 29618.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.107, cr_loss=0.3323, attn_decoder_loss=0.2245, over 29618.00 frames. ], batch size: 73, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:15:44,697 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 01:15:58,089 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.9515, 4.7430, 4.4457, 4.2535], device='cuda:0') 2024-09-19 01:16:03,114 INFO [train.py:1230] (0/2) Epoch 32, validation: loss=0.2127, ctc_loss=0.03714, cr_loss=6.101e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-19 01:16:03,114 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 01:16:03,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=561100.0, ans=0.1 2024-09-19 01:16:22,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-19 01:16:25,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=561140.0, ans=0.125 2024-09-19 01:16:27,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=561140.0, ans=0.125 2024-09-19 01:16:45,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=561180.0, ans=0.125 2024-09-19 01:16:45,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=561180.0, ans=0.0 2024-09-19 01:17:05,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=561260.0, ans=0.0 2024-09-19 01:17:17,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=561260.0, ans=0.1 2024-09-19 01:17:20,697 INFO [train.py:1198] (0/2) Epoch 32, batch 50, loss[loss=0.2077, ctc_loss=0.09286, cr_loss=0.3135, attn_decoder_loss=0.2135, over 29435.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1223, cr_loss=0.3673, attn_decoder_loss=0.244, over 1266455.14 frames. ], batch size: 70, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:17:22,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=561300.0, ans=0.125 2024-09-19 01:17:32,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-19 01:17:40,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-19 01:17:45,061 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.848e+01 9.833e+01 1.147e+02 1.812e+02, threshold=1.967e+02, percent-clipped=0.0 2024-09-19 01:17:59,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.85 vs. limit=22.5 2024-09-19 01:18:00,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=561380.0, ans=0.07 2024-09-19 01:18:09,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=561420.0, ans=0.125 2024-09-19 01:18:21,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=561460.0, ans=0.125 2024-09-19 01:18:27,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=561460.0, ans=0.0 2024-09-19 01:18:27,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561460.0, ans=0.1 2024-09-19 01:18:36,353 INFO [train.py:1198] (0/2) Epoch 32, batch 100, loss[loss=0.226, ctc_loss=0.1144, cr_loss=0.337, attn_decoder_loss=0.2309, over 29521.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1234, cr_loss=0.3704, attn_decoder_loss=0.2455, over 2251671.75 frames. ], batch size: 76, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:19:13,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=561580.0, ans=0.125 2024-09-19 01:19:15,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=561580.0, ans=0.2 2024-09-19 01:19:34,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=561620.0, ans=0.125 2024-09-19 01:19:39,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=561660.0, ans=0.1 2024-09-19 01:19:53,959 INFO [train.py:1198] (0/2) Epoch 32, batch 150, loss[loss=0.2123, ctc_loss=0.1034, cr_loss=0.3253, attn_decoder_loss=0.2172, over 29433.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1205, cr_loss=0.3633, attn_decoder_loss=0.2428, over 3047528.34 frames. ], batch size: 70, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:19:55,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=561700.0, ans=0.2 2024-09-19 01:19:55,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=561700.0, ans=0.125 2024-09-19 01:19:55,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=561700.0, ans=0.125 2024-09-19 01:20:00,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561700.0, ans=0.125 2024-09-19 01:20:00,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=561700.0, ans=0.0 2024-09-19 01:20:07,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2024-09-19 01:20:18,131 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.368e+01 8.757e+01 9.262e+01 1.493e+02, threshold=1.751e+02, percent-clipped=0.0 2024-09-19 01:20:35,047 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:20:44,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=561820.0, ans=0.125 2024-09-19 01:20:58,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=561860.0, ans=0.05 2024-09-19 01:20:58,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=561860.0, ans=0.0 2024-09-19 01:21:03,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=561860.0, ans=0.125 2024-09-19 01:21:08,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=561860.0, ans=0.0 2024-09-19 01:21:11,625 INFO [train.py:1198] (0/2) Epoch 32, batch 200, loss[loss=0.2552, ctc_loss=0.1376, cr_loss=0.3899, attn_decoder_loss=0.2596, over 27147.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1206, cr_loss=0.3639, attn_decoder_loss=0.2423, over 3660373.51 frames. ], batch size: 124, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:21:17,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.08 vs. limit=15.0 2024-09-19 01:21:32,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-09-19 01:21:37,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=561940.0, ans=0.04949747468305833 2024-09-19 01:21:37,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=561940.0, ans=0.1 2024-09-19 01:21:45,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=561980.0, ans=0.125 2024-09-19 01:21:46,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=561980.0, ans=0.0 2024-09-19 01:21:53,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.00 vs. limit=22.5 2024-09-19 01:22:16,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=562060.0, ans=0.125 2024-09-19 01:22:17,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=12.0 2024-09-19 01:22:27,316 INFO [train.py:1198] (0/2) Epoch 32, batch 250, loss[loss=0.2563, ctc_loss=0.1374, cr_loss=0.4122, attn_decoder_loss=0.2604, over 29244.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.12, cr_loss=0.3635, attn_decoder_loss=0.242, over 4142599.52 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:22:45,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=12.0 2024-09-19 01:22:46,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=562140.0, ans=0.0 2024-09-19 01:22:53,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.466e+01 9.044e+01 9.662e+01 1.743e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 01:23:05,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=22.5 2024-09-19 01:23:12,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=562180.0, ans=0.125 2024-09-19 01:23:14,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562220.0, ans=0.125 2024-09-19 01:23:20,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=562220.0, ans=0.2 2024-09-19 01:23:23,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=562220.0, ans=0.125 2024-09-19 01:23:27,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.20 vs. limit=10.0 2024-09-19 01:23:37,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.52 vs. limit=15.0 2024-09-19 01:23:41,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=562260.0, ans=0.125 2024-09-19 01:23:41,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.08 vs. limit=12.0 2024-09-19 01:23:45,522 INFO [train.py:1198] (0/2) Epoch 32, batch 300, loss[loss=0.2588, ctc_loss=0.1355, cr_loss=0.3991, attn_decoder_loss=0.2636, over 29519.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1198, cr_loss=0.3634, attn_decoder_loss=0.2418, over 4509583.44 frames. ], batch size: 92, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:24:10,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562340.0, ans=0.125 2024-09-19 01:24:17,142 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-19 01:24:22,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=562380.0, ans=0.0 2024-09-19 01:24:33,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.70 vs. limit=15.0 2024-09-19 01:25:01,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=562460.0, ans=0.125 2024-09-19 01:25:04,393 INFO [train.py:1198] (0/2) Epoch 32, batch 350, loss[loss=0.2063, ctc_loss=0.09845, cr_loss=0.3005, attn_decoder_loss=0.2116, over 29333.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1198, cr_loss=0.3628, attn_decoder_loss=0.242, over 4794921.21 frames. ], batch size: 71, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:25:07,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=562500.0, ans=0.1 2024-09-19 01:25:15,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=562500.0, ans=0.1 2024-09-19 01:25:28,323 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.336e+01 8.922e+01 9.619e+01 6.149e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-19 01:25:36,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=562580.0, ans=0.125 2024-09-19 01:25:37,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=562580.0, ans=0.0 2024-09-19 01:25:40,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=562580.0, ans=0.05 2024-09-19 01:25:50,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.56 vs. limit=15.0 2024-09-19 01:25:59,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=562620.0, ans=0.125 2024-09-19 01:26:00,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562620.0, ans=0.125 2024-09-19 01:26:02,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562620.0, ans=0.1 2024-09-19 01:26:06,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=562660.0, ans=0.0 2024-09-19 01:26:13,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.64 vs. limit=6.0 2024-09-19 01:26:13,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.81 vs. limit=15.0 2024-09-19 01:26:19,649 INFO [train.py:1198] (0/2) Epoch 32, batch 400, loss[loss=0.2378, ctc_loss=0.1178, cr_loss=0.3566, attn_decoder_loss=0.2432, over 29714.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1195, cr_loss=0.3618, attn_decoder_loss=0.2419, over 5025208.09 frames. ], batch size: 82, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:26:32,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=562700.0, ans=0.0 2024-09-19 01:26:48,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-19 01:26:52,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-19 01:27:21,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=562860.0, ans=0.125 2024-09-19 01:27:30,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562860.0, ans=0.125 2024-09-19 01:27:33,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=562860.0, ans=0.2 2024-09-19 01:27:38,244 INFO [train.py:1198] (0/2) Epoch 32, batch 450, loss[loss=0.2442, ctc_loss=0.1165, cr_loss=0.3531, attn_decoder_loss=0.2505, over 29695.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1198, cr_loss=0.3621, attn_decoder_loss=0.2422, over 5186833.58 frames. ], batch size: 83, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:27:38,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=562900.0, ans=0.125 2024-09-19 01:27:56,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=562940.0, ans=0.125 2024-09-19 01:28:00,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=562940.0, ans=0.0 2024-09-19 01:28:02,738 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.493e+01 8.894e+01 9.370e+01 1.465e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-19 01:28:03,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=562940.0, ans=0.125 2024-09-19 01:28:08,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.06 vs. limit=15.0 2024-09-19 01:28:11,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-19 01:28:30,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=563020.0, ans=0.125 2024-09-19 01:28:31,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=563020.0, ans=0.0 2024-09-19 01:28:54,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-19 01:28:56,470 INFO [train.py:1198] (0/2) Epoch 32, batch 500, loss[loss=0.26, ctc_loss=0.1385, cr_loss=0.3995, attn_decoder_loss=0.2646, over 29428.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1199, cr_loss=0.3622, attn_decoder_loss=0.2419, over 5330566.09 frames. ], batch size: 94, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 01:29:16,699 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:29:25,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=563180.0, ans=0.2 2024-09-19 01:29:29,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-19 01:29:41,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=563220.0, ans=0.125 2024-09-19 01:29:42,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=563220.0, ans=0.125 2024-09-19 01:29:49,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2024-09-19 01:30:02,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=563260.0, ans=0.95 2024-09-19 01:30:05,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=563260.0, ans=0.125 2024-09-19 01:30:12,546 INFO [train.py:1198] (0/2) Epoch 32, batch 550, loss[loss=0.2455, ctc_loss=0.1227, cr_loss=0.3704, attn_decoder_loss=0.251, over 28777.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1195, cr_loss=0.3611, attn_decoder_loss=0.2419, over 5422555.26 frames. ], batch size: 104, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:30:23,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-09-19 01:30:40,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.423e+01 9.076e+01 9.566e+01 2.311e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 01:30:42,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=563340.0, ans=0.0 2024-09-19 01:30:45,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563380.0, ans=0.1 2024-09-19 01:31:11,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=563420.0, ans=0.0 2024-09-19 01:31:30,749 INFO [train.py:1198] (0/2) Epoch 32, batch 600, loss[loss=0.2518, ctc_loss=0.1322, cr_loss=0.3805, attn_decoder_loss=0.2567, over 29295.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1193, cr_loss=0.3607, attn_decoder_loss=0.242, over 5509315.66 frames. ], batch size: 100, lr: 3.48e-03, grad_scale: 8.0 2024-09-19 01:31:41,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=563500.0, ans=0.05 2024-09-19 01:31:58,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=563540.0, ans=0.2 2024-09-19 01:31:59,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=563580.0, ans=0.0 2024-09-19 01:32:01,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=563580.0, ans=0.0 2024-09-19 01:32:17,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=563620.0, ans=0.125 2024-09-19 01:32:34,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=563660.0, ans=0.0 2024-09-19 01:32:45,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=8.0 2024-09-19 01:32:45,911 INFO [train.py:1198] (0/2) Epoch 32, batch 650, loss[loss=0.2335, ctc_loss=0.1093, cr_loss=0.3423, attn_decoder_loss=0.2397, over 29780.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1186, cr_loss=0.3595, attn_decoder_loss=0.2414, over 5586494.66 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:32:55,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563700.0, ans=0.1 2024-09-19 01:32:58,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563700.0, ans=0.1 2024-09-19 01:33:08,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=563740.0, ans=0.2 2024-09-19 01:33:08,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=563740.0, ans=0.1 2024-09-19 01:33:11,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-19 01:33:13,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=563740.0, ans=0.0 2024-09-19 01:33:14,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.421e+01 8.815e+01 9.543e+01 5.182e+02, threshold=1.763e+02, percent-clipped=1.0 2024-09-19 01:33:30,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-19 01:33:34,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=563820.0, ans=0.0 2024-09-19 01:33:34,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=563820.0, ans=0.125 2024-09-19 01:33:43,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=563820.0, ans=0.0 2024-09-19 01:33:47,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=563860.0, ans=0.0 2024-09-19 01:33:49,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=12.0 2024-09-19 01:33:59,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=563860.0, ans=0.0 2024-09-19 01:34:00,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-19 01:34:01,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=563860.0, ans=0.125 2024-09-19 01:34:01,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=563860.0, ans=0.0 2024-09-19 01:34:04,048 INFO [train.py:1198] (0/2) Epoch 32, batch 700, loss[loss=0.2394, ctc_loss=0.1271, cr_loss=0.3814, attn_decoder_loss=0.2434, over 29543.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1197, cr_loss=0.3619, attn_decoder_loss=0.2424, over 5636863.26 frames. ], batch size: 76, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:34:12,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-09-19 01:34:13,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=563900.0, ans=0.0 2024-09-19 01:34:13,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=563900.0, ans=0.125 2024-09-19 01:34:22,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=563940.0, ans=0.05 2024-09-19 01:34:54,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=564020.0, ans=0.125 2024-09-19 01:35:10,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-19 01:35:16,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=564060.0, ans=0.0 2024-09-19 01:35:22,665 INFO [train.py:1198] (0/2) Epoch 32, batch 750, loss[loss=0.2433, ctc_loss=0.1223, cr_loss=0.3478, attn_decoder_loss=0.249, over 29697.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1195, cr_loss=0.3614, attn_decoder_loss=0.2419, over 5676111.23 frames. ], batch size: 82, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:35:25,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-19 01:35:30,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=564100.0, ans=0.2 2024-09-19 01:35:33,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=564100.0, ans=0.125 2024-09-19 01:35:43,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.67 vs. limit=15.0 2024-09-19 01:35:48,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.447e+01 8.933e+01 9.518e+01 3.479e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-19 01:35:55,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=564180.0, ans=0.0 2024-09-19 01:35:58,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=564180.0, ans=0.125 2024-09-19 01:36:02,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=564180.0, ans=0.125 2024-09-19 01:36:14,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.86 vs. limit=15.0 2024-09-19 01:36:15,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=564220.0, ans=0.1 2024-09-19 01:36:17,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=564220.0, ans=0.2 2024-09-19 01:36:38,476 INFO [train.py:1198] (0/2) Epoch 32, batch 800, loss[loss=0.2168, ctc_loss=0.1027, cr_loss=0.336, attn_decoder_loss=0.222, over 29615.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1197, cr_loss=0.362, attn_decoder_loss=0.2421, over 5706136.75 frames. ], batch size: 73, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:36:45,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=22.5 2024-09-19 01:36:51,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=22.5 2024-09-19 01:37:05,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=564340.0, ans=0.125 2024-09-19 01:37:26,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=564420.0, ans=0.125 2024-09-19 01:37:29,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-19 01:37:46,645 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-19 01:37:48,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=564460.0, ans=0.2 2024-09-19 01:37:51,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=564460.0, ans=0.125 2024-09-19 01:37:54,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=564500.0, ans=0.2 2024-09-19 01:37:56,058 INFO [train.py:1198] (0/2) Epoch 32, batch 850, loss[loss=0.2401, ctc_loss=0.1266, cr_loss=0.3819, attn_decoder_loss=0.2442, over 29695.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1195, cr_loss=0.3613, attn_decoder_loss=0.2417, over 5735991.02 frames. ], batch size: 89, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:38:15,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=564540.0, ans=0.04949747468305833 2024-09-19 01:38:23,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.587e+01 9.050e+01 9.701e+01 1.930e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 01:38:31,865 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:38:46,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=564620.0, ans=0.125 2024-09-19 01:38:59,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=564660.0, ans=0.2 2024-09-19 01:39:04,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.03 vs. limit=10.0 2024-09-19 01:39:05,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564660.0, ans=0.1 2024-09-19 01:39:11,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=564660.0, ans=0.0 2024-09-19 01:39:13,909 INFO [train.py:1198] (0/2) Epoch 32, batch 900, loss[loss=0.2195, ctc_loss=0.1009, cr_loss=0.3293, attn_decoder_loss=0.2254, over 29608.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1196, cr_loss=0.3616, attn_decoder_loss=0.2419, over 5740975.28 frames. ], batch size: 73, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:39:14,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564700.0, ans=0.0 2024-09-19 01:39:17,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=564700.0, ans=0.025 2024-09-19 01:39:18,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=564700.0, ans=0.0 2024-09-19 01:39:49,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=564780.0, ans=0.0 2024-09-19 01:39:50,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=564780.0, ans=0.2 2024-09-19 01:39:54,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.69 vs. limit=15.0 2024-09-19 01:39:55,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=564780.0, ans=0.125 2024-09-19 01:40:01,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.12 vs. limit=15.0 2024-09-19 01:40:13,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=564860.0, ans=0.0 2024-09-19 01:40:14,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=564860.0, ans=0.025 2024-09-19 01:40:29,654 INFO [train.py:1198] (0/2) Epoch 32, batch 950, loss[loss=0.2278, ctc_loss=0.1095, cr_loss=0.325, attn_decoder_loss=0.2337, over 29514.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1199, cr_loss=0.362, attn_decoder_loss=0.2422, over 5743132.69 frames. ], batch size: 74, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:40:36,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=564900.0, ans=0.125 2024-09-19 01:40:40,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=564900.0, ans=0.125 2024-09-19 01:40:41,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=564900.0, ans=0.09899494936611666 2024-09-19 01:40:53,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564940.0, ans=0.1 2024-09-19 01:40:59,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.678e+01 9.053e+01 9.826e+01 2.124e+02, threshold=1.811e+02, percent-clipped=3.0 2024-09-19 01:41:16,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-09-19 01:41:19,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=565020.0, ans=0.0 2024-09-19 01:41:25,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=565020.0, ans=0.2 2024-09-19 01:41:34,500 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:41:35,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=565060.0, ans=0.0 2024-09-19 01:41:47,541 INFO [train.py:1198] (0/2) Epoch 32, batch 1000, loss[loss=0.2213, ctc_loss=0.1081, cr_loss=0.3287, attn_decoder_loss=0.2266, over 29488.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1204, cr_loss=0.363, attn_decoder_loss=0.2427, over 5738069.66 frames. ], batch size: 77, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:42:03,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=565140.0, ans=0.2 2024-09-19 01:42:06,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=565140.0, ans=0.0 2024-09-19 01:42:19,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=565180.0, ans=0.04949747468305833 2024-09-19 01:42:31,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=565180.0, ans=10.0 2024-09-19 01:42:54,981 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:43:05,369 INFO [train.py:1198] (0/2) Epoch 32, batch 1050, loss[loss=0.2464, ctc_loss=0.1235, cr_loss=0.387, attn_decoder_loss=0.2515, over 29690.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1202, cr_loss=0.3629, attn_decoder_loss=0.2423, over 5747486.47 frames. ], batch size: 85, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:43:22,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.38 vs. limit=22.5 2024-09-19 01:43:27,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=565340.0, ans=0.04949747468305833 2024-09-19 01:43:30,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-19 01:43:31,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=565340.0, ans=0.125 2024-09-19 01:43:32,742 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.597e+01 9.014e+01 9.453e+01 2.467e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 01:43:40,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=565380.0, ans=0.0 2024-09-19 01:43:54,948 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2024-09-19 01:44:21,432 INFO [train.py:1198] (0/2) Epoch 32, batch 1100, loss[loss=0.2285, ctc_loss=0.1125, cr_loss=0.3567, attn_decoder_loss=0.2335, over 29427.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1201, cr_loss=0.3625, attn_decoder_loss=0.242, over 5757355.57 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:44:21,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=565500.0, ans=0.2 2024-09-19 01:44:43,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=565540.0, ans=0.125 2024-09-19 01:44:57,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=565580.0, ans=0.0 2024-09-19 01:45:22,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=565620.0, ans=0.0 2024-09-19 01:45:31,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=565660.0, ans=0.125 2024-09-19 01:45:31,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=565660.0, ans=0.1 2024-09-19 01:45:40,254 INFO [train.py:1198] (0/2) Epoch 32, batch 1150, loss[loss=0.2368, ctc_loss=0.122, cr_loss=0.3689, attn_decoder_loss=0.2414, over 29462.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1201, cr_loss=0.3625, attn_decoder_loss=0.2418, over 5755691.86 frames. ], batch size: 78, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:45:45,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=565700.0, ans=0.025 2024-09-19 01:45:52,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=565700.0, ans=0.04949747468305833 2024-09-19 01:45:56,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=565740.0, ans=0.025 2024-09-19 01:45:57,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565740.0, ans=0.125 2024-09-19 01:46:02,055 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:46:10,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.460e+01 8.830e+01 9.335e+01 1.572e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-19 01:46:16,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=565780.0, ans=0.0 2024-09-19 01:46:24,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565780.0, ans=0.125 2024-09-19 01:46:25,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=565780.0, ans=0.125 2024-09-19 01:46:34,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=565820.0, ans=0.0 2024-09-19 01:46:47,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.42 vs. limit=15.0 2024-09-19 01:46:58,702 INFO [train.py:1198] (0/2) Epoch 32, batch 1200, loss[loss=0.2518, ctc_loss=0.125, cr_loss=0.3833, attn_decoder_loss=0.2574, over 29676.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1205, cr_loss=0.3635, attn_decoder_loss=0.2426, over 5747669.00 frames. ], batch size: 85, lr: 3.47e-03, grad_scale: 16.0 2024-09-19 01:46:58,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=565900.0, ans=0.125 2024-09-19 01:47:06,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=565900.0, ans=0.125 2024-09-19 01:47:09,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=565900.0, ans=0.0 2024-09-19 01:47:30,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=565980.0, ans=0.125 2024-09-19 01:47:34,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=565980.0, ans=0.125 2024-09-19 01:47:52,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=566020.0, ans=0.0 2024-09-19 01:48:14,673 INFO [train.py:1198] (0/2) Epoch 32, batch 1250, loss[loss=0.2558, ctc_loss=0.1311, cr_loss=0.3914, attn_decoder_loss=0.2609, over 29516.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1208, cr_loss=0.3645, attn_decoder_loss=0.2431, over 5774262.56 frames. ], batch size: 92, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:48:34,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=566140.0, ans=0.0 2024-09-19 01:48:43,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.626e+01 9.127e+01 9.598e+01 1.741e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-19 01:48:52,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=566180.0, ans=0.1 2024-09-19 01:48:55,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=566180.0, ans=0.5 2024-09-19 01:49:07,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=566220.0, ans=0.0 2024-09-19 01:49:32,736 INFO [train.py:1198] (0/2) Epoch 32, batch 1300, loss[loss=0.2436, ctc_loss=0.1161, cr_loss=0.3545, attn_decoder_loss=0.2499, over 28391.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1204, cr_loss=0.3631, attn_decoder_loss=0.2425, over 5780191.12 frames. ], batch size: 111, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:49:33,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2024-09-19 01:49:37,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=566300.0, ans=0.025 2024-09-19 01:49:48,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=566340.0, ans=0.025 2024-09-19 01:50:05,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566380.0, ans=0.1 2024-09-19 01:50:16,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=566380.0, ans=0.09899494936611666 2024-09-19 01:50:26,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=566420.0, ans=0.125 2024-09-19 01:50:45,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=566460.0, ans=0.125 2024-09-19 01:50:51,302 INFO [train.py:1198] (0/2) Epoch 32, batch 1350, loss[loss=0.2448, ctc_loss=0.125, cr_loss=0.3688, attn_decoder_loss=0.2499, over 29737.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1198, cr_loss=0.3624, attn_decoder_loss=0.242, over 5797837.80 frames. ], batch size: 81, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:51:19,499 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.117e+01 8.702e+01 9.337e+01 1.229e+02, threshold=1.740e+02, percent-clipped=0.0 2024-09-19 01:51:21,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=566580.0, ans=0.05 2024-09-19 01:51:24,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-09-19 01:51:42,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=566620.0, ans=0.125 2024-09-19 01:52:06,517 INFO [train.py:1198] (0/2) Epoch 32, batch 1400, loss[loss=0.2097, ctc_loss=0.09831, cr_loss=0.3106, attn_decoder_loss=0.2151, over 29574.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1198, cr_loss=0.3628, attn_decoder_loss=0.242, over 5808864.39 frames. ], batch size: 69, lr: 3.47e-03, grad_scale: 8.0 2024-09-19 01:52:15,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=566700.0, ans=0.0 2024-09-19 01:52:46,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=566780.0, ans=0.0 2024-09-19 01:52:52,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566820.0, ans=0.125 2024-09-19 01:52:54,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=566820.0, ans=0.125 2024-09-19 01:53:04,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=12.0 2024-09-19 01:53:06,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=566820.0, ans=0.1 2024-09-19 01:53:08,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=566860.0, ans=0.2 2024-09-19 01:53:24,559 INFO [train.py:1198] (0/2) Epoch 32, batch 1450, loss[loss=0.263, ctc_loss=0.1416, cr_loss=0.4136, attn_decoder_loss=0.2673, over 29424.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1197, cr_loss=0.3625, attn_decoder_loss=0.2423, over 5807296.71 frames. ], batch size: 94, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:53:26,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=566900.0, ans=0.125 2024-09-19 01:53:30,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=566900.0, ans=15.0 2024-09-19 01:53:32,550 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:53:53,182 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.580e+01 8.959e+01 9.480e+01 1.633e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 01:54:07,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=566980.0, ans=0.0 2024-09-19 01:54:10,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=567020.0, ans=0.125 2024-09-19 01:54:36,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=567060.0, ans=0.125 2024-09-19 01:54:36,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=567060.0, ans=0.0 2024-09-19 01:54:38,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=567060.0, ans=0.0 2024-09-19 01:54:41,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=567100.0, ans=0.0 2024-09-19 01:54:42,379 INFO [train.py:1198] (0/2) Epoch 32, batch 1500, loss[loss=0.2522, ctc_loss=0.1298, cr_loss=0.3754, attn_decoder_loss=0.2575, over 29639.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1201, cr_loss=0.3632, attn_decoder_loss=0.2428, over 5807608.84 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:54:57,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=567140.0, ans=0.1 2024-09-19 01:55:12,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=567180.0, ans=0.0 2024-09-19 01:55:14,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=567180.0, ans=0.2 2024-09-19 01:55:17,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567180.0, ans=0.125 2024-09-19 01:55:27,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567220.0, ans=0.125 2024-09-19 01:55:30,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-09-19 01:55:36,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=567220.0, ans=0.0 2024-09-19 01:55:48,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=567260.0, ans=0.125 2024-09-19 01:55:55,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=567260.0, ans=0.2 2024-09-19 01:55:58,503 INFO [train.py:1198] (0/2) Epoch 32, batch 1550, loss[loss=0.2631, ctc_loss=0.1481, cr_loss=0.4005, attn_decoder_loss=0.2669, over 29469.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1206, cr_loss=0.364, attn_decoder_loss=0.2429, over 5783454.24 frames. ], batch size: 90, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:56:00,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=567300.0, ans=0.025 2024-09-19 01:56:16,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=567340.0, ans=0.0 2024-09-19 01:56:17,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=567340.0, ans=0.0 2024-09-19 01:56:27,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.583e+01 9.090e+01 9.539e+01 2.299e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 01:56:46,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=567420.0, ans=0.125 2024-09-19 01:57:03,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-09-19 01:57:12,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=567460.0, ans=0.125 2024-09-19 01:57:16,197 INFO [train.py:1198] (0/2) Epoch 32, batch 1600, loss[loss=0.2417, ctc_loss=0.1148, cr_loss=0.3494, attn_decoder_loss=0.2481, over 29662.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1204, cr_loss=0.3633, attn_decoder_loss=0.2429, over 5766529.19 frames. ], batch size: 85, lr: 3.46e-03, grad_scale: 16.0 2024-09-19 01:57:16,542 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:58:04,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567620.0, ans=0.1 2024-09-19 01:58:13,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=567620.0, ans=0.125 2024-09-19 01:58:34,170 INFO [train.py:1198] (0/2) Epoch 32, batch 1650, loss[loss=0.2446, ctc_loss=0.1275, cr_loss=0.3706, attn_decoder_loss=0.2494, over 29716.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1199, cr_loss=0.3618, attn_decoder_loss=0.2424, over 5762219.95 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 01:58:39,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=567700.0, ans=0.0 2024-09-19 01:58:48,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=567740.0, ans=0.2 2024-09-19 01:58:52,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=567740.0, ans=0.125 2024-09-19 01:59:00,018 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:59:03,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=567780.0, ans=0.0 2024-09-19 01:59:04,225 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.351e+01 8.988e+01 9.892e+01 1.504e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 01:59:06,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=567780.0, ans=0.0 2024-09-19 01:59:21,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2024-09-19 01:59:49,522 INFO [train.py:1198] (0/2) Epoch 32, batch 1700, loss[loss=0.2148, ctc_loss=0.09969, cr_loss=0.3216, attn_decoder_loss=0.2204, over 29598.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1196, cr_loss=0.3611, attn_decoder_loss=0.2422, over 5783371.95 frames. ], batch size: 69, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:00:05,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.86 vs. limit=15.0 2024-09-19 02:00:12,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567940.0, ans=0.125 2024-09-19 02:00:17,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.62 vs. limit=15.0 2024-09-19 02:00:40,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=568020.0, ans=0.0 2024-09-19 02:00:53,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=22.5 2024-09-19 02:01:08,120 INFO [train.py:1198] (0/2) Epoch 32, batch 1750, loss[loss=0.2145, ctc_loss=0.1088, cr_loss=0.3366, attn_decoder_loss=0.2188, over 29329.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1199, cr_loss=0.3617, attn_decoder_loss=0.2422, over 5790792.74 frames. ], batch size: 67, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:01:10,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-19 02:01:20,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=568100.0, ans=0.0 2024-09-19 02:01:33,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-09-19 02:01:40,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.621e+01 8.991e+01 9.586e+01 2.043e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-19 02:01:44,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.66 vs. limit=15.0 2024-09-19 02:02:23,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-09-19 02:02:25,099 INFO [train.py:1198] (0/2) Epoch 32, batch 1800, loss[loss=0.2442, ctc_loss=0.1253, cr_loss=0.373, attn_decoder_loss=0.2491, over 29698.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1201, cr_loss=0.3625, attn_decoder_loss=0.2425, over 5792478.40 frames. ], batch size: 83, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:03:09,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=568420.0, ans=0.0 2024-09-19 02:03:13,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=568420.0, ans=10.0 2024-09-19 02:03:14,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=568420.0, ans=0.125 2024-09-19 02:03:15,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=568420.0, ans=0.025 2024-09-19 02:03:20,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=568420.0, ans=22.5 2024-09-19 02:03:36,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=568460.0, ans=0.07 2024-09-19 02:03:41,086 INFO [train.py:1198] (0/2) Epoch 32, batch 1850, loss[loss=0.2464, ctc_loss=0.1282, cr_loss=0.364, attn_decoder_loss=0.2515, over 29636.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1194, cr_loss=0.3612, attn_decoder_loss=0.242, over 5796025.41 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:03:47,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=568500.0, ans=0.0 2024-09-19 02:03:47,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=568500.0, ans=0.125 2024-09-19 02:03:50,533 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:03:51,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.08 vs. limit=15.0 2024-09-19 02:03:53,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=568500.0, ans=0.125 2024-09-19 02:03:59,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=568540.0, ans=0.0 2024-09-19 02:04:06,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=568540.0, ans=0.0 2024-09-19 02:04:09,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=568580.0, ans=0.0 2024-09-19 02:04:11,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 8.550e+01 9.044e+01 9.477e+01 1.404e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 02:04:48,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=568660.0, ans=0.125 2024-09-19 02:04:48,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-19 02:04:58,450 INFO [train.py:1198] (0/2) Epoch 32, batch 1900, loss[loss=0.2451, ctc_loss=0.1199, cr_loss=0.3599, attn_decoder_loss=0.251, over 29701.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1201, cr_loss=0.3627, attn_decoder_loss=0.2428, over 5803064.09 frames. ], batch size: 89, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:05:00,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=568700.0, ans=0.2 2024-09-19 02:05:01,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=568700.0, ans=0.125 2024-09-19 02:05:26,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=568740.0, ans=0.015 2024-09-19 02:05:29,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568780.0, ans=0.1 2024-09-19 02:05:36,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=568780.0, ans=0.125 2024-09-19 02:05:39,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=568780.0, ans=0.2 2024-09-19 02:05:48,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=568820.0, ans=0.0 2024-09-19 02:05:54,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=568820.0, ans=0.0 2024-09-19 02:05:56,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.40 vs. limit=22.5 2024-09-19 02:06:03,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=568860.0, ans=0.05 2024-09-19 02:06:09,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=568860.0, ans=0.125 2024-09-19 02:06:11,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=568860.0, ans=0.07 2024-09-19 02:06:16,862 INFO [train.py:1198] (0/2) Epoch 32, batch 1950, loss[loss=0.2395, ctc_loss=0.1269, cr_loss=0.3902, attn_decoder_loss=0.2433, over 29450.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1209, cr_loss=0.3646, attn_decoder_loss=0.244, over 5818477.84 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:06:22,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-19 02:06:23,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=568900.0, ans=0.125 2024-09-19 02:06:31,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=8.0 2024-09-19 02:06:47,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.678e+01 9.081e+01 9.709e+01 1.589e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 02:07:02,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=569020.0, ans=0.125 2024-09-19 02:07:32,377 INFO [train.py:1198] (0/2) Epoch 32, batch 2000, loss[loss=0.2099, ctc_loss=0.09901, cr_loss=0.3133, attn_decoder_loss=0.2152, over 29297.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1211, cr_loss=0.3647, attn_decoder_loss=0.2443, over 5796015.56 frames. ], batch size: 67, lr: 3.46e-03, grad_scale: 16.0 2024-09-19 02:07:40,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-09-19 02:07:52,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569140.0, ans=0.125 2024-09-19 02:07:57,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=569140.0, ans=0.0 2024-09-19 02:07:58,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=569140.0, ans=10.0 2024-09-19 02:08:03,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=569180.0, ans=0.2 2024-09-19 02:08:08,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-19 02:08:22,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=569220.0, ans=0.125 2024-09-19 02:08:38,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=569260.0, ans=0.0 2024-09-19 02:08:50,443 INFO [train.py:1198] (0/2) Epoch 32, batch 2050, loss[loss=0.2078, ctc_loss=0.1015, cr_loss=0.3058, attn_decoder_loss=0.2128, over 29431.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1203, cr_loss=0.3633, attn_decoder_loss=0.2433, over 5788399.31 frames. ], batch size: 70, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:09:07,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=569340.0, ans=0.0 2024-09-19 02:09:21,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=569380.0, ans=0.2 2024-09-19 02:09:21,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=569380.0, ans=0.025 2024-09-19 02:09:24,448 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.440e+01 8.893e+01 9.652e+01 5.207e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-19 02:09:32,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=569380.0, ans=0.2 2024-09-19 02:09:47,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-19 02:09:50,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=569420.0, ans=0.0 2024-09-19 02:10:00,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=569460.0, ans=0.125 2024-09-19 02:10:01,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569460.0, ans=0.125 2024-09-19 02:10:08,280 INFO [train.py:1198] (0/2) Epoch 32, batch 2100, loss[loss=0.2365, ctc_loss=0.1136, cr_loss=0.3439, attn_decoder_loss=0.2425, over 29755.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1197, cr_loss=0.362, attn_decoder_loss=0.2427, over 5799201.05 frames. ], batch size: 81, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:10:14,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=569500.0, ans=0.125 2024-09-19 02:10:29,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=569540.0, ans=0.0 2024-09-19 02:10:29,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=569540.0, ans=0.125 2024-09-19 02:10:35,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569540.0, ans=0.1 2024-09-19 02:10:38,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=569580.0, ans=0.125 2024-09-19 02:10:52,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=569620.0, ans=0.125 2024-09-19 02:10:52,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569620.0, ans=0.1 2024-09-19 02:10:58,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.54 vs. limit=15.0 2024-09-19 02:11:04,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=569620.0, ans=0.125 2024-09-19 02:11:23,559 INFO [train.py:1198] (0/2) Epoch 32, batch 2150, loss[loss=0.2369, ctc_loss=0.1243, cr_loss=0.3696, attn_decoder_loss=0.2411, over 29447.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1193, cr_loss=0.3611, attn_decoder_loss=0.2419, over 5813979.27 frames. ], batch size: 78, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:11:43,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=569740.0, ans=0.125 2024-09-19 02:11:55,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.476e+01 8.874e+01 9.335e+01 1.569e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-19 02:11:57,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=569780.0, ans=0.0 2024-09-19 02:11:57,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-19 02:12:04,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=569780.0, ans=0.0 2024-09-19 02:12:24,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=569860.0, ans=0.0 2024-09-19 02:12:24,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-09-19 02:12:41,303 INFO [train.py:1198] (0/2) Epoch 32, batch 2200, loss[loss=0.2461, ctc_loss=0.1215, cr_loss=0.3586, attn_decoder_loss=0.252, over 29620.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1192, cr_loss=0.3606, attn_decoder_loss=0.2418, over 5811052.55 frames. ], batch size: 86, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:13:02,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=569940.0, ans=0.2 2024-09-19 02:13:04,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569940.0, ans=0.1 2024-09-19 02:13:10,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569980.0, ans=0.125 2024-09-19 02:13:23,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=569980.0, ans=0.125 2024-09-19 02:13:36,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=8.0 2024-09-19 02:13:46,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=570060.0, ans=0.0 2024-09-19 02:13:58,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=570100.0, ans=0.0 2024-09-19 02:13:59,411 INFO [train.py:1198] (0/2) Epoch 32, batch 2250, loss[loss=0.2424, ctc_loss=0.1229, cr_loss=0.3619, attn_decoder_loss=0.2476, over 29694.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1187, cr_loss=0.36, attn_decoder_loss=0.2416, over 5811805.26 frames. ], batch size: 82, lr: 3.46e-03, grad_scale: 8.0 2024-09-19 02:14:09,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-19 02:14:13,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=570140.0, ans=0.0 2024-09-19 02:14:21,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=570140.0, ans=10.0 2024-09-19 02:14:31,247 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.789e+01 8.483e+01 9.181e+01 9.844e+01 2.065e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-19 02:14:36,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.07 vs. limit=22.5 2024-09-19 02:14:43,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=570220.0, ans=0.0 2024-09-19 02:14:43,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=570220.0, ans=0.0 2024-09-19 02:15:15,151 INFO [train.py:1198] (0/2) Epoch 32, batch 2300, loss[loss=0.199, ctc_loss=0.09008, cr_loss=0.2916, attn_decoder_loss=0.2047, over 29317.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1185, cr_loss=0.3595, attn_decoder_loss=0.2409, over 5798775.77 frames. ], batch size: 71, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:15:39,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=570340.0, ans=0.0 2024-09-19 02:15:39,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=570340.0, ans=0.0 2024-09-19 02:15:50,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570380.0, ans=0.1 2024-09-19 02:16:00,261 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-19 02:16:19,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=570460.0, ans=0.125 2024-09-19 02:16:23,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-19 02:16:31,319 INFO [train.py:1198] (0/2) Epoch 32, batch 2350, loss[loss=0.2526, ctc_loss=0.1303, cr_loss=0.3708, attn_decoder_loss=0.258, over 29676.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1188, cr_loss=0.36, attn_decoder_loss=0.2412, over 5803717.88 frames. ], batch size: 83, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:16:41,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2024-09-19 02:16:43,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=15.0 2024-09-19 02:16:48,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-09-19 02:17:07,393 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.588e+01 9.289e+01 9.851e+01 1.770e+02, threshold=1.858e+02, percent-clipped=0.0 2024-09-19 02:17:13,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=22.5 2024-09-19 02:17:21,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=570620.0, ans=0.125 2024-09-19 02:17:51,218 INFO [train.py:1198] (0/2) Epoch 32, batch 2400, loss[loss=0.2301, ctc_loss=0.1212, cr_loss=0.3814, attn_decoder_loss=0.2337, over 29523.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1196, cr_loss=0.3612, attn_decoder_loss=0.2419, over 5807819.60 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 16.0 2024-09-19 02:17:54,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=570700.0, ans=0.125 2024-09-19 02:18:03,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=570700.0, ans=0.125 2024-09-19 02:18:19,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-09-19 02:18:49,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=570820.0, ans=0.2 2024-09-19 02:19:05,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-19 02:19:07,320 INFO [train.py:1198] (0/2) Epoch 32, batch 2450, loss[loss=0.2439, ctc_loss=0.1171, cr_loss=0.3568, attn_decoder_loss=0.2501, over 29729.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1199, cr_loss=0.3617, attn_decoder_loss=0.2427, over 5784745.51 frames. ], batch size: 82, lr: 3.45e-03, grad_scale: 16.0 2024-09-19 02:19:11,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=570900.0, ans=0.125 2024-09-19 02:19:11,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=570900.0, ans=0.1 2024-09-19 02:19:12,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-19 02:19:27,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.17 vs. limit=15.0 2024-09-19 02:19:39,210 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.392e+01 8.947e+01 9.639e+01 2.320e+02, threshold=1.789e+02, percent-clipped=2.0 2024-09-19 02:19:45,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=570980.0, ans=0.025 2024-09-19 02:20:06,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=571060.0, ans=0.125 2024-09-19 02:20:10,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.29 vs. limit=15.0 2024-09-19 02:20:16,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=571060.0, ans=0.125 2024-09-19 02:20:23,383 INFO [train.py:1198] (0/2) Epoch 32, batch 2500, loss[loss=0.2454, ctc_loss=0.1216, cr_loss=0.3752, attn_decoder_loss=0.2509, over 29646.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1199, cr_loss=0.3619, attn_decoder_loss=0.2427, over 5794706.97 frames. ], batch size: 86, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:20:36,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=571100.0, ans=0.125 2024-09-19 02:20:39,682 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:20:53,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=571140.0, ans=0.0 2024-09-19 02:21:06,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=571180.0, ans=0.09899494936611666 2024-09-19 02:21:11,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=571180.0, ans=0.125 2024-09-19 02:21:24,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=571220.0, ans=0.125 2024-09-19 02:21:28,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-09-19 02:21:41,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=571260.0, ans=0.0 2024-09-19 02:21:44,430 INFO [train.py:1198] (0/2) Epoch 32, batch 2550, loss[loss=0.2194, ctc_loss=0.1134, cr_loss=0.3563, attn_decoder_loss=0.2233, over 29366.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1197, cr_loss=0.3617, attn_decoder_loss=0.2424, over 5798583.78 frames. ], batch size: 67, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:22:13,999 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.51 vs. limit=15.0 2024-09-19 02:22:17,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.677e+01 9.042e+01 9.632e+01 1.838e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 02:23:00,327 INFO [train.py:1198] (0/2) Epoch 32, batch 2600, loss[loss=0.2309, ctc_loss=0.1151, cr_loss=0.3733, attn_decoder_loss=0.2355, over 29448.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1197, cr_loss=0.3621, attn_decoder_loss=0.2428, over 5794317.76 frames. ], batch size: 78, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:23:00,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=571500.0, ans=0.125 2024-09-19 02:23:06,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=571500.0, ans=0.0 2024-09-19 02:23:11,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=571500.0, ans=0.1 2024-09-19 02:23:21,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2024-09-19 02:23:28,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=571580.0, ans=0.0 2024-09-19 02:23:53,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=571620.0, ans=0.125 2024-09-19 02:24:11,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.44 vs. limit=15.0 2024-09-19 02:24:15,312 INFO [train.py:1198] (0/2) Epoch 32, batch 2650, loss[loss=0.2598, ctc_loss=0.1355, cr_loss=0.3937, attn_decoder_loss=0.2649, over 29256.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.12, cr_loss=0.3624, attn_decoder_loss=0.2431, over 5800284.07 frames. ], batch size: 100, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:24:17,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=571700.0, ans=0.0 2024-09-19 02:24:35,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=571740.0, ans=0.125 2024-09-19 02:24:52,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.436e+01 8.918e+01 9.348e+01 1.627e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 02:25:00,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=571780.0, ans=0.5 2024-09-19 02:25:03,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=571820.0, ans=0.0 2024-09-19 02:25:12,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=571820.0, ans=0.0 2024-09-19 02:25:18,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=571860.0, ans=0.0 2024-09-19 02:25:21,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571860.0, ans=0.1 2024-09-19 02:25:26,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-19 02:25:34,876 INFO [train.py:1198] (0/2) Epoch 32, batch 2700, loss[loss=0.2473, ctc_loss=0.1167, cr_loss=0.3612, attn_decoder_loss=0.2537, over 29540.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1199, cr_loss=0.3621, attn_decoder_loss=0.2433, over 5796575.24 frames. ], batch size: 87, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:25:42,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=571900.0, ans=0.125 2024-09-19 02:25:48,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=571940.0, ans=0.0 2024-09-19 02:25:54,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=571940.0, ans=0.2 2024-09-19 02:26:21,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=572020.0, ans=0.0 2024-09-19 02:26:51,226 INFO [train.py:1198] (0/2) Epoch 32, batch 2750, loss[loss=0.2305, ctc_loss=0.1196, cr_loss=0.362, attn_decoder_loss=0.2348, over 29502.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1194, cr_loss=0.3611, attn_decoder_loss=0.2423, over 5796528.23 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:27:08,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.21 vs. limit=15.0 2024-09-19 02:27:12,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=572140.0, ans=0.125 2024-09-19 02:27:24,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.589e+01 9.060e+01 9.796e+01 2.270e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-19 02:27:26,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=572180.0, ans=0.125 2024-09-19 02:27:29,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.52 vs. limit=15.0 2024-09-19 02:27:44,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=572220.0, ans=0.0 2024-09-19 02:28:07,088 INFO [train.py:1198] (0/2) Epoch 32, batch 2800, loss[loss=0.2482, ctc_loss=0.1333, cr_loss=0.374, attn_decoder_loss=0.2526, over 20421.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1198, cr_loss=0.3614, attn_decoder_loss=0.2426, over 5777527.12 frames. ], batch size: 210, lr: 3.45e-03, grad_scale: 16.0 2024-09-19 02:28:07,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=572300.0, ans=0.2 2024-09-19 02:28:37,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=572340.0, ans=0.125 2024-09-19 02:28:39,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=572340.0, ans=0.0 2024-09-19 02:28:39,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=572340.0, ans=0.125 2024-09-19 02:28:56,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.85 vs. limit=22.5 2024-09-19 02:28:58,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.29 vs. limit=5.0 2024-09-19 02:29:13,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=572460.0, ans=0.0 2024-09-19 02:29:18,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=572460.0, ans=0.125 2024-09-19 02:29:26,874 INFO [train.py:1198] (0/2) Epoch 32, batch 2850, loss[loss=0.234, ctc_loss=0.1167, cr_loss=0.3515, attn_decoder_loss=0.2392, over 29526.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1204, cr_loss=0.3626, attn_decoder_loss=0.2431, over 5762170.78 frames. ], batch size: 77, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:29:31,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=572500.0, ans=0.2 2024-09-19 02:29:43,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572540.0, ans=0.1 2024-09-19 02:30:00,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572580.0, ans=0.1 2024-09-19 02:30:01,858 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.715e+01 9.222e+01 9.934e+01 2.539e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-19 02:30:15,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=572620.0, ans=0.125 2024-09-19 02:30:20,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572620.0, ans=0.1 2024-09-19 02:30:32,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=572660.0, ans=0.0 2024-09-19 02:30:33,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=572660.0, ans=0.025 2024-09-19 02:30:39,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=572660.0, ans=0.125 2024-09-19 02:30:42,446 INFO [train.py:1198] (0/2) Epoch 32, batch 2900, loss[loss=0.2353, ctc_loss=0.1207, cr_loss=0.3678, attn_decoder_loss=0.2399, over 29435.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.121, cr_loss=0.3643, attn_decoder_loss=0.2443, over 5787927.33 frames. ], batch size: 79, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:31:27,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=22.5 2024-09-19 02:31:28,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=572820.0, ans=0.1 2024-09-19 02:31:42,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=572860.0, ans=0.125 2024-09-19 02:31:58,403 INFO [train.py:1198] (0/2) Epoch 32, batch 2950, loss[loss=0.2269, ctc_loss=0.1175, cr_loss=0.3438, attn_decoder_loss=0.2315, over 29525.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1202, cr_loss=0.3626, attn_decoder_loss=0.2428, over 5782785.14 frames. ], batch size: 75, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:31:58,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=572900.0, ans=0.125 2024-09-19 02:32:37,782 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.544e+01 8.997e+01 9.588e+01 2.155e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-19 02:32:45,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=572980.0, ans=0.125 2024-09-19 02:32:56,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=573020.0, ans=0.07 2024-09-19 02:33:03,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=573060.0, ans=0.1 2024-09-19 02:33:06,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=573060.0, ans=0.125 2024-09-19 02:33:08,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=573060.0, ans=0.125 2024-09-19 02:33:08,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=573060.0, ans=0.125 2024-09-19 02:33:10,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-19 02:33:11,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=573060.0, ans=0.0 2024-09-19 02:33:18,456 INFO [train.py:1198] (0/2) Epoch 32, batch 3000, loss[loss=0.2374, ctc_loss=0.1178, cr_loss=0.3647, attn_decoder_loss=0.2426, over 29766.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1201, cr_loss=0.3619, attn_decoder_loss=0.2426, over 5784061.52 frames. ], batch size: 81, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:33:18,456 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 02:33:36,936 INFO [train.py:1230] (0/2) Epoch 32, validation: loss=0.2117, ctc_loss=0.0367, cr_loss=5.626e-15, attn_decoder_loss=0.2311, over 944034.00 frames. 2024-09-19 02:33:36,937 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 02:33:41,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=573100.0, ans=0.2 2024-09-19 02:33:59,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-09-19 02:33:59,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-19 02:34:01,777 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:34:03,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573140.0, ans=0.1 2024-09-19 02:34:04,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=573140.0, ans=0.1 2024-09-19 02:34:11,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-09-19 02:34:13,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=573180.0, ans=0.0 2024-09-19 02:34:18,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-19 02:34:34,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=573220.0, ans=0.125 2024-09-19 02:34:34,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573220.0, ans=0.125 2024-09-19 02:34:52,857 INFO [train.py:1198] (0/2) Epoch 32, batch 3050, loss[loss=0.2254, ctc_loss=0.1135, cr_loss=0.3308, attn_decoder_loss=0.2305, over 29539.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1208, cr_loss=0.3635, attn_decoder_loss=0.2433, over 5777459.39 frames. ], batch size: 76, lr: 3.45e-03, grad_scale: 8.0 2024-09-19 02:35:11,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=573340.0, ans=0.125 2024-09-19 02:35:19,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.46 vs. limit=15.0 2024-09-19 02:35:27,641 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.755e+01 9.253e+01 9.957e+01 1.667e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-19 02:35:42,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.39 vs. limit=22.5 2024-09-19 02:35:47,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=573420.0, ans=0.2 2024-09-19 02:35:53,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2024-09-19 02:36:11,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=573500.0, ans=0.125 2024-09-19 02:36:12,286 INFO [train.py:1198] (0/2) Epoch 32, batch 3100, loss[loss=0.2542, ctc_loss=0.1323, cr_loss=0.4011, attn_decoder_loss=0.2588, over 29200.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1201, cr_loss=0.3623, attn_decoder_loss=0.2429, over 5776365.54 frames. ], batch size: 100, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:36:23,046 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:36:23,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=573500.0, ans=0.0 2024-09-19 02:36:42,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=573580.0, ans=0.125 2024-09-19 02:36:44,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=573580.0, ans=0.0 2024-09-19 02:37:15,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=573660.0, ans=0.0 2024-09-19 02:37:28,501 INFO [train.py:1198] (0/2) Epoch 32, batch 3150, loss[loss=0.2581, ctc_loss=0.1453, cr_loss=0.4177, attn_decoder_loss=0.2613, over 28789.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1199, cr_loss=0.3623, attn_decoder_loss=0.243, over 5783084.71 frames. ], batch size: 104, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:38:03,495 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.454e+01 8.918e+01 9.492e+01 5.119e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-19 02:38:09,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573780.0, ans=0.1 2024-09-19 02:38:20,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-19 02:38:21,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=573820.0, ans=0.0 2024-09-19 02:38:44,610 INFO [train.py:1198] (0/2) Epoch 32, batch 3200, loss[loss=0.249, ctc_loss=0.1333, cr_loss=0.3938, attn_decoder_loss=0.2531, over 29393.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1196, cr_loss=0.3618, attn_decoder_loss=0.2427, over 5794026.96 frames. ], batch size: 79, lr: 3.44e-03, grad_scale: 16.0 2024-09-19 02:38:55,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=573900.0, ans=0.0 2024-09-19 02:39:11,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2024-09-19 02:39:13,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573980.0, ans=0.1 2024-09-19 02:39:40,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=574020.0, ans=0.0 2024-09-19 02:39:48,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=574060.0, ans=0.125 2024-09-19 02:39:53,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=574060.0, ans=0.2 2024-09-19 02:40:04,582 INFO [train.py:1198] (0/2) Epoch 32, batch 3250, loss[loss=0.2432, ctc_loss=0.1202, cr_loss=0.3592, attn_decoder_loss=0.2489, over 29684.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1194, cr_loss=0.3614, attn_decoder_loss=0.2427, over 5800718.10 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 16.0 2024-09-19 02:40:09,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=574100.0, ans=0.0 2024-09-19 02:40:16,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574100.0, ans=0.1 2024-09-19 02:40:19,883 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:40:26,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=574140.0, ans=15.0 2024-09-19 02:40:40,324 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.537e+01 9.027e+01 9.508e+01 1.850e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-19 02:40:40,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=574180.0, ans=0.125 2024-09-19 02:40:44,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-19 02:40:55,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=574220.0, ans=0.125 2024-09-19 02:41:10,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=574260.0, ans=0.125 2024-09-19 02:41:19,689 INFO [train.py:1198] (0/2) Epoch 32, batch 3300, loss[loss=0.2456, ctc_loss=0.1139, cr_loss=0.3574, attn_decoder_loss=0.2523, over 28220.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1187, cr_loss=0.3603, attn_decoder_loss=0.2416, over 5798071.09 frames. ], batch size: 111, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:41:23,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=574300.0, ans=0.0 2024-09-19 02:41:37,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.21 vs. limit=22.5 2024-09-19 02:42:10,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=574420.0, ans=0.0 2024-09-19 02:42:21,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=15.0 2024-09-19 02:42:35,362 INFO [train.py:1198] (0/2) Epoch 32, batch 3350, loss[loss=0.2472, ctc_loss=0.127, cr_loss=0.3597, attn_decoder_loss=0.2526, over 28858.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1199, cr_loss=0.3619, attn_decoder_loss=0.2426, over 5774855.08 frames. ], batch size: 104, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:43:03,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=574540.0, ans=0.125 2024-09-19 02:43:11,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.605e+01 8.988e+01 9.712e+01 2.177e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 02:43:29,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-19 02:43:39,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-19 02:43:46,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=574660.0, ans=0.025 2024-09-19 02:43:55,669 INFO [train.py:1198] (0/2) Epoch 32, batch 3400, loss[loss=0.2088, ctc_loss=0.09359, cr_loss=0.3084, attn_decoder_loss=0.2148, over 29332.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1198, cr_loss=0.3616, attn_decoder_loss=0.2425, over 5767137.85 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:43:56,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-19 02:44:08,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-19 02:44:22,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=574740.0, ans=0.125 2024-09-19 02:44:35,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574780.0, ans=0.1 2024-09-19 02:44:48,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-19 02:45:00,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2024-09-19 02:45:11,594 INFO [train.py:1198] (0/2) Epoch 32, batch 3450, loss[loss=0.2545, ctc_loss=0.1269, cr_loss=0.3624, attn_decoder_loss=0.2606, over 28533.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1199, cr_loss=0.3615, attn_decoder_loss=0.2427, over 5775170.89 frames. ], batch size: 112, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:45:14,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=574900.0, ans=0.125 2024-09-19 02:45:16,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=574900.0, ans=0.125 2024-09-19 02:45:30,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=574940.0, ans=0.0 2024-09-19 02:45:47,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.523e+01 9.077e+01 9.652e+01 1.976e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 02:46:13,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=575060.0, ans=0.125 2024-09-19 02:46:21,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.80 vs. limit=15.0 2024-09-19 02:46:24,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575060.0, ans=0.1 2024-09-19 02:46:27,033 INFO [train.py:1198] (0/2) Epoch 32, batch 3500, loss[loss=0.2136, ctc_loss=0.09588, cr_loss=0.3229, attn_decoder_loss=0.2195, over 29316.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1197, cr_loss=0.3614, attn_decoder_loss=0.2421, over 5777122.96 frames. ], batch size: 71, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:46:27,371 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:46:48,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.37 vs. limit=15.0 2024-09-19 02:46:51,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=575140.0, ans=0.2 2024-09-19 02:47:23,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-09-19 02:47:34,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575260.0, ans=0.1 2024-09-19 02:47:42,065 INFO [train.py:1198] (0/2) Epoch 32, batch 3550, loss[loss=0.2422, ctc_loss=0.1162, cr_loss=0.3667, attn_decoder_loss=0.2481, over 29719.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1192, cr_loss=0.3606, attn_decoder_loss=0.242, over 5782094.82 frames. ], batch size: 89, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:47:47,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-09-19 02:48:03,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575340.0, ans=0.1 2024-09-19 02:48:19,432 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.288e+01 8.961e+01 9.598e+01 1.614e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 02:48:48,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=575460.0, ans=0.125 2024-09-19 02:48:52,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=575460.0, ans=0.125 2024-09-19 02:48:58,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=575500.0, ans=0.125 2024-09-19 02:49:00,121 INFO [train.py:1198] (0/2) Epoch 32, batch 3600, loss[loss=0.2216, ctc_loss=0.1079, cr_loss=0.351, attn_decoder_loss=0.2264, over 29496.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1189, cr_loss=0.3609, attn_decoder_loss=0.242, over 5791555.63 frames. ], batch size: 77, lr: 3.44e-03, grad_scale: 16.0 2024-09-19 02:49:03,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=575500.0, ans=0.0 2024-09-19 02:49:19,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=575540.0, ans=0.125 2024-09-19 02:49:30,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=575580.0, ans=0.125 2024-09-19 02:49:34,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=575580.0, ans=0.125 2024-09-19 02:50:14,517 INFO [train.py:1198] (0/2) Epoch 32, batch 3650, loss[loss=0.255, ctc_loss=0.1304, cr_loss=0.3743, attn_decoder_loss=0.2605, over 29490.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1189, cr_loss=0.3612, attn_decoder_loss=0.2417, over 5793270.74 frames. ], batch size: 90, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:50:23,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=575700.0, ans=0.0 2024-09-19 02:50:46,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=575780.0, ans=0.125 2024-09-19 02:50:51,849 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 8.427e+01 8.894e+01 9.403e+01 1.898e+02, threshold=1.779e+02, percent-clipped=1.0 2024-09-19 02:51:06,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=575820.0, ans=0.0 2024-09-19 02:51:29,100 INFO [train.py:1198] (0/2) Epoch 32, batch 3700, loss[loss=0.2522, ctc_loss=0.1276, cr_loss=0.3707, attn_decoder_loss=0.2578, over 29703.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1191, cr_loss=0.3617, attn_decoder_loss=0.2421, over 5802838.59 frames. ], batch size: 84, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:51:31,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-19 02:51:42,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=575940.0, ans=0.125 2024-09-19 02:52:05,314 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-144000.pt 2024-09-19 02:52:27,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576020.0, ans=0.1 2024-09-19 02:52:33,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=576020.0, ans=0.0 2024-09-19 02:52:51,309 INFO [train.py:1198] (0/2) Epoch 32, batch 3750, loss[loss=0.21, ctc_loss=0.1003, cr_loss=0.3261, attn_decoder_loss=0.215, over 29290.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1193, cr_loss=0.362, attn_decoder_loss=0.2421, over 5807171.82 frames. ], batch size: 67, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:53:05,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-19 02:53:19,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=576180.0, ans=0.125 2024-09-19 02:53:28,121 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.581e+01 9.002e+01 9.610e+01 1.544e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-19 02:53:42,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576220.0, ans=0.1 2024-09-19 02:54:05,629 INFO [train.py:1198] (0/2) Epoch 32, batch 3800, loss[loss=0.2331, ctc_loss=0.1127, cr_loss=0.3425, attn_decoder_loss=0.2389, over 29618.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1194, cr_loss=0.3617, attn_decoder_loss=0.2418, over 5795896.76 frames. ], batch size: 86, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:54:13,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=576300.0, ans=0.125 2024-09-19 02:54:14,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-19 02:54:25,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=576340.0, ans=0.07 2024-09-19 02:54:28,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=576340.0, ans=0.95 2024-09-19 02:54:34,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=576340.0, ans=0.1 2024-09-19 02:54:46,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=22.5 2024-09-19 02:54:50,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=576380.0, ans=0.125 2024-09-19 02:54:58,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=576420.0, ans=0.2 2024-09-19 02:55:10,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=576460.0, ans=0.0 2024-09-19 02:55:23,185 INFO [train.py:1198] (0/2) Epoch 32, batch 3850, loss[loss=0.2576, ctc_loss=0.1345, cr_loss=0.3925, attn_decoder_loss=0.2625, over 29323.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.119, cr_loss=0.3611, attn_decoder_loss=0.2414, over 5810051.81 frames. ], batch size: 100, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:55:47,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=576540.0, ans=0.0 2024-09-19 02:55:51,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576580.0, ans=0.1 2024-09-19 02:56:00,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.855e+01 8.481e+01 8.994e+01 9.437e+01 1.418e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-19 02:56:34,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=576660.0, ans=0.025 2024-09-19 02:56:37,438 INFO [train.py:1198] (0/2) Epoch 32, batch 3900, loss[loss=0.2471, ctc_loss=0.1231, cr_loss=0.3666, attn_decoder_loss=0.2527, over 29644.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1194, cr_loss=0.3614, attn_decoder_loss=0.2419, over 5815060.67 frames. ], batch size: 86, lr: 3.44e-03, grad_scale: 8.0 2024-09-19 02:56:41,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.97 vs. limit=15.0 2024-09-19 02:56:45,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=576700.0, ans=10.0 2024-09-19 02:56:49,753 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:57:22,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=576820.0, ans=0.2 2024-09-19 02:57:52,056 INFO [train.py:1198] (0/2) Epoch 32, batch 3950, loss[loss=0.2596, ctc_loss=0.1339, cr_loss=0.403, attn_decoder_loss=0.2646, over 29494.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1196, cr_loss=0.3623, attn_decoder_loss=0.2425, over 5834987.77 frames. ], batch size: 97, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 02:57:56,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576900.0, ans=0.1 2024-09-19 02:57:57,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.54 vs. limit=22.5 2024-09-19 02:58:05,027 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.80 vs. limit=5.0 2024-09-19 02:58:25,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-09-19 02:58:28,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.521e+01 9.029e+01 9.542e+01 2.820e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-19 02:58:31,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=576980.0, ans=0.0 2024-09-19 02:58:32,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2024-09-19 02:58:40,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=577020.0, ans=0.125 2024-09-19 02:58:42,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577020.0, ans=0.1 2024-09-19 02:59:00,463 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-19 02:59:05,620 INFO [train.py:1198] (0/2) Epoch 32, batch 4000, loss[loss=0.219, ctc_loss=0.102, cr_loss=0.3152, attn_decoder_loss=0.225, over 29505.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1197, cr_loss=0.3622, attn_decoder_loss=0.2423, over 5811754.00 frames. ], batch size: 74, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 02:59:38,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=577180.0, ans=0.125 2024-09-19 02:59:46,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577180.0, ans=0.125 2024-09-19 02:59:47,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=577180.0, ans=0.2 2024-09-19 03:00:21,893 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.21 vs. limit=10.0 2024-09-19 03:00:22,472 INFO [train.py:1198] (0/2) Epoch 32, batch 4050, loss[loss=0.2522, ctc_loss=0.137, cr_loss=0.3817, attn_decoder_loss=0.2565, over 20682.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1197, cr_loss=0.3621, attn_decoder_loss=0.242, over 5796045.76 frames. ], batch size: 210, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:00:53,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=577380.0, ans=0.125 2024-09-19 03:00:59,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.593e+01 9.238e+01 9.964e+01 1.548e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-19 03:01:10,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.57 vs. limit=6.0 2024-09-19 03:01:25,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.97 vs. limit=12.0 2024-09-19 03:01:36,060 INFO [train.py:1198] (0/2) Epoch 32, batch 4100, loss[loss=0.2507, ctc_loss=0.1241, cr_loss=0.3772, attn_decoder_loss=0.2564, over 29539.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1195, cr_loss=0.3619, attn_decoder_loss=0.242, over 5791807.89 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:01:42,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=577500.0, ans=0.07 2024-09-19 03:01:49,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577540.0, ans=0.1 2024-09-19 03:01:57,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=12.0 2024-09-19 03:02:08,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=577580.0, ans=0.1 2024-09-19 03:02:16,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577580.0, ans=0.0 2024-09-19 03:02:19,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577620.0, ans=0.1 2024-09-19 03:02:27,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-19 03:02:50,340 INFO [train.py:1198] (0/2) Epoch 32, batch 4150, loss[loss=0.2269, ctc_loss=0.1113, cr_loss=0.3386, attn_decoder_loss=0.2322, over 29517.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1192, cr_loss=0.3612, attn_decoder_loss=0.2415, over 5798028.74 frames. ], batch size: 77, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:03:03,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.90 vs. limit=22.5 2024-09-19 03:03:08,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=577740.0, ans=0.125 2024-09-19 03:03:11,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577740.0, ans=0.125 2024-09-19 03:03:12,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-19 03:03:12,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=577740.0, ans=0.09899494936611666 2024-09-19 03:03:14,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-19 03:03:15,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=577740.0, ans=0.125 2024-09-19 03:03:19,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=577780.0, ans=0.0 2024-09-19 03:03:24,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=577780.0, ans=0.0 2024-09-19 03:03:26,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.412e+01 8.911e+01 9.455e+01 1.648e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 03:03:27,381 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:03:30,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=577780.0, ans=10.0 2024-09-19 03:03:46,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=577820.0, ans=0.125 2024-09-19 03:03:59,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.72 vs. limit=10.0 2024-09-19 03:04:02,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=577860.0, ans=0.125 2024-09-19 03:04:04,960 INFO [train.py:1198] (0/2) Epoch 32, batch 4200, loss[loss=0.2514, ctc_loss=0.133, cr_loss=0.3924, attn_decoder_loss=0.2558, over 29507.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1193, cr_loss=0.361, attn_decoder_loss=0.2418, over 5799966.27 frames. ], batch size: 90, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:04:13,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=577900.0, ans=0.0 2024-09-19 03:04:21,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=577940.0, ans=0.125 2024-09-19 03:04:34,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577980.0, ans=0.125 2024-09-19 03:04:51,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=578020.0, ans=0.125 2024-09-19 03:04:59,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=578020.0, ans=0.025 2024-09-19 03:05:06,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578060.0, ans=0.1 2024-09-19 03:05:10,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=578060.0, ans=0.025 2024-09-19 03:05:19,408 INFO [train.py:1198] (0/2) Epoch 32, batch 4250, loss[loss=0.2108, ctc_loss=0.09751, cr_loss=0.3212, attn_decoder_loss=0.2162, over 29527.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1192, cr_loss=0.3609, attn_decoder_loss=0.2422, over 5805296.48 frames. ], batch size: 74, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:05:19,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=578100.0, ans=0.125 2024-09-19 03:05:28,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=578100.0, ans=0.0 2024-09-19 03:05:30,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=578100.0, ans=0.2 2024-09-19 03:05:39,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-19 03:05:57,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.485e+01 9.060e+01 9.670e+01 1.862e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 03:06:08,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.63 vs. limit=22.5 2024-09-19 03:06:13,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-19 03:06:33,361 INFO [train.py:1198] (0/2) Epoch 32, batch 4300, loss[loss=0.2497, ctc_loss=0.1253, cr_loss=0.3731, attn_decoder_loss=0.2553, over 29495.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1192, cr_loss=0.361, attn_decoder_loss=0.2425, over 5794684.64 frames. ], batch size: 87, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:06:41,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=578300.0, ans=0.125 2024-09-19 03:06:53,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=578340.0, ans=0.035 2024-09-19 03:07:01,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578340.0, ans=0.125 2024-09-19 03:07:47,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=578460.0, ans=0.125 2024-09-19 03:07:48,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=578500.0, ans=0.2 2024-09-19 03:07:50,081 INFO [train.py:1198] (0/2) Epoch 32, batch 4350, loss[loss=0.2521, ctc_loss=0.1357, cr_loss=0.3939, attn_decoder_loss=0.2562, over 29485.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1216, cr_loss=0.3659, attn_decoder_loss=0.2455, over 5796735.53 frames. ], batch size: 97, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:07:51,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=578500.0, ans=0.125 2024-09-19 03:08:14,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=578540.0, ans=0.125 2024-09-19 03:08:18,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=578580.0, ans=0.125 2024-09-19 03:08:28,241 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 8.949e+01 9.418e+01 9.976e+01 1.682e+02, threshold=1.884e+02, percent-clipped=0.0 2024-09-19 03:08:35,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=578620.0, ans=0.125 2024-09-19 03:08:53,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=578660.0, ans=0.125 2024-09-19 03:09:03,605 INFO [train.py:1198] (0/2) Epoch 32, batch 4400, loss[loss=0.2513, ctc_loss=0.1311, cr_loss=0.3746, attn_decoder_loss=0.2564, over 27372.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.123, cr_loss=0.3683, attn_decoder_loss=0.2475, over 5768591.09 frames. ], batch size: 124, lr: 3.43e-03, grad_scale: 16.0 2024-09-19 03:09:20,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578740.0, ans=0.1 2024-09-19 03:09:54,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=578820.0, ans=0.125 2024-09-19 03:09:55,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=578820.0, ans=0.125 2024-09-19 03:10:04,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2024-09-19 03:10:12,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=578860.0, ans=0.025 2024-09-19 03:10:18,565 INFO [train.py:1198] (0/2) Epoch 32, batch 4450, loss[loss=0.2591, ctc_loss=0.1508, cr_loss=0.388, attn_decoder_loss=0.2625, over 19486.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1269, cr_loss=0.3742, attn_decoder_loss=0.2497, over 5582169.92 frames. ], batch size: 209, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:10:19,096 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:10:38,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.02 vs. limit=15.0 2024-09-19 03:10:41,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=578940.0, ans=0.025 2024-09-19 03:10:55,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2024-09-19 03:10:58,946 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.256e+01 9.217e+01 9.990e+01 1.147e+02 3.633e+02, threshold=1.998e+02, percent-clipped=4.0 2024-09-19 03:11:04,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=579020.0, ans=0.125 2024-09-19 03:11:08,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=579020.0, ans=0.2 2024-09-19 03:11:17,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=579060.0, ans=10.0 2024-09-19 03:11:34,149 INFO [train.py:1198] (0/2) Epoch 32, batch 4500, loss[loss=0.2538, ctc_loss=0.1486, cr_loss=0.3782, attn_decoder_loss=0.2571, over 20100.00 frames. ], tot_loss[loss=0.2472, ctc_loss=0.1306, cr_loss=0.3769, attn_decoder_loss=0.2518, over 5243910.12 frames. ], batch size: 209, lr: 3.43e-03, grad_scale: 8.0 2024-09-19 03:11:49,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=579140.0, ans=0.0 2024-09-19 03:11:58,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=579140.0, ans=0.2 2024-09-19 03:12:00,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2024-09-19 03:12:11,671 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-32.pt 2024-09-19 03:13:03,888 INFO [train.py:1198] (0/2) Epoch 33, batch 0, loss[loss=0.2071, ctc_loss=0.08985, cr_loss=0.2948, attn_decoder_loss=0.2135, over 29641.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.08985, cr_loss=0.2948, attn_decoder_loss=0.2135, over 29641.00 frames. ], batch size: 73, lr: 3.37e-03, grad_scale: 16.0 2024-09-19 03:13:03,888 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 03:13:20,825 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0481, 3.6285, 3.9033, 3.5478], device='cuda:0') 2024-09-19 03:13:22,384 INFO [train.py:1230] (0/2) Epoch 33, validation: loss=0.2131, ctc_loss=0.03625, cr_loss=6.2e-15, attn_decoder_loss=0.2327, over 944034.00 frames. 2024-09-19 03:13:22,385 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 03:13:24,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=579200.0, ans=0.125 2024-09-19 03:13:27,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=579200.0, ans=0.0 2024-09-19 03:13:43,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=579240.0, ans=0.125 2024-09-19 03:14:08,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=579320.0, ans=0.0 2024-09-19 03:14:22,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=579360.0, ans=0.125 2024-09-19 03:14:26,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=579360.0, ans=0.125 2024-09-19 03:14:29,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2024-09-19 03:14:33,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=579360.0, ans=0.025 2024-09-19 03:14:38,708 INFO [train.py:1198] (0/2) Epoch 33, batch 50, loss[loss=0.209, ctc_loss=0.09614, cr_loss=0.3074, attn_decoder_loss=0.2147, over 29467.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1208, cr_loss=0.3655, attn_decoder_loss=0.2427, over 1268171.71 frames. ], batch size: 70, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:14:43,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.277e+01 1.031e+02 1.119e+02 2.001e+02, threshold=2.062e+02, percent-clipped=1.0 2024-09-19 03:15:06,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=579440.0, ans=0.0 2024-09-19 03:15:09,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=579480.0, ans=0.025 2024-09-19 03:15:32,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579520.0, ans=0.1 2024-09-19 03:15:38,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=579560.0, ans=0.025 2024-09-19 03:15:54,960 INFO [train.py:1198] (0/2) Epoch 33, batch 100, loss[loss=0.2286, ctc_loss=0.1135, cr_loss=0.3475, attn_decoder_loss=0.2337, over 29528.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1226, cr_loss=0.3697, attn_decoder_loss=0.2452, over 2252997.21 frames. ], batch size: 76, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:15:55,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=579600.0, ans=0.0 2024-09-19 03:16:02,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-09-19 03:16:07,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-09-19 03:16:13,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=579640.0, ans=0.2 2024-09-19 03:16:24,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=579640.0, ans=0.2 2024-09-19 03:16:24,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=579640.0, ans=0.125 2024-09-19 03:16:43,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-19 03:16:48,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=579720.0, ans=0.125 2024-09-19 03:17:03,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-09-19 03:17:10,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=579800.0, ans=0.035 2024-09-19 03:17:11,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-19 03:17:11,883 INFO [train.py:1198] (0/2) Epoch 33, batch 150, loss[loss=0.2195, ctc_loss=0.1079, cr_loss=0.3584, attn_decoder_loss=0.2239, over 29422.00 frames. ], tot_loss[loss=0.2381, ctc_loss=0.1204, cr_loss=0.365, attn_decoder_loss=0.2431, over 3047719.18 frames. ], batch size: 70, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:17:12,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579800.0, ans=0.1 2024-09-19 03:17:16,295 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.477e+01 8.945e+01 9.593e+01 9.750e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-19 03:17:22,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=579800.0, ans=0.0 2024-09-19 03:17:33,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=12.0 2024-09-19 03:17:36,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=579840.0, ans=0.125 2024-09-19 03:17:42,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=12.0 2024-09-19 03:17:43,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579880.0, ans=0.125 2024-09-19 03:17:56,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=579920.0, ans=0.2 2024-09-19 03:18:07,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=579920.0, ans=0.05 2024-09-19 03:18:16,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579960.0, ans=0.1 2024-09-19 03:18:23,643 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:18:28,277 INFO [train.py:1198] (0/2) Epoch 33, batch 200, loss[loss=0.238, ctc_loss=0.1236, cr_loss=0.3649, attn_decoder_loss=0.2426, over 27293.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1197, cr_loss=0.3637, attn_decoder_loss=0.2422, over 3660162.64 frames. ], batch size: 124, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:19:16,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=580120.0, ans=0.125 2024-09-19 03:19:20,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.82 vs. limit=15.0 2024-09-19 03:19:30,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=580160.0, ans=0.025 2024-09-19 03:19:32,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2024-09-19 03:19:43,965 INFO [train.py:1198] (0/2) Epoch 33, batch 250, loss[loss=0.2495, ctc_loss=0.1268, cr_loss=0.3698, attn_decoder_loss=0.2549, over 29203.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1191, cr_loss=0.362, attn_decoder_loss=0.2423, over 4142921.87 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:19:48,586 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.257e+01 8.698e+01 9.269e+01 2.011e+02, threshold=1.740e+02, percent-clipped=1.0 2024-09-19 03:19:48,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=580200.0, ans=0.0 2024-09-19 03:19:59,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580240.0, ans=0.1 2024-09-19 03:20:15,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=580280.0, ans=0.1 2024-09-19 03:20:18,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=580280.0, ans=0.0 2024-09-19 03:20:18,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580280.0, ans=0.1 2024-09-19 03:20:35,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=580320.0, ans=0.125 2024-09-19 03:20:39,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2024-09-19 03:20:54,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=580360.0, ans=0.0 2024-09-19 03:21:02,353 INFO [train.py:1198] (0/2) Epoch 33, batch 300, loss[loss=0.2415, ctc_loss=0.1219, cr_loss=0.3713, attn_decoder_loss=0.2466, over 29535.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1187, cr_loss=0.3603, attn_decoder_loss=0.2418, over 4510469.32 frames. ], batch size: 92, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:21:25,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=580440.0, ans=0.07 2024-09-19 03:21:28,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=580440.0, ans=0.0 2024-09-19 03:21:42,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=580480.0, ans=0.0 2024-09-19 03:21:58,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-19 03:22:16,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=580560.0, ans=0.05 2024-09-19 03:22:20,290 INFO [train.py:1198] (0/2) Epoch 33, batch 350, loss[loss=0.2065, ctc_loss=0.09489, cr_loss=0.3116, attn_decoder_loss=0.212, over 29325.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1188, cr_loss=0.3603, attn_decoder_loss=0.2418, over 4795854.79 frames. ], batch size: 71, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:22:24,713 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.463e+01 8.888e+01 9.398e+01 1.588e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 03:22:40,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2024-09-19 03:22:51,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=580680.0, ans=0.2 2024-09-19 03:22:51,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-09-19 03:23:26,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.56 vs. limit=15.0 2024-09-19 03:23:36,311 INFO [train.py:1198] (0/2) Epoch 33, batch 400, loss[loss=0.2359, ctc_loss=0.1151, cr_loss=0.3539, attn_decoder_loss=0.2414, over 29712.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1186, cr_loss=0.3605, attn_decoder_loss=0.2416, over 5025143.34 frames. ], batch size: 82, lr: 3.37e-03, grad_scale: 16.0 2024-09-19 03:23:43,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-09-19 03:23:49,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=580800.0, ans=0.2 2024-09-19 03:24:05,566 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:24:15,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=580880.0, ans=0.2 2024-09-19 03:24:22,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=580920.0, ans=0.125 2024-09-19 03:24:38,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.86 vs. limit=10.0 2024-09-19 03:24:54,731 INFO [train.py:1198] (0/2) Epoch 33, batch 450, loss[loss=0.2506, ctc_loss=0.1302, cr_loss=0.3757, attn_decoder_loss=0.2556, over 29700.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.119, cr_loss=0.3614, attn_decoder_loss=0.2418, over 5187179.12 frames. ], batch size: 83, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:24:55,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581000.0, ans=0.1 2024-09-19 03:25:00,690 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.447e+01 9.007e+01 9.616e+01 1.601e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 03:25:04,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=581000.0, ans=0.025 2024-09-19 03:25:11,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581040.0, ans=0.1 2024-09-19 03:25:16,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=581040.0, ans=0.05 2024-09-19 03:25:22,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=581040.0, ans=0.125 2024-09-19 03:25:39,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=581080.0, ans=0.125 2024-09-19 03:25:49,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=581120.0, ans=0.125 2024-09-19 03:25:59,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=581160.0, ans=0.125 2024-09-19 03:26:12,853 INFO [train.py:1198] (0/2) Epoch 33, batch 500, loss[loss=0.2599, ctc_loss=0.1437, cr_loss=0.4232, attn_decoder_loss=0.2635, over 29448.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1189, cr_loss=0.361, attn_decoder_loss=0.2413, over 5330199.58 frames. ], batch size: 94, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:26:17,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=581200.0, ans=0.07 2024-09-19 03:26:36,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=8.0 2024-09-19 03:26:37,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=581240.0, ans=0.0 2024-09-19 03:26:38,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=581240.0, ans=0.0 2024-09-19 03:26:48,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-19 03:26:57,253 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:27:12,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=581360.0, ans=0.125 2024-09-19 03:27:12,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-19 03:27:15,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=581360.0, ans=0.0 2024-09-19 03:27:21,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=581360.0, ans=0.0 2024-09-19 03:27:28,653 INFO [train.py:1198] (0/2) Epoch 33, batch 550, loss[loss=0.2496, ctc_loss=0.117, cr_loss=0.3739, attn_decoder_loss=0.256, over 28884.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1189, cr_loss=0.3614, attn_decoder_loss=0.2415, over 5424127.96 frames. ], batch size: 104, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:27:34,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.333e+01 9.017e+01 9.436e+01 4.024e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 03:28:06,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=581480.0, ans=0.025 2024-09-19 03:28:15,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=581520.0, ans=0.125 2024-09-19 03:28:18,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=581520.0, ans=0.0 2024-09-19 03:28:18,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-19 03:28:29,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.93 vs. limit=15.0 2024-09-19 03:28:30,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581560.0, ans=0.1 2024-09-19 03:28:30,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=581560.0, ans=0.0 2024-09-19 03:28:47,669 INFO [train.py:1198] (0/2) Epoch 33, batch 600, loss[loss=0.2534, ctc_loss=0.1294, cr_loss=0.3821, attn_decoder_loss=0.2587, over 29264.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.119, cr_loss=0.3616, attn_decoder_loss=0.2417, over 5511289.67 frames. ], batch size: 100, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:28:54,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=581600.0, ans=0.125 2024-09-19 03:30:03,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.64 vs. limit=15.0 2024-09-19 03:30:05,159 INFO [train.py:1198] (0/2) Epoch 33, batch 650, loss[loss=0.2336, ctc_loss=0.1129, cr_loss=0.3552, attn_decoder_loss=0.2391, over 29784.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.241, over 5588646.29 frames. ], batch size: 81, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:30:11,215 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.577e+01 8.986e+01 9.488e+01 1.360e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 03:30:31,408 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:30:31,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=581840.0, ans=0.125 2024-09-19 03:30:39,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-19 03:30:54,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=581920.0, ans=0.07 2024-09-19 03:30:55,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=581920.0, ans=0.0 2024-09-19 03:31:07,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2024-09-19 03:31:07,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=581960.0, ans=0.1 2024-09-19 03:31:21,262 INFO [train.py:1198] (0/2) Epoch 33, batch 700, loss[loss=0.2336, ctc_loss=0.1216, cr_loss=0.3731, attn_decoder_loss=0.2378, over 29545.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1185, cr_loss=0.3602, attn_decoder_loss=0.2415, over 5639275.00 frames. ], batch size: 76, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:31:33,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=582000.0, ans=0.125 2024-09-19 03:31:42,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=582040.0, ans=0.025 2024-09-19 03:31:51,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2024-09-19 03:32:26,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-19 03:32:35,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582160.0, ans=0.1 2024-09-19 03:32:39,398 INFO [train.py:1198] (0/2) Epoch 33, batch 750, loss[loss=0.2467, ctc_loss=0.1261, cr_loss=0.3831, attn_decoder_loss=0.2516, over 29709.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.2412, over 5676936.62 frames. ], batch size: 82, lr: 3.37e-03, grad_scale: 8.0 2024-09-19 03:32:46,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.541e+01 8.897e+01 9.394e+01 1.704e+02, threshold=1.779e+02, percent-clipped=0.0 2024-09-19 03:32:50,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=582200.0, ans=0.0 2024-09-19 03:33:00,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-19 03:33:02,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=582240.0, ans=0.125 2024-09-19 03:33:15,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=582280.0, ans=0.125 2024-09-19 03:33:30,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=582320.0, ans=0.125 2024-09-19 03:33:51,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=582360.0, ans=0.125 2024-09-19 03:33:57,740 INFO [train.py:1198] (0/2) Epoch 33, batch 800, loss[loss=0.2226, ctc_loss=0.1074, cr_loss=0.3449, attn_decoder_loss=0.2277, over 29642.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1185, cr_loss=0.3601, attn_decoder_loss=0.2412, over 5708018.36 frames. ], batch size: 73, lr: 3.37e-03, grad_scale: 16.0 2024-09-19 03:34:20,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582440.0, ans=0.1 2024-09-19 03:35:06,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582560.0, ans=0.1 2024-09-19 03:35:13,253 INFO [train.py:1198] (0/2) Epoch 33, batch 850, loss[loss=0.2436, ctc_loss=0.1232, cr_loss=0.3444, attn_decoder_loss=0.2493, over 29692.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1178, cr_loss=0.3587, attn_decoder_loss=0.2407, over 5736735.16 frames. ], batch size: 89, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:35:14,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=582600.0, ans=0.05 2024-09-19 03:35:20,667 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.452e+01 8.956e+01 9.635e+01 2.624e+02, threshold=1.791e+02, percent-clipped=1.0 2024-09-19 03:35:22,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=582600.0, ans=0.0 2024-09-19 03:35:26,216 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-19 03:35:28,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=582640.0, ans=0.0 2024-09-19 03:36:11,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=582720.0, ans=0.125 2024-09-19 03:36:20,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2024-09-19 03:36:23,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.79 vs. limit=15.0 2024-09-19 03:36:31,365 INFO [train.py:1198] (0/2) Epoch 33, batch 900, loss[loss=0.2148, ctc_loss=0.1005, cr_loss=0.3266, attn_decoder_loss=0.2202, over 29642.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1185, cr_loss=0.3595, attn_decoder_loss=0.2414, over 5741792.23 frames. ], batch size: 73, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:36:36,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=582800.0, ans=0.125 2024-09-19 03:36:48,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=582840.0, ans=0.125 2024-09-19 03:36:49,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=582840.0, ans=0.125 2024-09-19 03:37:04,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582880.0, ans=0.1 2024-09-19 03:37:06,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=582880.0, ans=0.0 2024-09-19 03:37:07,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=582880.0, ans=0.1 2024-09-19 03:37:34,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=582960.0, ans=0.5 2024-09-19 03:37:37,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=15.0 2024-09-19 03:37:40,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-09-19 03:37:48,801 INFO [train.py:1198] (0/2) Epoch 33, batch 950, loss[loss=0.224, ctc_loss=0.1061, cr_loss=0.3466, attn_decoder_loss=0.2294, over 29503.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1185, cr_loss=0.36, attn_decoder_loss=0.2417, over 5741539.66 frames. ], batch size: 74, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:37:55,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=583000.0, ans=0.0 2024-09-19 03:37:56,284 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.465e+01 9.060e+01 1.004e+02 2.208e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 03:37:58,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=583000.0, ans=0.125 2024-09-19 03:37:58,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.74 vs. limit=15.0 2024-09-19 03:38:02,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=583040.0, ans=0.0 2024-09-19 03:38:37,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=583120.0, ans=0.0 2024-09-19 03:38:48,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.06 vs. limit=15.0 2024-09-19 03:39:04,622 INFO [train.py:1198] (0/2) Epoch 33, batch 1000, loss[loss=0.2356, ctc_loss=0.1224, cr_loss=0.401, attn_decoder_loss=0.2393, over 29533.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1195, cr_loss=0.3624, attn_decoder_loss=0.2424, over 5735557.99 frames. ], batch size: 77, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:39:19,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2024-09-19 03:39:38,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=583280.0, ans=0.5 2024-09-19 03:39:40,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=583280.0, ans=0.0 2024-09-19 03:39:49,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=583320.0, ans=0.125 2024-09-19 03:40:13,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=583360.0, ans=0.125 2024-09-19 03:40:15,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=583360.0, ans=0.0 2024-09-19 03:40:22,705 INFO [train.py:1198] (0/2) Epoch 33, batch 1050, loss[loss=0.2342, ctc_loss=0.1115, cr_loss=0.3621, attn_decoder_loss=0.2398, over 29698.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1189, cr_loss=0.3612, attn_decoder_loss=0.2416, over 5745125.72 frames. ], batch size: 85, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:40:33,202 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.604e+01 9.076e+01 9.577e+01 3.537e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 03:40:45,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=583440.0, ans=0.2 2024-09-19 03:40:47,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=583440.0, ans=0.07 2024-09-19 03:41:25,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=583560.0, ans=0.0 2024-09-19 03:41:40,567 INFO [train.py:1198] (0/2) Epoch 33, batch 1100, loss[loss=0.2311, ctc_loss=0.1187, cr_loss=0.3674, attn_decoder_loss=0.2355, over 29421.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1185, cr_loss=0.3606, attn_decoder_loss=0.2411, over 5758048.79 frames. ], batch size: 78, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:41:57,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=583640.0, ans=0.125 2024-09-19 03:42:03,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=583640.0, ans=0.0 2024-09-19 03:42:11,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=583680.0, ans=0.0 2024-09-19 03:42:12,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=583680.0, ans=0.0 2024-09-19 03:42:18,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2024-09-19 03:42:18,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=583680.0, ans=0.0 2024-09-19 03:42:23,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=583680.0, ans=0.0 2024-09-19 03:42:23,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=583680.0, ans=0.125 2024-09-19 03:42:31,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2024-09-19 03:42:33,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=583720.0, ans=0.0 2024-09-19 03:42:56,359 INFO [train.py:1198] (0/2) Epoch 33, batch 1150, loss[loss=0.2165, ctc_loss=0.1039, cr_loss=0.3317, attn_decoder_loss=0.2216, over 29445.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1187, cr_loss=0.3608, attn_decoder_loss=0.2414, over 5754239.77 frames. ], batch size: 78, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:43:01,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=583800.0, ans=0.025 2024-09-19 03:43:04,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=583800.0, ans=0.0 2024-09-19 03:43:06,974 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.417e+01 8.891e+01 9.458e+01 2.719e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 03:43:12,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-09-19 03:43:13,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-09-19 03:43:36,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583880.0, ans=0.1 2024-09-19 03:43:40,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=583920.0, ans=10.0 2024-09-19 03:43:45,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583920.0, ans=0.1 2024-09-19 03:43:53,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=583920.0, ans=15.0 2024-09-19 03:44:14,594 INFO [train.py:1198] (0/2) Epoch 33, batch 1200, loss[loss=0.2362, ctc_loss=0.1144, cr_loss=0.3549, attn_decoder_loss=0.2418, over 29689.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1192, cr_loss=0.3613, attn_decoder_loss=0.2422, over 5746328.68 frames. ], batch size: 85, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:44:31,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=584040.0, ans=0.015 2024-09-19 03:44:36,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=584040.0, ans=0.2 2024-09-19 03:44:57,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-19 03:45:02,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=584120.0, ans=0.0 2024-09-19 03:45:08,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=584120.0, ans=0.0 2024-09-19 03:45:14,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=584120.0, ans=0.0 2024-09-19 03:45:32,636 INFO [train.py:1198] (0/2) Epoch 33, batch 1250, loss[loss=0.2519, ctc_loss=0.1299, cr_loss=0.3888, attn_decoder_loss=0.2568, over 29552.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1191, cr_loss=0.3616, attn_decoder_loss=0.2427, over 5774446.47 frames. ], batch size: 92, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:45:43,422 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.568e+01 9.117e+01 9.876e+01 2.169e+02, threshold=1.823e+02, percent-clipped=3.0 2024-09-19 03:45:50,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=584240.0, ans=0.1 2024-09-19 03:45:56,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2024-09-19 03:46:15,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=584280.0, ans=0.125 2024-09-19 03:46:26,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=584320.0, ans=0.125 2024-09-19 03:46:34,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=584360.0, ans=0.2 2024-09-19 03:46:35,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=584360.0, ans=0.0 2024-09-19 03:46:48,683 INFO [train.py:1198] (0/2) Epoch 33, batch 1300, loss[loss=0.2397, ctc_loss=0.1114, cr_loss=0.3362, attn_decoder_loss=0.2465, over 28226.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1186, cr_loss=0.3599, attn_decoder_loss=0.2419, over 5779246.89 frames. ], batch size: 111, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:46:50,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=584400.0, ans=0.0 2024-09-19 03:47:16,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=584440.0, ans=0.125 2024-09-19 03:47:28,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=584480.0, ans=0.125 2024-09-19 03:47:33,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=584520.0, ans=0.2 2024-09-19 03:47:39,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=584520.0, ans=0.2 2024-09-19 03:47:57,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.90 vs. limit=22.5 2024-09-19 03:48:03,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=584600.0, ans=0.05 2024-09-19 03:48:04,740 INFO [train.py:1198] (0/2) Epoch 33, batch 1350, loss[loss=0.2411, ctc_loss=0.1239, cr_loss=0.3822, attn_decoder_loss=0.2456, over 29767.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.2415, over 5795666.87 frames. ], batch size: 81, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:48:06,565 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:48:17,536 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.274e+01 8.749e+01 9.320e+01 1.394e+02, threshold=1.750e+02, percent-clipped=0.0 2024-09-19 03:48:17,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=584600.0, ans=0.125 2024-09-19 03:48:17,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=584600.0, ans=0.025 2024-09-19 03:49:04,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=584720.0, ans=0.125 2024-09-19 03:49:17,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=584760.0, ans=0.125 2024-09-19 03:49:24,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.04 vs. limit=12.0 2024-09-19 03:49:24,983 INFO [train.py:1198] (0/2) Epoch 33, batch 1400, loss[loss=0.2102, ctc_loss=0.09253, cr_loss=0.2917, attn_decoder_loss=0.2168, over 29593.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.118, cr_loss=0.3591, attn_decoder_loss=0.2414, over 5806627.12 frames. ], batch size: 69, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:49:25,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-19 03:49:43,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-09-19 03:50:12,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584920.0, ans=0.1 2024-09-19 03:50:25,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=584960.0, ans=0.125 2024-09-19 03:50:25,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=584960.0, ans=0.125 2024-09-19 03:50:28,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=584960.0, ans=0.2 2024-09-19 03:50:34,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=584960.0, ans=0.07 2024-09-19 03:50:40,323 INFO [train.py:1198] (0/2) Epoch 33, batch 1450, loss[loss=0.2546, ctc_loss=0.1296, cr_loss=0.3929, attn_decoder_loss=0.2598, over 29424.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1181, cr_loss=0.3596, attn_decoder_loss=0.2418, over 5802721.15 frames. ], batch size: 94, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:50:41,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.80 vs. limit=15.0 2024-09-19 03:50:50,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.392e+01 8.954e+01 9.384e+01 1.541e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 03:50:55,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=22.5 2024-09-19 03:51:13,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=585080.0, ans=0.125 2024-09-19 03:51:16,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=585080.0, ans=0.0 2024-09-19 03:51:25,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=585120.0, ans=0.125 2024-09-19 03:51:26,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-09-19 03:51:42,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2024-09-19 03:51:55,822 INFO [train.py:1198] (0/2) Epoch 33, batch 1500, loss[loss=0.2426, ctc_loss=0.1193, cr_loss=0.3513, attn_decoder_loss=0.2484, over 29637.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1188, cr_loss=0.3606, attn_decoder_loss=0.2423, over 5802458.67 frames. ], batch size: 86, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:52:18,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=585240.0, ans=0.025 2024-09-19 03:52:32,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=12.0 2024-09-19 03:52:33,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=585280.0, ans=0.125 2024-09-19 03:52:38,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=585280.0, ans=0.125 2024-09-19 03:52:57,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=585320.0, ans=0.125 2024-09-19 03:53:07,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=585360.0, ans=0.0 2024-09-19 03:53:07,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=585360.0, ans=0.125 2024-09-19 03:53:16,551 INFO [train.py:1198] (0/2) Epoch 33, batch 1550, loss[loss=0.2602, ctc_loss=0.1372, cr_loss=0.3985, attn_decoder_loss=0.265, over 29527.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.119, cr_loss=0.361, attn_decoder_loss=0.2423, over 5778831.96 frames. ], batch size: 90, lr: 3.36e-03, grad_scale: 8.0 2024-09-19 03:53:20,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=585400.0, ans=0.0 2024-09-19 03:53:28,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.624e+01 9.001e+01 9.537e+01 4.675e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 03:53:52,176 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.50 vs. limit=15.0 2024-09-19 03:54:07,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=585520.0, ans=0.125 2024-09-19 03:54:15,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=585560.0, ans=0.125 2024-09-19 03:54:19,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=585560.0, ans=0.0 2024-09-19 03:54:19,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-19 03:54:32,530 INFO [train.py:1198] (0/2) Epoch 33, batch 1600, loss[loss=0.2425, ctc_loss=0.1202, cr_loss=0.3664, attn_decoder_loss=0.2479, over 29679.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1189, cr_loss=0.3608, attn_decoder_loss=0.2421, over 5761979.15 frames. ], batch size: 85, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:54:38,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.61 vs. limit=10.0 2024-09-19 03:54:41,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=585600.0, ans=0.2 2024-09-19 03:54:48,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=585640.0, ans=0.125 2024-09-19 03:54:54,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=585640.0, ans=0.025 2024-09-19 03:55:06,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585680.0, ans=0.1 2024-09-19 03:55:16,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585720.0, ans=0.1 2024-09-19 03:55:33,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=585760.0, ans=0.2 2024-09-19 03:55:48,321 INFO [train.py:1198] (0/2) Epoch 33, batch 1650, loss[loss=0.2467, ctc_loss=0.1214, cr_loss=0.3819, attn_decoder_loss=0.2521, over 29714.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1188, cr_loss=0.3605, attn_decoder_loss=0.2419, over 5756719.03 frames. ], batch size: 89, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:56:02,791 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.668e+01 9.020e+01 9.711e+01 1.996e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-19 03:56:06,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=585840.0, ans=0.0 2024-09-19 03:56:13,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=585840.0, ans=0.0 2024-09-19 03:56:20,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-09-19 03:56:22,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=585880.0, ans=0.025 2024-09-19 03:56:46,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=585920.0, ans=0.2 2024-09-19 03:56:57,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2024-09-19 03:57:08,350 INFO [train.py:1198] (0/2) Epoch 33, batch 1700, loss[loss=0.2121, ctc_loss=0.109, cr_loss=0.341, attn_decoder_loss=0.216, over 29578.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1184, cr_loss=0.36, attn_decoder_loss=0.2417, over 5779375.60 frames. ], batch size: 69, lr: 3.36e-03, grad_scale: 16.0 2024-09-19 03:57:08,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=586000.0, ans=0.2 2024-09-19 03:57:12,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.84 vs. limit=22.5 2024-09-19 03:57:25,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=586040.0, ans=0.2 2024-09-19 03:57:26,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=586040.0, ans=0.1 2024-09-19 03:57:44,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=586080.0, ans=0.125 2024-09-19 03:57:46,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=586080.0, ans=0.2 2024-09-19 03:57:56,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=586120.0, ans=0.125 2024-09-19 03:57:58,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=586120.0, ans=0.0 2024-09-19 03:58:01,919 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2024-09-19 03:58:06,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-09-19 03:58:18,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=586160.0, ans=0.0 2024-09-19 03:58:23,871 INFO [train.py:1198] (0/2) Epoch 33, batch 1750, loss[loss=0.2147, ctc_loss=0.1024, cr_loss=0.3324, attn_decoder_loss=0.2197, over 29346.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.3593, attn_decoder_loss=0.2414, over 5788018.48 frames. ], batch size: 67, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 03:58:37,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.452e+01 8.998e+01 9.448e+01 1.573e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-19 03:58:56,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2024-09-19 03:59:07,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=586280.0, ans=0.0 2024-09-19 03:59:11,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=586320.0, ans=0.025 2024-09-19 03:59:12,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.52 vs. limit=22.5 2024-09-19 03:59:40,303 INFO [train.py:1198] (0/2) Epoch 33, batch 1800, loss[loss=0.2463, ctc_loss=0.1341, cr_loss=0.4142, attn_decoder_loss=0.2496, over 29704.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1178, cr_loss=0.3589, attn_decoder_loss=0.2411, over 5791289.25 frames. ], batch size: 83, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 03:59:45,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-19 03:59:47,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=22.5 2024-09-19 04:00:40,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=586520.0, ans=0.125 2024-09-19 04:00:43,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=586560.0, ans=0.0 2024-09-19 04:00:59,188 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:01:00,455 INFO [train.py:1198] (0/2) Epoch 33, batch 1850, loss[loss=0.2435, ctc_loss=0.1184, cr_loss=0.3487, attn_decoder_loss=0.2496, over 29661.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1177, cr_loss=0.3585, attn_decoder_loss=0.2408, over 5796561.01 frames. ], batch size: 86, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:01:00,882 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:01:02,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-09-19 04:01:03,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=586600.0, ans=0.125 2024-09-19 04:01:13,842 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.429e+01 8.891e+01 9.502e+01 1.976e+02, threshold=1.778e+02, percent-clipped=1.0 2024-09-19 04:01:54,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=586720.0, ans=0.125 2024-09-19 04:01:59,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=586760.0, ans=0.2 2024-09-19 04:02:08,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=586760.0, ans=0.2 2024-09-19 04:02:15,704 INFO [train.py:1198] (0/2) Epoch 33, batch 1900, loss[loss=0.2458, ctc_loss=0.1216, cr_loss=0.3703, attn_decoder_loss=0.2514, over 29716.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1182, cr_loss=0.3595, attn_decoder_loss=0.2414, over 5804611.49 frames. ], batch size: 89, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:02:24,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=586800.0, ans=0.125 2024-09-19 04:02:24,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=586800.0, ans=0.0 2024-09-19 04:02:39,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2024-09-19 04:02:54,052 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:02:58,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-09-19 04:03:22,607 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:03:31,186 INFO [train.py:1198] (0/2) Epoch 33, batch 1950, loss[loss=0.237, ctc_loss=0.1102, cr_loss=0.3432, attn_decoder_loss=0.2434, over 29469.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1191, cr_loss=0.3612, attn_decoder_loss=0.2428, over 5819056.05 frames. ], batch size: 78, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:03:44,772 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.530e+01 9.165e+01 9.739e+01 1.607e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-19 04:03:59,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=587040.0, ans=0.125 2024-09-19 04:04:05,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=587080.0, ans=0.05 2024-09-19 04:04:12,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=587080.0, ans=0.07 2024-09-19 04:04:13,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-09-19 04:04:15,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=587080.0, ans=0.2 2024-09-19 04:04:43,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=587160.0, ans=0.09899494936611666 2024-09-19 04:04:47,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=12.0 2024-09-19 04:04:51,445 INFO [train.py:1198] (0/2) Epoch 33, batch 2000, loss[loss=0.2092, ctc_loss=0.1019, cr_loss=0.3183, attn_decoder_loss=0.2141, over 29343.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1197, cr_loss=0.3623, attn_decoder_loss=0.2434, over 5797045.21 frames. ], batch size: 67, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:04:59,652 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:05:05,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=587240.0, ans=0.125 2024-09-19 04:05:07,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=587240.0, ans=0.125 2024-09-19 04:05:30,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587280.0, ans=0.1 2024-09-19 04:05:31,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=587280.0, ans=0.1 2024-09-19 04:05:37,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587320.0, ans=0.1 2024-09-19 04:05:49,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=587320.0, ans=0.0 2024-09-19 04:05:52,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=587360.0, ans=0.2 2024-09-19 04:05:55,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=587360.0, ans=0.125 2024-09-19 04:06:01,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2024-09-19 04:06:07,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.41 vs. limit=10.0 2024-09-19 04:06:07,791 INFO [train.py:1198] (0/2) Epoch 33, batch 2050, loss[loss=0.2096, ctc_loss=0.09914, cr_loss=0.3199, attn_decoder_loss=0.2148, over 29461.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1188, cr_loss=0.3603, attn_decoder_loss=0.2422, over 5789582.28 frames. ], batch size: 70, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:06:08,024 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:06:14,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=587400.0, ans=0.2 2024-09-19 04:06:14,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=587400.0, ans=0.0 2024-09-19 04:06:21,352 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.725e+01 9.262e+01 9.868e+01 2.043e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 04:06:52,259 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:06:52,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-09-19 04:07:23,430 INFO [train.py:1198] (0/2) Epoch 33, batch 2100, loss[loss=0.2432, ctc_loss=0.1259, cr_loss=0.3859, attn_decoder_loss=0.2477, over 29758.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1186, cr_loss=0.3598, attn_decoder_loss=0.2419, over 5801533.99 frames. ], batch size: 81, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:07:31,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.10 vs. limit=10.0 2024-09-19 04:07:50,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=587640.0, ans=0.0 2024-09-19 04:08:19,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=587720.0, ans=0.0 2024-09-19 04:08:23,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=587720.0, ans=0.125 2024-09-19 04:08:42,918 INFO [train.py:1198] (0/2) Epoch 33, batch 2150, loss[loss=0.246, ctc_loss=0.1222, cr_loss=0.3673, attn_decoder_loss=0.2516, over 29466.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1183, cr_loss=0.3598, attn_decoder_loss=0.2415, over 5816290.52 frames. ], batch size: 78, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:08:56,491 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.476e+01 8.968e+01 9.482e+01 1.071e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 04:09:01,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587840.0, ans=0.1 2024-09-19 04:09:05,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-19 04:09:19,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=587880.0, ans=0.0 2024-09-19 04:09:58,815 INFO [train.py:1198] (0/2) Epoch 33, batch 2200, loss[loss=0.2387, ctc_loss=0.1162, cr_loss=0.3474, attn_decoder_loss=0.2446, over 29631.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1183, cr_loss=0.3598, attn_decoder_loss=0.2414, over 5812705.31 frames. ], batch size: 86, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:10:09,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-19 04:10:10,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588000.0, ans=0.1 2024-09-19 04:10:39,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588080.0, ans=0.0 2024-09-19 04:10:58,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.10 vs. limit=10.0 2024-09-19 04:11:00,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=588160.0, ans=0.125 2024-09-19 04:11:11,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=588160.0, ans=0.0 2024-09-19 04:11:14,406 INFO [train.py:1198] (0/2) Epoch 33, batch 2250, loss[loss=0.2465, ctc_loss=0.1263, cr_loss=0.3698, attn_decoder_loss=0.2517, over 29718.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1184, cr_loss=0.36, attn_decoder_loss=0.2414, over 5813084.98 frames. ], batch size: 82, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:11:17,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588200.0, ans=0.1 2024-09-19 04:11:26,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=588200.0, ans=0.0 2024-09-19 04:11:29,578 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.468e+01 9.055e+01 9.587e+01 2.332e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 04:11:34,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=588240.0, ans=0.125 2024-09-19 04:11:35,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=588240.0, ans=0.5 2024-09-19 04:11:45,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.31 vs. limit=15.0 2024-09-19 04:11:48,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.99 vs. limit=15.0 2024-09-19 04:12:34,491 INFO [train.py:1198] (0/2) Epoch 33, batch 2300, loss[loss=0.2222, ctc_loss=0.1041, cr_loss=0.3258, attn_decoder_loss=0.2281, over 29322.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1177, cr_loss=0.358, attn_decoder_loss=0.2404, over 5799909.87 frames. ], batch size: 71, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:12:37,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=588400.0, ans=0.125 2024-09-19 04:12:42,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=588400.0, ans=0.1 2024-09-19 04:13:09,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=588480.0, ans=0.0 2024-09-19 04:13:15,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-09-19 04:13:33,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=588560.0, ans=0.0 2024-09-19 04:13:38,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=588560.0, ans=0.125 2024-09-19 04:13:49,828 INFO [train.py:1198] (0/2) Epoch 33, batch 2350, loss[loss=0.2479, ctc_loss=0.1271, cr_loss=0.3811, attn_decoder_loss=0.2528, over 29690.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.118, cr_loss=0.3587, attn_decoder_loss=0.2409, over 5805301.47 frames. ], batch size: 83, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:13:53,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=588600.0, ans=0.125 2024-09-19 04:14:00,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=588600.0, ans=0.125 2024-09-19 04:14:00,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=588600.0, ans=0.125 2024-09-19 04:14:04,757 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.296e+01 8.859e+01 9.524e+01 1.352e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 04:14:06,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=588640.0, ans=0.0 2024-09-19 04:14:11,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=588640.0, ans=0.2 2024-09-19 04:14:27,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=588680.0, ans=0.125 2024-09-19 04:14:38,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=588720.0, ans=0.2 2024-09-19 04:14:47,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=588720.0, ans=0.125 2024-09-19 04:15:03,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-09-19 04:15:06,269 INFO [train.py:1198] (0/2) Epoch 33, batch 2400, loss[loss=0.2361, ctc_loss=0.1183, cr_loss=0.3652, attn_decoder_loss=0.2411, over 29527.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1181, cr_loss=0.3588, attn_decoder_loss=0.2414, over 5809482.73 frames. ], batch size: 76, lr: 3.35e-03, grad_scale: 16.0 2024-09-19 04:15:12,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=588800.0, ans=0.125 2024-09-19 04:15:18,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=588800.0, ans=0.125 2024-09-19 04:15:19,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-09-19 04:15:20,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=588840.0, ans=0.125 2024-09-19 04:15:32,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=588840.0, ans=0.0 2024-09-19 04:15:39,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=588880.0, ans=0.0 2024-09-19 04:15:45,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.11 vs. limit=15.0 2024-09-19 04:16:24,266 INFO [train.py:1198] (0/2) Epoch 33, batch 2450, loss[loss=0.243, ctc_loss=0.1265, cr_loss=0.3818, attn_decoder_loss=0.2474, over 29713.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.119, cr_loss=0.3606, attn_decoder_loss=0.2424, over 5786029.45 frames. ], batch size: 82, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:16:26,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2024-09-19 04:16:30,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=589000.0, ans=0.5 2024-09-19 04:16:33,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=589000.0, ans=0.0 2024-09-19 04:16:39,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=589040.0, ans=0.125 2024-09-19 04:16:40,789 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.663e+01 9.079e+01 9.765e+01 4.096e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 04:16:53,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=589080.0, ans=0.5 2024-09-19 04:16:56,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=589080.0, ans=0.125 2024-09-19 04:17:00,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=589080.0, ans=0.125 2024-09-19 04:17:02,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589080.0, ans=0.1 2024-09-19 04:17:08,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589120.0, ans=0.1 2024-09-19 04:17:31,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=589160.0, ans=0.0 2024-09-19 04:17:40,042 INFO [train.py:1198] (0/2) Epoch 33, batch 2500, loss[loss=0.2405, ctc_loss=0.1155, cr_loss=0.3627, attn_decoder_loss=0.2463, over 29625.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1186, cr_loss=0.3599, attn_decoder_loss=0.2422, over 5795647.64 frames. ], batch size: 86, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:18:21,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=589280.0, ans=0.025 2024-09-19 04:18:32,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=589320.0, ans=0.125 2024-09-19 04:18:36,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.48 vs. limit=15.0 2024-09-19 04:18:52,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589360.0, ans=0.1 2024-09-19 04:18:56,407 INFO [train.py:1198] (0/2) Epoch 33, batch 2550, loss[loss=0.2139, ctc_loss=0.1009, cr_loss=0.3422, attn_decoder_loss=0.2188, over 29344.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1186, cr_loss=0.3601, attn_decoder_loss=0.2421, over 5799156.50 frames. ], batch size: 67, lr: 3.35e-03, grad_scale: 8.0 2024-09-19 04:19:12,962 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.542e+01 8.992e+01 9.541e+01 1.643e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 04:19:13,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=589440.0, ans=0.125 2024-09-19 04:19:41,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589520.0, ans=0.1 2024-09-19 04:19:49,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.79 vs. limit=12.0 2024-09-19 04:19:59,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=589560.0, ans=0.0 2024-09-19 04:20:02,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=589560.0, ans=0.0 2024-09-19 04:20:07,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=589560.0, ans=0.0 2024-09-19 04:20:08,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=589560.0, ans=0.125 2024-09-19 04:20:10,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=589560.0, ans=0.0 2024-09-19 04:20:16,545 INFO [train.py:1198] (0/2) Epoch 33, batch 2600, loss[loss=0.2242, ctc_loss=0.1044, cr_loss=0.3262, attn_decoder_loss=0.2302, over 29459.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1188, cr_loss=0.3606, attn_decoder_loss=0.2422, over 5794825.99 frames. ], batch size: 78, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:20:37,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=589640.0, ans=0.2 2024-09-19 04:20:39,073 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:20:55,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589680.0, ans=0.1 2024-09-19 04:21:10,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=589720.0, ans=0.125 2024-09-19 04:21:17,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-19 04:21:31,655 INFO [train.py:1198] (0/2) Epoch 33, batch 2650, loss[loss=0.2378, ctc_loss=0.1147, cr_loss=0.3447, attn_decoder_loss=0.2438, over 29262.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1181, cr_loss=0.3593, attn_decoder_loss=0.242, over 5801391.51 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:21:41,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.57 vs. limit=22.5 2024-09-19 04:21:48,460 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.528e+01 8.946e+01 9.384e+01 1.299e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-19 04:21:52,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2024-09-19 04:22:07,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589880.0, ans=0.1 2024-09-19 04:22:47,777 INFO [train.py:1198] (0/2) Epoch 33, batch 2700, loss[loss=0.2365, ctc_loss=0.1084, cr_loss=0.3399, attn_decoder_loss=0.2432, over 29528.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1187, cr_loss=0.3603, attn_decoder_loss=0.2423, over 5795332.82 frames. ], batch size: 87, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:23:06,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=590040.0, ans=0.0 2024-09-19 04:23:22,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590080.0, ans=0.1 2024-09-19 04:23:22,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590080.0, ans=0.1 2024-09-19 04:24:00,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-19 04:24:07,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-19 04:24:07,847 INFO [train.py:1198] (0/2) Epoch 33, batch 2750, loss[loss=0.2217, ctc_loss=0.1109, cr_loss=0.3584, attn_decoder_loss=0.2261, over 29546.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1179, cr_loss=0.359, attn_decoder_loss=0.2413, over 5793696.50 frames. ], batch size: 75, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:24:20,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=590200.0, ans=0.125 2024-09-19 04:24:24,590 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.542e+01 8.884e+01 9.570e+01 2.810e+02, threshold=1.777e+02, percent-clipped=3.0 2024-09-19 04:24:44,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=590280.0, ans=0.125 2024-09-19 04:24:48,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=590280.0, ans=0.125 2024-09-19 04:24:49,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2024-09-19 04:25:01,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=590320.0, ans=0.0 2024-09-19 04:25:14,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=590360.0, ans=0.2 2024-09-19 04:25:24,063 INFO [train.py:1198] (0/2) Epoch 33, batch 2800, loss[loss=0.254, ctc_loss=0.1504, cr_loss=0.3995, attn_decoder_loss=0.2567, over 20419.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1184, cr_loss=0.3593, attn_decoder_loss=0.2416, over 5776206.54 frames. ], batch size: 210, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:25:39,645 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:25:45,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=590440.0, ans=0.125 2024-09-19 04:25:50,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-19 04:25:51,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-19 04:25:52,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=590480.0, ans=0.2 2024-09-19 04:26:11,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=590520.0, ans=0.035 2024-09-19 04:26:39,485 INFO [train.py:1198] (0/2) Epoch 33, batch 2850, loss[loss=0.2295, ctc_loss=0.1195, cr_loss=0.353, attn_decoder_loss=0.2339, over 29518.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1188, cr_loss=0.3599, attn_decoder_loss=0.242, over 5761717.51 frames. ], batch size: 77, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:26:39,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=590600.0, ans=0.2 2024-09-19 04:26:45,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.34 vs. limit=10.0 2024-09-19 04:26:52,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2024-09-19 04:26:57,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.701e+01 9.298e+01 9.945e+01 2.152e+02, threshold=1.860e+02, percent-clipped=1.0 2024-09-19 04:27:01,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=590640.0, ans=0.1 2024-09-19 04:27:14,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=590680.0, ans=0.125 2024-09-19 04:27:24,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-19 04:27:41,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=590720.0, ans=0.2 2024-09-19 04:27:52,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=590760.0, ans=0.2 2024-09-19 04:27:59,740 INFO [train.py:1198] (0/2) Epoch 33, batch 2900, loss[loss=0.2306, ctc_loss=0.1085, cr_loss=0.3255, attn_decoder_loss=0.237, over 29436.00 frames. ], tot_loss[loss=0.2376, ctc_loss=0.1192, cr_loss=0.3614, attn_decoder_loss=0.2427, over 5786558.33 frames. ], batch size: 79, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:28:18,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=590840.0, ans=0.2 2024-09-19 04:28:24,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=590840.0, ans=0.125 2024-09-19 04:28:28,111 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.39 vs. limit=15.0 2024-09-19 04:28:30,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590880.0, ans=0.125 2024-09-19 04:28:39,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=590880.0, ans=0.125 2024-09-19 04:28:51,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=590920.0, ans=0.125 2024-09-19 04:28:53,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=590920.0, ans=0.125 2024-09-19 04:28:53,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.18 vs. limit=12.0 2024-09-19 04:28:56,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-09-19 04:29:08,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=590960.0, ans=0.125 2024-09-19 04:29:15,410 INFO [train.py:1198] (0/2) Epoch 33, batch 2950, loss[loss=0.2307, ctc_loss=0.112, cr_loss=0.3607, attn_decoder_loss=0.2359, over 29508.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1182, cr_loss=0.3592, attn_decoder_loss=0.2414, over 5780959.72 frames. ], batch size: 75, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:29:25,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=591000.0, ans=22.5 2024-09-19 04:29:33,869 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.422e+01 8.881e+01 9.248e+01 1.525e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 04:29:48,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=591080.0, ans=0.125 2024-09-19 04:30:13,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=591120.0, ans=0.0 2024-09-19 04:30:20,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-19 04:30:32,269 INFO [train.py:1198] (0/2) Epoch 33, batch 3000, loss[loss=0.2439, ctc_loss=0.1242, cr_loss=0.3711, attn_decoder_loss=0.249, over 29769.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1184, cr_loss=0.359, attn_decoder_loss=0.2416, over 5781762.77 frames. ], batch size: 81, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:30:32,270 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 04:30:50,728 INFO [train.py:1230] (0/2) Epoch 33, validation: loss=0.2119, ctc_loss=0.03704, cr_loss=5.931e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-19 04:30:50,729 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 04:31:22,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.76 vs. limit=15.0 2024-09-19 04:31:30,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=591280.0, ans=0.2 2024-09-19 04:31:50,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=591320.0, ans=0.125 2024-09-19 04:31:51,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=591320.0, ans=0.125 2024-09-19 04:32:00,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=591360.0, ans=0.125 2024-09-19 04:32:11,227 INFO [train.py:1198] (0/2) Epoch 33, batch 3050, loss[loss=0.2223, ctc_loss=0.1044, cr_loss=0.3289, attn_decoder_loss=0.2281, over 29518.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1188, cr_loss=0.3601, attn_decoder_loss=0.2423, over 5776094.42 frames. ], batch size: 76, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:32:13,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-09-19 04:32:28,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=591440.0, ans=0.2 2024-09-19 04:32:29,479 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.592e+01 9.144e+01 9.827e+01 2.461e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 04:32:29,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=591440.0, ans=0.125 2024-09-19 04:32:41,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=591480.0, ans=0.125 2024-09-19 04:32:45,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.27 vs. limit=15.0 2024-09-19 04:33:02,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-09-19 04:33:03,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-09-19 04:33:11,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=591560.0, ans=0.125 2024-09-19 04:33:12,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=591560.0, ans=0.125 2024-09-19 04:33:19,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=591560.0, ans=0.2 2024-09-19 04:33:24,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.63 vs. limit=10.0 2024-09-19 04:33:26,769 INFO [train.py:1198] (0/2) Epoch 33, batch 3100, loss[loss=0.2539, ctc_loss=0.1278, cr_loss=0.3681, attn_decoder_loss=0.2597, over 29248.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1183, cr_loss=0.3591, attn_decoder_loss=0.2417, over 5775472.65 frames. ], batch size: 100, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:33:52,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=591640.0, ans=0.125 2024-09-19 04:34:31,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-09-19 04:34:42,674 INFO [train.py:1198] (0/2) Epoch 33, batch 3150, loss[loss=0.263, ctc_loss=0.1388, cr_loss=0.4227, attn_decoder_loss=0.2674, over 28854.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1185, cr_loss=0.36, attn_decoder_loss=0.242, over 5782880.62 frames. ], batch size: 104, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:35:03,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.559e+01 9.035e+01 9.509e+01 1.493e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 04:35:19,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=591880.0, ans=0.125 2024-09-19 04:35:19,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591880.0, ans=0.1 2024-09-19 04:35:34,316 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:35:48,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.whiten.whitening_limit, batch_count=591960.0, ans=12.0 2024-09-19 04:36:01,782 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-148000.pt 2024-09-19 04:36:10,887 INFO [train.py:1198] (0/2) Epoch 33, batch 3200, loss[loss=0.2393, ctc_loss=0.1147, cr_loss=0.3656, attn_decoder_loss=0.245, over 29425.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1178, cr_loss=0.3587, attn_decoder_loss=0.2414, over 5794269.34 frames. ], batch size: 79, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:36:37,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592040.0, ans=0.1 2024-09-19 04:36:54,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=592080.0, ans=0.2 2024-09-19 04:37:09,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=592120.0, ans=0.2 2024-09-19 04:37:15,733 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-09-19 04:37:21,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=592160.0, ans=0.125 2024-09-19 04:37:26,889 INFO [train.py:1198] (0/2) Epoch 33, batch 3250, loss[loss=0.2423, ctc_loss=0.1188, cr_loss=0.357, attn_decoder_loss=0.2481, over 29716.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1183, cr_loss=0.3602, attn_decoder_loss=0.242, over 5801289.22 frames. ], batch size: 84, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:37:27,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=592200.0, ans=0.125 2024-09-19 04:37:43,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=592240.0, ans=0.0 2024-09-19 04:37:44,971 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.640e+01 9.097e+01 9.766e+01 4.487e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 04:38:42,519 INFO [train.py:1198] (0/2) Epoch 33, batch 3300, loss[loss=0.253, ctc_loss=0.1372, cr_loss=0.393, attn_decoder_loss=0.2571, over 28402.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1176, cr_loss=0.3583, attn_decoder_loss=0.2409, over 5798124.56 frames. ], batch size: 111, lr: 3.34e-03, grad_scale: 16.0 2024-09-19 04:39:23,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=592480.0, ans=0.0 2024-09-19 04:39:41,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=592520.0, ans=0.0 2024-09-19 04:39:44,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=592520.0, ans=0.0 2024-09-19 04:39:52,608 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:40:02,668 INFO [train.py:1198] (0/2) Epoch 33, batch 3350, loss[loss=0.2546, ctc_loss=0.1291, cr_loss=0.3838, attn_decoder_loss=0.26, over 28796.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1187, cr_loss=0.36, attn_decoder_loss=0.2418, over 5773903.30 frames. ], batch size: 104, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:40:16,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=592640.0, ans=0.125 2024-09-19 04:40:22,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.878e+01 9.274e+01 9.993e+01 2.283e+02, threshold=1.855e+02, percent-clipped=2.0 2024-09-19 04:40:57,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592720.0, ans=0.1 2024-09-19 04:41:03,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=592760.0, ans=0.0 2024-09-19 04:41:05,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=592760.0, ans=0.125 2024-09-19 04:41:07,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.98 vs. limit=10.0 2024-09-19 04:41:08,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2024-09-19 04:41:10,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-09-19 04:41:14,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=592760.0, ans=0.1 2024-09-19 04:41:19,059 INFO [train.py:1198] (0/2) Epoch 33, batch 3400, loss[loss=0.2069, ctc_loss=0.09915, cr_loss=0.3214, attn_decoder_loss=0.2117, over 29349.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1183, cr_loss=0.3589, attn_decoder_loss=0.2415, over 5766150.92 frames. ], batch size: 67, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:41:19,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.20 vs. limit=6.0 2024-09-19 04:41:37,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=592840.0, ans=0.0 2024-09-19 04:41:46,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=592840.0, ans=0.125 2024-09-19 04:41:49,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592880.0, ans=0.1 2024-09-19 04:42:01,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=592880.0, ans=0.0 2024-09-19 04:42:09,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=592920.0, ans=0.125 2024-09-19 04:42:18,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=592960.0, ans=0.2 2024-09-19 04:42:27,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=592960.0, ans=0.125 2024-09-19 04:42:34,749 INFO [train.py:1198] (0/2) Epoch 33, batch 3450, loss[loss=0.2497, ctc_loss=0.1128, cr_loss=0.338, attn_decoder_loss=0.2574, over 28776.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1185, cr_loss=0.3593, attn_decoder_loss=0.242, over 5774617.61 frames. ], batch size: 112, lr: 3.34e-03, grad_scale: 8.0 2024-09-19 04:42:56,751 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.686e+01 9.141e+01 9.790e+01 2.387e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-19 04:43:01,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=593040.0, ans=0.0 2024-09-19 04:43:20,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=593080.0, ans=0.05 2024-09-19 04:43:30,265 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=17.14 vs. limit=15.0 2024-09-19 04:43:52,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=593160.0, ans=0.125 2024-09-19 04:43:55,210 INFO [train.py:1198] (0/2) Epoch 33, batch 3500, loss[loss=0.2213, ctc_loss=0.1099, cr_loss=0.3573, attn_decoder_loss=0.2257, over 29331.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1181, cr_loss=0.3586, attn_decoder_loss=0.2414, over 5776369.97 frames. ], batch size: 71, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:44:20,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=593240.0, ans=0.0 2024-09-19 04:44:22,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=593240.0, ans=0.125 2024-09-19 04:44:33,373 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.74 vs. limit=22.5 2024-09-19 04:44:38,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=22.5 2024-09-19 04:44:44,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=593320.0, ans=0.125 2024-09-19 04:44:56,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=593360.0, ans=0.0 2024-09-19 04:45:09,792 INFO [train.py:1198] (0/2) Epoch 33, batch 3550, loss[loss=0.2439, ctc_loss=0.1259, cr_loss=0.3768, attn_decoder_loss=0.2487, over 29726.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1179, cr_loss=0.3585, attn_decoder_loss=0.2414, over 5782222.21 frames. ], batch size: 89, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:45:23,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593440.0, ans=0.1 2024-09-19 04:45:28,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.555e+01 9.089e+01 9.583e+01 3.040e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-19 04:45:35,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2024-09-19 04:45:38,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=593480.0, ans=0.2 2024-09-19 04:45:54,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=593520.0, ans=0.125 2024-09-19 04:45:55,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=593520.0, ans=0.125 2024-09-19 04:45:59,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-19 04:46:09,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=593560.0, ans=0.2 2024-09-19 04:46:24,264 INFO [train.py:1198] (0/2) Epoch 33, batch 3600, loss[loss=0.2276, ctc_loss=0.1211, cr_loss=0.3785, attn_decoder_loss=0.2311, over 29486.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1178, cr_loss=0.359, attn_decoder_loss=0.2416, over 5791359.08 frames. ], batch size: 77, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:46:46,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2024-09-19 04:46:57,630 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:46:59,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593680.0, ans=0.1 2024-09-19 04:47:21,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=593720.0, ans=0.125 2024-09-19 04:47:21,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=593720.0, ans=0.125 2024-09-19 04:47:34,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=593760.0, ans=0.125 2024-09-19 04:47:38,883 INFO [train.py:1198] (0/2) Epoch 33, batch 3650, loss[loss=0.2526, ctc_loss=0.1303, cr_loss=0.3985, attn_decoder_loss=0.2573, over 29505.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1173, cr_loss=0.3577, attn_decoder_loss=0.2409, over 5793522.54 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:47:42,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=593800.0, ans=0.0 2024-09-19 04:47:58,207 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.325e+01 8.858e+01 9.502e+01 1.563e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 04:47:59,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=593840.0, ans=0.125 2024-09-19 04:48:33,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.13 vs. limit=6.0 2024-09-19 04:48:48,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=593960.0, ans=0.125 2024-09-19 04:48:52,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593960.0, ans=0.1 2024-09-19 04:48:55,547 INFO [train.py:1198] (0/2) Epoch 33, batch 3700, loss[loss=0.2343, ctc_loss=0.1158, cr_loss=0.3436, attn_decoder_loss=0.2399, over 29696.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1172, cr_loss=0.3577, attn_decoder_loss=0.2409, over 5805037.23 frames. ], batch size: 84, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:49:07,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=594000.0, ans=0.125 2024-09-19 04:49:10,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-19 04:49:16,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=594040.0, ans=0.0 2024-09-19 04:49:24,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=594040.0, ans=0.0 2024-09-19 04:49:54,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=594120.0, ans=0.0 2024-09-19 04:49:56,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=594160.0, ans=0.125 2024-09-19 04:50:06,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-19 04:50:11,530 INFO [train.py:1198] (0/2) Epoch 33, batch 3750, loss[loss=0.2094, ctc_loss=0.1003, cr_loss=0.3332, attn_decoder_loss=0.2141, over 29371.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1175, cr_loss=0.3585, attn_decoder_loss=0.241, over 5808391.87 frames. ], batch size: 67, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:50:16,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=594200.0, ans=0.125 2024-09-19 04:50:30,937 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.561e+01 9.006e+01 9.475e+01 6.465e+02, threshold=1.801e+02, percent-clipped=2.0 2024-09-19 04:50:51,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=594280.0, ans=0.025 2024-09-19 04:50:54,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=594320.0, ans=0.2 2024-09-19 04:51:02,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=594320.0, ans=0.0 2024-09-19 04:51:26,179 INFO [train.py:1198] (0/2) Epoch 33, batch 3800, loss[loss=0.2525, ctc_loss=0.1276, cr_loss=0.3731, attn_decoder_loss=0.2581, over 29616.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1174, cr_loss=0.3581, attn_decoder_loss=0.2409, over 5798861.05 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:51:28,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=594400.0, ans=0.125 2024-09-19 04:51:30,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=594400.0, ans=0.2 2024-09-19 04:51:32,428 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:51:45,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=594440.0, ans=0.125 2024-09-19 04:52:21,520 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:52:40,418 INFO [train.py:1198] (0/2) Epoch 33, batch 3850, loss[loss=0.2528, ctc_loss=0.1298, cr_loss=0.3884, attn_decoder_loss=0.2578, over 29264.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1172, cr_loss=0.3576, attn_decoder_loss=0.2407, over 5813138.43 frames. ], batch size: 100, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:52:47,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.82 vs. limit=22.5 2024-09-19 04:52:59,670 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.527e+01 9.047e+01 9.575e+01 1.638e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 04:53:01,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=594640.0, ans=0.125 2024-09-19 04:53:13,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=594680.0, ans=0.125 2024-09-19 04:53:23,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=594720.0, ans=0.0 2024-09-19 04:53:55,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=594800.0, ans=0.0 2024-09-19 04:53:56,281 INFO [train.py:1198] (0/2) Epoch 33, batch 3900, loss[loss=0.2539, ctc_loss=0.1321, cr_loss=0.4044, attn_decoder_loss=0.2584, over 29618.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1177, cr_loss=0.3586, attn_decoder_loss=0.2414, over 5817345.91 frames. ], batch size: 86, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:54:02,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=594800.0, ans=0.025 2024-09-19 04:54:06,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594800.0, ans=0.1 2024-09-19 04:54:08,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=594800.0, ans=0.125 2024-09-19 04:54:10,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.0 2024-09-19 04:54:11,324 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:54:30,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=594880.0, ans=0.125 2024-09-19 04:54:40,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=594920.0, ans=0.125 2024-09-19 04:54:55,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=594960.0, ans=0.125 2024-09-19 04:55:11,406 INFO [train.py:1198] (0/2) Epoch 33, batch 3950, loss[loss=0.2514, ctc_loss=0.1277, cr_loss=0.3787, attn_decoder_loss=0.2567, over 29464.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.3595, attn_decoder_loss=0.2415, over 5836355.13 frames. ], batch size: 97, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 04:55:19,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=595000.0, ans=0.0 2024-09-19 04:55:20,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=595000.0, ans=0.0 2024-09-19 04:55:32,035 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.607e+01 9.033e+01 9.637e+01 1.585e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 04:55:49,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.04 vs. limit=6.0 2024-09-19 04:55:50,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=595080.0, ans=0.0 2024-09-19 04:56:25,736 INFO [train.py:1198] (0/2) Epoch 33, batch 4000, loss[loss=0.2202, ctc_loss=0.1045, cr_loss=0.3358, attn_decoder_loss=0.2255, over 29520.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1185, cr_loss=0.3599, attn_decoder_loss=0.2417, over 5813199.41 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:56:26,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-19 04:56:30,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=595200.0, ans=0.125 2024-09-19 04:56:51,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=595240.0, ans=0.2 2024-09-19 04:57:00,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=595280.0, ans=0.0 2024-09-19 04:57:01,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=595280.0, ans=0.125 2024-09-19 04:57:25,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=595360.0, ans=0.035 2024-09-19 04:57:31,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2024-09-19 04:57:34,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=595360.0, ans=0.0 2024-09-19 04:57:37,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=595360.0, ans=0.025 2024-09-19 04:57:39,807 INFO [train.py:1198] (0/2) Epoch 33, batch 4050, loss[loss=0.2575, ctc_loss=0.1417, cr_loss=0.365, attn_decoder_loss=0.2623, over 20591.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1181, cr_loss=0.359, attn_decoder_loss=0.2412, over 5797523.60 frames. ], batch size: 209, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:58:00,218 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.573e+01 9.185e+01 9.893e+01 2.518e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-19 04:58:00,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=595440.0, ans=0.0 2024-09-19 04:58:10,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=595480.0, ans=0.0 2024-09-19 04:58:24,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=595520.0, ans=0.125 2024-09-19 04:58:37,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=595520.0, ans=0.125 2024-09-19 04:58:52,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=595560.0, ans=0.125 2024-09-19 04:58:55,014 INFO [train.py:1198] (0/2) Epoch 33, batch 4100, loss[loss=0.2465, ctc_loss=0.1267, cr_loss=0.3837, attn_decoder_loss=0.2512, over 29504.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1184, cr_loss=0.3596, attn_decoder_loss=0.2416, over 5793946.99 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 04:59:17,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=595640.0, ans=0.0 2024-09-19 04:59:41,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.30 vs. limit=10.0 2024-09-19 04:59:45,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=595720.0, ans=0.05 2024-09-19 04:59:46,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=595720.0, ans=0.2 2024-09-19 05:00:02,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=595760.0, ans=0.0 2024-09-19 05:00:02,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595760.0, ans=0.1 2024-09-19 05:00:09,891 INFO [train.py:1198] (0/2) Epoch 33, batch 4150, loss[loss=0.2276, ctc_loss=0.1138, cr_loss=0.341, attn_decoder_loss=0.2327, over 29473.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1183, cr_loss=0.3593, attn_decoder_loss=0.2413, over 5799297.35 frames. ], batch size: 77, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 05:00:21,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=595800.0, ans=0.125 2024-09-19 05:00:31,921 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.400e+01 8.837e+01 9.482e+01 1.626e+02, threshold=1.767e+02, percent-clipped=0.0 2024-09-19 05:00:33,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=595840.0, ans=0.1 2024-09-19 05:00:38,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=595880.0, ans=0.07 2024-09-19 05:00:40,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=595880.0, ans=0.0 2024-09-19 05:00:51,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595880.0, ans=0.125 2024-09-19 05:01:07,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=595960.0, ans=0.05 2024-09-19 05:01:23,828 INFO [train.py:1198] (0/2) Epoch 33, batch 4200, loss[loss=0.2543, ctc_loss=0.1396, cr_loss=0.3987, attn_decoder_loss=0.2582, over 29525.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1183, cr_loss=0.3598, attn_decoder_loss=0.2417, over 5801314.76 frames. ], batch size: 90, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:01:33,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.77 vs. limit=22.5 2024-09-19 05:01:45,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=596040.0, ans=0.0 2024-09-19 05:01:47,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=596040.0, ans=0.125 2024-09-19 05:02:06,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=596120.0, ans=0.2 2024-09-19 05:02:09,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=596120.0, ans=0.125 2024-09-19 05:02:17,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=596120.0, ans=0.125 2024-09-19 05:02:23,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=596160.0, ans=0.0 2024-09-19 05:02:38,337 INFO [train.py:1198] (0/2) Epoch 33, batch 4250, loss[loss=0.2179, ctc_loss=0.09707, cr_loss=0.3259, attn_decoder_loss=0.2241, over 29529.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1182, cr_loss=0.3601, attn_decoder_loss=0.2417, over 5806596.31 frames. ], batch size: 74, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:02:44,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=596200.0, ans=0.125 2024-09-19 05:02:59,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2024-09-19 05:02:59,965 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.601e+01 9.024e+01 9.699e+01 1.912e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-19 05:03:00,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596240.0, ans=0.1 2024-09-19 05:03:03,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=596240.0, ans=0.0 2024-09-19 05:03:31,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=596320.0, ans=0.2 2024-09-19 05:03:52,556 INFO [train.py:1198] (0/2) Epoch 33, batch 4300, loss[loss=0.2405, ctc_loss=0.1156, cr_loss=0.358, attn_decoder_loss=0.2464, over 29525.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1177, cr_loss=0.3588, attn_decoder_loss=0.2416, over 5796943.48 frames. ], batch size: 87, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:04:16,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=596440.0, ans=0.2 2024-09-19 05:04:32,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=596480.0, ans=0.09899494936611666 2024-09-19 05:04:38,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=596520.0, ans=0.125 2024-09-19 05:04:41,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=596520.0, ans=0.0 2024-09-19 05:04:52,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=596560.0, ans=0.07 2024-09-19 05:04:55,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=596560.0, ans=0.125 2024-09-19 05:04:55,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=596560.0, ans=0.0 2024-09-19 05:05:07,031 INFO [train.py:1198] (0/2) Epoch 33, batch 4350, loss[loss=0.2632, ctc_loss=0.1437, cr_loss=0.416, attn_decoder_loss=0.2672, over 29489.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1207, cr_loss=0.365, attn_decoder_loss=0.2453, over 5799077.79 frames. ], batch size: 97, lr: 3.33e-03, grad_scale: 8.0 2024-09-19 05:05:13,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=596600.0, ans=0.125 2024-09-19 05:05:19,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=596600.0, ans=0.125 2024-09-19 05:05:24,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=596640.0, ans=0.125 2024-09-19 05:05:29,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.801e+01 9.131e+01 9.765e+01 2.028e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-19 05:05:35,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=596680.0, ans=0.025 2024-09-19 05:05:39,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2024-09-19 05:05:43,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=596680.0, ans=0.025 2024-09-19 05:05:46,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=596680.0, ans=0.1 2024-09-19 05:06:06,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=596760.0, ans=0.125 2024-09-19 05:06:22,397 INFO [train.py:1198] (0/2) Epoch 33, batch 4400, loss[loss=0.251, ctc_loss=0.1388, cr_loss=0.4044, attn_decoder_loss=0.2545, over 27365.00 frames. ], tot_loss[loss=0.2419, ctc_loss=0.1218, cr_loss=0.3673, attn_decoder_loss=0.247, over 5769188.47 frames. ], batch size: 124, lr: 3.32e-03, grad_scale: 16.0 2024-09-19 05:06:23,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.91 vs. limit=12.0 2024-09-19 05:06:25,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=596800.0, ans=0.0 2024-09-19 05:07:09,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.36 vs. limit=12.0 2024-09-19 05:07:16,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=596920.0, ans=0.0 2024-09-19 05:07:36,278 INFO [train.py:1198] (0/2) Epoch 33, batch 4450, loss[loss=0.2507, ctc_loss=0.1368, cr_loss=0.3852, attn_decoder_loss=0.2548, over 20180.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1255, cr_loss=0.373, attn_decoder_loss=0.2494, over 5574626.52 frames. ], batch size: 209, lr: 3.32e-03, grad_scale: 8.0 2024-09-19 05:07:36,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=597000.0, ans=0.125 2024-09-19 05:07:53,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=597040.0, ans=0.125 2024-09-19 05:08:00,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 9.208e+01 9.597e+01 1.124e+02 1.638e+02, threshold=1.919e+02, percent-clipped=0.0 2024-09-19 05:08:05,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=597080.0, ans=0.125 2024-09-19 05:08:15,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-09-19 05:08:17,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=597080.0, ans=0.125 2024-09-19 05:08:49,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=12.0 2024-09-19 05:08:52,062 INFO [train.py:1198] (0/2) Epoch 33, batch 4500, loss[loss=0.2486, ctc_loss=0.1319, cr_loss=0.3676, attn_decoder_loss=0.2534, over 20311.00 frames. ], tot_loss[loss=0.2465, ctc_loss=0.1291, cr_loss=0.3758, attn_decoder_loss=0.2512, over 5231995.43 frames. ], batch size: 209, lr: 3.32e-03, grad_scale: 8.0 2024-09-19 05:08:53,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=597200.0, ans=0.125 2024-09-19 05:08:55,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=597200.0, ans=0.0 2024-09-19 05:08:59,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=597200.0, ans=0.125 2024-09-19 05:08:59,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=597200.0, ans=0.0 2024-09-19 05:09:16,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=597240.0, ans=0.125 2024-09-19 05:09:17,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=597240.0, ans=0.0 2024-09-19 05:09:23,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597280.0, ans=0.1 2024-09-19 05:09:29,391 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-33.pt 2024-09-19 05:10:21,345 INFO [train.py:1198] (0/2) Epoch 34, batch 0, loss[loss=0.2194, ctc_loss=0.1091, cr_loss=0.3431, attn_decoder_loss=0.224, over 29573.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1091, cr_loss=0.3431, attn_decoder_loss=0.224, over 29573.00 frames. ], batch size: 73, lr: 3.27e-03, grad_scale: 16.0 2024-09-19 05:10:21,346 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 05:10:26,212 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0973, 4.9657, 4.7907, 4.4852], device='cuda:0') 2024-09-19 05:10:39,722 INFO [train.py:1230] (0/2) Epoch 34, validation: loss=0.2115, ctc_loss=0.03706, cr_loss=5.889e-15, attn_decoder_loss=0.2309, over 944034.00 frames. 2024-09-19 05:10:39,723 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 05:10:43,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2024-09-19 05:11:08,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=597380.0, ans=10.0 2024-09-19 05:11:14,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=597380.0, ans=0.125 2024-09-19 05:11:32,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=597420.0, ans=0.0 2024-09-19 05:11:45,329 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.942e+01 9.532e+01 1.086e+02 1.158e+02 1.194e+03, threshold=2.172e+02, percent-clipped=2.0 2024-09-19 05:11:48,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=597460.0, ans=0.2 2024-09-19 05:11:54,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=597460.0, ans=0.1 2024-09-19 05:11:57,347 INFO [train.py:1198] (0/2) Epoch 34, batch 50, loss[loss=0.2175, ctc_loss=0.1079, cr_loss=0.3256, attn_decoder_loss=0.2224, over 29430.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1195, cr_loss=0.3625, attn_decoder_loss=0.2422, over 1269156.22 frames. ], batch size: 70, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:12:00,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=597500.0, ans=0.0 2024-09-19 05:12:02,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=597500.0, ans=0.0 2024-09-19 05:12:06,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=597500.0, ans=0.025 2024-09-19 05:12:19,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=597540.0, ans=0.2 2024-09-19 05:12:25,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=597540.0, ans=0.0 2024-09-19 05:13:04,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=597660.0, ans=0.0 2024-09-19 05:13:16,055 INFO [train.py:1198] (0/2) Epoch 34, batch 100, loss[loss=0.2249, ctc_loss=0.1087, cr_loss=0.3264, attn_decoder_loss=0.2305, over 29524.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1218, cr_loss=0.3676, attn_decoder_loss=0.245, over 2252911.65 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:13:22,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=597700.0, ans=0.0 2024-09-19 05:13:34,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=597740.0, ans=0.0 2024-09-19 05:13:53,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=597780.0, ans=0.125 2024-09-19 05:14:02,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=597820.0, ans=0.0 2024-09-19 05:14:04,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=597820.0, ans=0.0 2024-09-19 05:14:10,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.12 vs. limit=10.0 2024-09-19 05:14:14,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=597860.0, ans=0.09899494936611666 2024-09-19 05:14:18,841 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.662e+01 8.574e+01 9.028e+01 9.395e+01 1.381e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-19 05:14:25,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=597860.0, ans=0.0 2024-09-19 05:14:30,768 INFO [train.py:1198] (0/2) Epoch 34, batch 150, loss[loss=0.2194, ctc_loss=0.1059, cr_loss=0.3478, attn_decoder_loss=0.2243, over 29448.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1193, cr_loss=0.3626, attn_decoder_loss=0.2428, over 3048113.80 frames. ], batch size: 70, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:14:32,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=597900.0, ans=0.125 2024-09-19 05:14:41,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=597900.0, ans=0.125 2024-09-19 05:14:46,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=597940.0, ans=0.0 2024-09-19 05:14:47,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=597940.0, ans=0.125 2024-09-19 05:14:57,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=597940.0, ans=0.125 2024-09-19 05:15:06,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=597980.0, ans=0.125 2024-09-19 05:15:19,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=598020.0, ans=0.125 2024-09-19 05:15:27,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=22.5 2024-09-19 05:15:48,458 INFO [train.py:1198] (0/2) Epoch 34, batch 200, loss[loss=0.2557, ctc_loss=0.1329, cr_loss=0.3876, attn_decoder_loss=0.2608, over 27662.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1183, cr_loss=0.3609, attn_decoder_loss=0.2418, over 3659701.12 frames. ], batch size: 125, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:16:12,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2024-09-19 05:16:15,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=598140.0, ans=0.125 2024-09-19 05:16:21,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=598180.0, ans=0.2 2024-09-19 05:16:54,092 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.433e+01 8.957e+01 9.594e+01 1.517e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 05:16:55,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=598260.0, ans=0.125 2024-09-19 05:17:00,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=598260.0, ans=0.125 2024-09-19 05:17:06,366 INFO [train.py:1198] (0/2) Epoch 34, batch 250, loss[loss=0.259, ctc_loss=0.1396, cr_loss=0.4061, attn_decoder_loss=0.2632, over 29221.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1183, cr_loss=0.3614, attn_decoder_loss=0.2417, over 4140720.44 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:17:25,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598340.0, ans=0.1 2024-09-19 05:17:26,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-19 05:17:27,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=598340.0, ans=0.2 2024-09-19 05:17:29,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=598340.0, ans=0.025 2024-09-19 05:17:29,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-19 05:17:42,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2024-09-19 05:17:43,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=598380.0, ans=0.125 2024-09-19 05:18:01,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=598420.0, ans=0.035 2024-09-19 05:18:10,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=598460.0, ans=0.2 2024-09-19 05:18:12,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=598460.0, ans=0.0 2024-09-19 05:18:22,541 INFO [train.py:1198] (0/2) Epoch 34, batch 300, loss[loss=0.2423, ctc_loss=0.1214, cr_loss=0.3659, attn_decoder_loss=0.2476, over 29542.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1173, cr_loss=0.3589, attn_decoder_loss=0.2409, over 4510132.35 frames. ], batch size: 92, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:18:30,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598500.0, ans=0.1 2024-09-19 05:18:32,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2024-09-19 05:18:51,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=598580.0, ans=0.125 2024-09-19 05:19:12,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=598620.0, ans=0.0 2024-09-19 05:19:21,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=598660.0, ans=0.0 2024-09-19 05:19:24,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=598660.0, ans=0.5 2024-09-19 05:19:25,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-19 05:19:25,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2024-09-19 05:19:26,025 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.376e+01 8.844e+01 9.262e+01 3.831e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-19 05:19:40,603 INFO [train.py:1198] (0/2) Epoch 34, batch 350, loss[loss=0.2169, ctc_loss=0.09869, cr_loss=0.3146, attn_decoder_loss=0.223, over 29331.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1177, cr_loss=0.3593, attn_decoder_loss=0.2414, over 4796099.73 frames. ], batch size: 71, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:20:03,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=598740.0, ans=0.125 2024-09-19 05:20:10,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=598780.0, ans=0.0 2024-09-19 05:20:12,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-19 05:20:16,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=598780.0, ans=0.2 2024-09-19 05:20:39,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598860.0, ans=0.1 2024-09-19 05:20:52,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.31 vs. limit=22.5 2024-09-19 05:20:58,172 INFO [train.py:1198] (0/2) Epoch 34, batch 400, loss[loss=0.2441, ctc_loss=0.1211, cr_loss=0.3759, attn_decoder_loss=0.2494, over 29705.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1176, cr_loss=0.3594, attn_decoder_loss=0.2412, over 5025811.08 frames. ], batch size: 82, lr: 3.27e-03, grad_scale: 16.0 2024-09-19 05:21:21,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-19 05:22:02,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.485e+01 9.014e+01 9.585e+01 2.227e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 05:22:14,037 INFO [train.py:1198] (0/2) Epoch 34, batch 450, loss[loss=0.2457, ctc_loss=0.1203, cr_loss=0.3712, attn_decoder_loss=0.2513, over 29690.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1177, cr_loss=0.359, attn_decoder_loss=0.2413, over 5186095.76 frames. ], batch size: 83, lr: 3.27e-03, grad_scale: 16.0 2024-09-19 05:22:19,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=599100.0, ans=0.0 2024-09-19 05:22:49,317 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:22:59,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=22.5 2024-09-19 05:23:09,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=599220.0, ans=0.125 2024-09-19 05:23:30,249 INFO [train.py:1198] (0/2) Epoch 34, batch 500, loss[loss=0.2478, ctc_loss=0.1224, cr_loss=0.3485, attn_decoder_loss=0.254, over 29454.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1175, cr_loss=0.3586, attn_decoder_loss=0.2405, over 5328456.03 frames. ], batch size: 94, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:24:04,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=599380.0, ans=0.0 2024-09-19 05:24:06,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=599380.0, ans=0.0 2024-09-19 05:24:21,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599420.0, ans=0.1 2024-09-19 05:24:21,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.14 vs. limit=15.0 2024-09-19 05:24:27,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=599420.0, ans=0.125 2024-09-19 05:24:27,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=599420.0, ans=0.0 2024-09-19 05:24:37,604 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.516e+01 9.011e+01 9.672e+01 1.492e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 05:24:43,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=599460.0, ans=0.2 2024-09-19 05:24:44,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=599460.0, ans=0.035 2024-09-19 05:24:49,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=599500.0, ans=0.125 2024-09-19 05:24:50,590 INFO [train.py:1198] (0/2) Epoch 34, batch 550, loss[loss=0.2545, ctc_loss=0.1316, cr_loss=0.3991, attn_decoder_loss=0.2593, over 28919.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1172, cr_loss=0.3583, attn_decoder_loss=0.2405, over 5420942.70 frames. ], batch size: 104, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:25:27,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599580.0, ans=0.1 2024-09-19 05:25:29,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.58 vs. limit=10.0 2024-09-19 05:26:04,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-09-19 05:26:06,404 INFO [train.py:1198] (0/2) Epoch 34, batch 600, loss[loss=0.2499, ctc_loss=0.1243, cr_loss=0.3615, attn_decoder_loss=0.2559, over 29312.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1173, cr_loss=0.3578, attn_decoder_loss=0.2406, over 5510135.95 frames. ], batch size: 100, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:26:14,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=599700.0, ans=0.125 2024-09-19 05:26:22,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=12.0 2024-09-19 05:26:29,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=599740.0, ans=0.0 2024-09-19 05:26:44,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2024-09-19 05:26:48,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=599780.0, ans=0.0 2024-09-19 05:26:50,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.38 vs. limit=15.0 2024-09-19 05:27:11,322 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.437e+01 8.830e+01 9.420e+01 2.114e+02, threshold=1.766e+02, percent-clipped=1.0 2024-09-19 05:27:21,946 INFO [train.py:1198] (0/2) Epoch 34, batch 650, loss[loss=0.2493, ctc_loss=0.138, cr_loss=0.4094, attn_decoder_loss=0.2526, over 29765.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1166, cr_loss=0.3567, attn_decoder_loss=0.2401, over 5586936.75 frames. ], batch size: 81, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:28:14,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=600020.0, ans=0.0 2024-09-19 05:28:25,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=600060.0, ans=0.07 2024-09-19 05:28:32,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-09-19 05:28:33,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=600060.0, ans=0.07 2024-09-19 05:28:40,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-09-19 05:28:42,765 INFO [train.py:1198] (0/2) Epoch 34, batch 700, loss[loss=0.2255, ctc_loss=0.1094, cr_loss=0.3321, attn_decoder_loss=0.231, over 29538.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1166, cr_loss=0.3569, attn_decoder_loss=0.2406, over 5635995.72 frames. ], batch size: 76, lr: 3.27e-03, grad_scale: 8.0 2024-09-19 05:28:50,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600100.0, ans=0.1 2024-09-19 05:29:24,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=600180.0, ans=0.09899494936611666 2024-09-19 05:29:48,399 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.364e+01 8.809e+01 9.436e+01 2.463e+02, threshold=1.762e+02, percent-clipped=1.0 2024-09-19 05:29:59,016 INFO [train.py:1198] (0/2) Epoch 34, batch 750, loss[loss=0.2494, ctc_loss=0.1234, cr_loss=0.3739, attn_decoder_loss=0.2551, over 29715.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1168, cr_loss=0.3574, attn_decoder_loss=0.2406, over 5675338.89 frames. ], batch size: 82, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:30:09,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=600300.0, ans=22.5 2024-09-19 05:30:34,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=600380.0, ans=0.0 2024-09-19 05:30:54,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2024-09-19 05:31:04,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=600460.0, ans=0.0 2024-09-19 05:31:07,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600460.0, ans=0.1 2024-09-19 05:31:14,725 INFO [train.py:1198] (0/2) Epoch 34, batch 800, loss[loss=0.2286, ctc_loss=0.1183, cr_loss=0.3534, attn_decoder_loss=0.233, over 29609.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1172, cr_loss=0.3578, attn_decoder_loss=0.2409, over 5705851.11 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:31:21,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=600500.0, ans=0.025 2024-09-19 05:31:43,354 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.53 vs. limit=12.0 2024-09-19 05:32:08,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=600620.0, ans=0.025 2024-09-19 05:32:09,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-09-19 05:32:12,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=600620.0, ans=0.2 2024-09-19 05:32:16,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=600660.0, ans=0.125 2024-09-19 05:32:21,712 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.053e+01 8.379e+01 9.063e+01 9.651e+01 1.795e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-19 05:32:32,186 INFO [train.py:1198] (0/2) Epoch 34, batch 850, loss[loss=0.2408, ctc_loss=0.1243, cr_loss=0.3671, attn_decoder_loss=0.2456, over 29721.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1172, cr_loss=0.3577, attn_decoder_loss=0.2407, over 5735404.90 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:32:37,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600700.0, ans=0.1 2024-09-19 05:32:44,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.58 vs. limit=6.0 2024-09-19 05:32:46,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=600700.0, ans=0.0 2024-09-19 05:32:57,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=600740.0, ans=0.2 2024-09-19 05:32:58,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=600740.0, ans=0.125 2024-09-19 05:33:00,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600740.0, ans=0.1 2024-09-19 05:33:00,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=600740.0, ans=0.125 2024-09-19 05:33:09,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=600780.0, ans=0.125 2024-09-19 05:33:12,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=600780.0, ans=0.125 2024-09-19 05:33:24,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=600820.0, ans=0.125 2024-09-19 05:33:43,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-19 05:33:47,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=600860.0, ans=0.125 2024-09-19 05:33:50,503 INFO [train.py:1198] (0/2) Epoch 34, batch 900, loss[loss=0.2107, ctc_loss=0.1012, cr_loss=0.3282, attn_decoder_loss=0.2156, over 29618.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1176, cr_loss=0.3584, attn_decoder_loss=0.2409, over 5739140.91 frames. ], batch size: 73, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:33:58,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=600900.0, ans=0.025 2024-09-19 05:34:42,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.48 vs. limit=10.0 2024-09-19 05:34:43,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=601020.0, ans=0.1 2024-09-19 05:34:56,770 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.479e+01 9.154e+01 9.598e+01 2.436e+02, threshold=1.831e+02, percent-clipped=2.0 2024-09-19 05:35:05,825 INFO [train.py:1198] (0/2) Epoch 34, batch 950, loss[loss=0.2267, ctc_loss=0.1076, cr_loss=0.3298, attn_decoder_loss=0.2326, over 29523.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1174, cr_loss=0.3582, attn_decoder_loss=0.2409, over 5741451.58 frames. ], batch size: 74, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:35:32,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=601140.0, ans=0.0 2024-09-19 05:35:41,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=601180.0, ans=0.125 2024-09-19 05:35:51,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=601180.0, ans=0.05 2024-09-19 05:36:00,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=601220.0, ans=0.125 2024-09-19 05:36:01,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=601220.0, ans=0.0 2024-09-19 05:36:15,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601260.0, ans=0.125 2024-09-19 05:36:16,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=601260.0, ans=0.125 2024-09-19 05:36:26,135 INFO [train.py:1198] (0/2) Epoch 34, batch 1000, loss[loss=0.2301, ctc_loss=0.1137, cr_loss=0.3711, attn_decoder_loss=0.2348, over 29504.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1178, cr_loss=0.3591, attn_decoder_loss=0.2414, over 5736331.22 frames. ], batch size: 77, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:36:26,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=601300.0, ans=0.0 2024-09-19 05:36:36,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=601300.0, ans=0.125 2024-09-19 05:36:55,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=601380.0, ans=0.0 2024-09-19 05:37:13,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-09-19 05:37:16,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601420.0, ans=0.1 2024-09-19 05:37:23,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=601420.0, ans=0.125 2024-09-19 05:37:28,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=601460.0, ans=0.125 2024-09-19 05:37:32,496 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.521e+01 9.169e+01 9.649e+01 1.531e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-19 05:37:40,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2024-09-19 05:37:41,629 INFO [train.py:1198] (0/2) Epoch 34, batch 1050, loss[loss=0.2458, ctc_loss=0.1238, cr_loss=0.3741, attn_decoder_loss=0.251, over 29681.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1178, cr_loss=0.359, attn_decoder_loss=0.2409, over 5744512.33 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:38:15,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-19 05:38:29,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=601620.0, ans=0.125 2024-09-19 05:38:46,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=601660.0, ans=0.025 2024-09-19 05:38:50,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=601660.0, ans=0.125 2024-09-19 05:38:58,091 INFO [train.py:1198] (0/2) Epoch 34, batch 1100, loss[loss=0.2319, ctc_loss=0.117, cr_loss=0.3647, attn_decoder_loss=0.2366, over 29428.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1175, cr_loss=0.3585, attn_decoder_loss=0.2407, over 5757943.74 frames. ], batch size: 78, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:39:24,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601740.0, ans=0.1 2024-09-19 05:39:35,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=601780.0, ans=0.125 2024-09-19 05:39:53,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=601820.0, ans=0.125 2024-09-19 05:39:59,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=601860.0, ans=0.1 2024-09-19 05:40:06,790 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.458e+01 9.005e+01 9.723e+01 2.492e+02, threshold=1.801e+02, percent-clipped=1.0 2024-09-19 05:40:16,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=601900.0, ans=0.2 2024-09-19 05:40:18,194 INFO [train.py:1198] (0/2) Epoch 34, batch 1150, loss[loss=0.2339, ctc_loss=0.1154, cr_loss=0.3698, attn_decoder_loss=0.2388, over 29421.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1178, cr_loss=0.359, attn_decoder_loss=0.2409, over 5754408.21 frames. ], batch size: 78, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:40:38,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=601940.0, ans=0.125 2024-09-19 05:40:47,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=601980.0, ans=0.125 2024-09-19 05:41:04,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=602020.0, ans=0.125 2024-09-19 05:41:31,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=602060.0, ans=0.125 2024-09-19 05:41:33,784 INFO [train.py:1198] (0/2) Epoch 34, batch 1200, loss[loss=0.2503, ctc_loss=0.1234, cr_loss=0.3678, attn_decoder_loss=0.2563, over 29675.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1181, cr_loss=0.3593, attn_decoder_loss=0.2415, over 5748110.38 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:41:34,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.28 vs. limit=15.0 2024-09-19 05:41:52,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=602140.0, ans=0.125 2024-09-19 05:41:55,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=602140.0, ans=0.125 2024-09-19 05:41:55,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=602140.0, ans=0.0 2024-09-19 05:42:04,516 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:42:04,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=602180.0, ans=0.125 2024-09-19 05:42:22,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=602220.0, ans=0.07 2024-09-19 05:42:24,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=602220.0, ans=0.125 2024-09-19 05:42:24,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=602220.0, ans=0.125 2024-09-19 05:42:42,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.575e+01 9.202e+01 9.867e+01 2.398e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-19 05:42:49,882 INFO [train.py:1198] (0/2) Epoch 34, batch 1250, loss[loss=0.2427, ctc_loss=0.1183, cr_loss=0.3593, attn_decoder_loss=0.2485, over 29531.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1188, cr_loss=0.3612, attn_decoder_loss=0.2425, over 5775313.63 frames. ], batch size: 92, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:42:59,427 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:43:22,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.98 vs. limit=15.0 2024-09-19 05:43:25,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602380.0, ans=0.1 2024-09-19 05:43:30,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=602380.0, ans=0.0 2024-09-19 05:43:34,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-19 05:44:03,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.28 vs. limit=15.0 2024-09-19 05:44:10,635 INFO [train.py:1198] (0/2) Epoch 34, batch 1300, loss[loss=0.2421, ctc_loss=0.1156, cr_loss=0.3493, attn_decoder_loss=0.2484, over 28365.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1183, cr_loss=0.36, attn_decoder_loss=0.2417, over 5780445.01 frames. ], batch size: 111, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:44:32,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=602540.0, ans=0.1 2024-09-19 05:44:33,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=602540.0, ans=0.125 2024-09-19 05:45:10,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=8.0 2024-09-19 05:45:18,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.383e+01 8.885e+01 9.572e+01 2.098e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-19 05:45:20,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602660.0, ans=0.1 2024-09-19 05:45:25,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=602700.0, ans=0.125 2024-09-19 05:45:26,503 INFO [train.py:1198] (0/2) Epoch 34, batch 1350, loss[loss=0.2394, ctc_loss=0.1243, cr_loss=0.3943, attn_decoder_loss=0.2434, over 29748.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.118, cr_loss=0.36, attn_decoder_loss=0.2414, over 5796062.09 frames. ], batch size: 81, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:45:26,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=602700.0, ans=0.125 2024-09-19 05:45:31,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=602700.0, ans=0.125 2024-09-19 05:45:43,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=602740.0, ans=0.125 2024-09-19 05:45:48,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.30 vs. limit=12.0 2024-09-19 05:45:58,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=602780.0, ans=0.125 2024-09-19 05:46:01,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=602780.0, ans=0.125 2024-09-19 05:46:14,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=602820.0, ans=0.2 2024-09-19 05:46:17,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=602820.0, ans=0.125 2024-09-19 05:46:24,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.13 vs. limit=22.5 2024-09-19 05:46:36,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.51 vs. limit=15.0 2024-09-19 05:46:41,831 INFO [train.py:1198] (0/2) Epoch 34, batch 1400, loss[loss=0.2011, ctc_loss=0.08877, cr_loss=0.2917, attn_decoder_loss=0.2072, over 29612.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1177, cr_loss=0.3593, attn_decoder_loss=0.241, over 5808434.91 frames. ], batch size: 69, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:47:15,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=602980.0, ans=0.0 2024-09-19 05:47:51,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.415e+01 9.038e+01 9.472e+01 1.467e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 05:47:55,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603060.0, ans=0.1 2024-09-19 05:47:59,666 INFO [train.py:1198] (0/2) Epoch 34, batch 1450, loss[loss=0.2511, ctc_loss=0.1285, cr_loss=0.3811, attn_decoder_loss=0.2562, over 29476.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1176, cr_loss=0.3588, attn_decoder_loss=0.2413, over 5805225.85 frames. ], batch size: 94, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:48:06,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603100.0, ans=0.1 2024-09-19 05:48:09,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=603100.0, ans=0.1 2024-09-19 05:48:32,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=603180.0, ans=0.0 2024-09-19 05:48:32,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=603180.0, ans=0.2 2024-09-19 05:48:50,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=603220.0, ans=0.0 2024-09-19 05:48:50,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=603220.0, ans=0.0 2024-09-19 05:49:17,722 INFO [train.py:1198] (0/2) Epoch 34, batch 1500, loss[loss=0.2521, ctc_loss=0.1303, cr_loss=0.3866, attn_decoder_loss=0.257, over 29642.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1178, cr_loss=0.3596, attn_decoder_loss=0.2417, over 5805682.33 frames. ], batch size: 86, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:49:23,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2024-09-19 05:49:31,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=603340.0, ans=0.125 2024-09-19 05:49:38,081 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:49:41,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=603340.0, ans=0.09899494936611666 2024-09-19 05:49:42,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=603340.0, ans=0.125 2024-09-19 05:50:04,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=603420.0, ans=0.125 2024-09-19 05:50:05,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=603420.0, ans=0.025 2024-09-19 05:50:13,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=603420.0, ans=0.2 2024-09-19 05:50:13,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.09 vs. limit=15.0 2024-09-19 05:50:26,554 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 8.541e+01 9.102e+01 9.733e+01 3.230e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-19 05:50:28,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-09-19 05:50:34,066 INFO [train.py:1198] (0/2) Epoch 34, batch 1550, loss[loss=0.2474, ctc_loss=0.1245, cr_loss=0.3674, attn_decoder_loss=0.2529, over 29508.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.3595, attn_decoder_loss=0.2415, over 5778841.64 frames. ], batch size: 90, lr: 3.26e-03, grad_scale: 8.0 2024-09-19 05:50:34,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=603500.0, ans=0.125 2024-09-19 05:50:51,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=603540.0, ans=0.125 2024-09-19 05:50:57,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=603540.0, ans=0.0 2024-09-19 05:51:03,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=603580.0, ans=0.0 2024-09-19 05:51:12,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=603580.0, ans=0.125 2024-09-19 05:51:33,964 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:51:37,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.70 vs. limit=15.0 2024-09-19 05:51:41,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=603660.0, ans=0.125 2024-09-19 05:51:53,942 INFO [train.py:1198] (0/2) Epoch 34, batch 1600, loss[loss=0.2353, ctc_loss=0.1059, cr_loss=0.3365, attn_decoder_loss=0.2422, over 29664.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1178, cr_loss=0.3592, attn_decoder_loss=0.2414, over 5762492.51 frames. ], batch size: 85, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:52:24,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=603780.0, ans=0.125 2024-09-19 05:52:34,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603780.0, ans=0.1 2024-09-19 05:52:39,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=603820.0, ans=0.0 2024-09-19 05:52:41,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-09-19 05:53:01,949 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.541e+01 8.929e+01 9.524e+01 1.976e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 05:53:09,391 INFO [train.py:1198] (0/2) Epoch 34, batch 1650, loss[loss=0.2515, ctc_loss=0.1197, cr_loss=0.3566, attn_decoder_loss=0.2582, over 29712.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1176, cr_loss=0.3583, attn_decoder_loss=0.2413, over 5756436.28 frames. ], batch size: 89, lr: 3.26e-03, grad_scale: 16.0 2024-09-19 05:53:43,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=603980.0, ans=0.1 2024-09-19 05:54:11,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=604060.0, ans=0.1 2024-09-19 05:54:14,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=604060.0, ans=0.0 2024-09-19 05:54:17,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=604060.0, ans=0.0 2024-09-19 05:54:17,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=22.5 2024-09-19 05:54:25,724 INFO [train.py:1198] (0/2) Epoch 34, batch 1700, loss[loss=0.2033, ctc_loss=0.09864, cr_loss=0.3037, attn_decoder_loss=0.2081, over 29593.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3577, attn_decoder_loss=0.2412, over 5779018.29 frames. ], batch size: 69, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:54:41,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-19 05:55:27,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=604260.0, ans=0.2 2024-09-19 05:55:33,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=604260.0, ans=0.0 2024-09-19 05:55:35,946 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.515e+01 9.078e+01 9.556e+01 1.170e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 05:55:45,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.12 vs. limit=12.0 2024-09-19 05:55:45,694 INFO [train.py:1198] (0/2) Epoch 34, batch 1750, loss[loss=0.1996, ctc_loss=0.0908, cr_loss=0.3046, attn_decoder_loss=0.205, over 29329.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1172, cr_loss=0.3581, attn_decoder_loss=0.2411, over 5787194.69 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:55:57,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-09-19 05:56:01,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=604340.0, ans=0.125 2024-09-19 05:56:10,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604340.0, ans=0.1 2024-09-19 05:56:18,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-19 05:56:31,893 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.0 2024-09-19 05:56:46,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2024-09-19 05:57:00,866 INFO [train.py:1198] (0/2) Epoch 34, batch 1800, loss[loss=0.2539, ctc_loss=0.1305, cr_loss=0.3957, attn_decoder_loss=0.2588, over 29697.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1176, cr_loss=0.3593, attn_decoder_loss=0.2415, over 5789075.60 frames. ], batch size: 83, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:57:04,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=22.5 2024-09-19 05:57:10,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604500.0, ans=0.125 2024-09-19 05:57:39,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-19 05:57:42,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=604580.0, ans=0.2 2024-09-19 05:57:51,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=604620.0, ans=0.125 2024-09-19 05:57:51,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=604620.0, ans=0.125 2024-09-19 05:58:09,166 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.453e+01 8.879e+01 9.546e+01 1.316e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 05:58:09,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.52 vs. limit=22.5 2024-09-19 05:58:16,918 INFO [train.py:1198] (0/2) Epoch 34, batch 1850, loss[loss=0.2554, ctc_loss=0.1361, cr_loss=0.3976, attn_decoder_loss=0.2598, over 29630.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1171, cr_loss=0.3586, attn_decoder_loss=0.2411, over 5795779.71 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:58:23,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=604700.0, ans=0.125 2024-09-19 05:58:25,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.58 vs. limit=22.5 2024-09-19 05:58:55,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=604780.0, ans=0.125 2024-09-19 05:59:01,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=604820.0, ans=0.5 2024-09-19 05:59:36,980 INFO [train.py:1198] (0/2) Epoch 34, batch 1900, loss[loss=0.2465, ctc_loss=0.1252, cr_loss=0.3884, attn_decoder_loss=0.2514, over 29706.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1176, cr_loss=0.3598, attn_decoder_loss=0.2418, over 5803217.40 frames. ], batch size: 89, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 05:59:42,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 05:59:46,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=604900.0, ans=0.125 2024-09-19 06:00:10,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=604980.0, ans=0.0 2024-09-19 06:00:15,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604980.0, ans=0.125 2024-09-19 06:00:18,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=604980.0, ans=0.5 2024-09-19 06:00:37,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605060.0, ans=0.125 2024-09-19 06:00:39,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=605060.0, ans=0.0 2024-09-19 06:00:39,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605060.0, ans=0.1 2024-09-19 06:00:45,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=605060.0, ans=0.0 2024-09-19 06:00:45,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=605060.0, ans=0.1 2024-09-19 06:00:46,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.799e+01 9.191e+01 9.672e+01 1.531e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-19 06:00:47,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605060.0, ans=0.1 2024-09-19 06:00:52,902 INFO [train.py:1198] (0/2) Epoch 34, batch 1950, loss[loss=0.2361, ctc_loss=0.1257, cr_loss=0.3843, attn_decoder_loss=0.2398, over 29456.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1187, cr_loss=0.3629, attn_decoder_loss=0.2428, over 5818374.05 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:00:54,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605100.0, ans=0.1 2024-09-19 06:00:59,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.62 vs. limit=10.0 2024-09-19 06:01:02,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=605100.0, ans=0.0 2024-09-19 06:01:08,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=605140.0, ans=0.025 2024-09-19 06:01:32,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605180.0, ans=0.1 2024-09-19 06:01:35,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=605180.0, ans=0.125 2024-09-19 06:01:46,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=605220.0, ans=10.0 2024-09-19 06:01:49,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.92 vs. limit=12.0 2024-09-19 06:01:51,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=605260.0, ans=0.125 2024-09-19 06:01:56,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=605260.0, ans=0.125 2024-09-19 06:02:08,569 INFO [train.py:1198] (0/2) Epoch 34, batch 2000, loss[loss=0.206, ctc_loss=0.09164, cr_loss=0.3147, attn_decoder_loss=0.2117, over 29352.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1184, cr_loss=0.3615, attn_decoder_loss=0.2425, over 5797752.63 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:02:10,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=605300.0, ans=0.125 2024-09-19 06:02:16,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=605300.0, ans=0.125 2024-09-19 06:03:05,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605420.0, ans=0.1 2024-09-19 06:03:13,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=605460.0, ans=0.025 2024-09-19 06:03:22,872 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.535e+01 9.098e+01 9.559e+01 2.375e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-19 06:03:26,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=605460.0, ans=0.04949747468305833 2024-09-19 06:03:28,940 INFO [train.py:1198] (0/2) Epoch 34, batch 2050, loss[loss=0.2142, ctc_loss=0.0997, cr_loss=0.3214, attn_decoder_loss=0.2198, over 29459.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1177, cr_loss=0.36, attn_decoder_loss=0.2415, over 5789702.64 frames. ], batch size: 70, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:03:38,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=605500.0, ans=0.125 2024-09-19 06:03:39,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=605500.0, ans=0.2 2024-09-19 06:03:47,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=605540.0, ans=0.1 2024-09-19 06:03:59,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=605580.0, ans=0.5 2024-09-19 06:04:00,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=605580.0, ans=0.025 2024-09-19 06:04:03,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=605580.0, ans=0.0 2024-09-19 06:04:19,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605620.0, ans=0.125 2024-09-19 06:04:44,676 INFO [train.py:1198] (0/2) Epoch 34, batch 2100, loss[loss=0.2351, ctc_loss=0.1153, cr_loss=0.3527, attn_decoder_loss=0.2406, over 29758.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.117, cr_loss=0.3586, attn_decoder_loss=0.2411, over 5801947.88 frames. ], batch size: 81, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:04:57,014 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:05:06,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=605740.0, ans=0.125 2024-09-19 06:05:27,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=605780.0, ans=15.0 2024-09-19 06:05:30,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.09 vs. limit=22.5 2024-09-19 06:05:31,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=605820.0, ans=0.0 2024-09-19 06:05:40,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=605820.0, ans=0.125 2024-09-19 06:05:52,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=605860.0, ans=0.0 2024-09-19 06:05:53,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 8.705e+01 9.050e+01 9.610e+01 1.138e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 06:05:54,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-19 06:05:59,953 INFO [train.py:1198] (0/2) Epoch 34, batch 2150, loss[loss=0.2218, ctc_loss=0.1099, cr_loss=0.3613, attn_decoder_loss=0.2262, over 29456.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1165, cr_loss=0.3574, attn_decoder_loss=0.2406, over 5816554.39 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:06:07,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-09-19 06:06:27,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=605940.0, ans=0.0 2024-09-19 06:06:30,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=605980.0, ans=0.04949747468305833 2024-09-19 06:06:33,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=605980.0, ans=0.0 2024-09-19 06:07:17,779 INFO [train.py:1198] (0/2) Epoch 34, batch 2200, loss[loss=0.246, ctc_loss=0.1226, cr_loss=0.384, attn_decoder_loss=0.2512, over 29612.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1169, cr_loss=0.3578, attn_decoder_loss=0.2409, over 5813414.13 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:07:26,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606100.0, ans=0.1 2024-09-19 06:07:39,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=606140.0, ans=0.015 2024-09-19 06:07:43,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=606140.0, ans=0.0 2024-09-19 06:07:46,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=606140.0, ans=0.125 2024-09-19 06:08:31,029 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.748e+01 9.126e+01 9.549e+01 2.332e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-19 06:08:35,716 INFO [train.py:1198] (0/2) Epoch 34, batch 2250, loss[loss=0.2384, ctc_loss=0.1161, cr_loss=0.3536, attn_decoder_loss=0.2441, over 29721.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.117, cr_loss=0.3579, attn_decoder_loss=0.2409, over 5813334.29 frames. ], batch size: 82, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:08:40,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=606300.0, ans=0.125 2024-09-19 06:08:42,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606300.0, ans=0.1 2024-09-19 06:08:52,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=606340.0, ans=0.05 2024-09-19 06:08:54,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=606340.0, ans=0.125 2024-09-19 06:09:01,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=606340.0, ans=0.0 2024-09-19 06:09:03,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=606340.0, ans=0.0 2024-09-19 06:09:14,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=606380.0, ans=0.0 2024-09-19 06:09:20,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606420.0, ans=0.1 2024-09-19 06:09:20,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=606420.0, ans=0.0 2024-09-19 06:09:51,737 INFO [train.py:1198] (0/2) Epoch 34, batch 2300, loss[loss=0.215, ctc_loss=0.1026, cr_loss=0.324, attn_decoder_loss=0.2202, over 29752.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1163, cr_loss=0.3563, attn_decoder_loss=0.24, over 5801343.14 frames. ], batch size: 72, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:09:54,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=606500.0, ans=0.2 2024-09-19 06:10:13,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=606540.0, ans=0.0 2024-09-19 06:10:22,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=606580.0, ans=0.125 2024-09-19 06:10:31,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=606580.0, ans=0.0 2024-09-19 06:10:31,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606580.0, ans=0.1 2024-09-19 06:11:00,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=606660.0, ans=0.125 2024-09-19 06:11:02,991 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 8.516e+01 9.072e+01 9.584e+01 2.753e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 06:11:07,527 INFO [train.py:1198] (0/2) Epoch 34, batch 2350, loss[loss=0.2409, ctc_loss=0.123, cr_loss=0.3651, attn_decoder_loss=0.2459, over 29665.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1164, cr_loss=0.3566, attn_decoder_loss=0.2402, over 5806138.52 frames. ], batch size: 83, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:11:17,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=606700.0, ans=0.125 2024-09-19 06:11:24,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606700.0, ans=0.125 2024-09-19 06:11:50,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.51 vs. limit=22.5 2024-09-19 06:12:14,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=606860.0, ans=0.125 2024-09-19 06:12:17,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.74 vs. limit=10.0 2024-09-19 06:12:23,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=606860.0, ans=0.025 2024-09-19 06:12:27,789 INFO [train.py:1198] (0/2) Epoch 34, batch 2400, loss[loss=0.2356, ctc_loss=0.1216, cr_loss=0.3724, attn_decoder_loss=0.24, over 29545.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1173, cr_loss=0.3586, attn_decoder_loss=0.2407, over 5809527.64 frames. ], batch size: 76, lr: 3.25e-03, grad_scale: 16.0 2024-09-19 06:12:48,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.86 vs. limit=15.0 2024-09-19 06:12:49,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=606940.0, ans=0.125 2024-09-19 06:13:12,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=22.5 2024-09-19 06:13:40,174 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.501e+01 8.985e+01 9.485e+01 2.487e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 06:13:42,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607100.0, ans=0.125 2024-09-19 06:13:43,277 INFO [train.py:1198] (0/2) Epoch 34, batch 2450, loss[loss=0.2482, ctc_loss=0.1265, cr_loss=0.388, attn_decoder_loss=0.2531, over 29669.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.118, cr_loss=0.3595, attn_decoder_loss=0.2416, over 5785645.05 frames. ], batch size: 82, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:13:49,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=607100.0, ans=0.1 2024-09-19 06:14:03,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=607140.0, ans=0.025 2024-09-19 06:14:11,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-09-19 06:14:20,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=607180.0, ans=0.0 2024-09-19 06:14:32,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=607220.0, ans=0.125 2024-09-19 06:14:42,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607260.0, ans=0.1 2024-09-19 06:14:54,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=607260.0, ans=0.1 2024-09-19 06:14:59,118 INFO [train.py:1198] (0/2) Epoch 34, batch 2500, loss[loss=0.2451, ctc_loss=0.1289, cr_loss=0.381, attn_decoder_loss=0.2496, over 29611.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1179, cr_loss=0.3595, attn_decoder_loss=0.2413, over 5796074.54 frames. ], batch size: 86, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:15:10,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=607300.0, ans=0.125 2024-09-19 06:15:20,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=607340.0, ans=0.2 2024-09-19 06:15:46,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607380.0, ans=0.1 2024-09-19 06:15:54,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=607420.0, ans=0.125 2024-09-19 06:16:16,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2024-09-19 06:16:16,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.449e+01 8.900e+01 9.375e+01 2.079e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 06:16:19,647 INFO [train.py:1198] (0/2) Epoch 34, batch 2550, loss[loss=0.2072, ctc_loss=0.1032, cr_loss=0.3173, attn_decoder_loss=0.2117, over 29355.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1181, cr_loss=0.36, attn_decoder_loss=0.2416, over 5799320.07 frames. ], batch size: 67, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:16:30,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=607500.0, ans=0.125 2024-09-19 06:16:53,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=22.5 2024-09-19 06:16:54,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=607580.0, ans=0.0 2024-09-19 06:16:57,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=607580.0, ans=0.125 2024-09-19 06:17:05,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.37 vs. limit=22.5 2024-09-19 06:17:08,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607620.0, ans=0.1 2024-09-19 06:17:35,380 INFO [train.py:1198] (0/2) Epoch 34, batch 2600, loss[loss=0.2272, ctc_loss=0.1051, cr_loss=0.331, attn_decoder_loss=0.2334, over 29479.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1178, cr_loss=0.3594, attn_decoder_loss=0.2417, over 5794736.32 frames. ], batch size: 78, lr: 3.25e-03, grad_scale: 8.0 2024-09-19 06:17:38,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-09-19 06:17:43,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-09-19 06:18:14,575 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:18:43,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=607860.0, ans=0.125 2024-09-19 06:18:47,317 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.717e+01 9.275e+01 9.753e+01 1.560e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 06:18:49,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=607900.0, ans=0.0 2024-09-19 06:18:50,258 INFO [train.py:1198] (0/2) Epoch 34, batch 2650, loss[loss=0.2517, ctc_loss=0.1265, cr_loss=0.3824, attn_decoder_loss=0.2572, over 29195.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1178, cr_loss=0.3594, attn_decoder_loss=0.242, over 5801353.06 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:18:57,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.76 vs. limit=22.5 2024-09-19 06:18:58,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=607900.0, ans=0.0 2024-09-19 06:19:31,349 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-152000.pt 2024-09-19 06:20:04,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=608060.0, ans=0.0 2024-09-19 06:20:10,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=608060.0, ans=0.0 2024-09-19 06:20:17,682 INFO [train.py:1198] (0/2) Epoch 34, batch 2700, loss[loss=0.2365, ctc_loss=0.1128, cr_loss=0.3434, attn_decoder_loss=0.2426, over 29515.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1183, cr_loss=0.3606, attn_decoder_loss=0.2422, over 5795420.10 frames. ], batch size: 87, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:20:25,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608100.0, ans=0.1 2024-09-19 06:20:32,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=608140.0, ans=0.2 2024-09-19 06:21:03,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=608220.0, ans=0.125 2024-09-19 06:21:15,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=22.5 2024-09-19 06:21:30,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.546e+01 9.039e+01 9.900e+01 1.946e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 06:21:33,579 INFO [train.py:1198] (0/2) Epoch 34, batch 2750, loss[loss=0.2294, ctc_loss=0.1112, cr_loss=0.3455, attn_decoder_loss=0.2349, over 29526.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1172, cr_loss=0.3582, attn_decoder_loss=0.241, over 5794099.90 frames. ], batch size: 75, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:22:02,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=608380.0, ans=0.125 2024-09-19 06:22:10,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608380.0, ans=0.1 2024-09-19 06:22:11,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.62 vs. limit=22.5 2024-09-19 06:22:17,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=608420.0, ans=0.0 2024-09-19 06:22:33,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=15.0 2024-09-19 06:22:36,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=608460.0, ans=0.125 2024-09-19 06:22:37,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608460.0, ans=0.1 2024-09-19 06:22:38,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-09-19 06:22:42,504 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-09-19 06:22:42,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-19 06:22:47,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.17 vs. limit=15.0 2024-09-19 06:22:51,655 INFO [train.py:1198] (0/2) Epoch 34, batch 2800, loss[loss=0.2591, ctc_loss=0.1473, cr_loss=0.3894, attn_decoder_loss=0.2629, over 20100.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1177, cr_loss=0.3592, attn_decoder_loss=0.2413, over 5774374.08 frames. ], batch size: 209, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 06:23:02,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=608500.0, ans=0.0 2024-09-19 06:23:02,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=608500.0, ans=0.0 2024-09-19 06:23:07,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=608540.0, ans=0.0 2024-09-19 06:23:07,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=608540.0, ans=0.125 2024-09-19 06:23:19,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608540.0, ans=0.1 2024-09-19 06:23:37,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608620.0, ans=0.1 2024-09-19 06:23:44,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-19 06:23:48,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=608620.0, ans=0.125 2024-09-19 06:24:07,377 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.476e+01 9.039e+01 9.642e+01 3.312e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 06:24:08,977 INFO [train.py:1198] (0/2) Epoch 34, batch 2850, loss[loss=0.2205, ctc_loss=0.1014, cr_loss=0.321, attn_decoder_loss=0.2266, over 29509.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1181, cr_loss=0.3601, attn_decoder_loss=0.2418, over 5761306.59 frames. ], batch size: 77, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:24:09,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=608700.0, ans=0.125 2024-09-19 06:24:49,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=608780.0, ans=0.125 2024-09-19 06:24:59,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=608820.0, ans=0.0 2024-09-19 06:25:05,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=608820.0, ans=0.125 2024-09-19 06:25:17,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=608860.0, ans=0.0 2024-09-19 06:25:20,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=608860.0, ans=0.0 2024-09-19 06:25:25,015 INFO [train.py:1198] (0/2) Epoch 34, batch 2900, loss[loss=0.2308, ctc_loss=0.1137, cr_loss=0.3512, attn_decoder_loss=0.236, over 29422.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1187, cr_loss=0.3616, attn_decoder_loss=0.2429, over 5786338.83 frames. ], batch size: 79, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:25:37,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=608900.0, ans=0.0 2024-09-19 06:25:40,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-19 06:25:43,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=608940.0, ans=0.125 2024-09-19 06:25:57,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608980.0, ans=0.1 2024-09-19 06:25:58,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=608980.0, ans=0.125 2024-09-19 06:26:27,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=609060.0, ans=0.0 2024-09-19 06:26:41,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.332e+01 8.811e+01 9.212e+01 1.381e+02, threshold=1.762e+02, percent-clipped=0.0 2024-09-19 06:26:42,923 INFO [train.py:1198] (0/2) Epoch 34, batch 2950, loss[loss=0.2267, ctc_loss=0.1206, cr_loss=0.3745, attn_decoder_loss=0.2301, over 29522.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1176, cr_loss=0.3594, attn_decoder_loss=0.2414, over 5781504.44 frames. ], batch size: 75, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:26:46,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=609100.0, ans=0.1 2024-09-19 06:26:47,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=609100.0, ans=0.125 2024-09-19 06:26:52,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=609100.0, ans=0.0 2024-09-19 06:27:16,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2024-09-19 06:27:20,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-09-19 06:27:21,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=609180.0, ans=0.0 2024-09-19 06:27:31,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-09-19 06:27:37,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-09-19 06:27:43,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609220.0, ans=0.125 2024-09-19 06:28:00,851 INFO [train.py:1198] (0/2) Epoch 34, batch 3000, loss[loss=0.2349, ctc_loss=0.1149, cr_loss=0.3591, attn_decoder_loss=0.2403, over 29755.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1175, cr_loss=0.3591, attn_decoder_loss=0.2411, over 5781958.39 frames. ], batch size: 81, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:28:00,852 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 06:28:19,438 INFO [train.py:1230] (0/2) Epoch 34, validation: loss=0.2118, ctc_loss=0.03645, cr_loss=6.088e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-19 06:28:19,438 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 06:28:25,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=609300.0, ans=0.0 2024-09-19 06:28:39,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=609340.0, ans=0.0 2024-09-19 06:28:39,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=609340.0, ans=0.0 2024-09-19 06:28:47,262 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:28:48,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=609380.0, ans=0.1 2024-09-19 06:28:56,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-19 06:29:33,591 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.609e+01 9.134e+01 9.597e+01 3.076e+02, threshold=1.827e+02, percent-clipped=2.0 2024-09-19 06:29:35,190 INFO [train.py:1198] (0/2) Epoch 34, batch 3050, loss[loss=0.2252, ctc_loss=0.1083, cr_loss=0.3428, attn_decoder_loss=0.2305, over 29534.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1183, cr_loss=0.3608, attn_decoder_loss=0.2418, over 5775736.45 frames. ], batch size: 76, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:29:43,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=609500.0, ans=0.5 2024-09-19 06:29:59,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=609540.0, ans=0.125 2024-09-19 06:30:05,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=609580.0, ans=0.0 2024-09-19 06:30:07,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=609580.0, ans=0.125 2024-09-19 06:30:23,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=609620.0, ans=0.125 2024-09-19 06:30:45,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=609660.0, ans=0.0 2024-09-19 06:30:45,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=609660.0, ans=0.0 2024-09-19 06:30:53,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=609700.0, ans=0.125 2024-09-19 06:30:54,917 INFO [train.py:1198] (0/2) Epoch 34, batch 3100, loss[loss=0.2504, ctc_loss=0.1249, cr_loss=0.3771, attn_decoder_loss=0.256, over 29277.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1178, cr_loss=0.3599, attn_decoder_loss=0.2415, over 5774949.34 frames. ], batch size: 100, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:30:55,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=609700.0, ans=0.0 2024-09-19 06:30:56,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=609700.0, ans=0.125 2024-09-19 06:30:56,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=609700.0, ans=0.125 2024-09-19 06:31:40,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=609820.0, ans=0.125 2024-09-19 06:32:09,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.517e+01 8.968e+01 9.546e+01 1.931e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-19 06:32:10,672 INFO [train.py:1198] (0/2) Epoch 34, batch 3150, loss[loss=0.2445, ctc_loss=0.1183, cr_loss=0.3632, attn_decoder_loss=0.2504, over 28906.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1177, cr_loss=0.3594, attn_decoder_loss=0.2414, over 5781962.20 frames. ], batch size: 104, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:32:15,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=609900.0, ans=0.0 2024-09-19 06:32:15,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609900.0, ans=0.125 2024-09-19 06:32:15,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=609900.0, ans=0.2 2024-09-19 06:32:26,207 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:32:29,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.38 vs. limit=15.0 2024-09-19 06:32:45,756 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:32:47,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=609980.0, ans=0.0 2024-09-19 06:32:54,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=610020.0, ans=0.0 2024-09-19 06:33:00,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=610020.0, ans=0.0 2024-09-19 06:33:22,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-19 06:33:25,799 INFO [train.py:1198] (0/2) Epoch 34, batch 3200, loss[loss=0.2334, ctc_loss=0.1149, cr_loss=0.3571, attn_decoder_loss=0.2386, over 29404.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1174, cr_loss=0.3585, attn_decoder_loss=0.2411, over 5792859.14 frames. ], batch size: 79, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 06:34:24,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=610220.0, ans=0.125 2024-09-19 06:34:26,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610220.0, ans=0.1 2024-09-19 06:34:27,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=610260.0, ans=0.125 2024-09-19 06:34:42,401 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.499e+01 9.052e+01 9.605e+01 1.287e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 06:34:43,869 INFO [train.py:1198] (0/2) Epoch 34, batch 3250, loss[loss=0.2494, ctc_loss=0.1257, cr_loss=0.3653, attn_decoder_loss=0.255, over 29710.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1178, cr_loss=0.3597, attn_decoder_loss=0.2417, over 5799477.54 frames. ], batch size: 84, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 06:35:03,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=610340.0, ans=0.125 2024-09-19 06:35:13,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.36 vs. limit=15.0 2024-09-19 06:35:16,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=610380.0, ans=0.0 2024-09-19 06:35:24,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.74 vs. limit=15.0 2024-09-19 06:35:42,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=610420.0, ans=0.125 2024-09-19 06:35:43,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-19 06:35:57,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610460.0, ans=0.125 2024-09-19 06:36:01,921 INFO [train.py:1198] (0/2) Epoch 34, batch 3300, loss[loss=0.2489, ctc_loss=0.1238, cr_loss=0.3492, attn_decoder_loss=0.2551, over 28385.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1169, cr_loss=0.3575, attn_decoder_loss=0.2405, over 5797321.67 frames. ], batch size: 111, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:36:27,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.78 vs. limit=15.0 2024-09-19 06:36:34,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=610580.0, ans=10.0 2024-09-19 06:36:44,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=610580.0, ans=0.125 2024-09-19 06:36:55,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=610620.0, ans=0.125 2024-09-19 06:36:59,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=610620.0, ans=0.125 2024-09-19 06:37:01,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=610660.0, ans=0.125 2024-09-19 06:37:17,120 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.592e+01 9.077e+01 9.630e+01 2.771e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 06:37:17,142 INFO [train.py:1198] (0/2) Epoch 34, batch 3350, loss[loss=0.2474, ctc_loss=0.1272, cr_loss=0.3654, attn_decoder_loss=0.2526, over 28898.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1177, cr_loss=0.3583, attn_decoder_loss=0.2414, over 5775260.55 frames. ], batch size: 104, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:37:25,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=610700.0, ans=0.125 2024-09-19 06:37:31,172 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:37:58,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-09-19 06:38:06,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=610820.0, ans=0.2 2024-09-19 06:38:18,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=610860.0, ans=0.1 2024-09-19 06:38:21,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610860.0, ans=0.1 2024-09-19 06:38:29,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=610860.0, ans=0.125 2024-09-19 06:38:35,445 INFO [train.py:1198] (0/2) Epoch 34, batch 3400, loss[loss=0.2121, ctc_loss=0.1067, cr_loss=0.3267, attn_decoder_loss=0.2166, over 29370.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1179, cr_loss=0.359, attn_decoder_loss=0.2414, over 5767374.81 frames. ], batch size: 67, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:38:41,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=610900.0, ans=0.2 2024-09-19 06:38:42,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.28 vs. limit=22.5 2024-09-19 06:38:51,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=610940.0, ans=0.125 2024-09-19 06:38:54,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=610940.0, ans=0.0 2024-09-19 06:38:59,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=610940.0, ans=0.025 2024-09-19 06:39:53,416 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.710e+01 9.261e+01 9.751e+01 2.657e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 06:39:53,439 INFO [train.py:1198] (0/2) Epoch 34, batch 3450, loss[loss=0.2451, ctc_loss=0.1237, cr_loss=0.3472, attn_decoder_loss=0.2509, over 28225.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1181, cr_loss=0.3595, attn_decoder_loss=0.2417, over 5775752.82 frames. ], batch size: 111, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:39:55,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.15 vs. limit=6.0 2024-09-19 06:40:03,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=611100.0, ans=0.125 2024-09-19 06:40:32,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=611180.0, ans=0.1 2024-09-19 06:40:36,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=611180.0, ans=0.95 2024-09-19 06:40:41,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=611220.0, ans=0.0 2024-09-19 06:40:51,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=611220.0, ans=0.1 2024-09-19 06:41:09,771 INFO [train.py:1198] (0/2) Epoch 34, batch 3500, loss[loss=0.2043, ctc_loss=0.09207, cr_loss=0.2917, attn_decoder_loss=0.2103, over 29343.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1177, cr_loss=0.3583, attn_decoder_loss=0.2411, over 5776833.37 frames. ], batch size: 71, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:41:10,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=611300.0, ans=0.125 2024-09-19 06:41:50,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=611380.0, ans=0.125 2024-09-19 06:42:14,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=611460.0, ans=0.125 2024-09-19 06:42:20,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=611460.0, ans=0.0 2024-09-19 06:42:25,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.57 vs. limit=22.5 2024-09-19 06:42:26,135 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.638e+01 9.255e+01 9.995e+01 3.984e+02, threshold=1.851e+02, percent-clipped=2.0 2024-09-19 06:42:26,157 INFO [train.py:1198] (0/2) Epoch 34, batch 3550, loss[loss=0.246, ctc_loss=0.1184, cr_loss=0.3623, attn_decoder_loss=0.2521, over 29714.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1173, cr_loss=0.3579, attn_decoder_loss=0.2411, over 5781753.99 frames. ], batch size: 89, lr: 3.24e-03, grad_scale: 8.0 2024-09-19 06:42:33,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=611500.0, ans=0.1 2024-09-19 06:42:45,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=611540.0, ans=0.0 2024-09-19 06:42:58,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=611580.0, ans=0.125 2024-09-19 06:43:14,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-09-19 06:43:42,115 INFO [train.py:1198] (0/2) Epoch 34, batch 3600, loss[loss=0.2252, ctc_loss=0.1095, cr_loss=0.3478, attn_decoder_loss=0.2304, over 29487.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.117, cr_loss=0.3576, attn_decoder_loss=0.2412, over 5791255.91 frames. ], batch size: 77, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 06:44:04,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=611740.0, ans=0.035 2024-09-19 06:44:33,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=611820.0, ans=0.125 2024-09-19 06:44:33,698 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-09-19 06:44:41,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611860.0, ans=0.1 2024-09-19 06:44:56,827 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.640e+01 9.081e+01 9.603e+01 2.325e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 06:44:56,853 INFO [train.py:1198] (0/2) Epoch 34, batch 3650, loss[loss=0.2478, ctc_loss=0.1263, cr_loss=0.376, attn_decoder_loss=0.2529, over 29529.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1167, cr_loss=0.3574, attn_decoder_loss=0.2406, over 5793945.81 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 06:45:12,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=12.0 2024-09-19 06:45:27,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=611980.0, ans=0.0 2024-09-19 06:45:29,474 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-19 06:45:58,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=612060.0, ans=0.125 2024-09-19 06:46:11,758 INFO [train.py:1198] (0/2) Epoch 34, batch 3700, loss[loss=0.2469, ctc_loss=0.121, cr_loss=0.3542, attn_decoder_loss=0.253, over 29713.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1171, cr_loss=0.3579, attn_decoder_loss=0.2409, over 5804572.50 frames. ], batch size: 84, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:46:24,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=612100.0, ans=0.0 2024-09-19 06:46:45,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2024-09-19 06:46:52,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=612180.0, ans=0.1 2024-09-19 06:47:02,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=612220.0, ans=0.0 2024-09-19 06:47:11,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=612260.0, ans=0.05 2024-09-19 06:47:25,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.56 vs. limit=15.0 2024-09-19 06:47:26,247 INFO [train.py:1198] (0/2) Epoch 34, batch 3750, loss[loss=0.2075, ctc_loss=0.09346, cr_loss=0.3142, attn_decoder_loss=0.2132, over 29385.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.117, cr_loss=0.3576, attn_decoder_loss=0.2408, over 5808777.09 frames. ], batch size: 67, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:47:27,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.454e+01 8.933e+01 9.373e+01 1.602e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-19 06:47:30,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=612300.0, ans=0.2 2024-09-19 06:47:32,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=612300.0, ans=0.125 2024-09-19 06:47:44,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=612340.0, ans=0.125 2024-09-19 06:48:08,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=612380.0, ans=0.2 2024-09-19 06:48:17,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.80 vs. limit=15.0 2024-09-19 06:48:26,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612460.0, ans=0.125 2024-09-19 06:48:42,284 INFO [train.py:1198] (0/2) Epoch 34, batch 3800, loss[loss=0.2511, ctc_loss=0.1253, cr_loss=0.3749, attn_decoder_loss=0.2567, over 29628.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1168, cr_loss=0.3576, attn_decoder_loss=0.2404, over 5799262.89 frames. ], batch size: 86, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:48:42,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=612500.0, ans=0.2 2024-09-19 06:49:01,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=612540.0, ans=0.2 2024-09-19 06:49:05,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=612540.0, ans=0.1 2024-09-19 06:49:08,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2024-09-19 06:49:39,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=612620.0, ans=0.125 2024-09-19 06:49:48,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=612660.0, ans=10.0 2024-09-19 06:49:58,141 INFO [train.py:1198] (0/2) Epoch 34, batch 3850, loss[loss=0.2471, ctc_loss=0.1164, cr_loss=0.3667, attn_decoder_loss=0.2535, over 29231.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1165, cr_loss=0.3568, attn_decoder_loss=0.2402, over 5812783.31 frames. ], batch size: 100, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:49:59,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.333e+01 8.389e+01 8.951e+01 9.412e+01 1.497e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 06:50:16,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=612740.0, ans=0.0 2024-09-19 06:50:16,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=612740.0, ans=0.1 2024-09-19 06:50:19,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=612740.0, ans=0.0 2024-09-19 06:50:22,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=612740.0, ans=0.0 2024-09-19 06:50:46,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=612820.0, ans=0.0 2024-09-19 06:51:07,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=612860.0, ans=0.0 2024-09-19 06:51:12,887 INFO [train.py:1198] (0/2) Epoch 34, batch 3900, loss[loss=0.2449, ctc_loss=0.1235, cr_loss=0.3596, attn_decoder_loss=0.2504, over 29667.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1171, cr_loss=0.3579, attn_decoder_loss=0.241, over 5816973.05 frames. ], batch size: 86, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:51:17,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=612900.0, ans=0.2 2024-09-19 06:51:36,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=612940.0, ans=0.02 2024-09-19 06:51:36,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=612940.0, ans=0.125 2024-09-19 06:51:47,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=612980.0, ans=0.125 2024-09-19 06:51:57,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=613020.0, ans=0.2 2024-09-19 06:52:10,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=613060.0, ans=0.2 2024-09-19 06:52:26,846 INFO [train.py:1198] (0/2) Epoch 34, batch 3950, loss[loss=0.2507, ctc_loss=0.131, cr_loss=0.3736, attn_decoder_loss=0.2557, over 29469.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1172, cr_loss=0.3583, attn_decoder_loss=0.2413, over 5836260.80 frames. ], batch size: 97, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:52:28,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.556e+01 9.009e+01 9.395e+01 1.816e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 06:52:29,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.55 vs. limit=6.0 2024-09-19 06:52:30,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=613100.0, ans=0.125 2024-09-19 06:52:32,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=613100.0, ans=0.125 2024-09-19 06:52:56,627 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:53:42,106 INFO [train.py:1198] (0/2) Epoch 34, batch 4000, loss[loss=0.2209, ctc_loss=0.1051, cr_loss=0.3291, attn_decoder_loss=0.2265, over 29524.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1173, cr_loss=0.3584, attn_decoder_loss=0.2414, over 5813036.17 frames. ], batch size: 74, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 06:53:52,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=613300.0, ans=0.025 2024-09-19 06:53:55,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=613340.0, ans=0.0 2024-09-19 06:54:22,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=613380.0, ans=0.0 2024-09-19 06:54:33,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-09-19 06:54:47,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=613460.0, ans=0.125 2024-09-19 06:54:49,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=613460.0, ans=6.0 2024-09-19 06:54:55,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-19 06:54:57,764 INFO [train.py:1198] (0/2) Epoch 34, batch 4050, loss[loss=0.2581, ctc_loss=0.1477, cr_loss=0.3782, attn_decoder_loss=0.262, over 20349.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1175, cr_loss=0.3583, attn_decoder_loss=0.2412, over 5796039.38 frames. ], batch size: 209, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:55:00,713 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.471e+01 9.121e+01 9.639e+01 2.999e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 06:55:00,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=613500.0, ans=0.125 2024-09-19 06:55:14,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613540.0, ans=0.125 2024-09-19 06:55:30,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613580.0, ans=0.1 2024-09-19 06:55:38,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=613580.0, ans=0.125 2024-09-19 06:56:00,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613660.0, ans=0.125 2024-09-19 06:56:11,834 INFO [train.py:1198] (0/2) Epoch 34, batch 4100, loss[loss=0.2538, ctc_loss=0.1374, cr_loss=0.4112, attn_decoder_loss=0.2575, over 29490.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1178, cr_loss=0.359, attn_decoder_loss=0.2415, over 5791912.71 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:56:13,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=613700.0, ans=0.0 2024-09-19 06:57:00,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=613820.0, ans=0.125 2024-09-19 06:57:00,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=613820.0, ans=0.2 2024-09-19 06:57:15,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=613860.0, ans=0.0 2024-09-19 06:57:24,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=613900.0, ans=0.0 2024-09-19 06:57:25,788 INFO [train.py:1198] (0/2) Epoch 34, batch 4150, loss[loss=0.2313, ctc_loss=0.1147, cr_loss=0.3529, attn_decoder_loss=0.2364, over 29493.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1176, cr_loss=0.3587, attn_decoder_loss=0.2412, over 5796813.55 frames. ], batch size: 77, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:57:28,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 8.506e+01 8.901e+01 9.635e+01 1.346e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 06:57:33,722 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:57:38,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=613900.0, ans=0.125 2024-09-19 06:57:43,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=613940.0, ans=0.0 2024-09-19 06:57:55,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=613980.0, ans=0.0 2024-09-19 06:57:59,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=613980.0, ans=0.125 2024-09-19 06:58:24,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=614060.0, ans=0.2 2024-09-19 06:58:27,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614060.0, ans=0.1 2024-09-19 06:58:40,903 INFO [train.py:1198] (0/2) Epoch 34, batch 4200, loss[loss=0.2402, ctc_loss=0.1176, cr_loss=0.3527, attn_decoder_loss=0.246, over 29478.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1176, cr_loss=0.3587, attn_decoder_loss=0.2415, over 5798309.20 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:58:41,230 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:58:44,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614100.0, ans=0.1 2024-09-19 06:58:53,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-09-19 06:59:03,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=614140.0, ans=0.125 2024-09-19 06:59:11,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=614180.0, ans=0.0 2024-09-19 06:59:13,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2024-09-19 06:59:55,719 INFO [train.py:1198] (0/2) Epoch 34, batch 4250, loss[loss=0.2207, ctc_loss=0.1032, cr_loss=0.3243, attn_decoder_loss=0.2265, over 29508.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1172, cr_loss=0.3584, attn_decoder_loss=0.2417, over 5804980.99 frames. ], batch size: 74, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 06:59:58,622 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.496e+01 8.853e+01 9.381e+01 2.444e+02, threshold=1.771e+02, percent-clipped=1.0 2024-09-19 07:00:09,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=614340.0, ans=0.0 2024-09-19 07:00:49,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=614420.0, ans=0.125 2024-09-19 07:00:56,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614460.0, ans=0.1 2024-09-19 07:01:02,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=614460.0, ans=0.125 2024-09-19 07:01:05,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=614460.0, ans=0.125 2024-09-19 07:01:09,461 INFO [train.py:1198] (0/2) Epoch 34, batch 4300, loss[loss=0.251, ctc_loss=0.1277, cr_loss=0.3973, attn_decoder_loss=0.2559, over 29533.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1172, cr_loss=0.3583, attn_decoder_loss=0.2418, over 5793731.67 frames. ], batch size: 87, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 07:01:11,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=614500.0, ans=0.125 2024-09-19 07:01:18,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=614500.0, ans=0.125 2024-09-19 07:01:59,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=614620.0, ans=0.0 2024-09-19 07:02:22,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=614660.0, ans=0.2 2024-09-19 07:02:25,331 INFO [train.py:1198] (0/2) Epoch 34, batch 4350, loss[loss=0.2549, ctc_loss=0.1303, cr_loss=0.3852, attn_decoder_loss=0.2601, over 29510.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1199, cr_loss=0.3639, attn_decoder_loss=0.245, over 5796555.71 frames. ], batch size: 97, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 07:02:28,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.836e+01 9.274e+01 9.839e+01 5.976e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 07:02:42,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2024-09-19 07:02:48,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.58 vs. limit=22.5 2024-09-19 07:02:57,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=614780.0, ans=0.125 2024-09-19 07:03:12,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-09-19 07:03:15,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-19 07:03:19,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614820.0, ans=0.1 2024-09-19 07:03:27,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=22.5 2024-09-19 07:03:38,642 INFO [train.py:1198] (0/2) Epoch 34, batch 4400, loss[loss=0.2302, ctc_loss=0.1032, cr_loss=0.3193, attn_decoder_loss=0.2373, over 27349.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1207, cr_loss=0.3654, attn_decoder_loss=0.2466, over 5767013.94 frames. ], batch size: 124, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 07:03:39,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.02 vs. limit=15.0 2024-09-19 07:03:41,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614900.0, ans=0.1 2024-09-19 07:03:43,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=614900.0, ans=0.0 2024-09-19 07:03:55,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=614940.0, ans=0.125 2024-09-19 07:04:10,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=614980.0, ans=0.0 2024-09-19 07:04:22,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-19 07:04:31,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=615020.0, ans=0.2 2024-09-19 07:04:35,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=615020.0, ans=0.0 2024-09-19 07:04:35,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=615020.0, ans=0.0 2024-09-19 07:04:53,554 INFO [train.py:1198] (0/2) Epoch 34, batch 4450, loss[loss=0.2565, ctc_loss=0.144, cr_loss=0.3897, attn_decoder_loss=0.2603, over 20484.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1245, cr_loss=0.3712, attn_decoder_loss=0.2487, over 5580829.15 frames. ], batch size: 210, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 07:04:55,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=615100.0, ans=0.2 2024-09-19 07:04:56,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.045e+01 9.501e+01 1.052e+02 3.870e+02, threshold=1.900e+02, percent-clipped=1.0 2024-09-19 07:05:05,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=615100.0, ans=0.125 2024-09-19 07:05:25,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=615180.0, ans=0.2 2024-09-19 07:05:49,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=615220.0, ans=0.125 2024-09-19 07:05:50,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615220.0, ans=0.1 2024-09-19 07:05:54,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615260.0, ans=0.125 2024-09-19 07:06:04,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.03 vs. limit=15.0 2024-09-19 07:06:09,419 INFO [train.py:1198] (0/2) Epoch 34, batch 4500, loss[loss=0.2559, ctc_loss=0.1443, cr_loss=0.3699, attn_decoder_loss=0.2601, over 19663.00 frames. ], tot_loss[loss=0.246, ctc_loss=0.1278, cr_loss=0.3739, attn_decoder_loss=0.2508, over 5239247.98 frames. ], batch size: 210, lr: 3.23e-03, grad_scale: 8.0 2024-09-19 07:06:17,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=615300.0, ans=0.025 2024-09-19 07:06:34,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.43 vs. limit=22.5 2024-09-19 07:06:44,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=615380.0, ans=0.125 2024-09-19 07:06:47,043 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-34.pt 2024-09-19 07:07:33,727 INFO [train.py:1198] (0/2) Epoch 35, batch 0, loss[loss=0.2093, ctc_loss=0.09147, cr_loss=0.305, attn_decoder_loss=0.2156, over 29640.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.09147, cr_loss=0.305, attn_decoder_loss=0.2156, over 29640.00 frames. ], batch size: 73, lr: 3.18e-03, grad_scale: 16.0 2024-09-19 07:07:33,728 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 07:07:52,111 INFO [train.py:1230] (0/2) Epoch 35, validation: loss=0.2125, ctc_loss=0.03615, cr_loss=6.293e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-19 07:07:52,112 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 07:07:53,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.68 vs. limit=10.0 2024-09-19 07:08:23,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.75 vs. limit=10.0 2024-09-19 07:08:36,042 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 1.018e+02 1.116e+02 1.176e+02 2.643e+02, threshold=2.232e+02, percent-clipped=1.0 2024-09-19 07:08:59,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=615560.0, ans=0.05 2024-09-19 07:09:09,395 INFO [train.py:1198] (0/2) Epoch 35, batch 50, loss[loss=0.2132, ctc_loss=0.1022, cr_loss=0.326, attn_decoder_loss=0.2183, over 29396.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1212, cr_loss=0.3657, attn_decoder_loss=0.2431, over 1267901.08 frames. ], batch size: 70, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:09:32,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=615640.0, ans=0.0 2024-09-19 07:09:38,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=615680.0, ans=0.05 2024-09-19 07:10:01,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615720.0, ans=0.1 2024-09-19 07:10:15,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-09-19 07:10:23,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=615800.0, ans=0.1 2024-09-19 07:10:25,303 INFO [train.py:1198] (0/2) Epoch 35, batch 100, loss[loss=0.2283, ctc_loss=0.1126, cr_loss=0.3378, attn_decoder_loss=0.2337, over 29516.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1212, cr_loss=0.3653, attn_decoder_loss=0.2441, over 2251406.66 frames. ], batch size: 76, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:10:31,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615800.0, ans=0.125 2024-09-19 07:10:31,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=615800.0, ans=0.2 2024-09-19 07:10:37,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=615800.0, ans=0.09899494936611666 2024-09-19 07:10:40,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=615840.0, ans=0.5 2024-09-19 07:10:40,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=615840.0, ans=0.125 2024-09-19 07:10:43,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=615840.0, ans=0.125 2024-09-19 07:10:52,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=615840.0, ans=15.0 2024-09-19 07:11:11,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.556e+01 9.012e+01 9.778e+01 2.155e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 07:11:23,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=615920.0, ans=0.0 2024-09-19 07:11:24,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2024-09-19 07:11:24,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=615920.0, ans=0.125 2024-09-19 07:11:32,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=615960.0, ans=0.0 2024-09-19 07:11:42,866 INFO [train.py:1198] (0/2) Epoch 35, batch 150, loss[loss=0.2025, ctc_loss=0.08999, cr_loss=0.2981, attn_decoder_loss=0.2084, over 29427.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.118, cr_loss=0.3591, attn_decoder_loss=0.2414, over 3046816.81 frames. ], batch size: 70, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:11:46,326 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:11:59,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=616040.0, ans=0.125 2024-09-19 07:11:59,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=616040.0, ans=0.125 2024-09-19 07:12:15,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=616080.0, ans=0.125 2024-09-19 07:12:24,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=616080.0, ans=0.125 2024-09-19 07:12:26,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=616080.0, ans=0.0 2024-09-19 07:12:30,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=616120.0, ans=0.125 2024-09-19 07:12:32,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=616120.0, ans=0.125 2024-09-19 07:12:35,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=616120.0, ans=0.0 2024-09-19 07:12:42,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=616120.0, ans=0.0 2024-09-19 07:12:56,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=616160.0, ans=0.0 2024-09-19 07:13:00,661 INFO [train.py:1198] (0/2) Epoch 35, batch 200, loss[loss=0.2537, ctc_loss=0.1307, cr_loss=0.3933, attn_decoder_loss=0.2586, over 27197.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.117, cr_loss=0.3579, attn_decoder_loss=0.2405, over 3657373.81 frames. ], batch size: 124, lr: 3.18e-03, grad_scale: 8.0 2024-09-19 07:13:18,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-19 07:13:21,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.43 vs. limit=15.0 2024-09-19 07:13:22,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=616240.0, ans=0.0 2024-09-19 07:13:32,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=616280.0, ans=0.025 2024-09-19 07:13:34,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=616280.0, ans=0.0 2024-09-19 07:13:44,289 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.322e+01 8.803e+01 9.325e+01 1.291e+02, threshold=1.761e+02, percent-clipped=0.0 2024-09-19 07:14:14,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=616400.0, ans=0.125 2024-09-19 07:14:15,873 INFO [train.py:1198] (0/2) Epoch 35, batch 250, loss[loss=0.2429, ctc_loss=0.1166, cr_loss=0.3418, attn_decoder_loss=0.2493, over 29349.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1165, cr_loss=0.3573, attn_decoder_loss=0.2403, over 4139224.24 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:14:41,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=616440.0, ans=0.2 2024-09-19 07:14:46,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=616480.0, ans=0.125 2024-09-19 07:15:03,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=22.5 2024-09-19 07:15:10,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.41 vs. limit=15.0 2024-09-19 07:15:26,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=22.5 2024-09-19 07:15:34,117 INFO [train.py:1198] (0/2) Epoch 35, batch 300, loss[loss=0.2517, ctc_loss=0.1249, cr_loss=0.3711, attn_decoder_loss=0.2576, over 29516.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1165, cr_loss=0.3573, attn_decoder_loss=0.2403, over 4506829.44 frames. ], batch size: 92, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:15:53,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616640.0, ans=0.1 2024-09-19 07:16:04,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=616640.0, ans=0.0 2024-09-19 07:16:13,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=616680.0, ans=0.125 2024-09-19 07:16:20,346 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.384e+01 8.991e+01 9.743e+01 6.934e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 07:16:34,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=616720.0, ans=0.125 2024-09-19 07:16:46,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616760.0, ans=0.1 2024-09-19 07:16:49,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=616760.0, ans=0.125 2024-09-19 07:16:52,561 INFO [train.py:1198] (0/2) Epoch 35, batch 350, loss[loss=0.2117, ctc_loss=0.09936, cr_loss=0.3125, attn_decoder_loss=0.2172, over 29315.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.117, cr_loss=0.3585, attn_decoder_loss=0.2411, over 4792408.53 frames. ], batch size: 71, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:17:08,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2024-09-19 07:17:10,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616840.0, ans=0.1 2024-09-19 07:17:21,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=616880.0, ans=0.2 2024-09-19 07:17:33,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=616880.0, ans=0.0 2024-09-19 07:17:45,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=616920.0, ans=0.2 2024-09-19 07:17:45,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=616920.0, ans=0.0 2024-09-19 07:17:49,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=616920.0, ans=0.0 2024-09-19 07:18:04,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-09-19 07:18:07,763 INFO [train.py:1198] (0/2) Epoch 35, batch 400, loss[loss=0.2272, ctc_loss=0.1069, cr_loss=0.3197, attn_decoder_loss=0.2334, over 29699.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1165, cr_loss=0.3574, attn_decoder_loss=0.2407, over 5023166.50 frames. ], batch size: 82, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:18:16,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-19 07:18:35,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=617040.0, ans=0.125 2024-09-19 07:18:41,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=617080.0, ans=0.0 2024-09-19 07:18:41,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=617080.0, ans=0.125 2024-09-19 07:18:54,454 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.637e+01 9.137e+01 9.905e+01 1.373e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 07:19:11,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=617160.0, ans=0.125 2024-09-19 07:19:26,444 INFO [train.py:1198] (0/2) Epoch 35, batch 450, loss[loss=0.2406, ctc_loss=0.1197, cr_loss=0.3618, attn_decoder_loss=0.246, over 29690.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3595, attn_decoder_loss=0.2412, over 5184796.81 frames. ], batch size: 83, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:19:29,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=617200.0, ans=0.2 2024-09-19 07:19:42,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=617240.0, ans=0.125 2024-09-19 07:19:53,801 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.09 vs. limit=12.0 2024-09-19 07:20:05,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=617280.0, ans=0.125 2024-09-19 07:20:28,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=617360.0, ans=0.025 2024-09-19 07:20:44,374 INFO [train.py:1198] (0/2) Epoch 35, batch 500, loss[loss=0.257, ctc_loss=0.1344, cr_loss=0.3778, attn_decoder_loss=0.2623, over 29380.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1167, cr_loss=0.3577, attn_decoder_loss=0.2404, over 5328749.36 frames. ], batch size: 94, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:20:58,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-19 07:21:01,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=617440.0, ans=0.05 2024-09-19 07:21:05,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=617440.0, ans=0.125 2024-09-19 07:21:05,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=617440.0, ans=0.2 2024-09-19 07:21:27,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2024-09-19 07:21:29,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.461e+01 8.901e+01 9.576e+01 2.460e+02, threshold=1.780e+02, percent-clipped=1.0 2024-09-19 07:21:38,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.67 vs. limit=15.0 2024-09-19 07:21:39,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-09-19 07:22:00,512 INFO [train.py:1198] (0/2) Epoch 35, batch 550, loss[loss=0.2571, ctc_loss=0.1261, cr_loss=0.3856, attn_decoder_loss=0.2631, over 28808.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1165, cr_loss=0.3575, attn_decoder_loss=0.2405, over 5421776.48 frames. ], batch size: 104, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:22:02,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.41 vs. limit=12.0 2024-09-19 07:22:06,942 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:22:54,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617720.0, ans=0.1 2024-09-19 07:23:08,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=617760.0, ans=0.2 2024-09-19 07:23:12,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=12.0 2024-09-19 07:23:18,917 INFO [train.py:1198] (0/2) Epoch 35, batch 600, loss[loss=0.258, ctc_loss=0.1308, cr_loss=0.3866, attn_decoder_loss=0.2635, over 29267.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1169, cr_loss=0.3583, attn_decoder_loss=0.2409, over 5509462.41 frames. ], batch size: 100, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:23:23,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=617800.0, ans=0.0 2024-09-19 07:23:26,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=617800.0, ans=0.09899494936611666 2024-09-19 07:23:56,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-19 07:24:06,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.466e+01 8.823e+01 9.402e+01 3.791e+02, threshold=1.765e+02, percent-clipped=1.0 2024-09-19 07:24:13,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=617920.0, ans=0.125 2024-09-19 07:24:31,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=617960.0, ans=0.0 2024-09-19 07:24:35,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=618000.0, ans=0.0 2024-09-19 07:24:36,197 INFO [train.py:1198] (0/2) Epoch 35, batch 650, loss[loss=0.2336, ctc_loss=0.1068, cr_loss=0.35, attn_decoder_loss=0.24, over 29752.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1159, cr_loss=0.3562, attn_decoder_loss=0.24, over 5586305.07 frames. ], batch size: 81, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:24:51,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=618040.0, ans=0.125 2024-09-19 07:24:54,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=618040.0, ans=0.125 2024-09-19 07:25:19,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-09-19 07:25:35,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=618160.0, ans=0.125 2024-09-19 07:25:52,026 INFO [train.py:1198] (0/2) Epoch 35, batch 700, loss[loss=0.2242, ctc_loss=0.1079, cr_loss=0.3439, attn_decoder_loss=0.2295, over 29539.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1162, cr_loss=0.3567, attn_decoder_loss=0.2404, over 5636100.80 frames. ], batch size: 76, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:26:24,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.01 vs. limit=22.5 2024-09-19 07:26:28,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=618280.0, ans=0.0 2024-09-19 07:26:30,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2024-09-19 07:26:37,261 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.406e+01 8.899e+01 9.421e+01 1.331e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 07:26:42,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=618320.0, ans=0.09899494936611666 2024-09-19 07:26:43,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=618320.0, ans=0.125 2024-09-19 07:27:10,382 INFO [train.py:1198] (0/2) Epoch 35, batch 750, loss[loss=0.2346, ctc_loss=0.1132, cr_loss=0.3481, attn_decoder_loss=0.2404, over 29680.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.116, cr_loss=0.3563, attn_decoder_loss=0.2401, over 5676190.05 frames. ], batch size: 82, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:27:55,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=618480.0, ans=0.125 2024-09-19 07:28:00,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=618520.0, ans=0.2 2024-09-19 07:28:04,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-19 07:28:28,642 INFO [train.py:1198] (0/2) Epoch 35, batch 800, loss[loss=0.21, ctc_loss=0.09416, cr_loss=0.2933, attn_decoder_loss=0.2163, over 29600.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1161, cr_loss=0.3563, attn_decoder_loss=0.2401, over 5706988.89 frames. ], batch size: 73, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:28:48,594 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:28:52,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=618640.0, ans=0.125 2024-09-19 07:29:07,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=618680.0, ans=0.07 2024-09-19 07:29:15,165 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 8.586e+01 8.985e+01 9.600e+01 2.003e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 07:29:24,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-09-19 07:29:36,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=618760.0, ans=0.125 2024-09-19 07:29:36,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=618760.0, ans=0.125 2024-09-19 07:29:42,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618800.0, ans=0.1 2024-09-19 07:29:43,482 INFO [train.py:1198] (0/2) Epoch 35, batch 850, loss[loss=0.2461, ctc_loss=0.1245, cr_loss=0.3656, attn_decoder_loss=0.2515, over 29705.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1158, cr_loss=0.3566, attn_decoder_loss=0.24, over 5736245.31 frames. ], batch size: 89, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:29:48,317 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:29:55,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=618800.0, ans=0.0 2024-09-19 07:30:17,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=618880.0, ans=0.025 2024-09-19 07:30:17,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-09-19 07:30:22,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-09-19 07:30:36,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=618920.0, ans=0.125 2024-09-19 07:30:38,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=618920.0, ans=0.0 2024-09-19 07:30:40,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2024-09-19 07:30:53,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=618960.0, ans=0.125 2024-09-19 07:31:00,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-19 07:31:01,541 INFO [train.py:1198] (0/2) Epoch 35, batch 900, loss[loss=0.2177, ctc_loss=0.1036, cr_loss=0.3257, attn_decoder_loss=0.2231, over 29660.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.116, cr_loss=0.3568, attn_decoder_loss=0.2402, over 5741359.32 frames. ], batch size: 73, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:31:01,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=619000.0, ans=0.125 2024-09-19 07:31:13,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=619000.0, ans=0.125 2024-09-19 07:31:37,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.15 vs. limit=10.0 2024-09-19 07:31:49,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=619120.0, ans=0.0 2024-09-19 07:31:50,717 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.514e+01 9.190e+01 1.005e+02 2.448e+02, threshold=1.838e+02, percent-clipped=2.0 2024-09-19 07:31:54,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=619120.0, ans=0.1 2024-09-19 07:31:56,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=619120.0, ans=0.015 2024-09-19 07:32:01,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-19 07:32:04,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619160.0, ans=0.1 2024-09-19 07:32:19,735 INFO [train.py:1198] (0/2) Epoch 35, batch 950, loss[loss=0.2153, ctc_loss=0.09471, cr_loss=0.3189, attn_decoder_loss=0.2216, over 29507.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1159, cr_loss=0.3564, attn_decoder_loss=0.2404, over 5743112.27 frames. ], batch size: 74, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:33:16,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=619320.0, ans=6.0 2024-09-19 07:33:35,506 INFO [train.py:1198] (0/2) Epoch 35, batch 1000, loss[loss=0.2237, ctc_loss=0.09942, cr_loss=0.3284, attn_decoder_loss=0.2303, over 29484.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1167, cr_loss=0.358, attn_decoder_loss=0.2413, over 5737116.11 frames. ], batch size: 77, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:33:40,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=619400.0, ans=0.0 2024-09-19 07:33:43,400 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:33:58,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=619440.0, ans=0.125 2024-09-19 07:34:04,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=619480.0, ans=0.0 2024-09-19 07:34:06,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=619480.0, ans=0.125 2024-09-19 07:34:22,839 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.400e+01 8.920e+01 9.804e+01 1.524e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 07:34:32,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=619520.0, ans=0.1 2024-09-19 07:34:42,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619560.0, ans=0.1 2024-09-19 07:34:53,686 INFO [train.py:1198] (0/2) Epoch 35, batch 1050, loss[loss=0.2457, ctc_loss=0.1294, cr_loss=0.3821, attn_decoder_loss=0.2501, over 29686.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1167, cr_loss=0.358, attn_decoder_loss=0.2408, over 5745451.70 frames. ], batch size: 85, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:34:56,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=22.5 2024-09-19 07:34:57,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=619600.0, ans=0.0 2024-09-19 07:34:58,498 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:35:00,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=22.5 2024-09-19 07:35:03,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-09-19 07:35:05,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.29 vs. limit=15.0 2024-09-19 07:35:41,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=619720.0, ans=0.2 2024-09-19 07:35:55,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=619760.0, ans=0.0 2024-09-19 07:36:06,622 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.10 vs. limit=15.0 2024-09-19 07:36:11,770 INFO [train.py:1198] (0/2) Epoch 35, batch 1100, loss[loss=0.2307, ctc_loss=0.1204, cr_loss=0.361, attn_decoder_loss=0.235, over 29429.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1167, cr_loss=0.358, attn_decoder_loss=0.2406, over 5756866.30 frames. ], batch size: 78, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:36:13,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=619800.0, ans=0.025 2024-09-19 07:36:32,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2024-09-19 07:36:39,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=619840.0, ans=0.2 2024-09-19 07:36:45,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=619880.0, ans=0.125 2024-09-19 07:36:59,136 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.442e+01 8.888e+01 9.490e+01 5.357e+02, threshold=1.778e+02, percent-clipped=1.0 2024-09-19 07:36:59,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=619920.0, ans=0.125 2024-09-19 07:36:59,602 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:37:06,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=619920.0, ans=0.125 2024-09-19 07:37:14,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=619960.0, ans=0.125 2024-09-19 07:37:22,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=619960.0, ans=0.0 2024-09-19 07:37:28,427 INFO [train.py:1198] (0/2) Epoch 35, batch 1150, loss[loss=0.2226, ctc_loss=0.1049, cr_loss=0.3183, attn_decoder_loss=0.2285, over 29463.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1172, cr_loss=0.3588, attn_decoder_loss=0.2409, over 5756046.34 frames. ], batch size: 78, lr: 3.17e-03, grad_scale: 8.0 2024-09-19 07:37:53,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620040.0, ans=0.1 2024-09-19 07:38:12,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.20 vs. limit=15.0 2024-09-19 07:38:19,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=620120.0, ans=0.125 2024-09-19 07:38:38,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=620160.0, ans=0.0 2024-09-19 07:38:46,890 INFO [train.py:1198] (0/2) Epoch 35, batch 1200, loss[loss=0.2441, ctc_loss=0.122, cr_loss=0.3802, attn_decoder_loss=0.2492, over 29666.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1174, cr_loss=0.3592, attn_decoder_loss=0.2414, over 5747522.59 frames. ], batch size: 85, lr: 3.17e-03, grad_scale: 16.0 2024-09-19 07:39:06,340 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-19 07:39:19,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=620280.0, ans=0.2 2024-09-19 07:39:24,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=620280.0, ans=0.125 2024-09-19 07:39:31,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=620280.0, ans=0.0 2024-09-19 07:39:33,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.05 vs. limit=10.0 2024-09-19 07:39:35,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-09-19 07:39:35,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.532e+01 9.165e+01 9.750e+01 1.443e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-19 07:39:36,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=620320.0, ans=0.2 2024-09-19 07:39:42,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=620320.0, ans=0.125 2024-09-19 07:39:44,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=12.0 2024-09-19 07:39:55,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=620360.0, ans=0.125 2024-09-19 07:39:57,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=620360.0, ans=0.025 2024-09-19 07:39:58,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=620360.0, ans=0.2 2024-09-19 07:40:04,452 INFO [train.py:1198] (0/2) Epoch 35, batch 1250, loss[loss=0.2538, ctc_loss=0.1296, cr_loss=0.3802, attn_decoder_loss=0.2592, over 29546.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1176, cr_loss=0.3596, attn_decoder_loss=0.2419, over 5775146.74 frames. ], batch size: 92, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:40:09,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=620400.0, ans=0.125 2024-09-19 07:40:12,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=620400.0, ans=0.0 2024-09-19 07:40:20,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620440.0, ans=0.125 2024-09-19 07:40:24,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620440.0, ans=0.1 2024-09-19 07:40:24,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620440.0, ans=0.125 2024-09-19 07:40:29,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620440.0, ans=0.1 2024-09-19 07:40:39,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=620480.0, ans=0.2 2024-09-19 07:40:50,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620520.0, ans=0.0 2024-09-19 07:40:50,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=620520.0, ans=0.125 2024-09-19 07:40:53,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=620520.0, ans=0.125 2024-09-19 07:41:10,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620560.0, ans=0.125 2024-09-19 07:41:17,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=620560.0, ans=0.025 2024-09-19 07:41:20,154 INFO [train.py:1198] (0/2) Epoch 35, batch 1300, loss[loss=0.2448, ctc_loss=0.1133, cr_loss=0.3629, attn_decoder_loss=0.2514, over 28132.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1167, cr_loss=0.3578, attn_decoder_loss=0.241, over 5777774.59 frames. ], batch size: 111, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:41:38,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620640.0, ans=0.125 2024-09-19 07:41:49,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=620680.0, ans=0.2 2024-09-19 07:41:52,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=620680.0, ans=0.125 2024-09-19 07:41:57,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=620680.0, ans=0.0 2024-09-19 07:42:01,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=620680.0, ans=0.0 2024-09-19 07:42:08,927 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.419e+01 8.887e+01 9.525e+01 1.443e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-19 07:42:36,595 INFO [train.py:1198] (0/2) Epoch 35, batch 1350, loss[loss=0.225, ctc_loss=0.1003, cr_loss=0.3383, attn_decoder_loss=0.2314, over 29768.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.116, cr_loss=0.3567, attn_decoder_loss=0.2407, over 5794844.72 frames. ], batch size: 81, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:42:37,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.01 vs. limit=6.0 2024-09-19 07:42:41,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.34 vs. limit=12.0 2024-09-19 07:42:52,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=620840.0, ans=0.125 2024-09-19 07:42:55,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=620840.0, ans=0.0 2024-09-19 07:43:05,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=12.0 2024-09-19 07:43:17,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=620880.0, ans=0.125 2024-09-19 07:43:20,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=620880.0, ans=0.0 2024-09-19 07:43:21,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=620880.0, ans=0.0 2024-09-19 07:43:21,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=620880.0, ans=0.125 2024-09-19 07:43:56,461 INFO [train.py:1198] (0/2) Epoch 35, batch 1400, loss[loss=0.2146, ctc_loss=0.107, cr_loss=0.3578, attn_decoder_loss=0.2186, over 29594.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.3567, attn_decoder_loss=0.2404, over 5806157.07 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:44:04,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=621000.0, ans=0.125 2024-09-19 07:44:08,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=621000.0, ans=10.0 2024-09-19 07:44:31,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621080.0, ans=0.1 2024-09-19 07:44:44,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.443e+01 9.009e+01 9.628e+01 2.334e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 07:45:11,970 INFO [train.py:1198] (0/2) Epoch 35, batch 1450, loss[loss=0.2565, ctc_loss=0.1319, cr_loss=0.3868, attn_decoder_loss=0.2617, over 29446.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1163, cr_loss=0.3571, attn_decoder_loss=0.2408, over 5803154.90 frames. ], batch size: 94, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:45:12,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=621200.0, ans=0.2 2024-09-19 07:45:39,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=621240.0, ans=0.0 2024-09-19 07:45:48,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621280.0, ans=0.1 2024-09-19 07:45:51,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=621280.0, ans=0.2 2024-09-19 07:45:56,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=621320.0, ans=0.125 2024-09-19 07:45:58,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-09-19 07:46:00,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=621320.0, ans=0.04949747468305833 2024-09-19 07:46:02,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=621320.0, ans=0.2 2024-09-19 07:46:09,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=621320.0, ans=0.0 2024-09-19 07:46:23,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=621360.0, ans=0.2 2024-09-19 07:46:27,761 INFO [train.py:1198] (0/2) Epoch 35, batch 1500, loss[loss=0.2392, ctc_loss=0.1172, cr_loss=0.3614, attn_decoder_loss=0.2447, over 29610.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1162, cr_loss=0.3568, attn_decoder_loss=0.241, over 5805104.22 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:46:32,705 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:46:34,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=621400.0, ans=0.0 2024-09-19 07:46:55,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=621440.0, ans=0.0 2024-09-19 07:47:07,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=621480.0, ans=0.125 2024-09-19 07:47:16,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621520.0, ans=0.1 2024-09-19 07:47:20,907 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.458e+01 9.148e+01 9.758e+01 1.676e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-19 07:47:24,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.79 vs. limit=15.0 2024-09-19 07:47:39,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=621560.0, ans=0.125 2024-09-19 07:47:48,565 INFO [train.py:1198] (0/2) Epoch 35, batch 1550, loss[loss=0.2565, ctc_loss=0.1302, cr_loss=0.3954, attn_decoder_loss=0.2618, over 29530.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1165, cr_loss=0.3577, attn_decoder_loss=0.2412, over 5780795.35 frames. ], batch size: 90, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:47:59,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-09-19 07:48:05,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=621640.0, ans=0.125 2024-09-19 07:48:14,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621640.0, ans=0.1 2024-09-19 07:48:19,044 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:48:34,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=621720.0, ans=0.0 2024-09-19 07:48:43,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=621720.0, ans=0.0 2024-09-19 07:48:43,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=621720.0, ans=0.025 2024-09-19 07:48:50,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=621760.0, ans=0.2 2024-09-19 07:48:58,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=621760.0, ans=0.1 2024-09-19 07:49:04,113 INFO [train.py:1198] (0/2) Epoch 35, batch 1600, loss[loss=0.2411, ctc_loss=0.1161, cr_loss=0.362, attn_decoder_loss=0.247, over 29681.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1163, cr_loss=0.357, attn_decoder_loss=0.2409, over 5763323.91 frames. ], batch size: 85, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:49:18,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2024-09-19 07:49:25,632 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:49:52,911 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.463e+01 9.203e+01 9.882e+01 2.471e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-19 07:50:00,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=621920.0, ans=0.0 2024-09-19 07:50:11,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=621960.0, ans=0.125 2024-09-19 07:50:11,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2024-09-19 07:50:20,046 INFO [train.py:1198] (0/2) Epoch 35, batch 1650, loss[loss=0.2456, ctc_loss=0.1209, cr_loss=0.3626, attn_decoder_loss=0.2514, over 29671.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1163, cr_loss=0.357, attn_decoder_loss=0.2408, over 5756845.63 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:50:30,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.18 vs. limit=15.0 2024-09-19 07:50:52,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=622080.0, ans=0.125 2024-09-19 07:51:00,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=622080.0, ans=0.0 2024-09-19 07:51:21,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=622120.0, ans=0.125 2024-09-19 07:51:38,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622200.0, ans=0.1 2024-09-19 07:51:39,322 INFO [train.py:1198] (0/2) Epoch 35, batch 1700, loss[loss=0.2086, ctc_loss=0.09704, cr_loss=0.3128, attn_decoder_loss=0.2141, over 29574.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1158, cr_loss=0.3567, attn_decoder_loss=0.2406, over 5780122.51 frames. ], batch size: 69, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:51:51,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=622200.0, ans=0.125 2024-09-19 07:52:04,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=622240.0, ans=0.2 2024-09-19 07:52:10,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=622280.0, ans=0.2 2024-09-19 07:52:11,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=622280.0, ans=0.0 2024-09-19 07:52:17,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622280.0, ans=0.125 2024-09-19 07:52:27,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.456e+01 8.908e+01 9.428e+01 1.294e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 07:52:31,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622320.0, ans=0.1 2024-09-19 07:52:42,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2024-09-19 07:52:54,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622400.0, ans=0.125 2024-09-19 07:52:55,735 INFO [train.py:1198] (0/2) Epoch 35, batch 1750, loss[loss=0.2075, ctc_loss=0.09858, cr_loss=0.3154, attn_decoder_loss=0.2126, over 29362.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.116, cr_loss=0.3571, attn_decoder_loss=0.2406, over 5786938.76 frames. ], batch size: 67, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 07:53:10,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.65 vs. limit=22.5 2024-09-19 07:53:11,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-19 07:53:15,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=622440.0, ans=0.025 2024-09-19 07:53:23,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=622440.0, ans=0.0 2024-09-19 07:53:25,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=622480.0, ans=0.125 2024-09-19 07:53:30,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=22.5 2024-09-19 07:53:36,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=622480.0, ans=0.0 2024-09-19 07:54:03,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-09-19 07:54:04,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622560.0, ans=0.125 2024-09-19 07:54:04,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=622560.0, ans=0.125 2024-09-19 07:54:11,191 INFO [train.py:1198] (0/2) Epoch 35, batch 1800, loss[loss=0.2484, ctc_loss=0.1231, cr_loss=0.3653, attn_decoder_loss=0.2542, over 29694.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1162, cr_loss=0.3573, attn_decoder_loss=0.2407, over 5789053.58 frames. ], batch size: 83, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:54:20,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=622600.0, ans=0.0 2024-09-19 07:54:33,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=622640.0, ans=0.125 2024-09-19 07:55:05,583 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.311e+01 8.892e+01 9.552e+01 1.638e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 07:55:14,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=622760.0, ans=0.0 2024-09-19 07:55:16,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=622760.0, ans=0.125 2024-09-19 07:55:26,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=622760.0, ans=0.0 2024-09-19 07:55:30,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622800.0, ans=0.125 2024-09-19 07:55:31,265 INFO [train.py:1198] (0/2) Epoch 35, batch 1850, loss[loss=0.2389, ctc_loss=0.1204, cr_loss=0.3687, attn_decoder_loss=0.2439, over 29614.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1163, cr_loss=0.3578, attn_decoder_loss=0.2406, over 5795303.17 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:55:33,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=622800.0, ans=0.125 2024-09-19 07:55:36,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=622800.0, ans=0.0 2024-09-19 07:55:36,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.53 vs. limit=12.0 2024-09-19 07:55:48,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=622840.0, ans=0.025 2024-09-19 07:56:19,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622920.0, ans=0.1 2024-09-19 07:56:33,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=622960.0, ans=0.0 2024-09-19 07:56:35,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=622960.0, ans=0.2 2024-09-19 07:56:42,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622960.0, ans=0.1 2024-09-19 07:56:46,598 INFO [train.py:1198] (0/2) Epoch 35, batch 1900, loss[loss=0.2439, ctc_loss=0.1196, cr_loss=0.3757, attn_decoder_loss=0.2493, over 29740.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1164, cr_loss=0.3574, attn_decoder_loss=0.2408, over 5802913.74 frames. ], batch size: 89, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:56:52,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.71 vs. limit=10.0 2024-09-19 07:57:13,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=22.5 2024-09-19 07:57:15,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=623080.0, ans=0.125 2024-09-19 07:57:24,840 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:57:35,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=623120.0, ans=0.015 2024-09-19 07:57:36,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.574e+01 9.165e+01 9.579e+01 2.044e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-19 07:57:42,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623120.0, ans=0.125 2024-09-19 07:57:43,337 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.37 vs. limit=10.0 2024-09-19 07:58:01,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=623200.0, ans=0.125 2024-09-19 07:58:02,546 INFO [train.py:1198] (0/2) Epoch 35, batch 1950, loss[loss=0.2344, ctc_loss=0.1112, cr_loss=0.3402, attn_decoder_loss=0.2406, over 29471.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1176, cr_loss=0.36, attn_decoder_loss=0.2423, over 5817677.05 frames. ], batch size: 78, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 07:58:21,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=623240.0, ans=0.0 2024-09-19 07:58:30,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=623240.0, ans=0.125 2024-09-19 07:58:34,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=623240.0, ans=0.1 2024-09-19 07:58:47,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=623280.0, ans=0.025 2024-09-19 07:58:52,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=623320.0, ans=0.0 2024-09-19 07:58:53,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.29 vs. limit=22.5 2024-09-19 07:58:55,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=623320.0, ans=0.2 2024-09-19 07:59:18,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=623360.0, ans=0.125 2024-09-19 07:59:22,127 INFO [train.py:1198] (0/2) Epoch 35, batch 2000, loss[loss=0.2085, ctc_loss=0.09548, cr_loss=0.3172, attn_decoder_loss=0.214, over 29370.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1177, cr_loss=0.3601, attn_decoder_loss=0.2425, over 5794212.54 frames. ], batch size: 67, lr: 3.16e-03, grad_scale: 16.0 2024-09-19 08:00:10,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=623520.0, ans=0.125 2024-09-19 08:00:13,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.517e+01 9.042e+01 9.652e+01 2.863e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 08:00:15,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=623520.0, ans=0.125 2024-09-19 08:00:37,512 INFO [train.py:1198] (0/2) Epoch 35, batch 2050, loss[loss=0.2132, ctc_loss=0.09848, cr_loss=0.3221, attn_decoder_loss=0.2188, over 29433.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1177, cr_loss=0.3602, attn_decoder_loss=0.2419, over 5786264.82 frames. ], batch size: 70, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:00:48,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=623600.0, ans=0.0 2024-09-19 08:01:32,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=623720.0, ans=0.2 2024-09-19 08:01:35,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=623720.0, ans=0.025 2024-09-19 08:01:36,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2024-09-19 08:01:53,588 INFO [train.py:1198] (0/2) Epoch 35, batch 2100, loss[loss=0.235, ctc_loss=0.1156, cr_loss=0.3579, attn_decoder_loss=0.2404, over 29798.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1172, cr_loss=0.359, attn_decoder_loss=0.2412, over 5799598.43 frames. ], batch size: 81, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:02:04,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=623800.0, ans=0.2 2024-09-19 08:02:08,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623840.0, ans=0.125 2024-09-19 08:02:18,868 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:02:49,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.509e+01 9.008e+01 9.603e+01 1.299e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 08:02:55,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=623920.0, ans=0.125 2024-09-19 08:03:06,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=623960.0, ans=0.125 2024-09-19 08:03:12,216 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-156000.pt 2024-09-19 08:03:20,893 INFO [train.py:1198] (0/2) Epoch 35, batch 2150, loss[loss=0.2306, ctc_loss=0.1162, cr_loss=0.3582, attn_decoder_loss=0.2353, over 29441.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1168, cr_loss=0.3582, attn_decoder_loss=0.2407, over 5813687.80 frames. ], batch size: 78, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:03:21,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624000.0, ans=0.1 2024-09-19 08:03:24,876 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=12.0 2024-09-19 08:03:47,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=624040.0, ans=0.0 2024-09-19 08:04:00,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=624080.0, ans=0.2 2024-09-19 08:04:03,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=624080.0, ans=0.0 2024-09-19 08:04:35,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=624200.0, ans=0.0 2024-09-19 08:04:36,522 INFO [train.py:1198] (0/2) Epoch 35, batch 2200, loss[loss=0.2334, ctc_loss=0.1097, cr_loss=0.3362, attn_decoder_loss=0.2397, over 29648.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1172, cr_loss=0.359, attn_decoder_loss=0.241, over 5810436.23 frames. ], batch size: 86, lr: 3.16e-03, grad_scale: 8.0 2024-09-19 08:04:56,252 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:04:58,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-19 08:05:26,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=624320.0, ans=15.0 2024-09-19 08:05:27,602 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.597e+01 9.109e+01 9.743e+01 2.251e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 08:05:32,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=624320.0, ans=0.05 2024-09-19 08:05:35,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=624360.0, ans=0.0 2024-09-19 08:05:41,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=624360.0, ans=0.125 2024-09-19 08:05:41,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-19 08:05:47,554 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:05:49,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=624360.0, ans=0.125 2024-09-19 08:05:51,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2024-09-19 08:05:51,831 INFO [train.py:1198] (0/2) Epoch 35, batch 2250, loss[loss=0.2388, ctc_loss=0.1179, cr_loss=0.358, attn_decoder_loss=0.2443, over 29717.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1169, cr_loss=0.3586, attn_decoder_loss=0.2409, over 5810622.40 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:07:09,255 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.29 vs. limit=22.5 2024-09-19 08:07:10,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-19 08:07:11,523 INFO [train.py:1198] (0/2) Epoch 35, batch 2300, loss[loss=0.2149, ctc_loss=0.1006, cr_loss=0.3184, attn_decoder_loss=0.2205, over 29336.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1163, cr_loss=0.357, attn_decoder_loss=0.2402, over 5797762.68 frames. ], batch size: 71, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:07:15,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-09-19 08:07:35,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=624640.0, ans=0.125 2024-09-19 08:07:38,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-19 08:07:44,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-09-19 08:07:48,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=624680.0, ans=0.125 2024-09-19 08:07:54,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=624680.0, ans=0.2 2024-09-19 08:08:02,741 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.578e+01 9.084e+01 9.791e+01 1.309e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 08:08:08,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.06 vs. limit=22.5 2024-09-19 08:08:24,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=624760.0, ans=0.07 2024-09-19 08:08:27,452 INFO [train.py:1198] (0/2) Epoch 35, batch 2350, loss[loss=0.2448, ctc_loss=0.1271, cr_loss=0.3742, attn_decoder_loss=0.2496, over 29684.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1162, cr_loss=0.3569, attn_decoder_loss=0.2402, over 5803954.82 frames. ], batch size: 83, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:08:30,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=624800.0, ans=0.125 2024-09-19 08:08:35,233 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:09:42,850 INFO [train.py:1198] (0/2) Epoch 35, batch 2400, loss[loss=0.224, ctc_loss=0.1071, cr_loss=0.3412, attn_decoder_loss=0.2295, over 29544.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1165, cr_loss=0.3576, attn_decoder_loss=0.2407, over 5808400.93 frames. ], batch size: 76, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:09:47,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=625000.0, ans=0.025 2024-09-19 08:10:11,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=625040.0, ans=0.02 2024-09-19 08:10:12,705 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:10:14,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=625080.0, ans=0.5 2024-09-19 08:10:24,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=625080.0, ans=0.125 2024-09-19 08:10:38,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 8.636e+01 9.212e+01 9.895e+01 1.857e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-19 08:11:02,884 INFO [train.py:1198] (0/2) Epoch 35, batch 2450, loss[loss=0.2385, ctc_loss=0.1173, cr_loss=0.3574, attn_decoder_loss=0.2441, over 29745.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1173, cr_loss=0.359, attn_decoder_loss=0.2416, over 5783057.41 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:11:03,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=625200.0, ans=0.025 2024-09-19 08:11:15,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2024-09-19 08:11:21,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=625240.0, ans=0.0 2024-09-19 08:11:39,628 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:11:48,770 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:12:01,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625320.0, ans=0.1 2024-09-19 08:12:03,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-09-19 08:12:11,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=625360.0, ans=0.95 2024-09-19 08:12:18,949 INFO [train.py:1198] (0/2) Epoch 35, batch 2500, loss[loss=0.244, ctc_loss=0.1196, cr_loss=0.3594, attn_decoder_loss=0.2498, over 29644.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1171, cr_loss=0.3589, attn_decoder_loss=0.2415, over 5794375.72 frames. ], batch size: 86, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:12:29,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=625400.0, ans=0.125 2024-09-19 08:12:45,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.69 vs. limit=22.5 2024-09-19 08:13:10,578 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.524e+01 8.980e+01 9.425e+01 1.614e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-19 08:13:12,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625520.0, ans=0.1 2024-09-19 08:13:18,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=625560.0, ans=0.1 2024-09-19 08:13:20,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.20 vs. limit=6.0 2024-09-19 08:13:23,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=625560.0, ans=0.125 2024-09-19 08:13:35,396 INFO [train.py:1198] (0/2) Epoch 35, batch 2550, loss[loss=0.2036, ctc_loss=0.08984, cr_loss=0.2944, attn_decoder_loss=0.2097, over 29304.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3589, attn_decoder_loss=0.2412, over 5797334.09 frames. ], batch size: 67, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:13:57,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=625640.0, ans=0.0 2024-09-19 08:13:59,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-19 08:14:12,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=625680.0, ans=0.125 2024-09-19 08:14:13,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-19 08:14:21,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625720.0, ans=0.125 2024-09-19 08:14:55,516 INFO [train.py:1198] (0/2) Epoch 35, batch 2600, loss[loss=0.222, ctc_loss=0.1091, cr_loss=0.3437, attn_decoder_loss=0.2269, over 29456.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1172, cr_loss=0.3587, attn_decoder_loss=0.2414, over 5794593.41 frames. ], batch size: 78, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:14:55,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=625800.0, ans=0.125 2024-09-19 08:14:57,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=625800.0, ans=0.125 2024-09-19 08:15:24,596 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-19 08:15:33,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=625880.0, ans=0.0 2024-09-19 08:15:46,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.594e+01 9.058e+01 9.611e+01 1.555e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-19 08:15:49,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=625920.0, ans=0.125 2024-09-19 08:15:51,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=625920.0, ans=0.07 2024-09-19 08:15:52,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2024-09-19 08:16:10,548 INFO [train.py:1198] (0/2) Epoch 35, batch 2650, loss[loss=0.2541, ctc_loss=0.1397, cr_loss=0.4159, attn_decoder_loss=0.2576, over 29229.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1174, cr_loss=0.3591, attn_decoder_loss=0.2415, over 5801171.74 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:16:19,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=626000.0, ans=0.0 2024-09-19 08:16:35,144 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:17:25,667 INFO [train.py:1198] (0/2) Epoch 35, batch 2700, loss[loss=0.2404, ctc_loss=0.1122, cr_loss=0.3426, attn_decoder_loss=0.247, over 29513.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1176, cr_loss=0.3594, attn_decoder_loss=0.2417, over 5796425.67 frames. ], batch size: 87, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:17:49,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626240.0, ans=0.125 2024-09-19 08:18:05,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626280.0, ans=0.1 2024-09-19 08:18:20,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.428e+01 9.037e+01 9.618e+01 3.244e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-19 08:18:22,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=626320.0, ans=0.125 2024-09-19 08:18:22,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=626320.0, ans=0.025 2024-09-19 08:18:35,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=626360.0, ans=0.125 2024-09-19 08:18:37,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=626360.0, ans=0.0 2024-09-19 08:18:46,429 INFO [train.py:1198] (0/2) Epoch 35, batch 2750, loss[loss=0.2283, ctc_loss=0.1173, cr_loss=0.3447, attn_decoder_loss=0.233, over 29505.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1169, cr_loss=0.3581, attn_decoder_loss=0.2408, over 5795058.38 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:18:55,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=626400.0, ans=0.025 2024-09-19 08:18:58,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=626400.0, ans=0.1 2024-09-19 08:19:01,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=626440.0, ans=0.125 2024-09-19 08:19:07,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=626440.0, ans=0.125 2024-09-19 08:19:09,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=626440.0, ans=0.2 2024-09-19 08:19:25,935 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:19:25,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=626480.0, ans=0.05 2024-09-19 08:19:27,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.04 vs. limit=22.5 2024-09-19 08:19:29,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=12.0 2024-09-19 08:19:38,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=626520.0, ans=0.125 2024-09-19 08:19:50,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=626560.0, ans=0.0 2024-09-19 08:20:02,182 INFO [train.py:1198] (0/2) Epoch 35, batch 2800, loss[loss=0.2545, ctc_loss=0.1338, cr_loss=0.3813, attn_decoder_loss=0.2594, over 20666.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1171, cr_loss=0.3585, attn_decoder_loss=0.2409, over 5775572.52 frames. ], batch size: 209, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:20:35,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=626680.0, ans=10.0 2024-09-19 08:20:53,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=626720.0, ans=0.125 2024-09-19 08:20:54,998 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.599e+01 9.222e+01 9.663e+01 2.009e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-19 08:21:17,441 INFO [train.py:1198] (0/2) Epoch 35, batch 2850, loss[loss=0.2351, ctc_loss=0.1147, cr_loss=0.3431, attn_decoder_loss=0.2408, over 29489.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1176, cr_loss=0.3598, attn_decoder_loss=0.2413, over 5762325.02 frames. ], batch size: 77, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:21:29,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=626800.0, ans=0.0 2024-09-19 08:21:42,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=626840.0, ans=0.125 2024-09-19 08:21:56,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=626880.0, ans=0.0 2024-09-19 08:22:14,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=626920.0, ans=0.04949747468305833 2024-09-19 08:22:22,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-19 08:22:36,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=627000.0, ans=0.09899494936611666 2024-09-19 08:22:37,746 INFO [train.py:1198] (0/2) Epoch 35, batch 2900, loss[loss=0.2328, ctc_loss=0.1162, cr_loss=0.3519, attn_decoder_loss=0.2379, over 29428.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1181, cr_loss=0.3613, attn_decoder_loss=0.2423, over 5787767.56 frames. ], batch size: 79, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:22:45,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=627000.0, ans=0.125 2024-09-19 08:22:51,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=627040.0, ans=0.125 2024-09-19 08:23:02,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=627040.0, ans=0.125 2024-09-19 08:23:17,242 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:23:32,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.643e+01 9.038e+01 9.732e+01 2.249e+02, threshold=1.808e+02, percent-clipped=2.0 2024-09-19 08:23:38,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=627160.0, ans=0.125 2024-09-19 08:23:53,518 INFO [train.py:1198] (0/2) Epoch 35, batch 2950, loss[loss=0.2283, ctc_loss=0.1124, cr_loss=0.3436, attn_decoder_loss=0.2336, over 29517.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1173, cr_loss=0.3598, attn_decoder_loss=0.2412, over 5780913.25 frames. ], batch size: 75, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:23:54,625 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2024-09-19 08:24:10,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=627240.0, ans=0.1 2024-09-19 08:24:52,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=627360.0, ans=0.0 2024-09-19 08:25:09,411 INFO [train.py:1198] (0/2) Epoch 35, batch 3000, loss[loss=0.2346, ctc_loss=0.1145, cr_loss=0.3644, attn_decoder_loss=0.2399, over 29758.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1167, cr_loss=0.3579, attn_decoder_loss=0.2409, over 5781402.93 frames. ], batch size: 81, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:25:09,412 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 08:25:24,889 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([4.1830, 3.8003, 2.9862, 4.0655, 3.3760, 2.8495, 3.0859, 3.3812], device='cuda:0') 2024-09-19 08:25:28,764 INFO [train.py:1230] (0/2) Epoch 35, validation: loss=0.2119, ctc_loss=0.03685, cr_loss=6.108e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-19 08:25:28,765 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 08:25:47,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=627440.0, ans=0.0 2024-09-19 08:25:56,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=627440.0, ans=0.125 2024-09-19 08:26:15,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=627520.0, ans=0.2 2024-09-19 08:26:25,878 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.691e+01 9.210e+01 9.887e+01 4.457e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-19 08:26:32,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=627560.0, ans=0.125 2024-09-19 08:26:36,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=627560.0, ans=0.1 2024-09-19 08:26:38,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=627560.0, ans=0.125 2024-09-19 08:26:47,037 INFO [train.py:1198] (0/2) Epoch 35, batch 3050, loss[loss=0.2277, ctc_loss=0.1138, cr_loss=0.3605, attn_decoder_loss=0.2324, over 29522.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1174, cr_loss=0.3593, attn_decoder_loss=0.2419, over 5775506.26 frames. ], batch size: 76, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:26:53,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=627600.0, ans=0.125 2024-09-19 08:27:22,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=627680.0, ans=0.125 2024-09-19 08:27:29,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=627680.0, ans=0.5 2024-09-19 08:27:35,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=627720.0, ans=0.1 2024-09-19 08:27:41,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=627720.0, ans=0.125 2024-09-19 08:27:41,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=627720.0, ans=0.1 2024-09-19 08:27:59,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=627760.0, ans=0.125 2024-09-19 08:28:02,350 INFO [train.py:1198] (0/2) Epoch 35, batch 3100, loss[loss=0.258, ctc_loss=0.1346, cr_loss=0.4101, attn_decoder_loss=0.2626, over 29245.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1173, cr_loss=0.3595, attn_decoder_loss=0.2417, over 5775866.49 frames. ], batch size: 100, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:28:11,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=627800.0, ans=0.2 2024-09-19 08:28:36,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-19 08:28:42,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=627880.0, ans=0.1 2024-09-19 08:28:45,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-09-19 08:28:55,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2024-09-19 08:28:57,337 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.618e+01 9.080e+01 9.751e+01 2.675e+02, threshold=1.816e+02, percent-clipped=2.0 2024-09-19 08:29:01,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=15.0 2024-09-19 08:29:11,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=627960.0, ans=0.0 2024-09-19 08:29:18,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=627960.0, ans=0.07 2024-09-19 08:29:21,432 INFO [train.py:1198] (0/2) Epoch 35, batch 3150, loss[loss=0.2429, ctc_loss=0.1196, cr_loss=0.3588, attn_decoder_loss=0.2486, over 28778.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1168, cr_loss=0.3584, attn_decoder_loss=0.2413, over 5781830.64 frames. ], batch size: 104, lr: 3.15e-03, grad_scale: 8.0 2024-09-19 08:29:53,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=628080.0, ans=0.0 2024-09-19 08:29:58,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=628080.0, ans=0.125 2024-09-19 08:30:09,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=628120.0, ans=0.125 2024-09-19 08:30:27,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=628160.0, ans=0.125 2024-09-19 08:30:29,394 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.02 vs. limit=15.0 2024-09-19 08:30:38,989 INFO [train.py:1198] (0/2) Epoch 35, batch 3200, loss[loss=0.2266, ctc_loss=0.1089, cr_loss=0.3365, attn_decoder_loss=0.2322, over 29437.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1164, cr_loss=0.3576, attn_decoder_loss=0.2408, over 5793377.52 frames. ], batch size: 79, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 08:30:42,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=628200.0, ans=0.025 2024-09-19 08:31:06,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=628240.0, ans=0.09899494936611666 2024-09-19 08:31:19,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=628280.0, ans=0.125 2024-09-19 08:31:27,870 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:31:34,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.608e+01 9.276e+01 9.756e+01 1.910e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 08:31:49,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-19 08:31:54,881 INFO [train.py:1198] (0/2) Epoch 35, batch 3250, loss[loss=0.2467, ctc_loss=0.1214, cr_loss=0.3614, attn_decoder_loss=0.2526, over 29714.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1166, cr_loss=0.3581, attn_decoder_loss=0.2415, over 5800331.16 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:31:57,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.44 vs. limit=10.0 2024-09-19 08:32:22,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=628440.0, ans=0.125 2024-09-19 08:32:39,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=628520.0, ans=0.2 2024-09-19 08:32:43,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=628520.0, ans=0.125 2024-09-19 08:32:56,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2024-09-19 08:32:57,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=628560.0, ans=0.0 2024-09-19 08:33:12,772 INFO [train.py:1198] (0/2) Epoch 35, batch 3300, loss[loss=0.2447, ctc_loss=0.1169, cr_loss=0.336, attn_decoder_loss=0.2514, over 28277.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1157, cr_loss=0.3561, attn_decoder_loss=0.2401, over 5797599.22 frames. ], batch size: 111, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:33:14,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=628600.0, ans=0.0 2024-09-19 08:34:10,348 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.715e+01 9.290e+01 9.754e+01 2.928e+02, threshold=1.858e+02, percent-clipped=2.0 2024-09-19 08:34:30,126 INFO [train.py:1198] (0/2) Epoch 35, batch 3350, loss[loss=0.2613, ctc_loss=0.1352, cr_loss=0.4042, attn_decoder_loss=0.2663, over 28898.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1165, cr_loss=0.3573, attn_decoder_loss=0.241, over 5773025.53 frames. ], batch size: 104, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:34:30,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628800.0, ans=0.1 2024-09-19 08:34:53,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=628840.0, ans=0.09899494936611666 2024-09-19 08:34:53,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=628840.0, ans=0.2 2024-09-19 08:34:54,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=628840.0, ans=0.125 2024-09-19 08:35:03,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=628880.0, ans=0.125 2024-09-19 08:35:04,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=628880.0, ans=0.0 2024-09-19 08:35:46,083 INFO [train.py:1198] (0/2) Epoch 35, batch 3400, loss[loss=0.2072, ctc_loss=0.0978, cr_loss=0.3213, attn_decoder_loss=0.2122, over 29338.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1165, cr_loss=0.3574, attn_decoder_loss=0.2409, over 5765291.73 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:35:56,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=629000.0, ans=0.125 2024-09-19 08:36:04,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629040.0, ans=0.0 2024-09-19 08:36:08,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=629040.0, ans=0.125 2024-09-19 08:36:12,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=629040.0, ans=0.2 2024-09-19 08:36:12,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=629040.0, ans=0.125 2024-09-19 08:36:18,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-19 08:36:25,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=629080.0, ans=0.125 2024-09-19 08:36:31,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=629120.0, ans=0.125 2024-09-19 08:36:36,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=629120.0, ans=0.0 2024-09-19 08:36:44,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.920e+01 8.517e+01 9.055e+01 9.651e+01 2.142e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 08:36:58,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-09-19 08:37:03,842 INFO [train.py:1198] (0/2) Epoch 35, batch 3450, loss[loss=0.265, ctc_loss=0.14, cr_loss=0.4148, attn_decoder_loss=0.2697, over 28236.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1167, cr_loss=0.3583, attn_decoder_loss=0.2413, over 5774214.52 frames. ], batch size: 111, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:37:12,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-19 08:37:26,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=629240.0, ans=0.125 2024-09-19 08:37:26,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=629240.0, ans=0.125 2024-09-19 08:37:32,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=629280.0, ans=0.125 2024-09-19 08:38:21,920 INFO [train.py:1198] (0/2) Epoch 35, batch 3500, loss[loss=0.2146, ctc_loss=0.09909, cr_loss=0.325, attn_decoder_loss=0.2202, over 29341.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1166, cr_loss=0.3581, attn_decoder_loss=0.241, over 5775770.02 frames. ], batch size: 71, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:38:52,327 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:38:55,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=629480.0, ans=0.0 2024-09-19 08:39:17,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.529e+01 8.957e+01 9.484e+01 1.276e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 08:39:18,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=629520.0, ans=0.04949747468305833 2024-09-19 08:39:20,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=629560.0, ans=0.05 2024-09-19 08:39:29,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-09-19 08:39:36,787 INFO [train.py:1198] (0/2) Epoch 35, batch 3550, loss[loss=0.234, ctc_loss=0.1117, cr_loss=0.3435, attn_decoder_loss=0.24, over 29676.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1164, cr_loss=0.3574, attn_decoder_loss=0.2408, over 5780790.83 frames. ], batch size: 89, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:39:38,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=629600.0, ans=0.125 2024-09-19 08:40:11,546 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-09-19 08:40:16,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.19 vs. limit=12.0 2024-09-19 08:40:37,791 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:40:50,978 INFO [train.py:1198] (0/2) Epoch 35, batch 3600, loss[loss=0.22, ctc_loss=0.09878, cr_loss=0.3261, attn_decoder_loss=0.2262, over 29522.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1164, cr_loss=0.3574, attn_decoder_loss=0.2408, over 5790720.86 frames. ], batch size: 77, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 08:41:02,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2024-09-19 08:41:05,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-19 08:41:08,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2024-09-19 08:41:12,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=629840.0, ans=0.07 2024-09-19 08:41:29,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=629880.0, ans=0.125 2024-09-19 08:41:47,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.545e+01 9.030e+01 9.736e+01 4.485e+02, threshold=1.806e+02, percent-clipped=2.0 2024-09-19 08:41:55,500 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:42:01,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629960.0, ans=0.1 2024-09-19 08:42:07,127 INFO [train.py:1198] (0/2) Epoch 35, batch 3650, loss[loss=0.2596, ctc_loss=0.1392, cr_loss=0.4067, attn_decoder_loss=0.264, over 29513.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1161, cr_loss=0.3572, attn_decoder_loss=0.2403, over 5793322.79 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:42:08,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-19 08:42:23,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=630040.0, ans=0.2 2024-09-19 08:42:37,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=630080.0, ans=0.0 2024-09-19 08:43:21,873 INFO [train.py:1198] (0/2) Epoch 35, batch 3700, loss[loss=0.2468, ctc_loss=0.1239, cr_loss=0.37, attn_decoder_loss=0.2523, over 29699.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1166, cr_loss=0.3584, attn_decoder_loss=0.2405, over 5803013.18 frames. ], batch size: 84, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:43:42,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.24 vs. limit=10.0 2024-09-19 08:44:05,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630280.0, ans=0.1 2024-09-19 08:44:08,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630320.0, ans=0.0 2024-09-19 08:44:19,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.511e+01 9.010e+01 9.557e+01 1.443e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 08:44:31,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=630360.0, ans=0.015 2024-09-19 08:44:38,051 INFO [train.py:1198] (0/2) Epoch 35, batch 3750, loss[loss=0.211, ctc_loss=0.09384, cr_loss=0.3064, attn_decoder_loss=0.2173, over 29349.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1166, cr_loss=0.3587, attn_decoder_loss=0.2404, over 5806660.40 frames. ], batch size: 67, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:44:44,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=630400.0, ans=0.0 2024-09-19 08:44:55,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-19 08:44:57,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=630440.0, ans=0.05 2024-09-19 08:45:02,393 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:45:05,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=630440.0, ans=0.025 2024-09-19 08:45:43,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630560.0, ans=0.0 2024-09-19 08:45:52,361 INFO [train.py:1198] (0/2) Epoch 35, batch 3800, loss[loss=0.2393, ctc_loss=0.1133, cr_loss=0.336, attn_decoder_loss=0.2458, over 29603.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1163, cr_loss=0.3574, attn_decoder_loss=0.2403, over 5797049.73 frames. ], batch size: 86, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:46:16,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=630640.0, ans=0.0 2024-09-19 08:46:29,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=630680.0, ans=0.2 2024-09-19 08:46:34,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=630680.0, ans=0.125 2024-09-19 08:46:40,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-09-19 08:46:45,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=630720.0, ans=0.125 2024-09-19 08:46:46,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=630720.0, ans=0.125 2024-09-19 08:46:48,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.417e+01 9.020e+01 9.508e+01 1.354e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-19 08:46:59,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=630760.0, ans=0.09899494936611666 2024-09-19 08:46:59,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-09-19 08:47:00,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-19 08:47:02,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=630760.0, ans=0.2 2024-09-19 08:47:06,583 INFO [train.py:1198] (0/2) Epoch 35, batch 3850, loss[loss=0.2381, ctc_loss=0.1115, cr_loss=0.3528, attn_decoder_loss=0.2443, over 29237.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1157, cr_loss=0.356, attn_decoder_loss=0.24, over 5811323.05 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:47:12,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=630800.0, ans=0.125 2024-09-19 08:47:38,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=630880.0, ans=0.125 2024-09-19 08:47:54,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=630920.0, ans=0.125 2024-09-19 08:48:08,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.62 vs. limit=22.5 2024-09-19 08:48:12,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=630960.0, ans=0.125 2024-09-19 08:48:13,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=630960.0, ans=0.0 2024-09-19 08:48:15,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630960.0, ans=0.0 2024-09-19 08:48:22,482 INFO [train.py:1198] (0/2) Epoch 35, batch 3900, loss[loss=0.2473, ctc_loss=0.1262, cr_loss=0.3885, attn_decoder_loss=0.2521, over 29629.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1163, cr_loss=0.3574, attn_decoder_loss=0.2406, over 5815707.28 frames. ], batch size: 86, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:48:27,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631000.0, ans=0.1 2024-09-19 08:48:37,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631040.0, ans=0.125 2024-09-19 08:48:40,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=631040.0, ans=22.5 2024-09-19 08:49:00,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631080.0, ans=0.125 2024-09-19 08:49:18,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.482e+01 8.961e+01 9.353e+01 1.224e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 08:49:32,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=631160.0, ans=0.125 2024-09-19 08:49:38,531 INFO [train.py:1198] (0/2) Epoch 35, batch 3950, loss[loss=0.2513, ctc_loss=0.1315, cr_loss=0.397, attn_decoder_loss=0.2558, over 29506.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1164, cr_loss=0.3578, attn_decoder_loss=0.2411, over 5835166.37 frames. ], batch size: 97, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:49:40,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=631200.0, ans=10.0 2024-09-19 08:50:03,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631240.0, ans=0.1 2024-09-19 08:50:24,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=631320.0, ans=0.125 2024-09-19 08:50:32,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=631320.0, ans=0.125 2024-09-19 08:50:43,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=631360.0, ans=0.0 2024-09-19 08:50:48,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=631360.0, ans=0.0 2024-09-19 08:50:52,166 INFO [train.py:1198] (0/2) Epoch 35, batch 4000, loss[loss=0.2327, ctc_loss=0.1138, cr_loss=0.3604, attn_decoder_loss=0.2379, over 29501.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1164, cr_loss=0.3572, attn_decoder_loss=0.241, over 5812117.12 frames. ], batch size: 74, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 08:51:14,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=631440.0, ans=0.0 2024-09-19 08:51:28,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=631480.0, ans=0.2 2024-09-19 08:51:29,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=631480.0, ans=0.0 2024-09-19 08:51:42,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-09-19 08:51:49,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=631520.0, ans=0.125 2024-09-19 08:51:50,542 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.935e+01 8.609e+01 9.049e+01 9.611e+01 2.994e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 08:52:01,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=631560.0, ans=0.125 2024-09-19 08:52:06,771 INFO [train.py:1198] (0/2) Epoch 35, batch 4050, loss[loss=0.2532, ctc_loss=0.1488, cr_loss=0.3919, attn_decoder_loss=0.2561, over 20529.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1166, cr_loss=0.3577, attn_decoder_loss=0.241, over 5796937.43 frames. ], batch size: 212, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:52:08,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=631600.0, ans=0.5 2024-09-19 08:53:02,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=631720.0, ans=0.035 2024-09-19 08:53:21,184 INFO [train.py:1198] (0/2) Epoch 35, batch 4100, loss[loss=0.259, ctc_loss=0.1414, cr_loss=0.4167, attn_decoder_loss=0.2628, over 29531.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1165, cr_loss=0.3573, attn_decoder_loss=0.2411, over 5792594.84 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:53:21,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631800.0, ans=0.125 2024-09-19 08:53:28,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=631800.0, ans=0.0 2024-09-19 08:53:31,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=631800.0, ans=0.0 2024-09-19 08:53:49,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=631880.0, ans=0.125 2024-09-19 08:53:50,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=631880.0, ans=0.125 2024-09-19 08:54:12,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=631920.0, ans=0.125 2024-09-19 08:54:19,449 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.556e+01 9.136e+01 9.776e+01 2.394e+02, threshold=1.827e+02, percent-clipped=3.0 2024-09-19 08:54:29,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-09-19 08:54:36,201 INFO [train.py:1198] (0/2) Epoch 35, batch 4150, loss[loss=0.2331, ctc_loss=0.1183, cr_loss=0.3675, attn_decoder_loss=0.2376, over 29474.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1163, cr_loss=0.3568, attn_decoder_loss=0.2405, over 5797616.97 frames. ], batch size: 77, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:54:45,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=632000.0, ans=0.125 2024-09-19 08:54:48,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=632000.0, ans=0.125 2024-09-19 08:54:48,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=632000.0, ans=0.125 2024-09-19 08:54:51,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=632040.0, ans=0.0 2024-09-19 08:54:55,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=632040.0, ans=0.0 2024-09-19 08:55:04,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=632080.0, ans=0.2 2024-09-19 08:55:04,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=632080.0, ans=0.0 2024-09-19 08:55:18,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=632120.0, ans=0.025 2024-09-19 08:55:22,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-19 08:55:41,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=632160.0, ans=0.0 2024-09-19 08:55:49,866 INFO [train.py:1198] (0/2) Epoch 35, batch 4200, loss[loss=0.2433, ctc_loss=0.1284, cr_loss=0.3975, attn_decoder_loss=0.2472, over 29510.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1165, cr_loss=0.3575, attn_decoder_loss=0.241, over 5798981.12 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 08:55:51,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632200.0, ans=0.1 2024-09-19 08:55:54,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632200.0, ans=0.1 2024-09-19 08:56:24,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=632280.0, ans=0.125 2024-09-19 08:56:25,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=632280.0, ans=0.125 2024-09-19 08:56:26,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=632280.0, ans=0.2 2024-09-19 08:56:27,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=632280.0, ans=0.0 2024-09-19 08:56:33,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=632320.0, ans=0.125 2024-09-19 08:56:33,816 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:56:34,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-09-19 08:56:37,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=632320.0, ans=0.015 2024-09-19 08:56:38,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=632320.0, ans=15.0 2024-09-19 08:56:48,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.470e+01 8.972e+01 9.495e+01 2.308e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-19 08:57:01,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=632360.0, ans=0.125 2024-09-19 08:57:04,324 INFO [train.py:1198] (0/2) Epoch 35, batch 4250, loss[loss=0.2189, ctc_loss=0.09502, cr_loss=0.3111, attn_decoder_loss=0.2257, over 29532.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1163, cr_loss=0.357, attn_decoder_loss=0.2411, over 5805071.84 frames. ], batch size: 74, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 08:57:26,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=632440.0, ans=0.125 2024-09-19 08:57:46,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=632480.0, ans=0.125 2024-09-19 08:57:47,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632520.0, ans=0.1 2024-09-19 08:57:54,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632520.0, ans=0.1 2024-09-19 08:57:58,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=632520.0, ans=0.1 2024-09-19 08:58:13,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=632560.0, ans=0.0 2024-09-19 08:58:19,074 INFO [train.py:1198] (0/2) Epoch 35, batch 4300, loss[loss=0.2479, ctc_loss=0.1281, cr_loss=0.3678, attn_decoder_loss=0.2531, over 29559.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1163, cr_loss=0.3571, attn_decoder_loss=0.2416, over 5794623.81 frames. ], batch size: 87, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 08:58:41,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=632640.0, ans=0.0 2024-09-19 08:58:46,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.55 vs. limit=22.5 2024-09-19 08:58:52,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=632680.0, ans=0.0 2024-09-19 08:58:53,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632680.0, ans=0.1 2024-09-19 08:59:17,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.841e+01 9.230e+01 9.936e+01 2.115e+02, threshold=1.846e+02, percent-clipped=2.0 2024-09-19 08:59:20,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=632760.0, ans=0.0 2024-09-19 08:59:34,555 INFO [train.py:1198] (0/2) Epoch 35, batch 4350, loss[loss=0.2505, ctc_loss=0.1299, cr_loss=0.3742, attn_decoder_loss=0.2556, over 29471.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1189, cr_loss=0.3628, attn_decoder_loss=0.2447, over 5797603.02 frames. ], batch size: 97, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 08:59:40,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=632800.0, ans=0.125 2024-09-19 08:59:41,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-19 09:00:01,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=632840.0, ans=0.125 2024-09-19 09:00:23,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=632920.0, ans=0.025 2024-09-19 09:00:27,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=632920.0, ans=0.2 2024-09-19 09:00:47,706 INFO [train.py:1198] (0/2) Epoch 35, batch 4400, loss[loss=0.248, ctc_loss=0.1306, cr_loss=0.3922, attn_decoder_loss=0.2523, over 27390.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1203, cr_loss=0.3656, attn_decoder_loss=0.2465, over 5769256.30 frames. ], batch size: 124, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 09:01:06,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=633040.0, ans=0.5 2024-09-19 09:01:28,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.14 vs. limit=15.0 2024-09-19 09:01:35,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=633120.0, ans=0.07 2024-09-19 09:01:39,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=633120.0, ans=0.125 2024-09-19 09:01:45,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.031e+01 8.931e+01 9.450e+01 9.933e+01 1.920e+02, threshold=1.890e+02, percent-clipped=1.0 2024-09-19 09:01:50,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=633160.0, ans=0.125 2024-09-19 09:01:51,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-19 09:02:02,859 INFO [train.py:1198] (0/2) Epoch 35, batch 4450, loss[loss=0.2551, ctc_loss=0.1424, cr_loss=0.4017, attn_decoder_loss=0.2587, over 20507.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1237, cr_loss=0.3711, attn_decoder_loss=0.2485, over 5583854.42 frames. ], batch size: 209, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 09:02:06,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=12.0 2024-09-19 09:02:15,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=633200.0, ans=0.2 2024-09-19 09:02:15,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633200.0, ans=0.1 2024-09-19 09:02:35,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=633280.0, ans=0.125 2024-09-19 09:03:17,988 INFO [train.py:1198] (0/2) Epoch 35, batch 4500, loss[loss=0.2572, ctc_loss=0.1477, cr_loss=0.3827, attn_decoder_loss=0.2609, over 19135.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.127, cr_loss=0.3735, attn_decoder_loss=0.2504, over 5239063.70 frames. ], batch size: 209, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 09:03:18,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=633400.0, ans=0.0 2024-09-19 09:03:45,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633440.0, ans=0.125 2024-09-19 09:03:51,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=633480.0, ans=0.125 2024-09-19 09:03:53,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.39 vs. limit=15.0 2024-09-19 09:03:55,219 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-35.pt 2024-09-19 09:04:39,980 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:04:41,128 INFO [train.py:1198] (0/2) Epoch 36, batch 0, loss[loss=0.2157, ctc_loss=0.1019, cr_loss=0.3389, attn_decoder_loss=0.2208, over 29596.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1019, cr_loss=0.3389, attn_decoder_loss=0.2208, over 29596.00 frames. ], batch size: 73, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:04:41,128 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 09:04:49,388 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2387, 3.8427, 4.1141, 3.7377], device='cuda:0') 2024-09-19 09:04:56,200 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1522, 4.3278, 4.4782, 4.7765], device='cuda:0') 2024-09-19 09:04:59,474 INFO [train.py:1230] (0/2) Epoch 36, validation: loss=0.2129, ctc_loss=0.03662, cr_loss=5.743e-15, attn_decoder_loss=0.2325, over 944034.00 frames. 2024-09-19 09:04:59,474 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 09:05:02,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=633500.0, ans=0.125 2024-09-19 09:05:08,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=633500.0, ans=0.125 2024-09-19 09:05:13,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=633540.0, ans=0.125 2024-09-19 09:05:22,044 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 1.073e+02 1.144e+02 1.210e+02 8.768e+02, threshold=2.289e+02, percent-clipped=4.0 2024-09-19 09:05:33,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-09-19 09:06:08,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=633660.0, ans=0.0 2024-09-19 09:06:15,586 INFO [train.py:1198] (0/2) Epoch 36, batch 50, loss[loss=0.213, ctc_loss=0.09483, cr_loss=0.3005, attn_decoder_loss=0.2195, over 29427.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1168, cr_loss=0.3594, attn_decoder_loss=0.241, over 1267692.84 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:06:30,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633700.0, ans=0.1 2024-09-19 09:06:30,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2024-09-19 09:06:52,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=633780.0, ans=0.0 2024-09-19 09:06:52,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=633780.0, ans=0.125 2024-09-19 09:07:18,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.83 vs. limit=22.5 2024-09-19 09:07:19,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=633860.0, ans=0.1 2024-09-19 09:07:23,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=633860.0, ans=0.125 2024-09-19 09:07:28,786 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-09-19 09:07:34,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=633900.0, ans=0.1 2024-09-19 09:07:35,509 INFO [train.py:1198] (0/2) Epoch 36, batch 100, loss[loss=0.2234, ctc_loss=0.1034, cr_loss=0.3298, attn_decoder_loss=0.2294, over 29529.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.119, cr_loss=0.3641, attn_decoder_loss=0.2436, over 2251081.11 frames. ], batch size: 76, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:07:37,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=633900.0, ans=0.2 2024-09-19 09:07:49,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=633940.0, ans=0.125 2024-09-19 09:07:49,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=633940.0, ans=0.125 2024-09-19 09:07:57,922 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.630e+01 9.046e+01 9.825e+01 1.723e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 09:07:58,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=633940.0, ans=0.025 2024-09-19 09:08:22,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=634020.0, ans=0.125 2024-09-19 09:08:36,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=15.0 2024-09-19 09:08:44,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=634060.0, ans=0.125 2024-09-19 09:08:50,177 INFO [train.py:1198] (0/2) Epoch 36, batch 150, loss[loss=0.2111, ctc_loss=0.0983, cr_loss=0.3197, attn_decoder_loss=0.2166, over 29418.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1161, cr_loss=0.3578, attn_decoder_loss=0.2411, over 3047344.17 frames. ], batch size: 70, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:08:54,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=634100.0, ans=0.025 2024-09-19 09:09:23,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=634180.0, ans=0.2 2024-09-19 09:09:50,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.21 vs. limit=10.0 2024-09-19 09:10:00,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634260.0, ans=0.1 2024-09-19 09:10:04,908 INFO [train.py:1198] (0/2) Epoch 36, batch 200, loss[loss=0.2484, ctc_loss=0.124, cr_loss=0.3775, attn_decoder_loss=0.2538, over 27109.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1155, cr_loss=0.356, attn_decoder_loss=0.24, over 3658860.25 frames. ], batch size: 124, lr: 3.09e-03, grad_scale: 16.0 2024-09-19 09:10:24,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=634340.0, ans=0.125 2024-09-19 09:10:29,661 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.423e+01 8.790e+01 9.226e+01 1.100e+02, threshold=1.758e+02, percent-clipped=0.0 2024-09-19 09:11:05,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-19 09:11:25,654 INFO [train.py:1198] (0/2) Epoch 36, batch 250, loss[loss=0.251, ctc_loss=0.123, cr_loss=0.3658, attn_decoder_loss=0.2571, over 29287.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1153, cr_loss=0.3549, attn_decoder_loss=0.2398, over 4141728.24 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:11:31,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=634500.0, ans=0.025 2024-09-19 09:11:41,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634540.0, ans=0.1 2024-09-19 09:11:56,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=634580.0, ans=0.125 2024-09-19 09:12:06,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=634580.0, ans=0.125 2024-09-19 09:12:13,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-19 09:12:27,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=634660.0, ans=0.0 2024-09-19 09:12:40,875 INFO [train.py:1198] (0/2) Epoch 36, batch 300, loss[loss=0.2546, ctc_loss=0.1355, cr_loss=0.4038, attn_decoder_loss=0.2588, over 29559.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1154, cr_loss=0.3558, attn_decoder_loss=0.2399, over 4509960.68 frames. ], batch size: 92, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:12:47,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=634700.0, ans=0.1 2024-09-19 09:12:54,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=634740.0, ans=0.2 2024-09-19 09:12:57,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=634740.0, ans=0.125 2024-09-19 09:12:57,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=634740.0, ans=0.0 2024-09-19 09:13:04,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.698e+01 9.076e+01 9.667e+01 1.639e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-19 09:13:06,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=634740.0, ans=0.0 2024-09-19 09:13:08,227 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:13:24,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=634820.0, ans=0.0 2024-09-19 09:13:29,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=634820.0, ans=0.5 2024-09-19 09:13:38,488 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:13:56,338 INFO [train.py:1198] (0/2) Epoch 36, batch 350, loss[loss=0.2069, ctc_loss=0.09585, cr_loss=0.3108, attn_decoder_loss=0.2123, over 29306.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1154, cr_loss=0.3564, attn_decoder_loss=0.2404, over 4796458.08 frames. ], batch size: 71, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:14:38,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=634980.0, ans=0.125 2024-09-19 09:14:41,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=634980.0, ans=0.5 2024-09-19 09:14:47,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=15.0 2024-09-19 09:14:54,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-19 09:14:54,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2024-09-19 09:15:15,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=635100.0, ans=0.125 2024-09-19 09:15:16,543 INFO [train.py:1198] (0/2) Epoch 36, batch 400, loss[loss=0.2382, ctc_loss=0.1136, cr_loss=0.3659, attn_decoder_loss=0.2439, over 29691.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.115, cr_loss=0.3552, attn_decoder_loss=0.24, over 5025164.44 frames. ], batch size: 82, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:15:22,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635100.0, ans=0.1 2024-09-19 09:15:35,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.87 vs. limit=22.5 2024-09-19 09:15:40,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.564e+01 9.194e+01 9.781e+01 3.536e+02, threshold=1.839e+02, percent-clipped=4.0 2024-09-19 09:16:12,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=635220.0, ans=0.125 2024-09-19 09:16:33,035 INFO [train.py:1198] (0/2) Epoch 36, batch 450, loss[loss=0.2509, ctc_loss=0.127, cr_loss=0.3812, attn_decoder_loss=0.2562, over 29689.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1153, cr_loss=0.3554, attn_decoder_loss=0.2403, over 5187247.00 frames. ], batch size: 83, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:16:45,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=635300.0, ans=0.95 2024-09-19 09:16:50,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635340.0, ans=0.1 2024-09-19 09:17:26,723 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:17:38,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=635460.0, ans=0.04949747468305833 2024-09-19 09:17:44,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=635460.0, ans=0.125 2024-09-19 09:17:48,984 INFO [train.py:1198] (0/2) Epoch 36, batch 500, loss[loss=0.2423, ctc_loss=0.1267, cr_loss=0.3739, attn_decoder_loss=0.2468, over 29378.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1148, cr_loss=0.3546, attn_decoder_loss=0.2395, over 5329421.19 frames. ], batch size: 94, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:18:01,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=635500.0, ans=0.07 2024-09-19 09:18:13,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.310e+01 8.819e+01 9.519e+01 1.597e+02, threshold=1.764e+02, percent-clipped=0.0 2024-09-19 09:18:23,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=635580.0, ans=0.125 2024-09-19 09:18:32,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635580.0, ans=0.1 2024-09-19 09:18:32,463 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:18:45,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=635620.0, ans=0.0 2024-09-19 09:18:51,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635620.0, ans=0.1 2024-09-19 09:18:55,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=635660.0, ans=0.2 2024-09-19 09:19:09,303 INFO [train.py:1198] (0/2) Epoch 36, batch 550, loss[loss=0.2507, ctc_loss=0.1228, cr_loss=0.3936, attn_decoder_loss=0.2562, over 28775.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1148, cr_loss=0.3542, attn_decoder_loss=0.2395, over 5421610.17 frames. ], batch size: 104, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:19:36,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=635740.0, ans=0.025 2024-09-19 09:19:39,310 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2024-09-19 09:20:02,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-19 09:20:25,469 INFO [train.py:1198] (0/2) Epoch 36, batch 600, loss[loss=0.2456, ctc_loss=0.1242, cr_loss=0.3747, attn_decoder_loss=0.2508, over 29255.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1151, cr_loss=0.3549, attn_decoder_loss=0.2398, over 5508695.42 frames. ], batch size: 100, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:20:27,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=635900.0, ans=0.125 2024-09-19 09:20:30,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=635900.0, ans=0.125 2024-09-19 09:20:30,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=635900.0, ans=0.125 2024-09-19 09:20:30,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2024-09-19 09:20:33,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=635900.0, ans=10.0 2024-09-19 09:20:36,303 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:20:42,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635940.0, ans=0.1 2024-09-19 09:20:50,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.519e+01 9.044e+01 9.582e+01 1.949e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-19 09:20:54,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=635980.0, ans=0.125 2024-09-19 09:20:54,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.63 vs. limit=22.5 2024-09-19 09:21:36,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=636060.0, ans=0.125 2024-09-19 09:21:37,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=636060.0, ans=0.0 2024-09-19 09:21:40,519 INFO [train.py:1198] (0/2) Epoch 36, batch 650, loss[loss=0.2407, ctc_loss=0.1221, cr_loss=0.3797, attn_decoder_loss=0.2454, over 29764.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1143, cr_loss=0.3537, attn_decoder_loss=0.239, over 5586019.14 frames. ], batch size: 81, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:21:49,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636100.0, ans=0.1 2024-09-19 09:22:00,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=636140.0, ans=0.0 2024-09-19 09:22:06,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636140.0, ans=0.1 2024-09-19 09:22:37,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636220.0, ans=0.1 2024-09-19 09:22:49,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-09-19 09:22:52,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-19 09:23:00,741 INFO [train.py:1198] (0/2) Epoch 36, batch 700, loss[loss=0.2307, ctc_loss=0.1208, cr_loss=0.3618, attn_decoder_loss=0.2349, over 29519.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.115, cr_loss=0.3556, attn_decoder_loss=0.2401, over 5635977.64 frames. ], batch size: 76, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:23:13,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.97 vs. limit=22.5 2024-09-19 09:23:26,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.520e+01 8.919e+01 9.430e+01 1.206e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 09:23:29,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=15.0 2024-09-19 09:23:35,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=636380.0, ans=0.125 2024-09-19 09:23:36,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.15 vs. limit=10.0 2024-09-19 09:24:04,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=636460.0, ans=0.125 2024-09-19 09:24:16,384 INFO [train.py:1198] (0/2) Epoch 36, batch 750, loss[loss=0.243, ctc_loss=0.1206, cr_loss=0.371, attn_decoder_loss=0.2484, over 29704.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1154, cr_loss=0.3565, attn_decoder_loss=0.2399, over 5673970.56 frames. ], batch size: 82, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:24:33,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=636540.0, ans=0.125 2024-09-19 09:24:46,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=636580.0, ans=0.025 2024-09-19 09:25:16,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-09-19 09:25:29,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=636660.0, ans=0.125 2024-09-19 09:25:31,843 INFO [train.py:1198] (0/2) Epoch 36, batch 800, loss[loss=0.2142, ctc_loss=0.09237, cr_loss=0.3157, attn_decoder_loss=0.2207, over 29650.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1156, cr_loss=0.3568, attn_decoder_loss=0.24, over 5704345.64 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:25:38,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=636700.0, ans=0.0 2024-09-19 09:25:55,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.33 vs. limit=15.0 2024-09-19 09:25:57,541 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.447e+01 8.844e+01 9.388e+01 5.453e+02, threshold=1.769e+02, percent-clipped=1.0 2024-09-19 09:26:00,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=636780.0, ans=0.025 2024-09-19 09:26:09,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.67 vs. limit=15.0 2024-09-19 09:26:27,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2024-09-19 09:26:33,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=636820.0, ans=0.2 2024-09-19 09:26:37,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-09-19 09:26:39,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=636860.0, ans=0.125 2024-09-19 09:26:46,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=636860.0, ans=0.125 2024-09-19 09:26:52,628 INFO [train.py:1198] (0/2) Epoch 36, batch 850, loss[loss=0.2288, ctc_loss=0.09834, cr_loss=0.3238, attn_decoder_loss=0.2361, over 29705.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1147, cr_loss=0.3541, attn_decoder_loss=0.2395, over 5733748.95 frames. ], batch size: 89, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:26:54,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=636900.0, ans=0.125 2024-09-19 09:27:06,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636940.0, ans=0.125 2024-09-19 09:27:45,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=637020.0, ans=0.2 2024-09-19 09:27:50,588 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:27:59,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=637060.0, ans=0.125 2024-09-19 09:28:08,134 INFO [train.py:1198] (0/2) Epoch 36, batch 900, loss[loss=0.2105, ctc_loss=0.09503, cr_loss=0.3001, attn_decoder_loss=0.2166, over 29637.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1149, cr_loss=0.3547, attn_decoder_loss=0.2395, over 5739515.63 frames. ], batch size: 73, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:28:17,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=637100.0, ans=0.0 2024-09-19 09:28:28,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=637140.0, ans=0.125 2024-09-19 09:28:35,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.598e+01 8.959e+01 9.567e+01 2.745e+02, threshold=1.792e+02, percent-clipped=2.0 2024-09-19 09:28:35,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=637140.0, ans=0.0 2024-09-19 09:28:35,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=637140.0, ans=0.125 2024-09-19 09:28:37,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=22.5 2024-09-19 09:28:41,735 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:28:50,660 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:28:51,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-19 09:28:55,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637220.0, ans=0.1 2024-09-19 09:28:59,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=637220.0, ans=0.0 2024-09-19 09:29:07,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=637260.0, ans=0.025 2024-09-19 09:29:11,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.07 vs. limit=6.0 2024-09-19 09:29:23,682 INFO [train.py:1198] (0/2) Epoch 36, batch 950, loss[loss=0.226, ctc_loss=0.1079, cr_loss=0.3393, attn_decoder_loss=0.2316, over 29491.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.115, cr_loss=0.355, attn_decoder_loss=0.2397, over 5742272.70 frames. ], batch size: 74, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:29:34,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=637300.0, ans=0.0 2024-09-19 09:29:37,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=637340.0, ans=0.2 2024-09-19 09:29:46,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=637340.0, ans=0.2 2024-09-19 09:29:51,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=637340.0, ans=0.0 2024-09-19 09:30:00,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=637380.0, ans=10.0 2024-09-19 09:30:01,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2024-09-19 09:30:39,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=637460.0, ans=0.025 2024-09-19 09:30:40,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=637460.0, ans=0.125 2024-09-19 09:30:41,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-19 09:30:43,640 INFO [train.py:1198] (0/2) Epoch 36, batch 1000, loss[loss=0.22, ctc_loss=0.09931, cr_loss=0.3108, attn_decoder_loss=0.2265, over 29478.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.116, cr_loss=0.3573, attn_decoder_loss=0.2408, over 5736021.56 frames. ], batch size: 77, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:30:48,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=637500.0, ans=0.0 2024-09-19 09:31:09,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-19 09:31:11,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.580e+01 9.134e+01 9.845e+01 2.020e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-19 09:31:26,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.78 vs. limit=22.5 2024-09-19 09:31:32,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2024-09-19 09:31:37,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=637620.0, ans=0.125 2024-09-19 09:31:51,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=22.5 2024-09-19 09:31:57,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637660.0, ans=0.0 2024-09-19 09:31:59,736 INFO [train.py:1198] (0/2) Epoch 36, batch 1050, loss[loss=0.242, ctc_loss=0.1124, cr_loss=0.3483, attn_decoder_loss=0.2486, over 29662.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1158, cr_loss=0.3566, attn_decoder_loss=0.2404, over 5743386.73 frames. ], batch size: 85, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:32:09,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=637700.0, ans=0.125 2024-09-19 09:32:23,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=637740.0, ans=0.0 2024-09-19 09:32:37,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.72 vs. limit=22.5 2024-09-19 09:33:15,954 INFO [train.py:1198] (0/2) Epoch 36, batch 1100, loss[loss=0.2406, ctc_loss=0.1218, cr_loss=0.3646, attn_decoder_loss=0.2457, over 29438.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1161, cr_loss=0.357, attn_decoder_loss=0.2405, over 5755761.42 frames. ], batch size: 78, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:33:30,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2024-09-19 09:33:43,104 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.368e+01 8.851e+01 9.380e+01 2.140e+02, threshold=1.770e+02, percent-clipped=1.0 2024-09-19 09:34:00,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=638020.0, ans=0.125 2024-09-19 09:34:33,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-19 09:34:35,863 INFO [train.py:1198] (0/2) Epoch 36, batch 1150, loss[loss=0.2303, ctc_loss=0.1147, cr_loss=0.3488, attn_decoder_loss=0.2354, over 29466.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1163, cr_loss=0.3571, attn_decoder_loss=0.2405, over 5754964.15 frames. ], batch size: 78, lr: 3.08e-03, grad_scale: 8.0 2024-09-19 09:34:37,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=638100.0, ans=0.0 2024-09-19 09:34:43,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=638100.0, ans=0.125 2024-09-19 09:34:52,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=638140.0, ans=0.125 2024-09-19 09:35:12,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=638180.0, ans=0.125 2024-09-19 09:35:17,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=638180.0, ans=0.125 2024-09-19 09:35:18,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=638180.0, ans=0.05 2024-09-19 09:35:23,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638220.0, ans=0.1 2024-09-19 09:35:29,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=638220.0, ans=0.125 2024-09-19 09:35:29,393 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:35:30,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=638220.0, ans=0.125 2024-09-19 09:35:34,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-19 09:35:38,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=638260.0, ans=0.0 2024-09-19 09:35:51,890 INFO [train.py:1198] (0/2) Epoch 36, batch 1200, loss[loss=0.2468, ctc_loss=0.1224, cr_loss=0.3803, attn_decoder_loss=0.2522, over 29655.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1168, cr_loss=0.3582, attn_decoder_loss=0.2413, over 5748141.53 frames. ], batch size: 85, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:35:53,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=638300.0, ans=0.0 2024-09-19 09:36:19,079 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.630e+01 9.163e+01 9.879e+01 2.531e+02, threshold=1.833e+02, percent-clipped=3.0 2024-09-19 09:36:22,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=638380.0, ans=0.125 2024-09-19 09:36:32,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=638380.0, ans=0.125 2024-09-19 09:36:43,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.97 vs. limit=15.0 2024-09-19 09:36:52,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=638460.0, ans=0.1 2024-09-19 09:36:58,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=638460.0, ans=0.07 2024-09-19 09:37:01,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=638460.0, ans=0.09899494936611666 2024-09-19 09:37:08,431 INFO [train.py:1198] (0/2) Epoch 36, batch 1250, loss[loss=0.2503, ctc_loss=0.1309, cr_loss=0.3856, attn_decoder_loss=0.255, over 29542.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.117, cr_loss=0.3584, attn_decoder_loss=0.2416, over 5774372.11 frames. ], batch size: 92, lr: 3.08e-03, grad_scale: 16.0 2024-09-19 09:37:22,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.48 vs. limit=15.0 2024-09-19 09:37:43,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=638580.0, ans=0.04949747468305833 2024-09-19 09:38:14,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=638660.0, ans=0.5 2024-09-19 09:38:20,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=638660.0, ans=0.125 2024-09-19 09:38:24,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.32 vs. limit=10.0 2024-09-19 09:38:28,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2024-09-19 09:38:29,054 INFO [train.py:1198] (0/2) Epoch 36, batch 1300, loss[loss=0.2445, ctc_loss=0.1179, cr_loss=0.3656, attn_decoder_loss=0.2505, over 28553.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1166, cr_loss=0.3578, attn_decoder_loss=0.2412, over 5780318.77 frames. ], batch size: 112, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:38:32,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.28 vs. limit=22.5 2024-09-19 09:38:50,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638740.0, ans=0.1 2024-09-19 09:38:53,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=638740.0, ans=0.0 2024-09-19 09:38:56,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.178e+01 8.817e+01 9.661e+01 1.409e+02, threshold=1.763e+02, percent-clipped=0.0 2024-09-19 09:38:58,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638780.0, ans=0.1 2024-09-19 09:39:36,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=638860.0, ans=10.0 2024-09-19 09:39:45,467 INFO [train.py:1198] (0/2) Epoch 36, batch 1350, loss[loss=0.2355, ctc_loss=0.1109, cr_loss=0.3428, attn_decoder_loss=0.2417, over 29743.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1159, cr_loss=0.3565, attn_decoder_loss=0.2405, over 5799090.44 frames. ], batch size: 81, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:39:47,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=638900.0, ans=0.0 2024-09-19 09:40:36,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=639020.0, ans=0.0 2024-09-19 09:40:38,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=639020.0, ans=0.0 2024-09-19 09:40:42,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=639020.0, ans=0.2 2024-09-19 09:41:00,592 INFO [train.py:1198] (0/2) Epoch 36, batch 1400, loss[loss=0.2137, ctc_loss=0.1004, cr_loss=0.3308, attn_decoder_loss=0.2189, over 29581.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1158, cr_loss=0.3562, attn_decoder_loss=0.2403, over 5809210.17 frames. ], batch size: 69, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:41:00,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=639100.0, ans=0.125 2024-09-19 09:41:18,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=639140.0, ans=0.0 2024-09-19 09:41:25,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=639140.0, ans=22.5 2024-09-19 09:41:27,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.417e+01 9.024e+01 9.500e+01 1.848e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-19 09:41:42,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=639180.0, ans=0.2 2024-09-19 09:41:51,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=639220.0, ans=0.2 2024-09-19 09:42:09,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.94 vs. limit=15.0 2024-09-19 09:42:20,993 INFO [train.py:1198] (0/2) Epoch 36, batch 1450, loss[loss=0.2497, ctc_loss=0.1261, cr_loss=0.3748, attn_decoder_loss=0.2551, over 29446.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1163, cr_loss=0.3575, attn_decoder_loss=0.241, over 5806442.98 frames. ], batch size: 94, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:42:50,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=639380.0, ans=0.125 2024-09-19 09:42:54,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=639380.0, ans=0.04949747468305833 2024-09-19 09:43:01,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.04 vs. limit=22.5 2024-09-19 09:43:36,466 INFO [train.py:1198] (0/2) Epoch 36, batch 1500, loss[loss=0.2505, ctc_loss=0.1249, cr_loss=0.3872, attn_decoder_loss=0.2559, over 29631.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1162, cr_loss=0.3576, attn_decoder_loss=0.2411, over 5806107.41 frames. ], batch size: 86, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:43:47,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=639500.0, ans=0.0 2024-09-19 09:44:02,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.82 vs. limit=22.5 2024-09-19 09:44:03,549 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.635e+01 9.112e+01 9.549e+01 2.206e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 09:44:05,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=639580.0, ans=0.0 2024-09-19 09:44:14,673 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:44:23,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=639620.0, ans=0.125 2024-09-19 09:44:34,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=639620.0, ans=0.0 2024-09-19 09:44:42,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=639660.0, ans=0.125 2024-09-19 09:44:49,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639660.0, ans=0.1 2024-09-19 09:44:52,260 INFO [train.py:1198] (0/2) Epoch 36, batch 1550, loss[loss=0.254, ctc_loss=0.1323, cr_loss=0.3844, attn_decoder_loss=0.259, over 29509.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1163, cr_loss=0.3578, attn_decoder_loss=0.241, over 5781182.65 frames. ], batch size: 90, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:44:53,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-19 09:44:57,302 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:45:03,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=639700.0, ans=0.0 2024-09-19 09:45:10,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=639740.0, ans=0.125 2024-09-19 09:45:12,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-09-19 09:45:27,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=639780.0, ans=0.0 2024-09-19 09:45:48,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=639820.0, ans=0.125 2024-09-19 09:45:59,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=639860.0, ans=0.025 2024-09-19 09:46:03,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.76 vs. limit=15.0 2024-09-19 09:46:05,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-09-19 09:46:11,765 INFO [train.py:1198] (0/2) Epoch 36, batch 1600, loss[loss=0.2403, ctc_loss=0.114, cr_loss=0.3488, attn_decoder_loss=0.2466, over 29674.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1165, cr_loss=0.3582, attn_decoder_loss=0.2409, over 5763704.37 frames. ], batch size: 85, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:46:13,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=639900.0, ans=0.025 2024-09-19 09:46:34,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=639940.0, ans=0.125 2024-09-19 09:46:42,000 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.623e+01 9.307e+01 9.759e+01 1.491e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-19 09:46:48,614 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-160000.pt 2024-09-19 09:47:03,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.75 vs. limit=22.5 2024-09-19 09:47:08,222 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:47:11,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-19 09:47:28,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-19 09:47:34,969 INFO [train.py:1198] (0/2) Epoch 36, batch 1650, loss[loss=0.2389, ctc_loss=0.1108, cr_loss=0.3413, attn_decoder_loss=0.2456, over 29707.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1163, cr_loss=0.3579, attn_decoder_loss=0.2406, over 5757218.15 frames. ], batch size: 89, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:47:48,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=640140.0, ans=0.125 2024-09-19 09:47:51,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=640140.0, ans=0.125 2024-09-19 09:48:05,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=640180.0, ans=0.125 2024-09-19 09:48:22,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=640220.0, ans=0.125 2024-09-19 09:48:46,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=640260.0, ans=0.1 2024-09-19 09:48:50,207 INFO [train.py:1198] (0/2) Epoch 36, batch 1700, loss[loss=0.2064, ctc_loss=0.09674, cr_loss=0.3242, attn_decoder_loss=0.2114, over 29585.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.116, cr_loss=0.358, attn_decoder_loss=0.2406, over 5779835.99 frames. ], batch size: 69, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:49:05,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=640340.0, ans=0.025 2024-09-19 09:49:09,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=640340.0, ans=0.1 2024-09-19 09:49:20,275 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.508e+01 8.971e+01 9.480e+01 1.290e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 09:49:53,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=640460.0, ans=0.125 2024-09-19 09:50:10,194 INFO [train.py:1198] (0/2) Epoch 36, batch 1750, loss[loss=0.1967, ctc_loss=0.0869, cr_loss=0.3056, attn_decoder_loss=0.2021, over 29397.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1157, cr_loss=0.3574, attn_decoder_loss=0.2403, over 5789130.46 frames. ], batch size: 67, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:50:14,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=640500.0, ans=0.2 2024-09-19 09:50:16,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640500.0, ans=0.1 2024-09-19 09:50:22,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=640500.0, ans=0.0 2024-09-19 09:50:22,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=640500.0, ans=0.0 2024-09-19 09:50:54,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=640620.0, ans=0.125 2024-09-19 09:51:21,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640660.0, ans=0.1 2024-09-19 09:51:23,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=640660.0, ans=0.125 2024-09-19 09:51:24,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=640700.0, ans=0.125 2024-09-19 09:51:26,090 INFO [train.py:1198] (0/2) Epoch 36, batch 1800, loss[loss=0.2463, ctc_loss=0.1297, cr_loss=0.3865, attn_decoder_loss=0.2507, over 29696.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1159, cr_loss=0.3574, attn_decoder_loss=0.2406, over 5791152.59 frames. ], batch size: 83, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:51:56,469 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.489e+01 9.081e+01 9.519e+01 1.920e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 09:52:32,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=640860.0, ans=0.125 2024-09-19 09:52:38,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=640860.0, ans=0.125 2024-09-19 09:52:42,689 INFO [train.py:1198] (0/2) Epoch 36, batch 1850, loss[loss=0.2517, ctc_loss=0.121, cr_loss=0.3643, attn_decoder_loss=0.2581, over 29624.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.3574, attn_decoder_loss=0.2403, over 5795496.49 frames. ], batch size: 86, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:52:50,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=12.0 2024-09-19 09:52:54,978 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:53:02,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=640940.0, ans=0.0 2024-09-19 09:53:50,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=641060.0, ans=0.0 2024-09-19 09:53:55,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-19 09:53:57,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=641060.0, ans=0.2 2024-09-19 09:54:00,523 INFO [train.py:1198] (0/2) Epoch 36, batch 1900, loss[loss=0.2443, ctc_loss=0.1231, cr_loss=0.3744, attn_decoder_loss=0.2495, over 29685.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1163, cr_loss=0.3586, attn_decoder_loss=0.2407, over 5803958.24 frames. ], batch size: 89, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:54:09,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=641100.0, ans=0.2 2024-09-19 09:54:27,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641140.0, ans=0.1 2024-09-19 09:54:33,087 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.812e+01 8.610e+01 8.955e+01 9.499e+01 1.383e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 09:54:35,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-09-19 09:54:45,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=641180.0, ans=0.0 2024-09-19 09:54:57,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=641220.0, ans=0.07 2024-09-19 09:55:05,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641260.0, ans=0.1 2024-09-19 09:55:18,517 INFO [train.py:1198] (0/2) Epoch 36, batch 1950, loss[loss=0.2276, ctc_loss=0.1133, cr_loss=0.3584, attn_decoder_loss=0.2323, over 29431.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1166, cr_loss=0.3594, attn_decoder_loss=0.2416, over 5818979.46 frames. ], batch size: 78, lr: 3.07e-03, grad_scale: 8.0 2024-09-19 09:55:21,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-09-19 09:55:44,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=641340.0, ans=0.0 2024-09-19 09:56:10,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-19 09:56:33,542 INFO [train.py:1198] (0/2) Epoch 36, batch 2000, loss[loss=0.2127, ctc_loss=0.1004, cr_loss=0.3337, attn_decoder_loss=0.2178, over 29371.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1169, cr_loss=0.3596, attn_decoder_loss=0.2419, over 5797113.61 frames. ], batch size: 67, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:56:39,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=641500.0, ans=0.125 2024-09-19 09:56:49,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=641540.0, ans=0.125 2024-09-19 09:56:57,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.03 vs. limit=6.0 2024-09-19 09:57:04,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.610e+01 8.991e+01 9.571e+01 3.322e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-19 09:57:15,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=641580.0, ans=0.0 2024-09-19 09:57:25,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2024-09-19 09:57:28,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=641620.0, ans=0.1 2024-09-19 09:57:52,162 INFO [train.py:1198] (0/2) Epoch 36, batch 2050, loss[loss=0.2109, ctc_loss=0.09731, cr_loss=0.3141, attn_decoder_loss=0.2165, over 29431.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1163, cr_loss=0.358, attn_decoder_loss=0.2411, over 5788833.87 frames. ], batch size: 70, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:58:02,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=641700.0, ans=0.1 2024-09-19 09:58:27,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=641780.0, ans=0.125 2024-09-19 09:59:02,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=641860.0, ans=0.025 2024-09-19 09:59:09,445 INFO [train.py:1198] (0/2) Epoch 36, batch 2100, loss[loss=0.2337, ctc_loss=0.1091, cr_loss=0.3474, attn_decoder_loss=0.2398, over 29763.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1156, cr_loss=0.3561, attn_decoder_loss=0.2405, over 5801157.10 frames. ], batch size: 81, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 09:59:24,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=641940.0, ans=0.125 2024-09-19 09:59:26,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=15.0 2024-09-19 09:59:33,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641940.0, ans=0.1 2024-09-19 09:59:39,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=22.5 2024-09-19 09:59:39,339 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.432e+01 8.828e+01 9.578e+01 1.169e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-19 09:59:50,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=641980.0, ans=0.2 2024-09-19 10:00:02,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.54 vs. limit=15.0 2024-09-19 10:00:06,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=642020.0, ans=0.125 2024-09-19 10:00:09,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=642060.0, ans=0.2 2024-09-19 10:00:14,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=642060.0, ans=0.0 2024-09-19 10:00:17,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=642060.0, ans=0.0 2024-09-19 10:00:24,377 INFO [train.py:1198] (0/2) Epoch 36, batch 2150, loss[loss=0.2163, ctc_loss=0.1029, cr_loss=0.3239, attn_decoder_loss=0.2217, over 29460.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1152, cr_loss=0.3555, attn_decoder_loss=0.24, over 5815432.22 frames. ], batch size: 78, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:00:26,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=642100.0, ans=0.0 2024-09-19 10:00:57,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2024-09-19 10:00:59,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=642180.0, ans=0.1 2024-09-19 10:01:21,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=642220.0, ans=0.125 2024-09-19 10:01:27,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=642260.0, ans=0.0 2024-09-19 10:01:36,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=642260.0, ans=0.125 2024-09-19 10:01:42,317 INFO [train.py:1198] (0/2) Epoch 36, batch 2200, loss[loss=0.2458, ctc_loss=0.1163, cr_loss=0.3668, attn_decoder_loss=0.2521, over 29623.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1154, cr_loss=0.3564, attn_decoder_loss=0.2402, over 5812149.38 frames. ], batch size: 86, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:01:54,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=642300.0, ans=0.125 2024-09-19 10:02:06,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.61 vs. limit=12.0 2024-09-19 10:02:10,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=642340.0, ans=0.125 2024-09-19 10:02:11,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=642340.0, ans=0.125 2024-09-19 10:02:14,555 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.688e+01 9.104e+01 9.664e+01 2.107e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-19 10:02:16,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=642380.0, ans=0.0 2024-09-19 10:02:24,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=642380.0, ans=0.025 2024-09-19 10:02:30,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=642420.0, ans=0.125 2024-09-19 10:02:44,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=642460.0, ans=0.2 2024-09-19 10:02:45,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=642460.0, ans=0.125 2024-09-19 10:03:00,208 INFO [train.py:1198] (0/2) Epoch 36, batch 2250, loss[loss=0.2334, ctc_loss=0.1121, cr_loss=0.3437, attn_decoder_loss=0.2392, over 29724.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.115, cr_loss=0.3559, attn_decoder_loss=0.2398, over 5810603.86 frames. ], batch size: 82, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:03:00,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642500.0, ans=0.1 2024-09-19 10:03:05,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.70 vs. limit=15.0 2024-09-19 10:03:11,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=642500.0, ans=0.125 2024-09-19 10:03:24,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=642540.0, ans=0.05 2024-09-19 10:03:36,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=642580.0, ans=0.2 2024-09-19 10:03:51,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=642620.0, ans=0.0 2024-09-19 10:03:55,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.45 vs. limit=15.0 2024-09-19 10:04:15,306 INFO [train.py:1198] (0/2) Epoch 36, batch 2300, loss[loss=0.2068, ctc_loss=0.09063, cr_loss=0.2979, attn_decoder_loss=0.2131, over 29309.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1145, cr_loss=0.3541, attn_decoder_loss=0.2389, over 5799302.72 frames. ], batch size: 71, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 10:04:18,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=642700.0, ans=0.0 2024-09-19 10:04:38,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=642740.0, ans=0.2 2024-09-19 10:04:46,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.402e+01 9.082e+01 9.464e+01 1.800e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 10:04:56,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=642780.0, ans=0.2 2024-09-19 10:05:07,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=642820.0, ans=0.0 2024-09-19 10:05:22,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=642860.0, ans=0.025 2024-09-19 10:05:29,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=642860.0, ans=0.0 2024-09-19 10:05:33,282 INFO [train.py:1198] (0/2) Epoch 36, batch 2350, loss[loss=0.2434, ctc_loss=0.1205, cr_loss=0.3812, attn_decoder_loss=0.2486, over 29702.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1148, cr_loss=0.3549, attn_decoder_loss=0.2391, over 5804102.85 frames. ], batch size: 83, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:05:36,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=642900.0, ans=0.125 2024-09-19 10:05:45,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-09-19 10:06:14,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=642980.0, ans=0.0 2024-09-19 10:06:32,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=643020.0, ans=0.0 2024-09-19 10:06:37,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=643060.0, ans=0.0 2024-09-19 10:06:50,477 INFO [train.py:1198] (0/2) Epoch 36, batch 2400, loss[loss=0.2339, ctc_loss=0.1143, cr_loss=0.3586, attn_decoder_loss=0.2393, over 29542.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1152, cr_loss=0.3555, attn_decoder_loss=0.2397, over 5807886.29 frames. ], batch size: 76, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:06:55,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=643100.0, ans=0.025 2024-09-19 10:07:17,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2024-09-19 10:07:19,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=643180.0, ans=0.0 2024-09-19 10:07:22,303 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.644e+01 9.234e+01 9.836e+01 2.155e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-19 10:07:24,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=643180.0, ans=0.125 2024-09-19 10:07:35,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=643220.0, ans=0.0 2024-09-19 10:07:35,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=643220.0, ans=0.025 2024-09-19 10:07:35,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-09-19 10:07:39,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=643220.0, ans=0.0 2024-09-19 10:07:41,312 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:07:44,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=643220.0, ans=0.0 2024-09-19 10:07:45,819 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:08:01,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=643260.0, ans=0.1 2024-09-19 10:08:07,055 INFO [train.py:1198] (0/2) Epoch 36, batch 2450, loss[loss=0.2352, ctc_loss=0.1032, cr_loss=0.3382, attn_decoder_loss=0.2424, over 29711.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.116, cr_loss=0.3573, attn_decoder_loss=0.2407, over 5783069.05 frames. ], batch size: 82, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:08:07,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=643300.0, ans=0.0 2024-09-19 10:08:10,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=643300.0, ans=0.0 2024-09-19 10:08:10,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=643300.0, ans=0.1 2024-09-19 10:08:14,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=643300.0, ans=0.1 2024-09-19 10:08:17,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=643300.0, ans=0.125 2024-09-19 10:08:31,867 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2024-09-19 10:08:39,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=643380.0, ans=0.0 2024-09-19 10:08:54,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=643420.0, ans=0.0 2024-09-19 10:09:13,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2024-09-19 10:09:24,640 INFO [train.py:1198] (0/2) Epoch 36, batch 2500, loss[loss=0.2465, ctc_loss=0.1228, cr_loss=0.383, attn_decoder_loss=0.2518, over 29634.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1161, cr_loss=0.3579, attn_decoder_loss=0.2409, over 5793855.93 frames. ], batch size: 86, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:09:29,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=643500.0, ans=0.2 2024-09-19 10:09:58,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.618e+01 8.994e+01 9.637e+01 2.222e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-19 10:10:00,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=643580.0, ans=0.125 2024-09-19 10:10:32,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=643660.0, ans=0.0 2024-09-19 10:10:42,732 INFO [train.py:1198] (0/2) Epoch 36, batch 2550, loss[loss=0.2052, ctc_loss=0.09095, cr_loss=0.2959, attn_decoder_loss=0.2113, over 29334.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1158, cr_loss=0.3577, attn_decoder_loss=0.2407, over 5797498.65 frames. ], batch size: 67, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:10:50,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643700.0, ans=0.1 2024-09-19 10:10:53,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643700.0, ans=0.1 2024-09-19 10:11:30,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=643820.0, ans=0.09899494936611666 2024-09-19 10:11:46,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=643860.0, ans=0.2 2024-09-19 10:11:50,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-09-19 10:11:58,151 INFO [train.py:1198] (0/2) Epoch 36, batch 2600, loss[loss=0.2305, ctc_loss=0.1086, cr_loss=0.341, attn_decoder_loss=0.2364, over 29443.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1158, cr_loss=0.3575, attn_decoder_loss=0.2408, over 5794276.33 frames. ], batch size: 78, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:12:09,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643900.0, ans=0.1 2024-09-19 10:12:31,416 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.615e+01 9.147e+01 9.711e+01 1.347e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-19 10:12:39,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=643980.0, ans=0.0 2024-09-19 10:12:43,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.67 vs. limit=15.0 2024-09-19 10:13:01,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=644060.0, ans=0.125 2024-09-19 10:13:02,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=644060.0, ans=0.125 2024-09-19 10:13:10,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=644060.0, ans=0.2 2024-09-19 10:13:16,044 INFO [train.py:1198] (0/2) Epoch 36, batch 2650, loss[loss=0.2517, ctc_loss=0.1287, cr_loss=0.3823, attn_decoder_loss=0.2568, over 29279.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1161, cr_loss=0.3578, attn_decoder_loss=0.2411, over 5801554.33 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:13:27,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.69 vs. limit=15.0 2024-09-19 10:13:29,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=644140.0, ans=0.025 2024-09-19 10:13:34,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=644140.0, ans=0.125 2024-09-19 10:13:35,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-09-19 10:14:02,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2024-09-19 10:14:12,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=644220.0, ans=0.2 2024-09-19 10:14:18,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=644260.0, ans=0.125 2024-09-19 10:14:20,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=644260.0, ans=0.0 2024-09-19 10:14:33,369 INFO [train.py:1198] (0/2) Epoch 36, batch 2700, loss[loss=0.2451, ctc_loss=0.1217, cr_loss=0.3733, attn_decoder_loss=0.2505, over 29541.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1158, cr_loss=0.3575, attn_decoder_loss=0.241, over 5797849.69 frames. ], batch size: 87, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:14:42,988 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-09-19 10:14:49,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=644340.0, ans=0.2 2024-09-19 10:14:51,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644340.0, ans=0.1 2024-09-19 10:14:55,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=644340.0, ans=0.125 2024-09-19 10:14:57,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=644340.0, ans=0.125 2024-09-19 10:15:05,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.56 vs. limit=22.5 2024-09-19 10:15:06,278 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.558e+01 9.078e+01 9.683e+01 1.491e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 10:15:25,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.82 vs. limit=15.0 2024-09-19 10:15:35,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=644460.0, ans=0.125 2024-09-19 10:15:46,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=644460.0, ans=0.2 2024-09-19 10:15:48,824 INFO [train.py:1198] (0/2) Epoch 36, batch 2750, loss[loss=0.2333, ctc_loss=0.1154, cr_loss=0.3674, attn_decoder_loss=0.2382, over 29515.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1154, cr_loss=0.3561, attn_decoder_loss=0.24, over 5796577.95 frames. ], batch size: 75, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:15:50,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=644500.0, ans=0.125 2024-09-19 10:16:13,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=644540.0, ans=0.125 2024-09-19 10:16:17,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=644580.0, ans=0.125 2024-09-19 10:16:22,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=644580.0, ans=0.125 2024-09-19 10:16:27,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.32 vs. limit=22.5 2024-09-19 10:16:31,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=644580.0, ans=0.125 2024-09-19 10:17:03,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=644660.0, ans=0.2 2024-09-19 10:17:05,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644700.0, ans=0.1 2024-09-19 10:17:06,637 INFO [train.py:1198] (0/2) Epoch 36, batch 2800, loss[loss=0.2534, ctc_loss=0.1357, cr_loss=0.3788, attn_decoder_loss=0.2581, over 20883.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1162, cr_loss=0.3577, attn_decoder_loss=0.2404, over 5777967.23 frames. ], batch size: 209, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:17:43,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.452e+01 9.019e+01 9.554e+01 2.850e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-19 10:17:46,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=644780.0, ans=22.5 2024-09-19 10:18:01,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.43 vs. limit=22.5 2024-09-19 10:18:24,433 INFO [train.py:1198] (0/2) Epoch 36, batch 2850, loss[loss=0.2232, ctc_loss=0.1113, cr_loss=0.3398, attn_decoder_loss=0.2281, over 29513.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1168, cr_loss=0.3583, attn_decoder_loss=0.2409, over 5764221.51 frames. ], batch size: 77, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:18:32,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=644900.0, ans=0.125 2024-09-19 10:18:35,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=644900.0, ans=22.5 2024-09-19 10:18:44,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=644940.0, ans=0.125 2024-09-19 10:18:53,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=644980.0, ans=0.0 2024-09-19 10:19:10,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.31 vs. limit=15.0 2024-09-19 10:19:13,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.08 vs. limit=15.0 2024-09-19 10:19:19,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=645020.0, ans=0.125 2024-09-19 10:19:37,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=645060.0, ans=0.125 2024-09-19 10:19:37,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=645060.0, ans=0.125 2024-09-19 10:19:39,848 INFO [train.py:1198] (0/2) Epoch 36, batch 2900, loss[loss=0.2397, ctc_loss=0.1209, cr_loss=0.3709, attn_decoder_loss=0.2446, over 29442.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1171, cr_loss=0.3596, attn_decoder_loss=0.2419, over 5789511.64 frames. ], batch size: 79, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:20:05,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=645140.0, ans=0.2 2024-09-19 10:20:09,300 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:20:14,868 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.534e+01 8.969e+01 9.435e+01 1.794e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 10:20:37,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2024-09-19 10:20:57,807 INFO [train.py:1198] (0/2) Epoch 36, batch 2950, loss[loss=0.2303, ctc_loss=0.1162, cr_loss=0.3642, attn_decoder_loss=0.2349, over 29523.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1159, cr_loss=0.3567, attn_decoder_loss=0.2407, over 5782911.30 frames. ], batch size: 75, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:21:16,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=645340.0, ans=0.1 2024-09-19 10:21:16,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-19 10:21:25,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=645340.0, ans=0.0 2024-09-19 10:21:58,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2024-09-19 10:21:58,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-09-19 10:22:09,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=645460.0, ans=0.05 2024-09-19 10:22:09,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=645460.0, ans=0.2 2024-09-19 10:22:15,301 INFO [train.py:1198] (0/2) Epoch 36, batch 3000, loss[loss=0.2382, ctc_loss=0.1192, cr_loss=0.363, attn_decoder_loss=0.2433, over 29737.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1156, cr_loss=0.3558, attn_decoder_loss=0.2404, over 5782851.33 frames. ], batch size: 81, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:22:15,302 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 10:22:33,842 INFO [train.py:1230] (0/2) Epoch 36, validation: loss=0.212, ctc_loss=0.03671, cr_loss=5.93e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-19 10:22:33,842 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 10:22:49,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645540.0, ans=0.1 2024-09-19 10:23:08,438 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.660e+01 9.002e+01 9.609e+01 4.841e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 10:23:10,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=645580.0, ans=0.125 2024-09-19 10:23:18,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=645620.0, ans=0.125 2024-09-19 10:23:27,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=645620.0, ans=0.2 2024-09-19 10:23:27,604 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:23:31,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=645620.0, ans=0.0 2024-09-19 10:23:50,020 INFO [train.py:1198] (0/2) Epoch 36, batch 3050, loss[loss=0.234, ctc_loss=0.1189, cr_loss=0.3513, attn_decoder_loss=0.239, over 29546.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1167, cr_loss=0.3578, attn_decoder_loss=0.2416, over 5777628.60 frames. ], batch size: 76, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:23:56,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 10:24:06,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645740.0, ans=0.125 2024-09-19 10:24:12,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=645740.0, ans=0.125 2024-09-19 10:24:13,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=645740.0, ans=0.125 2024-09-19 10:24:31,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=645780.0, ans=0.025 2024-09-19 10:24:39,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=645820.0, ans=0.0 2024-09-19 10:24:42,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=645820.0, ans=0.125 2024-09-19 10:24:56,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-19 10:25:03,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=645860.0, ans=0.0 2024-09-19 10:25:06,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=645900.0, ans=0.2 2024-09-19 10:25:07,759 INFO [train.py:1198] (0/2) Epoch 36, batch 3100, loss[loss=0.2439, ctc_loss=0.1174, cr_loss=0.3664, attn_decoder_loss=0.2498, over 29191.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1163, cr_loss=0.3573, attn_decoder_loss=0.2412, over 5776640.73 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:25:17,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=645900.0, ans=0.125 2024-09-19 10:25:31,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=645940.0, ans=0.0 2024-09-19 10:25:44,450 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.450e+01 9.039e+01 9.711e+01 1.761e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 10:25:44,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645980.0, ans=0.125 2024-09-19 10:25:55,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=646020.0, ans=0.125 2024-09-19 10:26:25,354 INFO [train.py:1198] (0/2) Epoch 36, batch 3150, loss[loss=0.2434, ctc_loss=0.1186, cr_loss=0.3581, attn_decoder_loss=0.2493, over 28760.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1164, cr_loss=0.3578, attn_decoder_loss=0.2413, over 5782601.30 frames. ], batch size: 104, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:26:31,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=646100.0, ans=0.07 2024-09-19 10:26:57,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=646180.0, ans=0.0 2024-09-19 10:27:00,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=646180.0, ans=0.125 2024-09-19 10:27:10,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=646220.0, ans=0.1 2024-09-19 10:27:12,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646220.0, ans=0.125 2024-09-19 10:27:39,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646300.0, ans=0.125 2024-09-19 10:27:40,721 INFO [train.py:1198] (0/2) Epoch 36, batch 3200, loss[loss=0.2243, ctc_loss=0.09994, cr_loss=0.3082, attn_decoder_loss=0.2313, over 29429.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.116, cr_loss=0.357, attn_decoder_loss=0.2406, over 5792712.58 frames. ], batch size: 79, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:28:06,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-19 10:28:11,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2024-09-19 10:28:17,944 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.478e+01 9.056e+01 9.805e+01 1.899e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 10:28:18,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=646380.0, ans=0.0 2024-09-19 10:28:18,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=646380.0, ans=0.0 2024-09-19 10:28:30,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=646420.0, ans=0.125 2024-09-19 10:28:53,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2024-09-19 10:28:55,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-09-19 10:28:59,141 INFO [train.py:1198] (0/2) Epoch 36, batch 3250, loss[loss=0.242, ctc_loss=0.1239, cr_loss=0.3651, attn_decoder_loss=0.247, over 29707.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1158, cr_loss=0.3568, attn_decoder_loss=0.2408, over 5798751.78 frames. ], batch size: 84, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:28:59,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=646500.0, ans=0.125 2024-09-19 10:29:00,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=646500.0, ans=0.125 2024-09-19 10:29:16,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=646540.0, ans=0.125 2024-09-19 10:29:27,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=646540.0, ans=0.2 2024-09-19 10:29:27,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=646540.0, ans=0.125 2024-09-19 10:29:46,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=646620.0, ans=0.125 2024-09-19 10:30:03,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-09-19 10:30:07,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=646660.0, ans=0.0 2024-09-19 10:30:15,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=646700.0, ans=0.125 2024-09-19 10:30:16,478 INFO [train.py:1198] (0/2) Epoch 36, batch 3300, loss[loss=0.2448, ctc_loss=0.1203, cr_loss=0.3618, attn_decoder_loss=0.2506, over 28203.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.115, cr_loss=0.3551, attn_decoder_loss=0.2397, over 5795895.45 frames. ], batch size: 111, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 10:30:22,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=646700.0, ans=0.125 2024-09-19 10:30:44,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=646740.0, ans=0.04949747468305833 2024-09-19 10:30:46,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.28 vs. limit=15.0 2024-09-19 10:30:47,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646780.0, ans=0.125 2024-09-19 10:30:51,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=12.0 2024-09-19 10:30:52,749 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.565e+01 9.043e+01 9.746e+01 1.474e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 10:30:56,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.54 vs. limit=15.0 2024-09-19 10:31:17,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=646860.0, ans=0.0 2024-09-19 10:31:23,233 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:31:31,688 INFO [train.py:1198] (0/2) Epoch 36, batch 3350, loss[loss=0.2476, ctc_loss=0.1235, cr_loss=0.3808, attn_decoder_loss=0.253, over 28899.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.116, cr_loss=0.3577, attn_decoder_loss=0.2408, over 5772592.36 frames. ], batch size: 104, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 10:31:35,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=646900.0, ans=0.0 2024-09-19 10:31:49,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=646940.0, ans=0.0 2024-09-19 10:31:50,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=646940.0, ans=0.125 2024-09-19 10:32:49,154 INFO [train.py:1198] (0/2) Epoch 36, batch 3400, loss[loss=0.2066, ctc_loss=0.09039, cr_loss=0.3004, attn_decoder_loss=0.2129, over 29359.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1164, cr_loss=0.3587, attn_decoder_loss=0.2409, over 5764852.46 frames. ], batch size: 67, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:33:10,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=647140.0, ans=0.125 2024-09-19 10:33:18,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=647140.0, ans=0.0 2024-09-19 10:33:20,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=647180.0, ans=0.2 2024-09-19 10:33:27,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.557e+01 9.096e+01 9.972e+01 2.860e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 10:33:37,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=647220.0, ans=0.0 2024-09-19 10:33:37,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=647220.0, ans=0.2 2024-09-19 10:33:40,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647220.0, ans=0.1 2024-09-19 10:33:55,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=647260.0, ans=0.0 2024-09-19 10:33:58,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=647260.0, ans=0.07 2024-09-19 10:34:01,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=647260.0, ans=0.2 2024-09-19 10:34:03,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=647260.0, ans=0.125 2024-09-19 10:34:07,344 INFO [train.py:1198] (0/2) Epoch 36, batch 3450, loss[loss=0.2313, ctc_loss=0.1097, cr_loss=0.3441, attn_decoder_loss=0.2372, over 28160.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1164, cr_loss=0.3587, attn_decoder_loss=0.2411, over 5773250.50 frames. ], batch size: 111, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:34:10,562 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:34:37,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=647380.0, ans=0.125 2024-09-19 10:35:09,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=647460.0, ans=0.125 2024-09-19 10:35:11,070 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:35:12,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=647460.0, ans=0.125 2024-09-19 10:35:17,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=647460.0, ans=0.1 2024-09-19 10:35:21,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=647500.0, ans=0.125 2024-09-19 10:35:22,928 INFO [train.py:1198] (0/2) Epoch 36, batch 3500, loss[loss=0.2214, ctc_loss=0.103, cr_loss=0.3246, attn_decoder_loss=0.2273, over 29362.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1164, cr_loss=0.3585, attn_decoder_loss=0.2409, over 5775159.53 frames. ], batch size: 71, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:35:37,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=647500.0, ans=0.125 2024-09-19 10:35:39,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=647540.0, ans=0.2 2024-09-19 10:35:40,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=647540.0, ans=0.2 2024-09-19 10:35:44,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=647540.0, ans=0.125 2024-09-19 10:35:49,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=647540.0, ans=0.125 2024-09-19 10:35:52,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=647540.0, ans=0.5 2024-09-19 10:36:00,839 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.443e+01 8.960e+01 9.445e+01 1.390e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 10:36:39,521 INFO [train.py:1198] (0/2) Epoch 36, batch 3550, loss[loss=0.2509, ctc_loss=0.1172, cr_loss=0.3581, attn_decoder_loss=0.2578, over 29720.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1166, cr_loss=0.3591, attn_decoder_loss=0.241, over 5782224.27 frames. ], batch size: 89, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:36:50,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2024-09-19 10:37:00,325 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:37:09,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=647780.0, ans=0.125 2024-09-19 10:37:13,787 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:37:26,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=647820.0, ans=0.125 2024-09-19 10:37:43,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=647860.0, ans=0.125 2024-09-19 10:37:53,734 INFO [train.py:1198] (0/2) Epoch 36, batch 3600, loss[loss=0.2233, ctc_loss=0.1048, cr_loss=0.3486, attn_decoder_loss=0.2287, over 29507.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1159, cr_loss=0.3581, attn_decoder_loss=0.2406, over 5791734.20 frames. ], batch size: 77, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:38:06,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=647900.0, ans=0.2 2024-09-19 10:38:12,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647940.0, ans=0.1 2024-09-19 10:38:26,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-19 10:38:30,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647980.0, ans=0.1 2024-09-19 10:38:31,850 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.382e+01 8.950e+01 9.458e+01 2.043e+02, threshold=1.790e+02, percent-clipped=2.0 2024-09-19 10:38:47,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=648020.0, ans=0.5 2024-09-19 10:38:54,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=648060.0, ans=0.0 2024-09-19 10:38:54,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=648060.0, ans=0.0 2024-09-19 10:39:10,853 INFO [train.py:1198] (0/2) Epoch 36, batch 3650, loss[loss=0.2404, ctc_loss=0.1101, cr_loss=0.334, attn_decoder_loss=0.2475, over 29484.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1155, cr_loss=0.3574, attn_decoder_loss=0.2399, over 5794354.37 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:39:42,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=648180.0, ans=0.2 2024-09-19 10:40:14,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.08 vs. limit=15.0 2024-09-19 10:40:18,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=648260.0, ans=0.125 2024-09-19 10:40:22,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=648260.0, ans=0.0 2024-09-19 10:40:25,591 INFO [train.py:1198] (0/2) Epoch 36, batch 3700, loss[loss=0.2514, ctc_loss=0.1292, cr_loss=0.3906, attn_decoder_loss=0.2563, over 29708.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1154, cr_loss=0.3573, attn_decoder_loss=0.2402, over 5804680.62 frames. ], batch size: 84, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:41:01,475 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.595e+01 9.070e+01 9.562e+01 1.267e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 10:41:12,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=648420.0, ans=0.125 2024-09-19 10:41:15,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=648420.0, ans=0.2 2024-09-19 10:41:34,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=648460.0, ans=0.125 2024-09-19 10:41:40,472 INFO [train.py:1198] (0/2) Epoch 36, batch 3750, loss[loss=0.2111, ctc_loss=0.1037, cr_loss=0.3384, attn_decoder_loss=0.2155, over 29330.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1152, cr_loss=0.3563, attn_decoder_loss=0.24, over 5807277.98 frames. ], batch size: 67, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:41:54,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=648500.0, ans=0.125 2024-09-19 10:42:10,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=648580.0, ans=0.07 2024-09-19 10:42:31,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648620.0, ans=0.125 2024-09-19 10:42:52,330 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:42:53,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=648660.0, ans=0.1 2024-09-19 10:42:56,339 INFO [train.py:1198] (0/2) Epoch 36, batch 3800, loss[loss=0.2462, ctc_loss=0.1191, cr_loss=0.3666, attn_decoder_loss=0.2521, over 29615.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1155, cr_loss=0.3566, attn_decoder_loss=0.2399, over 5798033.46 frames. ], batch size: 86, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:42:58,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=22.5 2024-09-19 10:43:18,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-09-19 10:43:34,112 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.360e+01 8.859e+01 9.442e+01 1.706e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 10:43:43,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=648820.0, ans=0.125 2024-09-19 10:43:52,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=12.0 2024-09-19 10:43:53,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=648820.0, ans=0.125 2024-09-19 10:44:04,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=648860.0, ans=0.2 2024-09-19 10:44:04,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=648860.0, ans=0.95 2024-09-19 10:44:04,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=648860.0, ans=0.0 2024-09-19 10:44:11,224 INFO [train.py:1198] (0/2) Epoch 36, batch 3850, loss[loss=0.2546, ctc_loss=0.1229, cr_loss=0.3829, attn_decoder_loss=0.2607, over 29250.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1153, cr_loss=0.3557, attn_decoder_loss=0.2398, over 5810835.03 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:44:46,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=12.0 2024-09-19 10:44:47,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-19 10:44:48,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=648980.0, ans=0.5 2024-09-19 10:45:02,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-19 10:45:18,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-09-19 10:45:24,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-19 10:45:26,936 INFO [train.py:1198] (0/2) Epoch 36, batch 3900, loss[loss=0.2491, ctc_loss=0.1243, cr_loss=0.378, attn_decoder_loss=0.2545, over 29655.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1156, cr_loss=0.356, attn_decoder_loss=0.2402, over 5814835.20 frames. ], batch size: 86, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:45:31,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=649100.0, ans=0.2 2024-09-19 10:45:37,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649100.0, ans=0.1 2024-09-19 10:45:45,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2024-09-19 10:45:52,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=649140.0, ans=0.0 2024-09-19 10:45:58,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649180.0, ans=0.1 2024-09-19 10:46:03,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.759e+01 8.587e+01 8.995e+01 9.649e+01 1.195e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-19 10:46:04,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=649180.0, ans=0.025 2024-09-19 10:46:05,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=649180.0, ans=0.0 2024-09-19 10:46:16,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.50 vs. limit=15.0 2024-09-19 10:46:26,344 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:46:40,852 INFO [train.py:1198] (0/2) Epoch 36, batch 3950, loss[loss=0.2427, ctc_loss=0.1242, cr_loss=0.3876, attn_decoder_loss=0.2472, over 29508.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1158, cr_loss=0.357, attn_decoder_loss=0.2405, over 5834476.75 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:46:41,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=649300.0, ans=0.125 2024-09-19 10:46:47,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=649300.0, ans=0.0 2024-09-19 10:46:57,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=649340.0, ans=0.125 2024-09-19 10:47:04,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=649340.0, ans=0.125 2024-09-19 10:47:05,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=22.5 2024-09-19 10:47:12,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=649380.0, ans=0.2 2024-09-19 10:47:21,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=649380.0, ans=0.5 2024-09-19 10:47:24,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-09-19 10:47:41,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=649460.0, ans=0.2 2024-09-19 10:47:44,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649460.0, ans=0.1 2024-09-19 10:47:52,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=649460.0, ans=0.125 2024-09-19 10:47:56,080 INFO [train.py:1198] (0/2) Epoch 36, batch 4000, loss[loss=0.23, ctc_loss=0.1139, cr_loss=0.3558, attn_decoder_loss=0.235, over 29507.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.3569, attn_decoder_loss=0.2404, over 5811264.36 frames. ], batch size: 74, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:48:00,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=649500.0, ans=0.125 2024-09-19 10:48:05,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2024-09-19 10:48:18,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-19 10:48:33,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.599e+01 9.136e+01 9.707e+01 2.354e+02, threshold=1.827e+02, percent-clipped=2.0 2024-09-19 10:48:51,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=649620.0, ans=0.0 2024-09-19 10:49:02,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=649660.0, ans=0.0 2024-09-19 10:49:02,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=649660.0, ans=0.0 2024-09-19 10:49:10,626 INFO [train.py:1198] (0/2) Epoch 36, batch 4050, loss[loss=0.2526, ctc_loss=0.1412, cr_loss=0.3906, attn_decoder_loss=0.2563, over 20869.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1158, cr_loss=0.3565, attn_decoder_loss=0.2401, over 5795944.45 frames. ], batch size: 210, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:49:41,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=649780.0, ans=0.0 2024-09-19 10:50:04,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=649820.0, ans=0.1 2024-09-19 10:50:13,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=649860.0, ans=0.125 2024-09-19 10:50:15,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=649860.0, ans=0.125 2024-09-19 10:50:23,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-09-19 10:50:25,522 INFO [train.py:1198] (0/2) Epoch 36, batch 4100, loss[loss=0.2477, ctc_loss=0.1324, cr_loss=0.4053, attn_decoder_loss=0.2515, over 29492.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1163, cr_loss=0.3569, attn_decoder_loss=0.2405, over 5791150.08 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:50:36,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=649900.0, ans=0.04949747468305833 2024-09-19 10:50:53,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=649980.0, ans=0.0 2024-09-19 10:50:54,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=649980.0, ans=0.0 2024-09-19 10:51:01,956 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.541e+01 9.319e+01 9.811e+01 6.662e+02, threshold=1.864e+02, percent-clipped=1.0 2024-09-19 10:51:06,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=649980.0, ans=0.09899494936611666 2024-09-19 10:51:12,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=650020.0, ans=0.2 2024-09-19 10:51:39,875 INFO [train.py:1198] (0/2) Epoch 36, batch 4150, loss[loss=0.2251, ctc_loss=0.1157, cr_loss=0.3529, attn_decoder_loss=0.2294, over 29478.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1163, cr_loss=0.3572, attn_decoder_loss=0.2403, over 5796294.09 frames. ], batch size: 77, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:51:44,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=650100.0, ans=0.0 2024-09-19 10:51:58,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-09-19 10:52:06,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=650140.0, ans=0.05 2024-09-19 10:52:08,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=650180.0, ans=0.0 2024-09-19 10:52:21,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=650180.0, ans=0.0 2024-09-19 10:52:23,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2024-09-19 10:52:27,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=650220.0, ans=0.1 2024-09-19 10:52:50,215 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.97 vs. limit=12.0 2024-09-19 10:52:53,752 INFO [train.py:1198] (0/2) Epoch 36, batch 4200, loss[loss=0.25, ctc_loss=0.1279, cr_loss=0.3804, attn_decoder_loss=0.2551, over 29498.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1162, cr_loss=0.357, attn_decoder_loss=0.2405, over 5797725.52 frames. ], batch size: 90, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:53:06,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650300.0, ans=0.1 2024-09-19 10:53:16,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=650340.0, ans=0.125 2024-09-19 10:53:31,779 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.662e+01 9.257e+01 9.687e+01 2.927e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-19 10:53:47,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=15.0 2024-09-19 10:53:58,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=650460.0, ans=0.0 2024-09-19 10:53:59,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2024-09-19 10:54:01,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650460.0, ans=0.125 2024-09-19 10:54:08,378 INFO [train.py:1198] (0/2) Epoch 36, batch 4250, loss[loss=0.2205, ctc_loss=0.1073, cr_loss=0.3293, attn_decoder_loss=0.2258, over 29536.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1153, cr_loss=0.3554, attn_decoder_loss=0.2402, over 5804306.45 frames. ], batch size: 74, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:54:12,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=650500.0, ans=0.2 2024-09-19 10:54:21,814 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:54:30,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650540.0, ans=0.125 2024-09-19 10:55:00,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=650620.0, ans=0.125 2024-09-19 10:55:02,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=650620.0, ans=0.125 2024-09-19 10:55:02,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=650620.0, ans=0.0 2024-09-19 10:55:19,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=650660.0, ans=0.125 2024-09-19 10:55:22,697 INFO [train.py:1198] (0/2) Epoch 36, batch 4300, loss[loss=0.2343, ctc_loss=0.1137, cr_loss=0.3545, attn_decoder_loss=0.2398, over 29567.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1151, cr_loss=0.3549, attn_decoder_loss=0.2404, over 5794053.28 frames. ], batch size: 87, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:55:52,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2024-09-19 10:55:58,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=650780.0, ans=0.125 2024-09-19 10:56:01,045 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.651e+01 9.063e+01 9.682e+01 5.777e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-19 10:56:29,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=650860.0, ans=0.2 2024-09-19 10:56:36,544 INFO [train.py:1198] (0/2) Epoch 36, batch 4350, loss[loss=0.242, ctc_loss=0.122, cr_loss=0.3725, attn_decoder_loss=0.2471, over 29516.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1176, cr_loss=0.361, attn_decoder_loss=0.2434, over 5797119.86 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 8.0 2024-09-19 10:56:39,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650900.0, ans=0.125 2024-09-19 10:56:44,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650900.0, ans=0.1 2024-09-19 10:56:56,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=650940.0, ans=0.125 2024-09-19 10:57:33,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651020.0, ans=0.125 2024-09-19 10:57:51,173 INFO [train.py:1198] (0/2) Epoch 36, batch 4400, loss[loss=0.2419, ctc_loss=0.1234, cr_loss=0.3699, attn_decoder_loss=0.2469, over 27603.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.119, cr_loss=0.3638, attn_decoder_loss=0.2455, over 5769485.13 frames. ], batch size: 125, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 10:57:56,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651100.0, ans=0.1 2024-09-19 10:58:09,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=651140.0, ans=15.0 2024-09-19 10:58:22,961 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-19 10:58:23,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=651180.0, ans=0.2 2024-09-19 10:58:29,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.945e+01 9.277e+01 9.704e+01 3.205e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 10:58:33,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-09-19 10:58:34,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651220.0, ans=0.1 2024-09-19 10:58:50,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=651260.0, ans=0.035 2024-09-19 10:59:05,956 INFO [train.py:1198] (0/2) Epoch 36, batch 4450, loss[loss=0.2595, ctc_loss=0.1477, cr_loss=0.3898, attn_decoder_loss=0.2632, over 20126.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1224, cr_loss=0.3687, attn_decoder_loss=0.2476, over 5583111.17 frames. ], batch size: 209, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 10:59:10,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=651300.0, ans=0.2 2024-09-19 10:59:13,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=651300.0, ans=0.125 2024-09-19 10:59:19,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=651340.0, ans=0.07 2024-09-19 10:59:40,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=651380.0, ans=0.0 2024-09-19 10:59:53,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-09-19 11:00:18,826 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:00:21,397 INFO [train.py:1198] (0/2) Epoch 36, batch 4500, loss[loss=0.2485, ctc_loss=0.1334, cr_loss=0.3379, attn_decoder_loss=0.2538, over 20068.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1258, cr_loss=0.3714, attn_decoder_loss=0.2495, over 5239238.40 frames. ], batch size: 209, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 11:00:23,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=651500.0, ans=0.125 2024-09-19 11:00:59,344 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-36.pt 2024-09-19 11:01:50,756 INFO [train.py:1198] (0/2) Epoch 37, batch 0, loss[loss=0.2221, ctc_loss=0.1068, cr_loss=0.3496, attn_decoder_loss=0.2272, over 29610.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1068, cr_loss=0.3496, attn_decoder_loss=0.2272, over 29610.00 frames. ], batch size: 73, lr: 3.00e-03, grad_scale: 16.0 2024-09-19 11:01:50,756 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 11:02:09,657 INFO [train.py:1230] (0/2) Epoch 37, validation: loss=0.2132, ctc_loss=0.03619, cr_loss=6.181e-15, attn_decoder_loss=0.2329, over 944034.00 frames. 2024-09-19 11:02:09,657 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 11:02:12,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.774e+01 1.049e+02 1.138e+02 1.230e+02 2.136e+02, threshold=2.276e+02, percent-clipped=1.0 2024-09-19 11:02:12,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=651600.0, ans=0.0 2024-09-19 11:02:21,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=651600.0, ans=0.125 2024-09-19 11:02:22,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=651600.0, ans=0.0 2024-09-19 11:02:51,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=651680.0, ans=0.125 2024-09-19 11:02:53,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=651680.0, ans=0.0 2024-09-19 11:02:54,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=651720.0, ans=0.2 2024-09-19 11:02:59,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=12.0 2024-09-19 11:03:07,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.67 vs. limit=12.0 2024-09-19 11:03:08,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=651720.0, ans=0.125 2024-09-19 11:03:08,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=651720.0, ans=0.0 2024-09-19 11:03:15,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=651760.0, ans=0.5 2024-09-19 11:03:25,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2024-09-19 11:03:26,220 INFO [train.py:1198] (0/2) Epoch 37, batch 50, loss[loss=0.2151, ctc_loss=0.103, cr_loss=0.3234, attn_decoder_loss=0.2204, over 29465.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1176, cr_loss=0.3617, attn_decoder_loss=0.2416, over 1267905.63 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:03:29,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=651800.0, ans=0.0 2024-09-19 11:03:39,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.64 vs. limit=10.0 2024-09-19 11:03:44,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=651840.0, ans=0.125 2024-09-19 11:03:48,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=8.0 2024-09-19 11:04:04,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=651880.0, ans=0.125 2024-09-19 11:04:19,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-09-19 11:04:42,598 INFO [train.py:1198] (0/2) Epoch 37, batch 100, loss[loss=0.2241, ctc_loss=0.1088, cr_loss=0.3453, attn_decoder_loss=0.2292, over 29519.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1186, cr_loss=0.3633, attn_decoder_loss=0.2437, over 2251832.60 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:04:46,987 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 8.722e+01 9.272e+01 9.995e+01 2.422e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-19 11:04:54,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=652000.0, ans=0.09899494936611666 2024-09-19 11:05:02,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=652040.0, ans=0.125 2024-09-19 11:05:10,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-19 11:05:11,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=652080.0, ans=0.2 2024-09-19 11:05:26,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=652120.0, ans=0.125 2024-09-19 11:05:29,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-19 11:05:57,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652200.0, ans=0.0 2024-09-19 11:05:59,137 INFO [train.py:1198] (0/2) Epoch 37, batch 150, loss[loss=0.2018, ctc_loss=0.09433, cr_loss=0.3092, attn_decoder_loss=0.2069, over 29460.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1169, cr_loss=0.3598, attn_decoder_loss=0.2415, over 3046907.45 frames. ], batch size: 70, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:06:29,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=22.5 2024-09-19 11:06:48,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=652320.0, ans=0.125 2024-09-19 11:07:08,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.11 vs. limit=10.0 2024-09-19 11:07:16,349 INFO [train.py:1198] (0/2) Epoch 37, batch 200, loss[loss=0.2504, ctc_loss=0.1329, cr_loss=0.3902, attn_decoder_loss=0.2547, over 27305.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1159, cr_loss=0.3581, attn_decoder_loss=0.2403, over 3659653.17 frames. ], batch size: 124, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:07:20,812 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.412e+01 8.881e+01 9.450e+01 8.334e+02, threshold=1.776e+02, percent-clipped=1.0 2024-09-19 11:07:25,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=652400.0, ans=0.0 2024-09-19 11:07:25,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=652400.0, ans=0.2 2024-09-19 11:07:33,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=652440.0, ans=0.125 2024-09-19 11:07:34,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=652440.0, ans=0.0 2024-09-19 11:07:49,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=652480.0, ans=0.025 2024-09-19 11:07:54,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=652480.0, ans=0.125 2024-09-19 11:08:04,801 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:08:12,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=652520.0, ans=0.0 2024-09-19 11:08:31,987 INFO [train.py:1198] (0/2) Epoch 37, batch 250, loss[loss=0.2477, ctc_loss=0.1242, cr_loss=0.3734, attn_decoder_loss=0.2531, over 29235.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1152, cr_loss=0.3568, attn_decoder_loss=0.24, over 4141908.43 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:08:35,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=652600.0, ans=0.125 2024-09-19 11:08:49,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=652640.0, ans=0.125 2024-09-19 11:08:53,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=652640.0, ans=0.1 2024-09-19 11:09:01,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=652680.0, ans=0.0 2024-09-19 11:09:13,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=652680.0, ans=0.2 2024-09-19 11:09:16,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=652720.0, ans=0.2 2024-09-19 11:09:39,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=652760.0, ans=0.2 2024-09-19 11:09:50,042 INFO [train.py:1198] (0/2) Epoch 37, batch 300, loss[loss=0.2621, ctc_loss=0.1394, cr_loss=0.4197, attn_decoder_loss=0.2664, over 29575.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1152, cr_loss=0.3567, attn_decoder_loss=0.2396, over 4510415.68 frames. ], batch size: 92, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:09:54,608 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.309e+01 8.922e+01 9.556e+01 2.479e+02, threshold=1.784e+02, percent-clipped=1.0 2024-09-19 11:09:59,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=652800.0, ans=0.125 2024-09-19 11:10:07,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.61 vs. limit=6.0 2024-09-19 11:10:13,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652840.0, ans=0.0 2024-09-19 11:10:27,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=652880.0, ans=0.0 2024-09-19 11:10:29,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.08 vs. limit=15.0 2024-09-19 11:10:47,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=652920.0, ans=0.2 2024-09-19 11:10:47,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=652920.0, ans=0.0 2024-09-19 11:11:00,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=652960.0, ans=0.125 2024-09-19 11:11:03,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=652960.0, ans=0.125 2024-09-19 11:11:08,035 INFO [train.py:1198] (0/2) Epoch 37, batch 350, loss[loss=0.2117, ctc_loss=0.1017, cr_loss=0.3278, attn_decoder_loss=0.2166, over 29333.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1157, cr_loss=0.3577, attn_decoder_loss=0.2401, over 4795216.70 frames. ], batch size: 71, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:11:34,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=653040.0, ans=0.1 2024-09-19 11:11:34,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=653040.0, ans=0.125 2024-09-19 11:11:42,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=653080.0, ans=0.0 2024-09-19 11:11:52,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=653120.0, ans=0.0 2024-09-19 11:12:07,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=653160.0, ans=0.04949747468305833 2024-09-19 11:12:14,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653160.0, ans=0.1 2024-09-19 11:12:23,525 INFO [train.py:1198] (0/2) Epoch 37, batch 400, loss[loss=0.24, ctc_loss=0.1179, cr_loss=0.385, attn_decoder_loss=0.245, over 29683.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1162, cr_loss=0.3582, attn_decoder_loss=0.2404, over 5025642.26 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 16.0 2024-09-19 11:12:28,152 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.454e+01 8.886e+01 9.286e+01 1.359e+02, threshold=1.777e+02, percent-clipped=0.0 2024-09-19 11:12:42,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=653240.0, ans=0.125 2024-09-19 11:12:50,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=22.5 2024-09-19 11:12:54,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=653280.0, ans=0.125 2024-09-19 11:13:00,572 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:13:07,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=653320.0, ans=0.07 2024-09-19 11:13:08,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.69 vs. limit=22.5 2024-09-19 11:13:13,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=653320.0, ans=0.125 2024-09-19 11:13:41,659 INFO [train.py:1198] (0/2) Epoch 37, batch 450, loss[loss=0.2445, ctc_loss=0.1217, cr_loss=0.3708, attn_decoder_loss=0.2499, over 29716.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1163, cr_loss=0.3587, attn_decoder_loss=0.2408, over 5186674.24 frames. ], batch size: 83, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:14:00,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=22.5 2024-09-19 11:14:06,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=653440.0, ans=0.0 2024-09-19 11:14:08,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=653440.0, ans=0.0 2024-09-19 11:14:09,851 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:14:15,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=653480.0, ans=0.035 2024-09-19 11:14:17,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.66 vs. limit=15.0 2024-09-19 11:14:27,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.82 vs. limit=12.0 2024-09-19 11:14:35,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=653520.0, ans=0.2 2024-09-19 11:14:35,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=653520.0, ans=0.125 2024-09-19 11:15:00,247 INFO [train.py:1198] (0/2) Epoch 37, batch 500, loss[loss=0.2557, ctc_loss=0.1317, cr_loss=0.3914, attn_decoder_loss=0.2607, over 29487.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1158, cr_loss=0.3577, attn_decoder_loss=0.2402, over 5328965.95 frames. ], batch size: 94, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:15:00,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=653600.0, ans=0.0 2024-09-19 11:15:06,232 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.426e+01 9.049e+01 9.525e+01 1.733e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 11:15:07,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.39 vs. limit=15.0 2024-09-19 11:15:18,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=653640.0, ans=0.0 2024-09-19 11:15:23,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=653640.0, ans=0.125 2024-09-19 11:15:27,865 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-19 11:16:02,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=653760.0, ans=0.125 2024-09-19 11:16:07,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=653760.0, ans=0.125 2024-09-19 11:16:10,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653760.0, ans=0.1 2024-09-19 11:16:14,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=653800.0, ans=0.95 2024-09-19 11:16:15,821 INFO [train.py:1198] (0/2) Epoch 37, batch 550, loss[loss=0.2437, ctc_loss=0.1281, cr_loss=0.382, attn_decoder_loss=0.2481, over 28806.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.116, cr_loss=0.358, attn_decoder_loss=0.2403, over 5423112.18 frames. ], batch size: 104, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:16:28,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653800.0, ans=0.1 2024-09-19 11:16:29,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=653840.0, ans=0.0 2024-09-19 11:16:34,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=653840.0, ans=0.2 2024-09-19 11:16:48,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-09-19 11:16:48,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=22.5 2024-09-19 11:16:50,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=653880.0, ans=0.0 2024-09-19 11:17:04,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=653920.0, ans=0.1 2024-09-19 11:17:06,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=653920.0, ans=0.0 2024-09-19 11:17:18,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=653960.0, ans=0.0 2024-09-19 11:17:25,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.27 vs. limit=15.0 2024-09-19 11:17:27,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=653960.0, ans=0.5 2024-09-19 11:17:29,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=653960.0, ans=0.125 2024-09-19 11:17:31,926 INFO [train.py:1198] (0/2) Epoch 37, batch 600, loss[loss=0.2515, ctc_loss=0.1266, cr_loss=0.365, attn_decoder_loss=0.2573, over 29291.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.116, cr_loss=0.3577, attn_decoder_loss=0.2404, over 5510549.40 frames. ], batch size: 100, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:17:32,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=654000.0, ans=0.125 2024-09-19 11:17:38,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=654000.0, ans=0.125 2024-09-19 11:17:40,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.494e+01 8.998e+01 9.681e+01 2.744e+02, threshold=1.800e+02, percent-clipped=3.0 2024-09-19 11:17:54,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654040.0, ans=0.1 2024-09-19 11:17:59,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=654040.0, ans=0.125 2024-09-19 11:18:02,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=654040.0, ans=0.125 2024-09-19 11:18:12,115 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-19 11:18:17,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=654080.0, ans=10.0 2024-09-19 11:18:20,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=654120.0, ans=0.0 2024-09-19 11:18:51,781 INFO [train.py:1198] (0/2) Epoch 37, batch 650, loss[loss=0.2281, ctc_loss=0.1059, cr_loss=0.3265, attn_decoder_loss=0.2344, over 29758.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1149, cr_loss=0.3554, attn_decoder_loss=0.2397, over 5587828.02 frames. ], batch size: 81, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:19:21,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=654280.0, ans=0.125 2024-09-19 11:19:49,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=654320.0, ans=0.125 2024-09-19 11:19:58,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654360.0, ans=0.1 2024-09-19 11:20:07,721 INFO [train.py:1198] (0/2) Epoch 37, batch 700, loss[loss=0.2347, ctc_loss=0.1182, cr_loss=0.3573, attn_decoder_loss=0.2397, over 29548.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1153, cr_loss=0.3561, attn_decoder_loss=0.2404, over 5639990.92 frames. ], batch size: 76, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:20:11,365 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-09-19 11:20:13,643 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.549e+01 8.958e+01 9.415e+01 1.725e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 11:20:54,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=654520.0, ans=0.0 2024-09-19 11:20:59,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-19 11:21:00,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=654520.0, ans=0.2 2024-09-19 11:21:22,120 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:21:23,289 INFO [train.py:1198] (0/2) Epoch 37, batch 750, loss[loss=0.2403, ctc_loss=0.1113, cr_loss=0.3465, attn_decoder_loss=0.247, over 29710.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1149, cr_loss=0.3555, attn_decoder_loss=0.2401, over 5676686.06 frames. ], batch size: 82, lr: 3.00e-03, grad_scale: 8.0 2024-09-19 11:21:23,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=654600.0, ans=0.0 2024-09-19 11:22:27,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=654760.0, ans=0.0 2024-09-19 11:22:39,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=654760.0, ans=0.07 2024-09-19 11:22:40,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=654760.0, ans=0.125 2024-09-19 11:22:43,727 INFO [train.py:1198] (0/2) Epoch 37, batch 800, loss[loss=0.2199, ctc_loss=0.09739, cr_loss=0.3098, attn_decoder_loss=0.2266, over 29621.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1149, cr_loss=0.3553, attn_decoder_loss=0.2401, over 5708114.80 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:22:47,071 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:22:49,768 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.523e+01 9.017e+01 9.581e+01 2.303e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 11:22:56,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.93 vs. limit=15.0 2024-09-19 11:23:26,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.57 vs. limit=15.0 2024-09-19 11:23:58,746 INFO [train.py:1198] (0/2) Epoch 37, batch 850, loss[loss=0.2389, ctc_loss=0.1103, cr_loss=0.3579, attn_decoder_loss=0.2452, over 29733.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1141, cr_loss=0.3538, attn_decoder_loss=0.2394, over 5737190.37 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:24:45,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655120.0, ans=0.1 2024-09-19 11:24:47,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=655120.0, ans=0.0 2024-09-19 11:25:15,283 INFO [train.py:1198] (0/2) Epoch 37, batch 900, loss[loss=0.2099, ctc_loss=0.09156, cr_loss=0.2977, attn_decoder_loss=0.2164, over 29630.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1145, cr_loss=0.3542, attn_decoder_loss=0.2395, over 5741406.26 frames. ], batch size: 73, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:25:16,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.45 vs. limit=22.5 2024-09-19 11:25:20,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=655200.0, ans=0.125 2024-09-19 11:25:22,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.623e+01 9.305e+01 9.762e+01 2.031e+02, threshold=1.861e+02, percent-clipped=1.0 2024-09-19 11:25:29,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=655240.0, ans=0.125 2024-09-19 11:25:33,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=655240.0, ans=0.0 2024-09-19 11:25:49,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=655280.0, ans=0.0 2024-09-19 11:26:08,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.50 vs. limit=15.0 2024-09-19 11:26:11,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=655320.0, ans=0.2 2024-09-19 11:26:17,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=655320.0, ans=0.02 2024-09-19 11:26:20,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=655360.0, ans=0.1 2024-09-19 11:26:29,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=655360.0, ans=0.05 2024-09-19 11:26:34,858 INFO [train.py:1198] (0/2) Epoch 37, batch 950, loss[loss=0.2209, ctc_loss=0.1026, cr_loss=0.3269, attn_decoder_loss=0.2267, over 29505.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1149, cr_loss=0.355, attn_decoder_loss=0.2397, over 5742394.43 frames. ], batch size: 74, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:27:22,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655520.0, ans=0.1 2024-09-19 11:27:22,254 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:27:29,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=655520.0, ans=0.0 2024-09-19 11:27:32,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=655520.0, ans=0.0 2024-09-19 11:27:50,303 INFO [train.py:1198] (0/2) Epoch 37, batch 1000, loss[loss=0.2257, ctc_loss=0.1073, cr_loss=0.3537, attn_decoder_loss=0.231, over 29512.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1155, cr_loss=0.3566, attn_decoder_loss=0.2402, over 5736013.86 frames. ], batch size: 77, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:27:51,427 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-19 11:27:52,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=655600.0, ans=0.0 2024-09-19 11:27:53,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=655600.0, ans=0.2 2024-09-19 11:27:57,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.810e+01 9.265e+01 9.999e+01 4.241e+02, threshold=1.853e+02, percent-clipped=2.0 2024-09-19 11:28:05,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655640.0, ans=0.1 2024-09-19 11:28:49,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=655760.0, ans=0.04949747468305833 2024-09-19 11:28:52,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=655760.0, ans=0.125 2024-09-19 11:28:53,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-19 11:29:06,009 INFO [train.py:1198] (0/2) Epoch 37, batch 1050, loss[loss=0.242, ctc_loss=0.1116, cr_loss=0.3544, attn_decoder_loss=0.2486, over 29699.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.115, cr_loss=0.3559, attn_decoder_loss=0.2396, over 5742996.70 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:29:11,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.10 vs. limit=6.0 2024-09-19 11:29:12,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=655800.0, ans=0.125 2024-09-19 11:29:17,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2024-09-19 11:29:23,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=655840.0, ans=0.125 2024-09-19 11:29:30,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=655840.0, ans=0.2 2024-09-19 11:29:34,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.73 vs. limit=22.5 2024-09-19 11:29:42,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=655880.0, ans=0.0 2024-09-19 11:29:47,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=655880.0, ans=0.125 2024-09-19 11:29:53,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=655880.0, ans=0.2 2024-09-19 11:30:25,478 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-164000.pt 2024-09-19 11:30:33,900 INFO [train.py:1198] (0/2) Epoch 37, batch 1100, loss[loss=0.2279, ctc_loss=0.1121, cr_loss=0.3455, attn_decoder_loss=0.2331, over 29435.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1148, cr_loss=0.3555, attn_decoder_loss=0.2394, over 5756406.62 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:30:41,293 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.502e+01 8.949e+01 9.455e+01 1.229e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 11:30:41,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.19 vs. limit=22.5 2024-09-19 11:30:55,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.14 vs. limit=15.0 2024-09-19 11:31:05,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=656080.0, ans=0.025 2024-09-19 11:31:07,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=656080.0, ans=0.2 2024-09-19 11:31:34,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=656160.0, ans=0.2 2024-09-19 11:31:36,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=656160.0, ans=0.035 2024-09-19 11:31:49,460 INFO [train.py:1198] (0/2) Epoch 37, batch 1150, loss[loss=0.2347, ctc_loss=0.116, cr_loss=0.3574, attn_decoder_loss=0.2399, over 29466.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1147, cr_loss=0.3555, attn_decoder_loss=0.2392, over 5755660.52 frames. ], batch size: 78, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:31:49,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=656200.0, ans=0.125 2024-09-19 11:32:04,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2024-09-19 11:32:08,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.50 vs. limit=10.0 2024-09-19 11:32:11,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=656240.0, ans=0.0 2024-09-19 11:32:17,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=656240.0, ans=0.125 2024-09-19 11:32:18,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=656280.0, ans=0.2 2024-09-19 11:32:34,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=656320.0, ans=10.0 2024-09-19 11:32:43,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=656320.0, ans=0.0 2024-09-19 11:33:03,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2024-09-19 11:33:04,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=656400.0, ans=0.07 2024-09-19 11:33:05,431 INFO [train.py:1198] (0/2) Epoch 37, batch 1200, loss[loss=0.2407, ctc_loss=0.1152, cr_loss=0.3511, attn_decoder_loss=0.2468, over 29681.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1151, cr_loss=0.3561, attn_decoder_loss=0.2399, over 5747735.79 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:33:07,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=656400.0, ans=0.125 2024-09-19 11:33:10,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=656400.0, ans=0.0 2024-09-19 11:33:12,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.756e+01 9.143e+01 9.785e+01 1.884e+02, threshold=1.829e+02, percent-clipped=2.0 2024-09-19 11:33:16,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=656400.0, ans=0.025 2024-09-19 11:33:19,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=656440.0, ans=0.2 2024-09-19 11:33:42,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2024-09-19 11:34:12,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=656560.0, ans=0.125 2024-09-19 11:34:17,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=656560.0, ans=0.125 2024-09-19 11:34:19,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=656560.0, ans=0.05 2024-09-19 11:34:25,527 INFO [train.py:1198] (0/2) Epoch 37, batch 1250, loss[loss=0.2542, ctc_loss=0.1314, cr_loss=0.4099, attn_decoder_loss=0.2587, over 29537.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1159, cr_loss=0.3583, attn_decoder_loss=0.2407, over 5774263.28 frames. ], batch size: 92, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:34:28,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2024-09-19 11:35:03,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=656680.0, ans=0.0 2024-09-19 11:35:41,475 INFO [train.py:1198] (0/2) Epoch 37, batch 1300, loss[loss=0.2502, ctc_loss=0.1253, cr_loss=0.3691, attn_decoder_loss=0.2558, over 28229.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1157, cr_loss=0.358, attn_decoder_loss=0.2403, over 5778888.46 frames. ], batch size: 111, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:35:45,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-19 11:35:49,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.478e+01 8.951e+01 9.333e+01 1.111e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 11:35:49,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.22 vs. limit=22.5 2024-09-19 11:36:56,958 INFO [train.py:1198] (0/2) Epoch 37, batch 1350, loss[loss=0.2336, ctc_loss=0.1151, cr_loss=0.3504, attn_decoder_loss=0.239, over 29750.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1153, cr_loss=0.3576, attn_decoder_loss=0.2399, over 5795810.87 frames. ], batch size: 81, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:37:00,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=657000.0, ans=0.0 2024-09-19 11:37:04,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=657000.0, ans=0.125 2024-09-19 11:37:06,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=657000.0, ans=0.0 2024-09-19 11:37:08,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=657000.0, ans=0.125 2024-09-19 11:37:13,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657040.0, ans=0.1 2024-09-19 11:37:19,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=657040.0, ans=0.05 2024-09-19 11:37:21,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=657040.0, ans=0.1 2024-09-19 11:37:35,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=657080.0, ans=0.0 2024-09-19 11:37:35,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657080.0, ans=0.1 2024-09-19 11:37:52,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=657120.0, ans=0.0 2024-09-19 11:37:52,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657120.0, ans=0.1 2024-09-19 11:37:56,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=657120.0, ans=0.125 2024-09-19 11:37:57,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=657120.0, ans=0.1 2024-09-19 11:37:59,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=657160.0, ans=0.0 2024-09-19 11:38:16,411 INFO [train.py:1198] (0/2) Epoch 37, batch 1400, loss[loss=0.2098, ctc_loss=0.1001, cr_loss=0.328, attn_decoder_loss=0.2147, over 29575.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1147, cr_loss=0.3562, attn_decoder_loss=0.2394, over 5807333.71 frames. ], batch size: 69, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:38:25,472 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.397e+01 9.027e+01 9.734e+01 1.349e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 11:38:25,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=657200.0, ans=0.0 2024-09-19 11:38:46,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=657280.0, ans=0.025 2024-09-19 11:39:04,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2024-09-19 11:39:31,999 INFO [train.py:1198] (0/2) Epoch 37, batch 1450, loss[loss=0.2456, ctc_loss=0.1233, cr_loss=0.3829, attn_decoder_loss=0.2507, over 29424.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.115, cr_loss=0.3566, attn_decoder_loss=0.2398, over 5803997.87 frames. ], batch size: 94, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:39:38,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-19 11:39:58,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=657440.0, ans=0.1 2024-09-19 11:40:05,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=657480.0, ans=0.125 2024-09-19 11:40:11,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=657480.0, ans=0.125 2024-09-19 11:40:18,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2024-09-19 11:40:29,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=657520.0, ans=0.0 2024-09-19 11:40:43,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=657560.0, ans=0.0 2024-09-19 11:40:48,388 INFO [train.py:1198] (0/2) Epoch 37, batch 1500, loss[loss=0.2454, ctc_loss=0.1162, cr_loss=0.3532, attn_decoder_loss=0.2519, over 29625.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.115, cr_loss=0.3564, attn_decoder_loss=0.2403, over 5805435.98 frames. ], batch size: 86, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:40:57,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.474e+01 8.863e+01 9.485e+01 5.565e+02, threshold=1.773e+02, percent-clipped=3.0 2024-09-19 11:41:37,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-09-19 11:41:38,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=657720.0, ans=0.2 2024-09-19 11:42:07,928 INFO [train.py:1198] (0/2) Epoch 37, batch 1550, loss[loss=0.244, ctc_loss=0.1277, cr_loss=0.3926, attn_decoder_loss=0.2482, over 29498.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1153, cr_loss=0.3571, attn_decoder_loss=0.2402, over 5781436.38 frames. ], batch size: 90, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:42:11,184 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:42:17,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=657800.0, ans=0.125 2024-09-19 11:42:21,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=657840.0, ans=0.07 2024-09-19 11:42:26,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=657840.0, ans=0.125 2024-09-19 11:42:42,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657880.0, ans=0.1 2024-09-19 11:42:53,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-09-19 11:42:56,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.47 vs. limit=22.5 2024-09-19 11:43:01,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657920.0, ans=0.1 2024-09-19 11:43:16,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=657960.0, ans=0.125 2024-09-19 11:43:17,590 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:43:23,372 INFO [train.py:1198] (0/2) Epoch 37, batch 1600, loss[loss=0.2489, ctc_loss=0.1253, cr_loss=0.3814, attn_decoder_loss=0.2541, over 29691.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1156, cr_loss=0.3568, attn_decoder_loss=0.2402, over 5763287.93 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 11:43:32,327 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.772e+01 8.728e+01 9.146e+01 9.748e+01 2.180e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 11:43:40,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658040.0, ans=0.125 2024-09-19 11:43:41,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=658040.0, ans=0.1 2024-09-19 11:44:15,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=658120.0, ans=0.125 2024-09-19 11:44:16,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=658120.0, ans=0.1 2024-09-19 11:44:33,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=658160.0, ans=0.125 2024-09-19 11:44:38,993 INFO [train.py:1198] (0/2) Epoch 37, batch 1650, loss[loss=0.2414, ctc_loss=0.1188, cr_loss=0.3664, attn_decoder_loss=0.2468, over 29712.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1152, cr_loss=0.3554, attn_decoder_loss=0.2396, over 5756101.23 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:45:09,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.94 vs. limit=10.0 2024-09-19 11:45:16,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=658280.0, ans=0.0 2024-09-19 11:45:29,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658320.0, ans=0.125 2024-09-19 11:45:44,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=658360.0, ans=0.125 2024-09-19 11:45:59,549 INFO [train.py:1198] (0/2) Epoch 37, batch 1700, loss[loss=0.2049, ctc_loss=0.09385, cr_loss=0.3058, attn_decoder_loss=0.2105, over 29570.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1151, cr_loss=0.3553, attn_decoder_loss=0.2397, over 5778281.86 frames. ], batch size: 69, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:46:10,139 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.607e+01 9.068e+01 9.479e+01 1.872e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 11:46:13,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=658440.0, ans=0.125 2024-09-19 11:46:18,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=658440.0, ans=0.125 2024-09-19 11:47:04,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=658560.0, ans=0.125 2024-09-19 11:47:15,016 INFO [train.py:1198] (0/2) Epoch 37, batch 1750, loss[loss=0.2056, ctc_loss=0.09605, cr_loss=0.3209, attn_decoder_loss=0.2106, over 29345.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.115, cr_loss=0.3551, attn_decoder_loss=0.2395, over 5787040.76 frames. ], batch size: 67, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:48:08,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=658720.0, ans=0.1 2024-09-19 11:48:15,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=12.0 2024-09-19 11:48:28,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=658800.0, ans=0.125 2024-09-19 11:48:30,139 INFO [train.py:1198] (0/2) Epoch 37, batch 1800, loss[loss=0.2474, ctc_loss=0.1221, cr_loss=0.3817, attn_decoder_loss=0.2528, over 29689.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.115, cr_loss=0.355, attn_decoder_loss=0.2398, over 5789606.66 frames. ], batch size: 83, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:48:38,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2024-09-19 11:48:40,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.421e+01 8.885e+01 9.322e+01 2.627e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-19 11:48:41,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.02 vs. limit=10.0 2024-09-19 11:49:01,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=658880.0, ans=0.2 2024-09-19 11:49:02,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-09-19 11:49:08,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=658880.0, ans=0.125 2024-09-19 11:49:09,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-09-19 11:49:16,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=658920.0, ans=0.0 2024-09-19 11:49:22,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=658920.0, ans=15.0 2024-09-19 11:49:29,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658920.0, ans=0.125 2024-09-19 11:49:30,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2024-09-19 11:49:30,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=658920.0, ans=0.0 2024-09-19 11:49:38,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=658960.0, ans=0.125 2024-09-19 11:49:39,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=658960.0, ans=0.125 2024-09-19 11:49:50,131 INFO [train.py:1198] (0/2) Epoch 37, batch 1850, loss[loss=0.2407, ctc_loss=0.1131, cr_loss=0.3372, attn_decoder_loss=0.2474, over 29611.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1147, cr_loss=0.3548, attn_decoder_loss=0.2396, over 5794486.09 frames. ], batch size: 86, lr: 2.99e-03, grad_scale: 8.0 2024-09-19 11:49:54,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=659000.0, ans=0.125 2024-09-19 11:50:38,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=659120.0, ans=0.0 2024-09-19 11:50:40,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=659120.0, ans=0.1 2024-09-19 11:50:48,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=22.5 2024-09-19 11:50:52,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659160.0, ans=0.1 2024-09-19 11:50:55,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=659160.0, ans=0.0 2024-09-19 11:51:00,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.03 vs. limit=6.0 2024-09-19 11:51:05,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.24 vs. limit=15.0 2024-09-19 11:51:05,857 INFO [train.py:1198] (0/2) Epoch 37, batch 1900, loss[loss=0.243, ctc_loss=0.1177, cr_loss=0.3596, attn_decoder_loss=0.2489, over 29689.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1147, cr_loss=0.3554, attn_decoder_loss=0.2402, over 5802425.78 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 11:51:16,272 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.529e+01 8.942e+01 9.570e+01 1.575e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-19 11:51:51,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=659320.0, ans=0.2 2024-09-19 11:51:54,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=659320.0, ans=0.0 2024-09-19 11:51:57,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2024-09-19 11:52:12,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=659360.0, ans=0.2 2024-09-19 11:52:15,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=659360.0, ans=0.0 2024-09-19 11:52:17,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=659360.0, ans=0.125 2024-09-19 11:52:21,363 INFO [train.py:1198] (0/2) Epoch 37, batch 1950, loss[loss=0.2233, ctc_loss=0.1009, cr_loss=0.3347, attn_decoder_loss=0.2295, over 29453.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1151, cr_loss=0.3566, attn_decoder_loss=0.241, over 5817843.46 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 11:52:30,930 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:52:35,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=659440.0, ans=0.125 2024-09-19 11:52:37,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=659440.0, ans=0.0 2024-09-19 11:52:50,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten.whitening_limit, batch_count=659440.0, ans=22.5 2024-09-19 11:52:55,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=659480.0, ans=0.125 2024-09-19 11:53:01,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659480.0, ans=0.1 2024-09-19 11:53:06,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659480.0, ans=0.1 2024-09-19 11:53:33,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-09-19 11:53:41,381 INFO [train.py:1198] (0/2) Epoch 37, batch 2000, loss[loss=0.2075, ctc_loss=0.1029, cr_loss=0.3259, attn_decoder_loss=0.2119, over 29356.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1158, cr_loss=0.3577, attn_decoder_loss=0.2418, over 5794826.29 frames. ], batch size: 67, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:53:51,962 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.756e+01 9.402e+01 9.802e+01 1.853e+02, threshold=1.880e+02, percent-clipped=1.0 2024-09-19 11:53:52,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=659600.0, ans=0.125 2024-09-19 11:54:18,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.37 vs. limit=15.0 2024-09-19 11:54:19,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659680.0, ans=0.1 2024-09-19 11:54:24,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2024-09-19 11:54:33,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659720.0, ans=0.1 2024-09-19 11:54:34,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=659720.0, ans=0.125 2024-09-19 11:54:42,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=659760.0, ans=0.125 2024-09-19 11:54:57,546 INFO [train.py:1198] (0/2) Epoch 37, batch 2050, loss[loss=0.2168, ctc_loss=0.1047, cr_loss=0.3497, attn_decoder_loss=0.2215, over 29460.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1156, cr_loss=0.3575, attn_decoder_loss=0.241, over 5787906.48 frames. ], batch size: 70, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:55:16,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=659840.0, ans=0.0 2024-09-19 11:55:20,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=659840.0, ans=0.0 2024-09-19 11:55:34,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659880.0, ans=0.1 2024-09-19 11:55:57,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2024-09-19 11:56:04,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=659960.0, ans=0.5 2024-09-19 11:56:07,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=659960.0, ans=0.0 2024-09-19 11:56:13,773 INFO [train.py:1198] (0/2) Epoch 37, batch 2100, loss[loss=0.2296, ctc_loss=0.1076, cr_loss=0.3531, attn_decoder_loss=0.2353, over 29754.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1146, cr_loss=0.356, attn_decoder_loss=0.24, over 5799731.84 frames. ], batch size: 81, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:56:14,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=660000.0, ans=0.025 2024-09-19 11:56:24,219 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.399e+01 8.911e+01 9.542e+01 1.204e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 11:56:26,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=660000.0, ans=0.1 2024-09-19 11:56:52,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=660080.0, ans=0.125 2024-09-19 11:57:08,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=660120.0, ans=0.125 2024-09-19 11:57:18,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=660160.0, ans=0.125 2024-09-19 11:57:33,569 INFO [train.py:1198] (0/2) Epoch 37, batch 2150, loss[loss=0.2356, ctc_loss=0.1201, cr_loss=0.3653, attn_decoder_loss=0.2403, over 29441.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.114, cr_loss=0.3551, attn_decoder_loss=0.2395, over 5815526.68 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 11:57:46,142 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:57:50,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=660240.0, ans=0.125 2024-09-19 11:58:29,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=660320.0, ans=0.1 2024-09-19 11:58:49,159 INFO [train.py:1198] (0/2) Epoch 37, batch 2200, loss[loss=0.2495, ctc_loss=0.1227, cr_loss=0.3726, attn_decoder_loss=0.2554, over 29648.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1145, cr_loss=0.3557, attn_decoder_loss=0.2395, over 5811548.42 frames. ], batch size: 86, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 11:58:52,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=660400.0, ans=0.0 2024-09-19 11:59:01,227 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.378e+01 8.935e+01 9.603e+01 1.294e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-19 11:59:03,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=660440.0, ans=0.1 2024-09-19 11:59:42,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-19 11:59:51,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=660560.0, ans=0.125 2024-09-19 12:00:01,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=660560.0, ans=0.2 2024-09-19 12:00:04,895 INFO [train.py:1198] (0/2) Epoch 37, batch 2250, loss[loss=0.232, ctc_loss=0.1051, cr_loss=0.3254, attn_decoder_loss=0.2389, over 29695.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1143, cr_loss=0.3551, attn_decoder_loss=0.2395, over 5811640.08 frames. ], batch size: 82, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:00:31,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=660640.0, ans=0.0 2024-09-19 12:01:13,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=660760.0, ans=0.025 2024-09-19 12:01:22,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.25 vs. limit=15.0 2024-09-19 12:01:23,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=660800.0, ans=0.05 2024-09-19 12:01:25,019 INFO [train.py:1198] (0/2) Epoch 37, batch 2300, loss[loss=0.2111, ctc_loss=0.09468, cr_loss=0.2949, attn_decoder_loss=0.2175, over 29708.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1139, cr_loss=0.3534, attn_decoder_loss=0.2386, over 5798383.24 frames. ], batch size: 72, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:01:36,927 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.548e+01 9.077e+01 9.950e+01 1.821e+02, threshold=1.815e+02, percent-clipped=1.0 2024-09-19 12:01:49,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=660840.0, ans=0.0 2024-09-19 12:01:54,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660880.0, ans=0.1 2024-09-19 12:01:58,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=660880.0, ans=0.0 2024-09-19 12:02:03,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=660880.0, ans=0.125 2024-09-19 12:02:07,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=660880.0, ans=0.125 2024-09-19 12:02:41,021 INFO [train.py:1198] (0/2) Epoch 37, batch 2350, loss[loss=0.2432, ctc_loss=0.1243, cr_loss=0.3768, attn_decoder_loss=0.2481, over 29676.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1142, cr_loss=0.3544, attn_decoder_loss=0.239, over 5803968.01 frames. ], batch size: 83, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:02:59,449 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:03:28,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=661120.0, ans=0.0 2024-09-19 12:03:56,988 INFO [train.py:1198] (0/2) Epoch 37, batch 2400, loss[loss=0.232, ctc_loss=0.1173, cr_loss=0.364, attn_decoder_loss=0.2367, over 29517.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1149, cr_loss=0.3556, attn_decoder_loss=0.2397, over 5808123.17 frames. ], batch size: 76, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:04:08,985 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.594e+01 9.080e+01 9.693e+01 1.252e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 12:04:35,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=12.0 2024-09-19 12:04:38,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.14 vs. limit=10.0 2024-09-19 12:04:47,084 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-19 12:04:59,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=661360.0, ans=0.125 2024-09-19 12:05:06,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=661360.0, ans=0.025 2024-09-19 12:05:07,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=661360.0, ans=0.125 2024-09-19 12:05:17,007 INFO [train.py:1198] (0/2) Epoch 37, batch 2450, loss[loss=0.2411, ctc_loss=0.1214, cr_loss=0.3636, attn_decoder_loss=0.2463, over 29689.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1156, cr_loss=0.357, attn_decoder_loss=0.2406, over 5785080.64 frames. ], batch size: 82, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:05:17,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=661400.0, ans=0.0 2024-09-19 12:05:26,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=661400.0, ans=0.0 2024-09-19 12:05:36,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661440.0, ans=0.125 2024-09-19 12:05:43,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=661440.0, ans=0.125 2024-09-19 12:05:47,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=661480.0, ans=0.1 2024-09-19 12:05:47,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661480.0, ans=0.1 2024-09-19 12:06:01,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-19 12:06:02,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=661520.0, ans=0.125 2024-09-19 12:06:02,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661520.0, ans=0.0 2024-09-19 12:06:16,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=661560.0, ans=0.07 2024-09-19 12:06:22,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=661560.0, ans=0.125 2024-09-19 12:06:30,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=661560.0, ans=0.125 2024-09-19 12:06:33,575 INFO [train.py:1198] (0/2) Epoch 37, batch 2500, loss[loss=0.2343, ctc_loss=0.1127, cr_loss=0.3455, attn_decoder_loss=0.2402, over 29632.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1155, cr_loss=0.3568, attn_decoder_loss=0.2406, over 5795051.87 frames. ], batch size: 86, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:06:45,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.640e+01 9.238e+01 1.003e+02 4.668e+02, threshold=1.848e+02, percent-clipped=4.0 2024-09-19 12:07:01,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-09-19 12:07:05,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661680.0, ans=0.1 2024-09-19 12:07:11,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=661680.0, ans=0.0 2024-09-19 12:07:37,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=661760.0, ans=0.0 2024-09-19 12:07:49,437 INFO [train.py:1198] (0/2) Epoch 37, batch 2550, loss[loss=0.2086, ctc_loss=0.1012, cr_loss=0.309, attn_decoder_loss=0.2136, over 29350.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1152, cr_loss=0.3564, attn_decoder_loss=0.2403, over 5797218.58 frames. ], batch size: 67, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:07:49,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=661800.0, ans=0.0 2024-09-19 12:07:54,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=661800.0, ans=0.125 2024-09-19 12:08:37,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-19 12:08:41,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=661920.0, ans=0.2 2024-09-19 12:08:44,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-09-19 12:09:01,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=661960.0, ans=0.0 2024-09-19 12:09:07,399 INFO [train.py:1198] (0/2) Epoch 37, batch 2600, loss[loss=0.2268, ctc_loss=0.1039, cr_loss=0.3379, attn_decoder_loss=0.2329, over 29450.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1154, cr_loss=0.357, attn_decoder_loss=0.2407, over 5793506.74 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:09:18,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=662000.0, ans=0.125 2024-09-19 12:09:21,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 8.481e+01 8.933e+01 9.512e+01 2.457e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-19 12:09:38,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=662080.0, ans=0.125 2024-09-19 12:10:03,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=22.5 2024-09-19 12:10:12,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=662160.0, ans=0.0 2024-09-19 12:10:20,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=662160.0, ans=0.2 2024-09-19 12:10:24,381 INFO [train.py:1198] (0/2) Epoch 37, batch 2650, loss[loss=0.2468, ctc_loss=0.1184, cr_loss=0.3645, attn_decoder_loss=0.2529, over 29314.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1156, cr_loss=0.3575, attn_decoder_loss=0.241, over 5799776.84 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:10:35,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=662200.0, ans=0.2 2024-09-19 12:10:55,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=662280.0, ans=0.025 2024-09-19 12:11:03,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=662280.0, ans=0.125 2024-09-19 12:11:14,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=662320.0, ans=0.125 2024-09-19 12:11:25,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662360.0, ans=0.125 2024-09-19 12:11:29,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=662360.0, ans=0.125 2024-09-19 12:11:39,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=662400.0, ans=0.0 2024-09-19 12:11:40,389 INFO [train.py:1198] (0/2) Epoch 37, batch 2700, loss[loss=0.2459, ctc_loss=0.1157, cr_loss=0.3629, attn_decoder_loss=0.2523, over 29538.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1154, cr_loss=0.3569, attn_decoder_loss=0.2411, over 5795205.99 frames. ], batch size: 87, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:11:52,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.575e+01 9.095e+01 9.529e+01 6.705e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 12:12:30,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=662520.0, ans=0.1 2024-09-19 12:12:45,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=662560.0, ans=0.0 2024-09-19 12:12:51,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=662560.0, ans=0.125 2024-09-19 12:12:58,482 INFO [train.py:1198] (0/2) Epoch 37, batch 2750, loss[loss=0.229, ctc_loss=0.1192, cr_loss=0.3685, attn_decoder_loss=0.233, over 29529.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1145, cr_loss=0.3547, attn_decoder_loss=0.2397, over 5794998.78 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 12:13:00,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=662600.0, ans=0.125 2024-09-19 12:13:15,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.67 vs. limit=15.0 2024-09-19 12:13:16,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=662640.0, ans=0.125 2024-09-19 12:13:21,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-09-19 12:13:23,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=662640.0, ans=0.5 2024-09-19 12:13:24,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-19 12:13:26,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=662640.0, ans=0.0 2024-09-19 12:13:34,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=662680.0, ans=0.125 2024-09-19 12:13:40,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=662680.0, ans=0.125 2024-09-19 12:13:43,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=662680.0, ans=0.125 2024-09-19 12:14:00,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=662760.0, ans=0.125 2024-09-19 12:14:03,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=662760.0, ans=0.5 2024-09-19 12:14:16,761 INFO [train.py:1198] (0/2) Epoch 37, batch 2800, loss[loss=0.2576, ctc_loss=0.1425, cr_loss=0.3665, attn_decoder_loss=0.2622, over 20107.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1152, cr_loss=0.3562, attn_decoder_loss=0.2402, over 5775099.05 frames. ], batch size: 210, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 12:14:19,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.58 vs. limit=15.0 2024-09-19 12:14:26,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=662800.0, ans=0.125 2024-09-19 12:14:30,281 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.502e+01 8.910e+01 9.403e+01 2.471e+02, threshold=1.782e+02, percent-clipped=1.0 2024-09-19 12:14:42,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=662840.0, ans=0.125 2024-09-19 12:15:12,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2024-09-19 12:15:22,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-19 12:15:32,028 INFO [train.py:1198] (0/2) Epoch 37, batch 2850, loss[loss=0.2279, ctc_loss=0.1131, cr_loss=0.344, attn_decoder_loss=0.233, over 29506.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1156, cr_loss=0.3572, attn_decoder_loss=0.2405, over 5760102.76 frames. ], batch size: 77, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:15:41,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=663000.0, ans=0.0 2024-09-19 12:15:49,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663040.0, ans=0.1 2024-09-19 12:16:13,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2024-09-19 12:16:20,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=663120.0, ans=0.04949747468305833 2024-09-19 12:16:39,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=663160.0, ans=0.125 2024-09-19 12:16:49,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=663200.0, ans=0.05 2024-09-19 12:16:50,708 INFO [train.py:1198] (0/2) Epoch 37, batch 2900, loss[loss=0.2337, ctc_loss=0.1056, cr_loss=0.3327, attn_decoder_loss=0.2405, over 29427.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.116, cr_loss=0.3583, attn_decoder_loss=0.2415, over 5785966.38 frames. ], batch size: 79, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:17:07,907 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.541e+01 8.975e+01 9.658e+01 1.927e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-19 12:17:08,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=663240.0, ans=0.125 2024-09-19 12:17:26,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2024-09-19 12:17:36,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=663320.0, ans=0.0 2024-09-19 12:18:07,938 INFO [train.py:1198] (0/2) Epoch 37, batch 2950, loss[loss=0.2363, ctc_loss=0.1232, cr_loss=0.368, attn_decoder_loss=0.2407, over 29513.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.115, cr_loss=0.3557, attn_decoder_loss=0.2401, over 5780757.15 frames. ], batch size: 75, lr: 2.98e-03, grad_scale: 8.0 2024-09-19 12:18:08,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=12.0 2024-09-19 12:18:23,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=663440.0, ans=0.0 2024-09-19 12:18:26,426 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:18:32,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663440.0, ans=0.125 2024-09-19 12:18:34,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=663440.0, ans=0.0 2024-09-19 12:18:56,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=663520.0, ans=0.05 2024-09-19 12:18:58,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=663520.0, ans=0.125 2024-09-19 12:19:02,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=663520.0, ans=0.125 2024-09-19 12:19:02,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=663520.0, ans=0.125 2024-09-19 12:19:07,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663560.0, ans=0.1 2024-09-19 12:19:21,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=663560.0, ans=0.125 2024-09-19 12:19:23,863 INFO [train.py:1198] (0/2) Epoch 37, batch 3000, loss[loss=0.231, ctc_loss=0.1149, cr_loss=0.3542, attn_decoder_loss=0.236, over 29772.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1147, cr_loss=0.3553, attn_decoder_loss=0.2397, over 5780628.70 frames. ], batch size: 81, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:19:23,864 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 12:19:35,528 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5817, 3.7650, 3.9748, 4.1670], device='cuda:0') 2024-09-19 12:19:43,120 INFO [train.py:1230] (0/2) Epoch 37, validation: loss=0.212, ctc_loss=0.03675, cr_loss=6.305e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-19 12:19:43,120 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 12:19:58,495 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.507e+01 8.935e+01 9.407e+01 3.949e+02, threshold=1.787e+02, percent-clipped=1.0 2024-09-19 12:20:07,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663640.0, ans=0.1 2024-09-19 12:20:38,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-09-19 12:20:39,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=663720.0, ans=0.125 2024-09-19 12:20:55,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=663760.0, ans=0.125 2024-09-19 12:21:01,308 INFO [train.py:1198] (0/2) Epoch 37, batch 3050, loss[loss=0.2261, ctc_loss=0.1067, cr_loss=0.3401, attn_decoder_loss=0.2318, over 29531.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1151, cr_loss=0.3563, attn_decoder_loss=0.2404, over 5775855.39 frames. ], batch size: 76, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:21:03,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=663800.0, ans=0.0 2024-09-19 12:21:06,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=663800.0, ans=0.125 2024-09-19 12:21:14,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=663800.0, ans=0.025 2024-09-19 12:21:35,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-19 12:21:56,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663920.0, ans=0.1 2024-09-19 12:22:06,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=663960.0, ans=0.1 2024-09-19 12:22:17,509 INFO [train.py:1198] (0/2) Epoch 37, batch 3100, loss[loss=0.25, ctc_loss=0.1237, cr_loss=0.3753, attn_decoder_loss=0.2557, over 29294.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1154, cr_loss=0.3566, attn_decoder_loss=0.2405, over 5774541.65 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:22:32,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.715e+01 9.369e+01 9.767e+01 1.782e+02, threshold=1.874e+02, percent-clipped=0.0 2024-09-19 12:23:05,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-19 12:23:15,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664120.0, ans=0.1 2024-09-19 12:23:35,721 INFO [train.py:1198] (0/2) Epoch 37, batch 3150, loss[loss=0.2474, ctc_loss=0.1208, cr_loss=0.3641, attn_decoder_loss=0.2534, over 28855.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1154, cr_loss=0.3567, attn_decoder_loss=0.2405, over 5781744.05 frames. ], batch size: 104, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:23:42,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.36 vs. limit=15.0 2024-09-19 12:23:50,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-19 12:24:01,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=664240.0, ans=0.125 2024-09-19 12:24:04,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=664280.0, ans=0.125 2024-09-19 12:24:27,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-19 12:24:33,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2024-09-19 12:24:53,367 INFO [train.py:1198] (0/2) Epoch 37, batch 3200, loss[loss=0.2329, ctc_loss=0.1129, cr_loss=0.3655, attn_decoder_loss=0.2381, over 29411.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1149, cr_loss=0.3559, attn_decoder_loss=0.2398, over 5792885.22 frames. ], batch size: 79, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:25:03,463 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-19 12:25:07,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-19 12:25:08,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.621e+01 9.120e+01 9.766e+01 2.704e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 12:25:21,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=664440.0, ans=0.2 2024-09-19 12:25:30,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=664480.0, ans=0.2 2024-09-19 12:25:33,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=664480.0, ans=0.2 2024-09-19 12:25:33,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=664480.0, ans=0.2 2024-09-19 12:25:45,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=664520.0, ans=0.2 2024-09-19 12:25:57,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=664560.0, ans=0.125 2024-09-19 12:25:57,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=664560.0, ans=0.0 2024-09-19 12:26:02,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=664560.0, ans=0.125 2024-09-19 12:26:09,291 INFO [train.py:1198] (0/2) Epoch 37, batch 3250, loss[loss=0.2396, ctc_loss=0.1106, cr_loss=0.3521, attn_decoder_loss=0.2461, over 29717.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1153, cr_loss=0.3571, attn_decoder_loss=0.2404, over 5799718.85 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:26:12,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=664600.0, ans=0.2 2024-09-19 12:26:42,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-09-19 12:26:49,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.63 vs. limit=15.0 2024-09-19 12:27:00,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=664720.0, ans=0.125 2024-09-19 12:27:11,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=664760.0, ans=0.125 2024-09-19 12:27:27,409 INFO [train.py:1198] (0/2) Epoch 37, batch 3300, loss[loss=0.2369, ctc_loss=0.1114, cr_loss=0.3474, attn_decoder_loss=0.2432, over 28297.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1143, cr_loss=0.3548, attn_decoder_loss=0.2393, over 5798526.83 frames. ], batch size: 111, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:27:27,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=664800.0, ans=0.1 2024-09-19 12:27:30,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=22.5 2024-09-19 12:27:39,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664800.0, ans=0.125 2024-09-19 12:27:42,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.526e+01 9.078e+01 9.888e+01 1.961e+02, threshold=1.816e+02, percent-clipped=2.0 2024-09-19 12:27:46,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=664840.0, ans=0.125 2024-09-19 12:27:46,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-19 12:28:11,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=664920.0, ans=0.025 2024-09-19 12:28:21,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=664920.0, ans=0.125 2024-09-19 12:28:30,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664960.0, ans=0.0 2024-09-19 12:28:45,005 INFO [train.py:1198] (0/2) Epoch 37, batch 3350, loss[loss=0.245, ctc_loss=0.122, cr_loss=0.3654, attn_decoder_loss=0.2505, over 28885.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1148, cr_loss=0.3558, attn_decoder_loss=0.2399, over 5774740.32 frames. ], batch size: 104, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:28:46,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=665000.0, ans=0.0 2024-09-19 12:29:14,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-09-19 12:29:40,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.73 vs. limit=15.0 2024-09-19 12:29:43,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-09-19 12:29:43,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=12.0 2024-09-19 12:29:47,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=665160.0, ans=0.125 2024-09-19 12:30:00,464 INFO [train.py:1198] (0/2) Epoch 37, batch 3400, loss[loss=0.2097, ctc_loss=0.1067, cr_loss=0.3402, attn_decoder_loss=0.2136, over 29332.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1152, cr_loss=0.3568, attn_decoder_loss=0.24, over 5765811.37 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:30:16,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=665240.0, ans=0.125 2024-09-19 12:30:17,145 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.762e+01 9.202e+01 9.777e+01 2.648e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-19 12:30:19,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=665240.0, ans=0.025 2024-09-19 12:30:23,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=665240.0, ans=0.0 2024-09-19 12:30:24,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-09-19 12:30:32,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=665280.0, ans=0.125 2024-09-19 12:30:35,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=665280.0, ans=0.0 2024-09-19 12:31:11,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665360.0, ans=0.1 2024-09-19 12:31:18,360 INFO [train.py:1198] (0/2) Epoch 37, batch 3450, loss[loss=0.2541, ctc_loss=0.1249, cr_loss=0.3561, attn_decoder_loss=0.2606, over 28333.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1154, cr_loss=0.3568, attn_decoder_loss=0.2405, over 5772022.56 frames. ], batch size: 111, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:31:24,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=665400.0, ans=0.2 2024-09-19 12:31:35,583 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:31:43,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-19 12:32:19,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665560.0, ans=0.1 2024-09-19 12:32:35,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=665600.0, ans=0.125 2024-09-19 12:32:36,779 INFO [train.py:1198] (0/2) Epoch 37, batch 3500, loss[loss=0.2179, ctc_loss=0.1066, cr_loss=0.3418, attn_decoder_loss=0.2227, over 29329.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.115, cr_loss=0.3558, attn_decoder_loss=0.2398, over 5773803.46 frames. ], batch size: 71, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:32:40,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=665600.0, ans=0.0 2024-09-19 12:32:44,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=665600.0, ans=0.125 2024-09-19 12:32:47,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=665600.0, ans=0.125 2024-09-19 12:32:50,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=665640.0, ans=0.2 2024-09-19 12:32:53,367 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.561e+01 8.978e+01 9.459e+01 2.098e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-19 12:32:55,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=665640.0, ans=0.2 2024-09-19 12:32:59,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=665640.0, ans=0.0 2024-09-19 12:33:32,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=665720.0, ans=0.1 2024-09-19 12:33:33,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=665720.0, ans=0.0 2024-09-19 12:33:51,558 INFO [train.py:1198] (0/2) Epoch 37, batch 3550, loss[loss=0.2383, ctc_loss=0.1061, cr_loss=0.3205, attn_decoder_loss=0.2459, over 29718.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1145, cr_loss=0.3546, attn_decoder_loss=0.2397, over 5780948.14 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:33:54,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665800.0, ans=0.1 2024-09-19 12:33:56,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=665800.0, ans=0.125 2024-09-19 12:34:20,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-19 12:34:40,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=665920.0, ans=10.0 2024-09-19 12:34:43,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=665920.0, ans=0.0 2024-09-19 12:34:52,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=665960.0, ans=0.125 2024-09-19 12:34:54,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=665960.0, ans=0.0 2024-09-19 12:34:55,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2024-09-19 12:35:05,689 INFO [train.py:1198] (0/2) Epoch 37, batch 3600, loss[loss=0.223, ctc_loss=0.1074, cr_loss=0.3334, attn_decoder_loss=0.2284, over 29489.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1146, cr_loss=0.3549, attn_decoder_loss=0.2398, over 5790318.06 frames. ], batch size: 77, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:35:06,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.60 vs. limit=10.0 2024-09-19 12:35:14,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=666000.0, ans=0.0 2024-09-19 12:35:22,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.583e+01 9.106e+01 9.636e+01 2.538e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-19 12:35:31,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-19 12:35:58,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-19 12:35:59,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=666120.0, ans=0.125 2024-09-19 12:36:22,096 INFO [train.py:1198] (0/2) Epoch 37, batch 3650, loss[loss=0.2456, ctc_loss=0.1239, cr_loss=0.3848, attn_decoder_loss=0.2506, over 29508.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1142, cr_loss=0.3546, attn_decoder_loss=0.2392, over 5793238.65 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:36:34,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=666200.0, ans=0.1 2024-09-19 12:36:42,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-19 12:36:57,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=666280.0, ans=0.125 2024-09-19 12:37:15,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=666320.0, ans=0.2 2024-09-19 12:37:36,686 INFO [train.py:1198] (0/2) Epoch 37, batch 3700, loss[loss=0.2435, ctc_loss=0.1165, cr_loss=0.3658, attn_decoder_loss=0.2495, over 29702.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1144, cr_loss=0.3549, attn_decoder_loss=0.2396, over 5803643.71 frames. ], batch size: 84, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:37:41,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=666400.0, ans=0.2 2024-09-19 12:37:48,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.54 vs. limit=10.0 2024-09-19 12:37:48,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=666400.0, ans=0.125 2024-09-19 12:37:56,005 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.469e+01 9.062e+01 9.671e+01 3.468e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 12:38:16,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.05 vs. limit=10.0 2024-09-19 12:38:17,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=666480.0, ans=0.2 2024-09-19 12:38:24,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666520.0, ans=0.0 2024-09-19 12:38:35,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=666560.0, ans=0.125 2024-09-19 12:38:37,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=666560.0, ans=0.125 2024-09-19 12:38:42,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=666560.0, ans=0.07 2024-09-19 12:38:52,752 INFO [train.py:1198] (0/2) Epoch 37, batch 3750, loss[loss=0.2066, ctc_loss=0.09794, cr_loss=0.3292, attn_decoder_loss=0.2114, over 29368.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1147, cr_loss=0.3553, attn_decoder_loss=0.2398, over 5807905.85 frames. ], batch size: 67, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:39:02,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666600.0, ans=0.1 2024-09-19 12:39:06,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=666640.0, ans=0.2 2024-09-19 12:39:24,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=666680.0, ans=0.0 2024-09-19 12:39:44,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=666720.0, ans=0.0 2024-09-19 12:39:59,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=666760.0, ans=0.0 2024-09-19 12:40:07,586 INFO [train.py:1198] (0/2) Epoch 37, batch 3800, loss[loss=0.2373, ctc_loss=0.1116, cr_loss=0.3403, attn_decoder_loss=0.2437, over 29611.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1144, cr_loss=0.3541, attn_decoder_loss=0.2393, over 5798578.41 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:40:19,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666800.0, ans=0.0 2024-09-19 12:40:26,998 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.580e+01 8.987e+01 9.690e+01 1.357e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 12:40:43,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=666880.0, ans=0.125 2024-09-19 12:40:45,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666880.0, ans=0.1 2024-09-19 12:40:58,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=666920.0, ans=0.125 2024-09-19 12:40:59,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=666920.0, ans=0.1 2024-09-19 12:41:01,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=666920.0, ans=0.0 2024-09-19 12:41:01,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=666920.0, ans=0.125 2024-09-19 12:41:08,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=666960.0, ans=0.125 2024-09-19 12:41:19,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=666960.0, ans=0.125 2024-09-19 12:41:21,812 INFO [train.py:1198] (0/2) Epoch 37, batch 3850, loss[loss=0.2493, ctc_loss=0.1265, cr_loss=0.3784, attn_decoder_loss=0.2545, over 29302.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1139, cr_loss=0.3534, attn_decoder_loss=0.239, over 5812137.96 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:41:25,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=667000.0, ans=0.1 2024-09-19 12:41:29,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=667000.0, ans=0.125 2024-09-19 12:42:20,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=667120.0, ans=0.0 2024-09-19 12:42:38,731 INFO [train.py:1198] (0/2) Epoch 37, batch 3900, loss[loss=0.2472, ctc_loss=0.1235, cr_loss=0.3497, attn_decoder_loss=0.2532, over 29645.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1145, cr_loss=0.3545, attn_decoder_loss=0.2397, over 5816723.22 frames. ], batch size: 86, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:42:44,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-19 12:42:57,941 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.596e+01 8.935e+01 9.669e+01 1.380e+02, threshold=1.787e+02, percent-clipped=0.0 2024-09-19 12:42:59,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667240.0, ans=0.1 2024-09-19 12:43:27,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=667320.0, ans=0.125 2024-09-19 12:43:38,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=667360.0, ans=0.2 2024-09-19 12:43:48,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=667360.0, ans=0.125 2024-09-19 12:43:52,903 INFO [train.py:1198] (0/2) Epoch 37, batch 3950, loss[loss=0.2557, ctc_loss=0.135, cr_loss=0.4039, attn_decoder_loss=0.2601, over 29527.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1144, cr_loss=0.3549, attn_decoder_loss=0.2396, over 5836309.81 frames. ], batch size: 97, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 12:44:06,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=667440.0, ans=0.2 2024-09-19 12:44:40,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=667520.0, ans=0.125 2024-09-19 12:44:52,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=667560.0, ans=0.0 2024-09-19 12:44:57,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=22.5 2024-09-19 12:45:01,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=667560.0, ans=0.125 2024-09-19 12:45:08,493 INFO [train.py:1198] (0/2) Epoch 37, batch 4000, loss[loss=0.2229, ctc_loss=0.1086, cr_loss=0.3558, attn_decoder_loss=0.2277, over 29505.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1145, cr_loss=0.355, attn_decoder_loss=0.2397, over 5813282.92 frames. ], batch size: 74, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:45:16,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=667600.0, ans=0.0 2024-09-19 12:45:16,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-09-19 12:45:17,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=667600.0, ans=0.0 2024-09-19 12:45:23,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.30 vs. limit=10.0 2024-09-19 12:45:27,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.490e+01 9.030e+01 9.800e+01 2.988e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-19 12:45:31,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=22.5 2024-09-19 12:45:57,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=667720.0, ans=0.035 2024-09-19 12:46:04,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667720.0, ans=0.1 2024-09-19 12:46:22,187 INFO [train.py:1198] (0/2) Epoch 37, batch 4050, loss[loss=0.2606, ctc_loss=0.1424, cr_loss=0.4059, attn_decoder_loss=0.2647, over 20034.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1142, cr_loss=0.3542, attn_decoder_loss=0.2394, over 5795388.99 frames. ], batch size: 209, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:46:26,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=667800.0, ans=0.07 2024-09-19 12:46:36,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-19 12:46:45,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=667840.0, ans=0.025 2024-09-19 12:46:48,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=667840.0, ans=0.1 2024-09-19 12:46:54,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=667880.0, ans=0.2 2024-09-19 12:47:15,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-19 12:47:37,494 INFO [train.py:1198] (0/2) Epoch 37, batch 4100, loss[loss=0.2421, ctc_loss=0.1195, cr_loss=0.3729, attn_decoder_loss=0.2474, over 29480.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1147, cr_loss=0.3558, attn_decoder_loss=0.24, over 5791423.91 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 12:47:51,463 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-19 12:47:56,159 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.448e+01 9.033e+01 9.875e+01 1.600e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 12:47:59,523 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:48:04,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=668040.0, ans=0.125 2024-09-19 12:48:09,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=668080.0, ans=0.125 2024-09-19 12:48:09,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=668080.0, ans=0.125 2024-09-19 12:48:12,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=668080.0, ans=0.125 2024-09-19 12:48:15,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=668080.0, ans=0.125 2024-09-19 12:48:15,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=668080.0, ans=0.0 2024-09-19 12:48:34,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=668160.0, ans=10.0 2024-09-19 12:48:37,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=668160.0, ans=0.125 2024-09-19 12:48:37,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=668160.0, ans=0.025 2024-09-19 12:48:51,882 INFO [train.py:1198] (0/2) Epoch 37, batch 4150, loss[loss=0.2313, ctc_loss=0.1167, cr_loss=0.3537, attn_decoder_loss=0.2361, over 29494.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1143, cr_loss=0.3549, attn_decoder_loss=0.2396, over 5796940.32 frames. ], batch size: 77, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:49:01,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=668200.0, ans=0.125 2024-09-19 12:49:17,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-19 12:49:31,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=668280.0, ans=0.0 2024-09-19 12:49:37,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=668320.0, ans=0.025 2024-09-19 12:49:49,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=668360.0, ans=0.125 2024-09-19 12:49:55,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=668360.0, ans=0.025 2024-09-19 12:50:05,539 INFO [train.py:1198] (0/2) Epoch 37, batch 4200, loss[loss=0.2523, ctc_loss=0.128, cr_loss=0.3791, attn_decoder_loss=0.2577, over 29503.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1146, cr_loss=0.3558, attn_decoder_loss=0.24, over 5800248.73 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:50:24,821 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 8.584e+01 9.010e+01 9.647e+01 2.583e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 12:50:41,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=668480.0, ans=0.125 2024-09-19 12:50:48,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=668480.0, ans=0.125 2024-09-19 12:50:51,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=668520.0, ans=0.125 2024-09-19 12:50:52,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=668520.0, ans=0.1 2024-09-19 12:51:00,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=668520.0, ans=0.2 2024-09-19 12:51:10,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=668560.0, ans=0.025 2024-09-19 12:51:20,677 INFO [train.py:1198] (0/2) Epoch 37, batch 4250, loss[loss=0.2183, ctc_loss=0.1058, cr_loss=0.346, attn_decoder_loss=0.2231, over 29509.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1144, cr_loss=0.3551, attn_decoder_loss=0.2401, over 5805604.64 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:51:41,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=668640.0, ans=0.125 2024-09-19 12:52:03,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=668720.0, ans=0.125 2024-09-19 12:52:07,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=668720.0, ans=0.2 2024-09-19 12:52:16,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.74 vs. limit=15.0 2024-09-19 12:52:25,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=668760.0, ans=0.0 2024-09-19 12:52:31,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=668760.0, ans=0.05 2024-09-19 12:52:35,078 INFO [train.py:1198] (0/2) Epoch 37, batch 4300, loss[loss=0.2352, ctc_loss=0.1086, cr_loss=0.3283, attn_decoder_loss=0.242, over 29525.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1142, cr_loss=0.3547, attn_decoder_loss=0.2401, over 5794431.21 frames. ], batch size: 87, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:52:39,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=668800.0, ans=0.125 2024-09-19 12:52:50,949 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=22.5 2024-09-19 12:52:54,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.796e+01 9.094e+01 9.550e+01 2.475e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 12:53:05,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.75 vs. limit=15.0 2024-09-19 12:53:12,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=668880.0, ans=0.0 2024-09-19 12:53:16,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=668880.0, ans=0.125 2024-09-19 12:53:25,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=668920.0, ans=10.0 2024-09-19 12:53:30,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=668920.0, ans=0.0 2024-09-19 12:53:48,878 INFO [train.py:1198] (0/2) Epoch 37, batch 4350, loss[loss=0.2572, ctc_loss=0.133, cr_loss=0.4137, attn_decoder_loss=0.2618, over 29423.00 frames. ], tot_loss[loss=0.2378, ctc_loss=0.1166, cr_loss=0.3597, attn_decoder_loss=0.2433, over 5796711.97 frames. ], batch size: 97, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:54:11,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=669040.0, ans=0.035 2024-09-19 12:54:16,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669040.0, ans=0.1 2024-09-19 12:54:19,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=669080.0, ans=0.2 2024-09-19 12:54:19,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=669080.0, ans=0.0 2024-09-19 12:54:37,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=22.5 2024-09-19 12:54:49,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=669160.0, ans=0.09899494936611666 2024-09-19 12:54:54,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=669160.0, ans=0.025 2024-09-19 12:55:02,637 INFO [train.py:1198] (0/2) Epoch 37, batch 4400, loss[loss=0.2504, ctc_loss=0.1345, cr_loss=0.4052, attn_decoder_loss=0.2543, over 27505.00 frames. ], tot_loss[loss=0.2398, ctc_loss=0.1179, cr_loss=0.3621, attn_decoder_loss=0.2453, over 5767648.43 frames. ], batch size: 125, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 12:55:11,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=669200.0, ans=0.0 2024-09-19 12:55:23,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.307e+01 8.852e+01 9.362e+01 9.812e+01 1.394e+02, threshold=1.872e+02, percent-clipped=0.0 2024-09-19 12:55:34,600 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:56:07,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669360.0, ans=0.1 2024-09-19 12:56:17,482 INFO [train.py:1198] (0/2) Epoch 37, batch 4450, loss[loss=0.2586, ctc_loss=0.1485, cr_loss=0.3802, attn_decoder_loss=0.2624, over 19822.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1217, cr_loss=0.3673, attn_decoder_loss=0.2473, over 5582007.31 frames. ], batch size: 209, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:56:34,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.72 vs. limit=22.5 2024-09-19 12:57:19,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669560.0, ans=0.125 2024-09-19 12:57:32,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669600.0, ans=0.125 2024-09-19 12:57:33,295 INFO [train.py:1198] (0/2) Epoch 37, batch 4500, loss[loss=0.2548, ctc_loss=0.1356, cr_loss=0.3863, attn_decoder_loss=0.2595, over 20195.00 frames. ], tot_loss[loss=0.2441, ctc_loss=0.125, cr_loss=0.3703, attn_decoder_loss=0.2491, over 5241687.17 frames. ], batch size: 209, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 12:57:54,010 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.699e+01 1.043e+02 1.161e+02 1.270e+02 1.246e+03, threshold=2.323e+02, percent-clipped=2.0 2024-09-19 12:57:55,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=669640.0, ans=0.0 2024-09-19 12:57:55,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=669640.0, ans=0.025 2024-09-19 12:58:01,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=669680.0, ans=0.125 2024-09-19 12:58:10,467 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-37.pt 2024-09-19 12:58:56,697 INFO [train.py:1198] (0/2) Epoch 38, batch 0, loss[loss=0.2182, ctc_loss=0.1033, cr_loss=0.3426, attn_decoder_loss=0.2234, over 29603.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1033, cr_loss=0.3426, attn_decoder_loss=0.2234, over 29603.00 frames. ], batch size: 73, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 12:58:56,698 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 12:59:15,172 INFO [train.py:1230] (0/2) Epoch 38, validation: loss=0.2124, ctc_loss=0.03582, cr_loss=6.776e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-19 12:59:15,172 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 12:59:18,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=669700.0, ans=0.2 2024-09-19 12:59:21,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=669700.0, ans=0.1 2024-09-19 12:59:22,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=669700.0, ans=0.125 2024-09-19 12:59:23,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-09-19 12:59:28,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=669740.0, ans=0.0 2024-09-19 13:00:09,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=669820.0, ans=0.125 2024-09-19 13:00:13,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=669820.0, ans=15.0 2024-09-19 13:00:14,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=669860.0, ans=0.2 2024-09-19 13:00:20,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=669860.0, ans=0.0 2024-09-19 13:00:32,670 INFO [train.py:1198] (0/2) Epoch 38, batch 50, loss[loss=0.2173, ctc_loss=0.1081, cr_loss=0.3455, attn_decoder_loss=0.2218, over 29438.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.117, cr_loss=0.3605, attn_decoder_loss=0.2416, over 1267202.93 frames. ], batch size: 70, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:00:48,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=669940.0, ans=0.0 2024-09-19 13:01:02,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=669980.0, ans=0.125 2024-09-19 13:01:27,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=670020.0, ans=0.0 2024-09-19 13:01:35,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.673e+01 9.380e+01 1.040e+02 1.745e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-19 13:01:50,652 INFO [train.py:1198] (0/2) Epoch 38, batch 100, loss[loss=0.2302, ctc_loss=0.1169, cr_loss=0.3722, attn_decoder_loss=0.2345, over 29536.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1174, cr_loss=0.3617, attn_decoder_loss=0.2432, over 2253653.07 frames. ], batch size: 76, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:02:33,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-19 13:03:00,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2024-09-19 13:03:05,242 INFO [train.py:1198] (0/2) Epoch 38, batch 150, loss[loss=0.2129, ctc_loss=0.1027, cr_loss=0.3194, attn_decoder_loss=0.218, over 29427.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1152, cr_loss=0.3567, attn_decoder_loss=0.2407, over 3047891.81 frames. ], batch size: 70, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:03:47,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-09-19 13:03:47,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=670380.0, ans=22.5 2024-09-19 13:03:51,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=670420.0, ans=0.125 2024-09-19 13:03:58,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670420.0, ans=0.1 2024-09-19 13:04:05,658 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.326e+01 8.770e+01 9.236e+01 1.783e+02, threshold=1.754e+02, percent-clipped=0.0 2024-09-19 13:04:08,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-19 13:04:12,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=670460.0, ans=0.125 2024-09-19 13:04:20,689 INFO [train.py:1198] (0/2) Epoch 38, batch 200, loss[loss=0.2499, ctc_loss=0.1273, cr_loss=0.3915, attn_decoder_loss=0.2548, over 27306.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1143, cr_loss=0.3552, attn_decoder_loss=0.2399, over 3659314.12 frames. ], batch size: 124, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:04:29,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670500.0, ans=0.1 2024-09-19 13:04:32,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=670500.0, ans=0.0 2024-09-19 13:04:39,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-19 13:04:49,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-09-19 13:04:58,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=670580.0, ans=0.125 2024-09-19 13:05:10,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=670620.0, ans=0.0 2024-09-19 13:05:12,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-09-19 13:05:32,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=670660.0, ans=0.09899494936611666 2024-09-19 13:05:38,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=670660.0, ans=0.04949747468305833 2024-09-19 13:05:41,274 INFO [train.py:1198] (0/2) Epoch 38, batch 250, loss[loss=0.2465, ctc_loss=0.124, cr_loss=0.3575, attn_decoder_loss=0.2522, over 29246.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1138, cr_loss=0.3537, attn_decoder_loss=0.2393, over 4142040.01 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:05:46,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-19 13:06:01,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=670740.0, ans=0.125 2024-09-19 13:06:07,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=670740.0, ans=0.125 2024-09-19 13:06:19,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=670780.0, ans=0.125 2024-09-19 13:06:28,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=670820.0, ans=0.0 2024-09-19 13:06:33,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=670820.0, ans=0.0 2024-09-19 13:06:41,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.456e+01 8.891e+01 9.506e+01 1.343e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 13:06:56,768 INFO [train.py:1198] (0/2) Epoch 38, batch 300, loss[loss=0.253, ctc_loss=0.1284, cr_loss=0.4034, attn_decoder_loss=0.2579, over 29510.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1132, cr_loss=0.353, attn_decoder_loss=0.239, over 4510490.16 frames. ], batch size: 92, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:06:57,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=670900.0, ans=0.125 2024-09-19 13:07:12,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=670940.0, ans=0.125 2024-09-19 13:07:19,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670940.0, ans=0.1 2024-09-19 13:07:21,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=670940.0, ans=0.125 2024-09-19 13:07:24,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.75 vs. limit=22.5 2024-09-19 13:07:25,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670980.0, ans=0.0 2024-09-19 13:08:02,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671060.0, ans=0.1 2024-09-19 13:08:10,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=671100.0, ans=0.0 2024-09-19 13:08:12,145 INFO [train.py:1198] (0/2) Epoch 38, batch 350, loss[loss=0.2111, ctc_loss=0.09593, cr_loss=0.317, attn_decoder_loss=0.2169, over 29296.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1134, cr_loss=0.3534, attn_decoder_loss=0.2394, over 4795300.60 frames. ], batch size: 71, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:08:15,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=671100.0, ans=0.125 2024-09-19 13:08:25,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-09-19 13:08:38,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=671140.0, ans=0.025 2024-09-19 13:08:44,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=671180.0, ans=0.0 2024-09-19 13:08:55,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=671180.0, ans=0.125 2024-09-19 13:08:58,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=671220.0, ans=0.125 2024-09-19 13:09:16,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.520e+01 8.939e+01 9.511e+01 1.277e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-19 13:09:32,533 INFO [train.py:1198] (0/2) Epoch 38, batch 400, loss[loss=0.2347, ctc_loss=0.111, cr_loss=0.3474, attn_decoder_loss=0.2407, over 29713.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1135, cr_loss=0.3534, attn_decoder_loss=0.2394, over 5025414.26 frames. ], batch size: 82, lr: 2.92e-03, grad_scale: 16.0 2024-09-19 13:09:32,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=671300.0, ans=0.0 2024-09-19 13:09:38,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=671300.0, ans=0.125 2024-09-19 13:09:46,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=671340.0, ans=0.0 2024-09-19 13:09:58,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=671340.0, ans=0.2 2024-09-19 13:10:24,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=671420.0, ans=0.0 2024-09-19 13:10:24,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.62 vs. limit=10.0 2024-09-19 13:10:35,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2024-09-19 13:10:39,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=671460.0, ans=0.125 2024-09-19 13:10:48,148 INFO [train.py:1198] (0/2) Epoch 38, batch 450, loss[loss=0.2509, ctc_loss=0.1265, cr_loss=0.3981, attn_decoder_loss=0.2559, over 29683.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.114, cr_loss=0.3546, attn_decoder_loss=0.2399, over 5187494.44 frames. ], batch size: 83, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:11:51,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.658e+01 9.040e+01 9.546e+01 1.503e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 13:12:00,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=671660.0, ans=0.1 2024-09-19 13:12:03,561 INFO [train.py:1198] (0/2) Epoch 38, batch 500, loss[loss=0.2475, ctc_loss=0.1177, cr_loss=0.3547, attn_decoder_loss=0.254, over 29454.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1137, cr_loss=0.3534, attn_decoder_loss=0.2393, over 5329875.77 frames. ], batch size: 94, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:12:19,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=671740.0, ans=0.125 2024-09-19 13:12:27,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=671740.0, ans=0.2 2024-09-19 13:12:29,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.17 vs. limit=6.0 2024-09-19 13:12:32,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2024-09-19 13:12:39,386 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:13:00,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2024-09-19 13:13:23,869 INFO [train.py:1198] (0/2) Epoch 38, batch 550, loss[loss=0.2434, ctc_loss=0.117, cr_loss=0.3578, attn_decoder_loss=0.2495, over 28799.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1138, cr_loss=0.3536, attn_decoder_loss=0.2393, over 5422497.46 frames. ], batch size: 104, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:13:25,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=671900.0, ans=0.015 2024-09-19 13:13:31,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=671900.0, ans=0.2 2024-09-19 13:13:36,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=671900.0, ans=0.125 2024-09-19 13:13:42,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=671940.0, ans=0.125 2024-09-19 13:13:44,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=671940.0, ans=0.0 2024-09-19 13:14:00,913 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-168000.pt 2024-09-19 13:14:14,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=671980.0, ans=0.125 2024-09-19 13:14:24,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=672020.0, ans=0.125 2024-09-19 13:14:35,171 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.563e+01 9.079e+01 9.918e+01 4.106e+02, threshold=1.816e+02, percent-clipped=4.0 2024-09-19 13:14:43,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=672060.0, ans=0.025 2024-09-19 13:14:43,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=672060.0, ans=0.2 2024-09-19 13:14:47,317 INFO [train.py:1198] (0/2) Epoch 38, batch 600, loss[loss=0.2453, ctc_loss=0.1228, cr_loss=0.3685, attn_decoder_loss=0.2507, over 29294.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1142, cr_loss=0.3539, attn_decoder_loss=0.2395, over 5509316.80 frames. ], batch size: 100, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:14:49,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=672100.0, ans=0.125 2024-09-19 13:15:03,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2024-09-19 13:15:43,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=672220.0, ans=0.125 2024-09-19 13:16:02,916 INFO [train.py:1198] (0/2) Epoch 38, batch 650, loss[loss=0.2383, ctc_loss=0.1105, cr_loss=0.3516, attn_decoder_loss=0.2447, over 29754.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1132, cr_loss=0.352, attn_decoder_loss=0.2388, over 5586881.89 frames. ], batch size: 81, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:16:16,855 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:16:24,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=672340.0, ans=0.0 2024-09-19 13:16:40,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=672380.0, ans=0.125 2024-09-19 13:16:51,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=672420.0, ans=0.125 2024-09-19 13:16:57,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=672420.0, ans=0.125 2024-09-19 13:17:02,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2024-09-19 13:17:04,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=672460.0, ans=0.0 2024-09-19 13:17:04,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=672460.0, ans=0.0 2024-09-19 13:17:09,155 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.584e+01 9.023e+01 9.741e+01 1.282e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 13:17:12,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=672460.0, ans=0.07 2024-09-19 13:17:23,525 INFO [train.py:1198] (0/2) Epoch 38, batch 700, loss[loss=0.2185, ctc_loss=0.1051, cr_loss=0.3305, attn_decoder_loss=0.2237, over 29521.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1137, cr_loss=0.3533, attn_decoder_loss=0.2395, over 5638645.00 frames. ], batch size: 76, lr: 2.92e-03, grad_scale: 8.0 2024-09-19 13:17:29,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=672500.0, ans=0.125 2024-09-19 13:17:38,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=672540.0, ans=0.0 2024-09-19 13:17:47,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-09-19 13:17:48,457 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-09-19 13:17:52,656 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:18:12,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672620.0, ans=0.125 2024-09-19 13:18:20,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2024-09-19 13:18:23,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=672660.0, ans=0.125 2024-09-19 13:18:29,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=672660.0, ans=0.2 2024-09-19 13:18:33,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=672660.0, ans=0.2 2024-09-19 13:18:39,488 INFO [train.py:1198] (0/2) Epoch 38, batch 750, loss[loss=0.2366, ctc_loss=0.113, cr_loss=0.3525, attn_decoder_loss=0.2425, over 29708.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1135, cr_loss=0.3528, attn_decoder_loss=0.2392, over 5676044.04 frames. ], batch size: 82, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:18:55,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=12.0 2024-09-19 13:19:17,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=672780.0, ans=0.125 2024-09-19 13:19:43,312 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.750e+01 9.083e+01 9.607e+01 5.779e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-19 13:19:45,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672860.0, ans=0.1 2024-09-19 13:19:51,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2024-09-19 13:19:55,421 INFO [train.py:1198] (0/2) Epoch 38, batch 800, loss[loss=0.2207, ctc_loss=0.1039, cr_loss=0.3237, attn_decoder_loss=0.2265, over 29563.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1135, cr_loss=0.3528, attn_decoder_loss=0.239, over 5706142.15 frames. ], batch size: 73, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:20:07,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=672900.0, ans=0.0 2024-09-19 13:20:13,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=672940.0, ans=0.0 2024-09-19 13:20:14,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=672940.0, ans=0.0 2024-09-19 13:20:37,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=672980.0, ans=0.0 2024-09-19 13:20:58,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=673060.0, ans=0.0 2024-09-19 13:21:15,060 INFO [train.py:1198] (0/2) Epoch 38, batch 850, loss[loss=0.2458, ctc_loss=0.1232, cr_loss=0.3668, attn_decoder_loss=0.2513, over 29714.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1131, cr_loss=0.3522, attn_decoder_loss=0.2386, over 5736288.16 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:22:18,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=673260.0, ans=0.0 2024-09-19 13:22:19,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.440e+01 8.974e+01 9.392e+01 3.199e+02, threshold=1.795e+02, percent-clipped=2.0 2024-09-19 13:22:23,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=673260.0, ans=0.0 2024-09-19 13:22:30,668 INFO [train.py:1198] (0/2) Epoch 38, batch 900, loss[loss=0.2121, ctc_loss=0.09601, cr_loss=0.3062, attn_decoder_loss=0.2182, over 29586.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1135, cr_loss=0.3527, attn_decoder_loss=0.2389, over 5739917.94 frames. ], batch size: 73, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:22:42,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=673300.0, ans=0.125 2024-09-19 13:22:47,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=673340.0, ans=0.125 2024-09-19 13:23:04,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2024-09-19 13:23:05,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=673380.0, ans=0.125 2024-09-19 13:23:05,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=673380.0, ans=0.0 2024-09-19 13:23:07,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=673380.0, ans=0.0 2024-09-19 13:23:45,776 INFO [train.py:1198] (0/2) Epoch 38, batch 950, loss[loss=0.2174, ctc_loss=0.09595, cr_loss=0.3119, attn_decoder_loss=0.2239, over 29501.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1137, cr_loss=0.3531, attn_decoder_loss=0.2392, over 5740486.15 frames. ], batch size: 74, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:23:53,471 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:24:15,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=673580.0, ans=0.0 2024-09-19 13:24:53,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.808e+01 9.253e+01 1.008e+02 2.662e+02, threshold=1.851e+02, percent-clipped=5.0 2024-09-19 13:25:04,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.18 vs. limit=15.0 2024-09-19 13:25:06,339 INFO [train.py:1198] (0/2) Epoch 38, batch 1000, loss[loss=0.2256, ctc_loss=0.1083, cr_loss=0.3608, attn_decoder_loss=0.2307, over 29510.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1149, cr_loss=0.3555, attn_decoder_loss=0.24, over 5735873.73 frames. ], batch size: 77, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:25:08,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=673700.0, ans=0.125 2024-09-19 13:25:17,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=673700.0, ans=0.125 2024-09-19 13:25:30,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673740.0, ans=0.1 2024-09-19 13:25:38,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=673780.0, ans=0.2 2024-09-19 13:25:58,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=673820.0, ans=0.2 2024-09-19 13:26:20,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=673900.0, ans=0.0 2024-09-19 13:26:21,866 INFO [train.py:1198] (0/2) Epoch 38, batch 1050, loss[loss=0.2408, ctc_loss=0.1047, cr_loss=0.3265, attn_decoder_loss=0.2487, over 29700.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1145, cr_loss=0.3548, attn_decoder_loss=0.2395, over 5745049.54 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:26:26,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673900.0, ans=0.1 2024-09-19 13:26:58,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=673980.0, ans=0.125 2024-09-19 13:27:10,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=674020.0, ans=0.125 2024-09-19 13:27:27,109 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.465e+01 8.973e+01 9.470e+01 1.777e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-19 13:27:37,782 INFO [train.py:1198] (0/2) Epoch 38, batch 1100, loss[loss=0.2258, ctc_loss=0.1076, cr_loss=0.3217, attn_decoder_loss=0.2318, over 29439.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1144, cr_loss=0.354, attn_decoder_loss=0.2394, over 5757461.78 frames. ], batch size: 78, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:28:08,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=674180.0, ans=0.125 2024-09-19 13:28:11,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=674180.0, ans=10.0 2024-09-19 13:28:24,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=674220.0, ans=12.0 2024-09-19 13:28:28,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=674220.0, ans=0.0 2024-09-19 13:28:28,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=674220.0, ans=0.2 2024-09-19 13:28:42,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=674260.0, ans=0.125 2024-09-19 13:28:58,021 INFO [train.py:1198] (0/2) Epoch 38, batch 1150, loss[loss=0.2145, ctc_loss=0.101, cr_loss=0.3352, attn_decoder_loss=0.2197, over 29454.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1147, cr_loss=0.3546, attn_decoder_loss=0.2394, over 5755046.51 frames. ], batch size: 78, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:29:05,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=674300.0, ans=0.125 2024-09-19 13:29:13,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=674340.0, ans=0.0 2024-09-19 13:29:21,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=15.0 2024-09-19 13:29:27,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674380.0, ans=0.1 2024-09-19 13:29:39,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=674380.0, ans=0.125 2024-09-19 13:30:03,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.613e+01 9.064e+01 9.591e+01 1.895e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-19 13:30:07,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-19 13:30:08,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=674460.0, ans=0.125 2024-09-19 13:30:13,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=674500.0, ans=0.2 2024-09-19 13:30:14,471 INFO [train.py:1198] (0/2) Epoch 38, batch 1200, loss[loss=0.2365, ctc_loss=0.108, cr_loss=0.3484, attn_decoder_loss=0.243, over 29675.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1148, cr_loss=0.3555, attn_decoder_loss=0.2399, over 5747484.42 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:30:16,267 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:30:19,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=674500.0, ans=0.125 2024-09-19 13:30:28,397 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:30:40,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=674540.0, ans=0.1 2024-09-19 13:31:12,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=674620.0, ans=0.025 2024-09-19 13:31:20,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=12.0 2024-09-19 13:31:29,825 INFO [train.py:1198] (0/2) Epoch 38, batch 1250, loss[loss=0.252, ctc_loss=0.1333, cr_loss=0.3997, attn_decoder_loss=0.2564, over 29566.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1153, cr_loss=0.3571, attn_decoder_loss=0.2405, over 5775409.25 frames. ], batch size: 92, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:31:36,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=674700.0, ans=0.125 2024-09-19 13:31:50,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=674740.0, ans=0.0 2024-09-19 13:32:19,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=674820.0, ans=0.025 2024-09-19 13:32:20,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=674820.0, ans=10.0 2024-09-19 13:32:22,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=674820.0, ans=0.2 2024-09-19 13:32:30,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=674820.0, ans=0.2 2024-09-19 13:32:30,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.82 vs. limit=15.0 2024-09-19 13:32:35,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-19 13:32:37,392 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.525e+01 9.083e+01 9.622e+01 1.847e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-19 13:32:46,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=22.5 2024-09-19 13:32:50,183 INFO [train.py:1198] (0/2) Epoch 38, batch 1300, loss[loss=0.2426, ctc_loss=0.1205, cr_loss=0.365, attn_decoder_loss=0.2481, over 28068.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1151, cr_loss=0.3568, attn_decoder_loss=0.24, over 5780000.71 frames. ], batch size: 111, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:32:52,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-09-19 13:32:58,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=674900.0, ans=0.125 2024-09-19 13:33:01,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=674900.0, ans=0.125 2024-09-19 13:33:07,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.70 vs. limit=15.0 2024-09-19 13:33:13,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=674940.0, ans=0.125 2024-09-19 13:33:22,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=674980.0, ans=0.125 2024-09-19 13:33:43,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=675020.0, ans=0.125 2024-09-19 13:33:50,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=675060.0, ans=0.125 2024-09-19 13:33:52,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675060.0, ans=0.1 2024-09-19 13:33:55,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=675060.0, ans=0.2 2024-09-19 13:34:05,850 INFO [train.py:1198] (0/2) Epoch 38, batch 1350, loss[loss=0.2423, ctc_loss=0.1186, cr_loss=0.3637, attn_decoder_loss=0.248, over 29739.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1145, cr_loss=0.3555, attn_decoder_loss=0.2398, over 5797462.69 frames. ], batch size: 81, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:34:22,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675140.0, ans=0.1 2024-09-19 13:34:23,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=675140.0, ans=0.2 2024-09-19 13:34:25,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=675140.0, ans=0.125 2024-09-19 13:34:40,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=675180.0, ans=0.0 2024-09-19 13:35:04,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=675260.0, ans=0.0 2024-09-19 13:35:11,895 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.540e+01 8.958e+01 9.553e+01 1.189e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-19 13:35:20,937 INFO [train.py:1198] (0/2) Epoch 38, batch 1400, loss[loss=0.2073, ctc_loss=0.09708, cr_loss=0.3149, attn_decoder_loss=0.2125, over 29600.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1146, cr_loss=0.3559, attn_decoder_loss=0.2397, over 5808010.84 frames. ], batch size: 69, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:35:24,381 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:35:34,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=675340.0, ans=0.125 2024-09-19 13:36:02,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=675380.0, ans=0.125 2024-09-19 13:36:08,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=675420.0, ans=0.2 2024-09-19 13:36:08,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=675420.0, ans=0.125 2024-09-19 13:36:13,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.93 vs. limit=15.0 2024-09-19 13:36:14,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.31 vs. limit=15.0 2024-09-19 13:36:23,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=675460.0, ans=0.025 2024-09-19 13:36:40,630 INFO [train.py:1198] (0/2) Epoch 38, batch 1450, loss[loss=0.2478, ctc_loss=0.1235, cr_loss=0.3706, attn_decoder_loss=0.2534, over 29449.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1146, cr_loss=0.3557, attn_decoder_loss=0.2403, over 5804702.87 frames. ], batch size: 94, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:36:52,984 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:37:09,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=675580.0, ans=0.0 2024-09-19 13:37:15,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=675580.0, ans=0.125 2024-09-19 13:37:27,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=675620.0, ans=0.125 2024-09-19 13:37:41,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=675660.0, ans=0.125 2024-09-19 13:37:46,830 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.618e+01 9.172e+01 9.928e+01 3.328e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-19 13:37:51,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=675660.0, ans=0.0 2024-09-19 13:37:56,071 INFO [train.py:1198] (0/2) Epoch 38, batch 1500, loss[loss=0.245, ctc_loss=0.1234, cr_loss=0.3724, attn_decoder_loss=0.2502, over 29596.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1149, cr_loss=0.3561, attn_decoder_loss=0.2405, over 5804829.21 frames. ], batch size: 86, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:37:59,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675700.0, ans=0.1 2024-09-19 13:38:25,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=675780.0, ans=0.125 2024-09-19 13:38:43,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=675820.0, ans=0.1 2024-09-19 13:38:49,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=675820.0, ans=0.125 2024-09-19 13:39:07,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675860.0, ans=0.1 2024-09-19 13:39:09,962 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.17 vs. limit=15.0 2024-09-19 13:39:11,913 INFO [train.py:1198] (0/2) Epoch 38, batch 1550, loss[loss=0.2582, ctc_loss=0.1299, cr_loss=0.3778, attn_decoder_loss=0.2641, over 29521.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1148, cr_loss=0.3559, attn_decoder_loss=0.2402, over 5781130.60 frames. ], batch size: 90, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:39:16,237 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-19 13:39:30,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=675940.0, ans=0.125 2024-09-19 13:39:33,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=675940.0, ans=0.125 2024-09-19 13:39:35,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=675940.0, ans=0.2 2024-09-19 13:39:57,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=676020.0, ans=0.2 2024-09-19 13:40:20,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.489e+01 9.048e+01 9.769e+01 3.941e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 13:40:28,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2024-09-19 13:40:32,003 INFO [train.py:1198] (0/2) Epoch 38, batch 1600, loss[loss=0.2376, ctc_loss=0.1065, cr_loss=0.3261, attn_decoder_loss=0.245, over 29696.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1146, cr_loss=0.355, attn_decoder_loss=0.2399, over 5763915.38 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:40:33,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=676100.0, ans=0.125 2024-09-19 13:40:33,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=676100.0, ans=0.125 2024-09-19 13:40:47,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=676140.0, ans=0.125 2024-09-19 13:40:56,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=676140.0, ans=0.0 2024-09-19 13:41:17,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=676220.0, ans=0.125 2024-09-19 13:41:31,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=676260.0, ans=0.0 2024-09-19 13:41:34,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=676260.0, ans=0.125 2024-09-19 13:41:47,496 INFO [train.py:1198] (0/2) Epoch 38, batch 1650, loss[loss=0.2407, ctc_loss=0.1104, cr_loss=0.3314, attn_decoder_loss=0.2478, over 29699.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1141, cr_loss=0.3542, attn_decoder_loss=0.2397, over 5758724.73 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:41:49,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2024-09-19 13:42:10,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=676340.0, ans=0.125 2024-09-19 13:42:11,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=676340.0, ans=0.0 2024-09-19 13:42:14,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=676340.0, ans=0.125 2024-09-19 13:42:28,439 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:42:38,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-09-19 13:42:49,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=676460.0, ans=0.125 2024-09-19 13:42:50,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=676460.0, ans=0.2 2024-09-19 13:42:53,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.638e+01 9.232e+01 9.728e+01 1.403e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-19 13:43:00,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.69 vs. limit=15.0 2024-09-19 13:43:02,488 INFO [train.py:1198] (0/2) Epoch 38, batch 1700, loss[loss=0.215, ctc_loss=0.1022, cr_loss=0.3311, attn_decoder_loss=0.2201, over 29579.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1142, cr_loss=0.3545, attn_decoder_loss=0.2397, over 5780264.06 frames. ], batch size: 69, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:43:04,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=676500.0, ans=0.025 2024-09-19 13:43:16,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=676540.0, ans=0.0 2024-09-19 13:43:22,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=676540.0, ans=0.0 2024-09-19 13:43:33,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=676580.0, ans=0.125 2024-09-19 13:43:42,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=676580.0, ans=0.1 2024-09-19 13:43:43,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-09-19 13:44:22,574 INFO [train.py:1198] (0/2) Epoch 38, batch 1750, loss[loss=0.2134, ctc_loss=0.1006, cr_loss=0.325, attn_decoder_loss=0.2187, over 29380.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.114, cr_loss=0.3543, attn_decoder_loss=0.2395, over 5788566.77 frames. ], batch size: 67, lr: 2.91e-03, grad_scale: 16.0 2024-09-19 13:44:30,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=676700.0, ans=0.125 2024-09-19 13:44:30,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=676700.0, ans=0.125 2024-09-19 13:44:32,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=676700.0, ans=0.125 2024-09-19 13:44:44,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=676740.0, ans=0.125 2024-09-19 13:45:04,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=676780.0, ans=0.95 2024-09-19 13:45:19,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676820.0, ans=0.1 2024-09-19 13:45:30,731 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.567e+01 9.147e+01 9.670e+01 2.287e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-19 13:45:37,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=676900.0, ans=0.0 2024-09-19 13:45:38,185 INFO [train.py:1198] (0/2) Epoch 38, batch 1800, loss[loss=0.2406, ctc_loss=0.1222, cr_loss=0.3721, attn_decoder_loss=0.2455, over 29697.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1145, cr_loss=0.3554, attn_decoder_loss=0.24, over 5789591.42 frames. ], batch size: 83, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:46:14,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676980.0, ans=0.125 2024-09-19 13:46:19,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2024-09-19 13:46:29,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=677020.0, ans=0.0 2024-09-19 13:46:41,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=677060.0, ans=0.0 2024-09-19 13:46:46,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=677060.0, ans=0.0 2024-09-19 13:46:49,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677060.0, ans=0.1 2024-09-19 13:46:52,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=677100.0, ans=0.125 2024-09-19 13:46:52,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=677100.0, ans=0.2 2024-09-19 13:46:53,833 INFO [train.py:1198] (0/2) Epoch 38, batch 1850, loss[loss=0.2422, ctc_loss=0.1155, cr_loss=0.3549, attn_decoder_loss=0.2484, over 29642.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1142, cr_loss=0.3548, attn_decoder_loss=0.2397, over 5796514.82 frames. ], batch size: 86, lr: 2.91e-03, grad_scale: 8.0 2024-09-19 13:46:55,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=677100.0, ans=0.2 2024-09-19 13:47:03,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.65 vs. limit=10.0 2024-09-19 13:47:11,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-19 13:47:33,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2024-09-19 13:47:34,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677180.0, ans=0.1 2024-09-19 13:47:38,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-09-19 13:47:57,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=677260.0, ans=0.025 2024-09-19 13:48:01,725 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.507e+01 9.088e+01 9.545e+01 1.586e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 13:48:10,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=677260.0, ans=0.125 2024-09-19 13:48:13,524 INFO [train.py:1198] (0/2) Epoch 38, batch 1900, loss[loss=0.2452, ctc_loss=0.1148, cr_loss=0.3625, attn_decoder_loss=0.2517, over 29697.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1144, cr_loss=0.3553, attn_decoder_loss=0.2401, over 5803853.61 frames. ], batch size: 89, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:49:05,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677420.0, ans=0.125 2024-09-19 13:49:11,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=677420.0, ans=0.125 2024-09-19 13:49:15,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=677460.0, ans=0.125 2024-09-19 13:49:29,173 INFO [train.py:1198] (0/2) Epoch 38, batch 1950, loss[loss=0.2301, ctc_loss=0.1103, cr_loss=0.3525, attn_decoder_loss=0.2356, over 29448.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1147, cr_loss=0.3559, attn_decoder_loss=0.2411, over 5818387.96 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:49:32,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=677500.0, ans=0.0 2024-09-19 13:49:50,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=677540.0, ans=0.125 2024-09-19 13:49:57,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-09-19 13:50:10,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=677580.0, ans=0.0 2024-09-19 13:50:17,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2024-09-19 13:50:18,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=677620.0, ans=0.125 2024-09-19 13:50:35,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-19 13:50:37,363 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.623e+01 9.104e+01 9.387e+01 1.434e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-19 13:50:37,767 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:50:44,929 INFO [train.py:1198] (0/2) Epoch 38, batch 2000, loss[loss=0.217, ctc_loss=0.1072, cr_loss=0.338, attn_decoder_loss=0.2216, over 29337.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1152, cr_loss=0.3571, attn_decoder_loss=0.2416, over 5795490.13 frames. ], batch size: 67, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:50:56,644 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.09 vs. limit=22.5 2024-09-19 13:51:12,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=677740.0, ans=0.125 2024-09-19 13:51:18,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=677780.0, ans=0.0 2024-09-19 13:51:35,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677820.0, ans=0.1 2024-09-19 13:51:38,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.58 vs. limit=15.0 2024-09-19 13:51:42,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=677820.0, ans=0.2 2024-09-19 13:51:47,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-09-19 13:52:01,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=677860.0, ans=0.125 2024-09-19 13:52:04,795 INFO [train.py:1198] (0/2) Epoch 38, batch 2050, loss[loss=0.2199, ctc_loss=0.1035, cr_loss=0.3302, attn_decoder_loss=0.2255, over 29422.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1149, cr_loss=0.3568, attn_decoder_loss=0.2407, over 5787614.44 frames. ], batch size: 70, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:52:09,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=677900.0, ans=0.0 2024-09-19 13:52:38,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=677980.0, ans=0.125 2024-09-19 13:52:52,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=678020.0, ans=0.0 2024-09-19 13:52:59,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=678020.0, ans=0.2 2024-09-19 13:53:04,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=678060.0, ans=0.125 2024-09-19 13:53:12,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.454e+01 8.907e+01 9.620e+01 4.678e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-19 13:53:20,343 INFO [train.py:1198] (0/2) Epoch 38, batch 2100, loss[loss=0.2374, ctc_loss=0.1202, cr_loss=0.3572, attn_decoder_loss=0.2425, over 29752.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1145, cr_loss=0.3561, attn_decoder_loss=0.2401, over 5798534.38 frames. ], batch size: 81, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:53:20,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=678100.0, ans=10.0 2024-09-19 13:53:28,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=678100.0, ans=0.125 2024-09-19 13:54:13,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=678220.0, ans=0.2 2024-09-19 13:54:14,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=678220.0, ans=0.2 2024-09-19 13:54:35,527 INFO [train.py:1198] (0/2) Epoch 38, batch 2150, loss[loss=0.2488, ctc_loss=0.134, cr_loss=0.4185, attn_decoder_loss=0.2522, over 29471.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1139, cr_loss=0.3552, attn_decoder_loss=0.2395, over 5813553.91 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 13:54:40,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=678300.0, ans=15.0 2024-09-19 13:54:42,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-19 13:55:10,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=12.0 2024-09-19 13:55:14,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=678380.0, ans=0.2 2024-09-19 13:55:21,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=12.0 2024-09-19 13:55:27,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=678420.0, ans=0.125 2024-09-19 13:55:44,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=678460.0, ans=0.125 2024-09-19 13:55:45,741 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.586e+01 8.959e+01 9.597e+01 2.666e+02, threshold=1.792e+02, percent-clipped=1.0 2024-09-19 13:55:50,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=678500.0, ans=0.0 2024-09-19 13:55:53,921 INFO [train.py:1198] (0/2) Epoch 38, batch 2200, loss[loss=0.2337, ctc_loss=0.1087, cr_loss=0.325, attn_decoder_loss=0.2404, over 29641.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1142, cr_loss=0.3555, attn_decoder_loss=0.2394, over 5810635.10 frames. ], batch size: 86, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:56:10,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2024-09-19 13:56:25,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=678580.0, ans=0.125 2024-09-19 13:56:32,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=678580.0, ans=0.125 2024-09-19 13:56:34,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=678580.0, ans=0.2 2024-09-19 13:56:44,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=678620.0, ans=0.125 2024-09-19 13:56:53,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=678620.0, ans=0.2 2024-09-19 13:56:55,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.60 vs. limit=22.5 2024-09-19 13:56:57,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=678660.0, ans=0.0 2024-09-19 13:57:11,729 INFO [train.py:1198] (0/2) Epoch 38, batch 2250, loss[loss=0.2461, ctc_loss=0.1223, cr_loss=0.3582, attn_decoder_loss=0.2519, over 29730.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1142, cr_loss=0.3554, attn_decoder_loss=0.2396, over 5810191.29 frames. ], batch size: 82, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:57:59,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=678820.0, ans=0.2 2024-09-19 13:58:20,713 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.565e+01 9.062e+01 9.644e+01 4.463e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-19 13:58:22,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=678860.0, ans=0.125 2024-09-19 13:58:23,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-19 13:58:26,895 INFO [train.py:1198] (0/2) Epoch 38, batch 2300, loss[loss=0.2126, ctc_loss=0.09438, cr_loss=0.3165, attn_decoder_loss=0.2187, over 29341.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1137, cr_loss=0.3541, attn_decoder_loss=0.2387, over 5797299.42 frames. ], batch size: 71, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 13:58:49,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=678940.0, ans=0.2 2024-09-19 13:59:06,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678980.0, ans=0.1 2024-09-19 13:59:13,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=679020.0, ans=0.125 2024-09-19 13:59:23,197 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:59:25,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-19 13:59:42,413 INFO [train.py:1198] (0/2) Epoch 38, batch 2350, loss[loss=0.242, ctc_loss=0.1185, cr_loss=0.3691, attn_decoder_loss=0.2475, over 29699.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1137, cr_loss=0.3542, attn_decoder_loss=0.2389, over 5803314.89 frames. ], batch size: 83, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:00:04,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679140.0, ans=0.1 2024-09-19 14:00:09,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=679140.0, ans=0.125 2024-09-19 14:00:26,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=679180.0, ans=0.0 2024-09-19 14:00:27,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=679180.0, ans=0.125 2024-09-19 14:00:30,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=679220.0, ans=0.07 2024-09-19 14:00:39,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=679220.0, ans=0.1 2024-09-19 14:00:56,208 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.471e+01 8.979e+01 9.530e+01 2.043e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-19 14:01:02,398 INFO [train.py:1198] (0/2) Epoch 38, batch 2400, loss[loss=0.2361, ctc_loss=0.1166, cr_loss=0.3679, attn_decoder_loss=0.2412, over 29535.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1137, cr_loss=0.3544, attn_decoder_loss=0.2393, over 5807660.08 frames. ], batch size: 76, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 14:01:04,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=679300.0, ans=0.0 2024-09-19 14:01:32,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=679380.0, ans=0.025 2024-09-19 14:01:54,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=679420.0, ans=0.125 2024-09-19 14:01:55,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=679420.0, ans=0.2 2024-09-19 14:02:10,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=679460.0, ans=0.125 2024-09-19 14:02:12,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=679460.0, ans=0.025 2024-09-19 14:02:18,229 INFO [train.py:1198] (0/2) Epoch 38, batch 2450, loss[loss=0.2364, ctc_loss=0.1185, cr_loss=0.3739, attn_decoder_loss=0.2412, over 29713.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1144, cr_loss=0.3553, attn_decoder_loss=0.2402, over 5785240.05 frames. ], batch size: 82, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:02:29,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.46 vs. limit=15.0 2024-09-19 14:02:31,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=679540.0, ans=0.09899494936611666 2024-09-19 14:02:33,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.84 vs. limit=15.0 2024-09-19 14:03:09,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=679620.0, ans=0.0 2024-09-19 14:03:12,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=679620.0, ans=0.125 2024-09-19 14:03:28,797 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.520e+01 8.981e+01 9.531e+01 3.262e+02, threshold=1.796e+02, percent-clipped=1.0 2024-09-19 14:03:35,430 INFO [train.py:1198] (0/2) Epoch 38, batch 2500, loss[loss=0.2478, ctc_loss=0.1216, cr_loss=0.3605, attn_decoder_loss=0.2538, over 29621.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1144, cr_loss=0.3551, attn_decoder_loss=0.2403, over 5796223.76 frames. ], batch size: 86, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:03:45,163 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-19 14:03:50,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.00 vs. limit=22.5 2024-09-19 14:04:29,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=679820.0, ans=0.0 2024-09-19 14:04:48,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679860.0, ans=0.1 2024-09-19 14:04:53,319 INFO [train.py:1198] (0/2) Epoch 38, batch 2550, loss[loss=0.2029, ctc_loss=0.0925, cr_loss=0.2932, attn_decoder_loss=0.2087, over 29357.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1141, cr_loss=0.3544, attn_decoder_loss=0.2402, over 5798274.98 frames. ], batch size: 67, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:04:59,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=679900.0, ans=0.125 2024-09-19 14:05:08,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=679940.0, ans=0.95 2024-09-19 14:05:25,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=679980.0, ans=0.125 2024-09-19 14:05:49,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=680020.0, ans=0.125 2024-09-19 14:05:56,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680060.0, ans=0.1 2024-09-19 14:06:04,812 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.447e+01 9.059e+01 9.443e+01 1.451e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-19 14:06:05,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=680060.0, ans=0.125 2024-09-19 14:06:09,435 INFO [train.py:1198] (0/2) Epoch 38, batch 2600, loss[loss=0.2307, ctc_loss=0.1146, cr_loss=0.3546, attn_decoder_loss=0.2357, over 29478.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1142, cr_loss=0.3548, attn_decoder_loss=0.2403, over 5795629.45 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:06:14,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=680100.0, ans=0.025 2024-09-19 14:06:17,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=680100.0, ans=0.025 2024-09-19 14:06:18,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680100.0, ans=0.1 2024-09-19 14:06:23,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-09-19 14:06:27,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=680140.0, ans=0.125 2024-09-19 14:06:30,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680140.0, ans=0.1 2024-09-19 14:06:53,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=680220.0, ans=0.2 2024-09-19 14:06:59,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=680220.0, ans=0.0 2024-09-19 14:07:17,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-19 14:07:26,864 INFO [train.py:1198] (0/2) Epoch 38, batch 2650, loss[loss=0.2502, ctc_loss=0.1269, cr_loss=0.3853, attn_decoder_loss=0.2554, over 29227.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1148, cr_loss=0.3562, attn_decoder_loss=0.2408, over 5801204.94 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:07:36,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=15.0 2024-09-19 14:08:24,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=680420.0, ans=0.2 2024-09-19 14:08:32,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=680460.0, ans=0.0 2024-09-19 14:08:39,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.526e+01 9.066e+01 9.711e+01 1.379e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-19 14:08:41,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680460.0, ans=0.1 2024-09-19 14:08:44,066 INFO [train.py:1198] (0/2) Epoch 38, batch 2700, loss[loss=0.245, ctc_loss=0.1165, cr_loss=0.3711, attn_decoder_loss=0.251, over 29533.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.115, cr_loss=0.3559, attn_decoder_loss=0.2412, over 5796443.60 frames. ], batch size: 87, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:08:54,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=680500.0, ans=0.2 2024-09-19 14:09:40,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=680620.0, ans=0.125 2024-09-19 14:09:46,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.25 vs. limit=15.0 2024-09-19 14:09:59,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-19 14:09:59,762 INFO [train.py:1198] (0/2) Epoch 38, batch 2750, loss[loss=0.2218, ctc_loss=0.1063, cr_loss=0.3461, attn_decoder_loss=0.2269, over 29508.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1143, cr_loss=0.3549, attn_decoder_loss=0.2401, over 5796142.31 frames. ], batch size: 75, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:10:18,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=680740.0, ans=0.125 2024-09-19 14:10:33,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_na.min_abs, batch_count=680780.0, ans=0.02 2024-09-19 14:10:37,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-09-19 14:10:39,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=680780.0, ans=0.1 2024-09-19 14:10:47,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=680820.0, ans=0.125 2024-09-19 14:10:53,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=680820.0, ans=0.125 2024-09-19 14:11:13,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.653e+01 9.096e+01 9.746e+01 4.436e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-19 14:11:18,100 INFO [train.py:1198] (0/2) Epoch 38, batch 2800, loss[loss=0.2559, ctc_loss=0.1486, cr_loss=0.3974, attn_decoder_loss=0.259, over 20260.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1148, cr_loss=0.3557, attn_decoder_loss=0.2403, over 5777104.63 frames. ], batch size: 210, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 14:11:18,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=680900.0, ans=0.125 2024-09-19 14:11:21,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=680900.0, ans=0.125 2024-09-19 14:12:07,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=681020.0, ans=10.0 2024-09-19 14:12:17,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=681020.0, ans=0.0 2024-09-19 14:12:22,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=681060.0, ans=0.0 2024-09-19 14:12:26,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-09-19 14:12:32,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=681060.0, ans=0.0 2024-09-19 14:12:35,543 INFO [train.py:1198] (0/2) Epoch 38, batch 2850, loss[loss=0.2352, ctc_loss=0.1148, cr_loss=0.364, attn_decoder_loss=0.2405, over 29498.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1151, cr_loss=0.3561, attn_decoder_loss=0.2405, over 5762478.20 frames. ], batch size: 77, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:12:38,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=681100.0, ans=0.125 2024-09-19 14:12:58,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=681140.0, ans=0.125 2024-09-19 14:13:27,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=681220.0, ans=0.125 2024-09-19 14:13:47,916 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.614e+01 9.082e+01 1.001e+02 4.152e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 14:13:50,898 INFO [train.py:1198] (0/2) Epoch 38, batch 2900, loss[loss=0.2297, ctc_loss=0.1127, cr_loss=0.351, attn_decoder_loss=0.2349, over 29435.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.116, cr_loss=0.3584, attn_decoder_loss=0.2415, over 5787965.61 frames. ], batch size: 79, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:14:09,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=681340.0, ans=0.5 2024-09-19 14:14:11,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2024-09-19 14:14:37,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=681420.0, ans=0.07 2024-09-19 14:14:39,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=681420.0, ans=10.0 2024-09-19 14:15:08,291 INFO [train.py:1198] (0/2) Epoch 38, batch 2950, loss[loss=0.2299, ctc_loss=0.1219, cr_loss=0.3711, attn_decoder_loss=0.2337, over 29518.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.115, cr_loss=0.3563, attn_decoder_loss=0.2402, over 5781503.71 frames. ], batch size: 75, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:15:08,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=681500.0, ans=0.07 2024-09-19 14:15:29,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=681540.0, ans=0.125 2024-09-19 14:15:39,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=681580.0, ans=0.125 2024-09-19 14:15:42,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.56 vs. limit=22.5 2024-09-19 14:15:55,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2024-09-19 14:15:59,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.70 vs. limit=15.0 2024-09-19 14:16:16,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681660.0, ans=0.1 2024-09-19 14:16:20,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=681660.0, ans=0.125 2024-09-19 14:16:23,458 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.330e+01 8.968e+01 9.568e+01 1.287e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 14:16:26,715 INFO [train.py:1198] (0/2) Epoch 38, batch 3000, loss[loss=0.2365, ctc_loss=0.1127, cr_loss=0.355, attn_decoder_loss=0.2423, over 29766.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1148, cr_loss=0.3557, attn_decoder_loss=0.2401, over 5782271.67 frames. ], batch size: 81, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:16:26,716 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 14:16:45,081 INFO [train.py:1230] (0/2) Epoch 38, validation: loss=0.2118, ctc_loss=0.03653, cr_loss=5.871e-15, attn_decoder_loss=0.2312, over 944034.00 frames. 2024-09-19 14:16:45,082 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 14:16:47,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.41 vs. limit=15.0 2024-09-19 14:16:48,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=681700.0, ans=0.1 2024-09-19 14:16:50,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=681700.0, ans=0.2 2024-09-19 14:17:06,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=681740.0, ans=0.0 2024-09-19 14:17:08,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.03 vs. limit=10.0 2024-09-19 14:17:17,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=681780.0, ans=0.125 2024-09-19 14:17:23,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=681780.0, ans=0.1 2024-09-19 14:17:37,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=681820.0, ans=0.125 2024-09-19 14:17:38,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=681820.0, ans=0.125 2024-09-19 14:17:39,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=681820.0, ans=0.125 2024-09-19 14:17:40,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=681820.0, ans=0.1 2024-09-19 14:17:41,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=681820.0, ans=0.125 2024-09-19 14:17:47,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=681860.0, ans=0.125 2024-09-19 14:18:00,867 INFO [train.py:1198] (0/2) Epoch 38, batch 3050, loss[loss=0.2205, ctc_loss=0.1002, cr_loss=0.3334, attn_decoder_loss=0.2264, over 29546.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1144, cr_loss=0.355, attn_decoder_loss=0.2404, over 5776648.11 frames. ], batch size: 76, lr: 2.90e-03, grad_scale: 8.0 2024-09-19 14:18:03,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.43 vs. limit=22.5 2024-09-19 14:18:09,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2024-09-19 14:18:11,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=681900.0, ans=0.0 2024-09-19 14:18:17,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=681940.0, ans=10.0 2024-09-19 14:18:29,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=681980.0, ans=0.02 2024-09-19 14:18:52,987 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:19:15,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.484e+01 8.987e+01 9.703e+01 1.967e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 14:19:18,299 INFO [train.py:1198] (0/2) Epoch 38, batch 3100, loss[loss=0.2553, ctc_loss=0.1233, cr_loss=0.3654, attn_decoder_loss=0.2618, over 29272.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.114, cr_loss=0.3539, attn_decoder_loss=0.2399, over 5776135.51 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:19:39,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-19 14:19:47,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=682140.0, ans=0.0 2024-09-19 14:19:51,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.91 vs. limit=15.0 2024-09-19 14:20:08,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=682220.0, ans=0.2 2024-09-19 14:20:32,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=8.0 2024-09-19 14:20:33,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=682260.0, ans=0.125 2024-09-19 14:20:35,964 INFO [train.py:1198] (0/2) Epoch 38, batch 3150, loss[loss=0.2408, ctc_loss=0.1201, cr_loss=0.3687, attn_decoder_loss=0.246, over 28715.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1142, cr_loss=0.3542, attn_decoder_loss=0.2401, over 5781572.31 frames. ], batch size: 104, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:20:49,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=682340.0, ans=0.0 2024-09-19 14:21:01,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682340.0, ans=0.1 2024-09-19 14:21:25,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=682420.0, ans=0.025 2024-09-19 14:21:29,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=682420.0, ans=0.125 2024-09-19 14:21:48,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.507e+01 9.169e+01 9.644e+01 2.178e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-19 14:21:51,677 INFO [train.py:1198] (0/2) Epoch 38, batch 3200, loss[loss=0.2236, ctc_loss=0.1042, cr_loss=0.3355, attn_decoder_loss=0.2294, over 29401.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1139, cr_loss=0.3537, attn_decoder_loss=0.2397, over 5791801.45 frames. ], batch size: 79, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:21:54,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-09-19 14:22:01,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=682500.0, ans=0.125 2024-09-19 14:22:02,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=682500.0, ans=0.125 2024-09-19 14:22:06,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=682540.0, ans=0.07 2024-09-19 14:22:22,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=682580.0, ans=0.125 2024-09-19 14:22:22,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-19 14:22:41,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=682620.0, ans=0.07 2024-09-19 14:23:09,410 INFO [train.py:1198] (0/2) Epoch 38, batch 3250, loss[loss=0.2443, ctc_loss=0.1203, cr_loss=0.3774, attn_decoder_loss=0.2497, over 29690.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1141, cr_loss=0.3549, attn_decoder_loss=0.2399, over 5797941.35 frames. ], batch size: 84, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:23:09,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=682700.0, ans=0.2 2024-09-19 14:23:11,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=682700.0, ans=0.025 2024-09-19 14:23:14,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=682700.0, ans=0.0 2024-09-19 14:23:15,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=682700.0, ans=0.125 2024-09-19 14:23:28,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=682740.0, ans=0.0 2024-09-19 14:23:38,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=682740.0, ans=0.05 2024-09-19 14:23:50,824 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:23:50,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=682780.0, ans=0.125 2024-09-19 14:23:51,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-19 14:23:55,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=682820.0, ans=0.025 2024-09-19 14:24:01,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=682820.0, ans=0.125 2024-09-19 14:24:14,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=682860.0, ans=0.1 2024-09-19 14:24:23,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.537e+01 9.091e+01 9.701e+01 1.814e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 14:24:26,798 INFO [train.py:1198] (0/2) Epoch 38, batch 3300, loss[loss=0.2534, ctc_loss=0.1194, cr_loss=0.3615, attn_decoder_loss=0.2602, over 28471.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1132, cr_loss=0.3522, attn_decoder_loss=0.2386, over 5795264.04 frames. ], batch size: 112, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:24:28,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=682900.0, ans=0.2 2024-09-19 14:24:34,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=682900.0, ans=0.2 2024-09-19 14:24:47,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=682940.0, ans=10.0 2024-09-19 14:25:26,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=683060.0, ans=0.0 2024-09-19 14:25:33,566 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:25:42,127 INFO [train.py:1198] (0/2) Epoch 38, batch 3350, loss[loss=0.257, ctc_loss=0.1288, cr_loss=0.3972, attn_decoder_loss=0.2624, over 28795.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1138, cr_loss=0.3534, attn_decoder_loss=0.2395, over 5772681.99 frames. ], batch size: 104, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:25:51,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683100.0, ans=0.0 2024-09-19 14:25:53,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.58 vs. limit=6.0 2024-09-19 14:26:11,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2024-09-19 14:26:42,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=683220.0, ans=0.125 2024-09-19 14:26:51,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=683260.0, ans=0.125 2024-09-19 14:26:57,159 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.721e+01 9.366e+01 9.877e+01 4.380e+02, threshold=1.873e+02, percent-clipped=1.0 2024-09-19 14:27:00,141 INFO [train.py:1198] (0/2) Epoch 38, batch 3400, loss[loss=0.2029, ctc_loss=0.09145, cr_loss=0.2948, attn_decoder_loss=0.2088, over 29360.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1142, cr_loss=0.3534, attn_decoder_loss=0.2395, over 5764779.76 frames. ], batch size: 67, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:27:03,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683300.0, ans=0.1 2024-09-19 14:27:09,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683300.0, ans=0.1 2024-09-19 14:27:13,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=683300.0, ans=0.0 2024-09-19 14:27:13,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=683300.0, ans=0.125 2024-09-19 14:27:17,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683340.0, ans=0.125 2024-09-19 14:27:25,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=683340.0, ans=0.07 2024-09-19 14:27:40,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683380.0, ans=0.1 2024-09-19 14:28:10,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=683460.0, ans=0.125 2024-09-19 14:28:13,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=683460.0, ans=0.125 2024-09-19 14:28:16,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=683500.0, ans=0.0 2024-09-19 14:28:17,521 INFO [train.py:1198] (0/2) Epoch 38, batch 3450, loss[loss=0.2352, ctc_loss=0.1097, cr_loss=0.3399, attn_decoder_loss=0.2416, over 28336.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1144, cr_loss=0.3545, attn_decoder_loss=0.24, over 5772652.58 frames. ], batch size: 111, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:28:20,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683500.0, ans=0.125 2024-09-19 14:28:23,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=683500.0, ans=0.0 2024-09-19 14:28:57,282 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:29:14,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=683620.0, ans=0.125 2024-09-19 14:29:17,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=683660.0, ans=0.2 2024-09-19 14:29:31,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.608e+01 9.088e+01 9.945e+01 4.659e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 14:29:33,415 INFO [train.py:1198] (0/2) Epoch 38, batch 3500, loss[loss=0.2062, ctc_loss=0.08862, cr_loss=0.2964, attn_decoder_loss=0.2126, over 29345.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1138, cr_loss=0.3538, attn_decoder_loss=0.2393, over 5774557.55 frames. ], batch size: 71, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:29:39,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=683700.0, ans=0.125 2024-09-19 14:29:42,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=683700.0, ans=0.125 2024-09-19 14:29:44,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=683700.0, ans=0.125 2024-09-19 14:29:56,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=683740.0, ans=0.1 2024-09-19 14:30:03,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=683780.0, ans=0.125 2024-09-19 14:30:05,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=683780.0, ans=0.125 2024-09-19 14:30:18,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=683820.0, ans=0.025 2024-09-19 14:30:21,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=683820.0, ans=0.0 2024-09-19 14:30:31,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2024-09-19 14:30:32,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=683820.0, ans=0.0 2024-09-19 14:30:40,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683860.0, ans=0.125 2024-09-19 14:30:40,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=683860.0, ans=0.0 2024-09-19 14:30:50,291 INFO [train.py:1198] (0/2) Epoch 38, batch 3550, loss[loss=0.2487, ctc_loss=0.1187, cr_loss=0.383, attn_decoder_loss=0.2546, over 29722.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1139, cr_loss=0.3542, attn_decoder_loss=0.2394, over 5781869.47 frames. ], batch size: 89, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:30:50,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=683900.0, ans=0.125 2024-09-19 14:30:52,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683900.0, ans=0.125 2024-09-19 14:31:14,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=12.0 2024-09-19 14:31:21,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683980.0, ans=0.125 2024-09-19 14:31:41,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=684020.0, ans=0.125 2024-09-19 14:31:50,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=684060.0, ans=0.125 2024-09-19 14:31:54,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=684060.0, ans=0.125 2024-09-19 14:32:05,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.514e+01 8.990e+01 9.421e+01 1.244e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 14:32:06,681 INFO [train.py:1198] (0/2) Epoch 38, batch 3600, loss[loss=0.2321, ctc_loss=0.1076, cr_loss=0.3511, attn_decoder_loss=0.2381, over 29488.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.114, cr_loss=0.3542, attn_decoder_loss=0.2398, over 5791104.46 frames. ], batch size: 77, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:32:23,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684140.0, ans=0.1 2024-09-19 14:32:35,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=684180.0, ans=0.125 2024-09-19 14:32:59,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=684220.0, ans=0.125 2024-09-19 14:33:11,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=684260.0, ans=0.125 2024-09-19 14:33:21,081 INFO [train.py:1198] (0/2) Epoch 38, batch 3650, loss[loss=0.2382, ctc_loss=0.1194, cr_loss=0.375, attn_decoder_loss=0.243, over 29501.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1136, cr_loss=0.3529, attn_decoder_loss=0.239, over 5792125.76 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:33:25,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=684300.0, ans=0.125 2024-09-19 14:33:37,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=684340.0, ans=0.125 2024-09-19 14:33:57,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684380.0, ans=0.1 2024-09-19 14:33:58,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=684380.0, ans=0.125 2024-09-19 14:34:01,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=684380.0, ans=0.2 2024-09-19 14:34:32,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=684460.0, ans=0.125 2024-09-19 14:34:34,296 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.479e+01 8.884e+01 9.372e+01 5.863e+02, threshold=1.777e+02, percent-clipped=1.0 2024-09-19 14:34:34,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=684500.0, ans=0.07 2024-09-19 14:34:35,785 INFO [train.py:1198] (0/2) Epoch 38, batch 3700, loss[loss=0.2544, ctc_loss=0.1328, cr_loss=0.4103, attn_decoder_loss=0.2588, over 29711.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1135, cr_loss=0.3533, attn_decoder_loss=0.2392, over 5802941.98 frames. ], batch size: 84, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:34:59,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=684540.0, ans=0.125 2024-09-19 14:35:01,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=684540.0, ans=0.0 2024-09-19 14:35:15,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=684580.0, ans=0.125 2024-09-19 14:35:36,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=684660.0, ans=0.0 2024-09-19 14:35:39,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=684660.0, ans=0.025 2024-09-19 14:35:49,914 INFO [train.py:1198] (0/2) Epoch 38, batch 3750, loss[loss=0.2125, ctc_loss=0.09902, cr_loss=0.3135, attn_decoder_loss=0.2181, over 29358.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1135, cr_loss=0.3533, attn_decoder_loss=0.2391, over 5807094.65 frames. ], batch size: 67, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:35:59,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=684700.0, ans=0.125 2024-09-19 14:36:02,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-19 14:36:27,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=684780.0, ans=0.2 2024-09-19 14:36:29,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=684780.0, ans=0.07 2024-09-19 14:37:01,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=22.5 2024-09-19 14:37:04,811 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.375e+01 8.926e+01 9.574e+01 1.662e+02, threshold=1.785e+02, percent-clipped=0.0 2024-09-19 14:37:06,353 INFO [train.py:1198] (0/2) Epoch 38, batch 3800, loss[loss=0.2541, ctc_loss=0.1333, cr_loss=0.4054, attn_decoder_loss=0.2585, over 29619.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1134, cr_loss=0.3528, attn_decoder_loss=0.2388, over 5796603.85 frames. ], batch size: 86, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:37:19,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-09-19 14:37:25,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=684940.0, ans=0.5 2024-09-19 14:37:29,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=684940.0, ans=0.2 2024-09-19 14:37:38,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.65 vs. limit=15.0 2024-09-19 14:37:40,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=684980.0, ans=0.0 2024-09-19 14:37:56,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2024-09-19 14:37:58,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=685020.0, ans=0.125 2024-09-19 14:38:03,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=685020.0, ans=0.0 2024-09-19 14:38:04,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=685060.0, ans=0.05 2024-09-19 14:38:07,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=685060.0, ans=0.125 2024-09-19 14:38:10,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=685060.0, ans=0.0 2024-09-19 14:38:19,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=685060.0, ans=0.0 2024-09-19 14:38:22,255 INFO [train.py:1198] (0/2) Epoch 38, batch 3850, loss[loss=0.2475, ctc_loss=0.1171, cr_loss=0.3502, attn_decoder_loss=0.2542, over 29240.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1133, cr_loss=0.3529, attn_decoder_loss=0.2386, over 5811708.57 frames. ], batch size: 100, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:38:38,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2024-09-19 14:38:56,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=685180.0, ans=0.125 2024-09-19 14:39:34,606 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.495e+01 8.957e+01 9.535e+01 1.173e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 14:39:35,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=685300.0, ans=0.125 2024-09-19 14:39:36,181 INFO [train.py:1198] (0/2) Epoch 38, batch 3900, loss[loss=0.2423, ctc_loss=0.1163, cr_loss=0.3579, attn_decoder_loss=0.2483, over 29644.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1136, cr_loss=0.3536, attn_decoder_loss=0.239, over 5816652.07 frames. ], batch size: 86, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:39:50,406 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.08 vs. limit=15.0 2024-09-19 14:39:58,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=685340.0, ans=0.1 2024-09-19 14:39:59,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=685340.0, ans=0.125 2024-09-19 14:40:08,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=685380.0, ans=0.1 2024-09-19 14:40:16,604 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-09-19 14:40:38,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685460.0, ans=0.1 2024-09-19 14:40:49,773 INFO [train.py:1198] (0/2) Epoch 38, batch 3950, loss[loss=0.247, ctc_loss=0.1179, cr_loss=0.3768, attn_decoder_loss=0.2529, over 29465.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1138, cr_loss=0.354, attn_decoder_loss=0.2393, over 5835549.40 frames. ], batch size: 97, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 14:40:53,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-09-19 14:40:59,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=685500.0, ans=0.125 2024-09-19 14:41:14,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2024-09-19 14:41:18,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685580.0, ans=0.1 2024-09-19 14:41:18,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.87 vs. limit=15.0 2024-09-19 14:42:03,852 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.719e+01 9.341e+01 1.012e+02 2.118e+02, threshold=1.868e+02, percent-clipped=1.0 2024-09-19 14:42:05,293 INFO [train.py:1198] (0/2) Epoch 38, batch 4000, loss[loss=0.2168, ctc_loss=0.09837, cr_loss=0.3018, attn_decoder_loss=0.2233, over 29508.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1139, cr_loss=0.3538, attn_decoder_loss=0.2396, over 5813405.29 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 14:42:33,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=685780.0, ans=0.0 2024-09-19 14:42:40,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2024-09-19 14:42:59,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.03 vs. limit=22.5 2024-09-19 14:43:00,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=685820.0, ans=0.125 2024-09-19 14:43:20,736 INFO [train.py:1198] (0/2) Epoch 38, batch 4050, loss[loss=0.2505, ctc_loss=0.1265, cr_loss=0.3574, attn_decoder_loss=0.2563, over 19714.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.114, cr_loss=0.3542, attn_decoder_loss=0.2395, over 5796970.94 frames. ], batch size: 210, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:43:41,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=685940.0, ans=0.0 2024-09-19 14:43:50,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2024-09-19 14:43:51,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=685980.0, ans=0.125 2024-09-19 14:44:19,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=686060.0, ans=0.125 2024-09-19 14:44:33,761 INFO [train.py:1198] (0/2) Epoch 38, batch 4100, loss[loss=0.2657, ctc_loss=0.1427, cr_loss=0.4175, attn_decoder_loss=0.2701, over 29495.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1144, cr_loss=0.3548, attn_decoder_loss=0.2396, over 5792470.94 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:44:33,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=686100.0, ans=0.125 2024-09-19 14:44:34,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686100.0, ans=0.0 2024-09-19 14:44:35,193 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.495e+01 9.024e+01 9.584e+01 1.415e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 14:44:42,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=686100.0, ans=0.125 2024-09-19 14:44:47,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=686140.0, ans=0.125 2024-09-19 14:45:16,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=686220.0, ans=0.0 2024-09-19 14:45:35,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=686260.0, ans=0.07 2024-09-19 14:45:39,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=686260.0, ans=0.125 2024-09-19 14:45:47,175 INFO [train.py:1198] (0/2) Epoch 38, batch 4150, loss[loss=0.2301, ctc_loss=0.1162, cr_loss=0.3608, attn_decoder_loss=0.2348, over 29501.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1145, cr_loss=0.3556, attn_decoder_loss=0.2395, over 5797948.24 frames. ], batch size: 77, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:46:03,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=686340.0, ans=0.125 2024-09-19 14:46:04,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=686340.0, ans=0.2 2024-09-19 14:46:05,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2024-09-19 14:46:06,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=686340.0, ans=0.2 2024-09-19 14:46:16,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-09-19 14:46:19,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=686380.0, ans=0.125 2024-09-19 14:46:29,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=686380.0, ans=0.5 2024-09-19 14:46:32,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686420.0, ans=0.1 2024-09-19 14:46:45,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=686460.0, ans=0.0 2024-09-19 14:46:53,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686460.0, ans=0.1 2024-09-19 14:47:01,643 INFO [train.py:1198] (0/2) Epoch 38, batch 4200, loss[loss=0.2569, ctc_loss=0.1367, cr_loss=0.4165, attn_decoder_loss=0.261, over 29489.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1146, cr_loss=0.356, attn_decoder_loss=0.2398, over 5800034.03 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:47:03,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.618e+01 9.071e+01 9.625e+01 1.972e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 14:47:07,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=686500.0, ans=0.2 2024-09-19 14:47:12,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686500.0, ans=0.1 2024-09-19 14:47:29,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=686540.0, ans=0.125 2024-09-19 14:47:54,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=686620.0, ans=0.0 2024-09-19 14:48:08,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.71 vs. limit=15.0 2024-09-19 14:48:09,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=686660.0, ans=0.05 2024-09-19 14:48:16,371 INFO [train.py:1198] (0/2) Epoch 38, batch 4250, loss[loss=0.2274, ctc_loss=0.1104, cr_loss=0.3546, attn_decoder_loss=0.2325, over 29517.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1143, cr_loss=0.3552, attn_decoder_loss=0.24, over 5805461.84 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 8.0 2024-09-19 14:48:17,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-19 14:48:36,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=686740.0, ans=0.025 2024-09-19 14:48:36,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=686740.0, ans=0.1 2024-09-19 14:48:39,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-09-19 14:49:26,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=686860.0, ans=0.125 2024-09-19 14:49:30,555 INFO [train.py:1198] (0/2) Epoch 38, batch 4300, loss[loss=0.249, ctc_loss=0.1179, cr_loss=0.3604, attn_decoder_loss=0.2556, over 29502.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.114, cr_loss=0.3543, attn_decoder_loss=0.2401, over 5795090.04 frames. ], batch size: 87, lr: 2.88e-03, grad_scale: 8.0 2024-09-19 14:49:32,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.697e+01 9.242e+01 9.593e+01 9.804e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-19 14:50:05,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=686980.0, ans=0.125 2024-09-19 14:50:28,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-09-19 14:50:45,378 INFO [train.py:1198] (0/2) Epoch 38, batch 4350, loss[loss=0.2439, ctc_loss=0.125, cr_loss=0.378, attn_decoder_loss=0.2487, over 29545.00 frames. ], tot_loss[loss=0.238, ctc_loss=0.1168, cr_loss=0.3605, attn_decoder_loss=0.2435, over 5797332.96 frames. ], batch size: 97, lr: 2.88e-03, grad_scale: 8.0 2024-09-19 14:50:48,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=687100.0, ans=0.125 2024-09-19 14:50:49,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.11 vs. limit=15.0 2024-09-19 14:51:17,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-19 14:51:39,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.40 vs. limit=10.0 2024-09-19 14:51:58,779 INFO [train.py:1198] (0/2) Epoch 38, batch 4400, loss[loss=0.2541, ctc_loss=0.1335, cr_loss=0.4015, attn_decoder_loss=0.2585, over 27176.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.1178, cr_loss=0.3625, attn_decoder_loss=0.2454, over 5768542.48 frames. ], batch size: 124, lr: 2.88e-03, grad_scale: 16.0 2024-09-19 14:52:00,221 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.169e+01 8.939e+01 9.261e+01 9.709e+01 1.293e+02, threshold=1.852e+02, percent-clipped=0.0 2024-09-19 14:52:05,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=687300.0, ans=0.2 2024-09-19 14:52:50,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=687420.0, ans=0.2 2024-09-19 14:52:50,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=687420.0, ans=0.2 2024-09-19 14:53:13,723 INFO [train.py:1198] (0/2) Epoch 38, batch 4450, loss[loss=0.2583, ctc_loss=0.1525, cr_loss=0.395, attn_decoder_loss=0.2613, over 20212.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1212, cr_loss=0.3674, attn_decoder_loss=0.2475, over 5574936.87 frames. ], batch size: 209, lr: 2.88e-03, grad_scale: 16.0 2024-09-19 14:53:23,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=687500.0, ans=0.0 2024-09-19 14:53:55,173 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:54:11,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=687620.0, ans=0.05 2024-09-19 14:54:15,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=687660.0, ans=0.5 2024-09-19 14:54:22,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=687660.0, ans=0.125 2024-09-19 14:54:28,738 INFO [train.py:1198] (0/2) Epoch 38, batch 4500, loss[loss=0.2488, ctc_loss=0.1334, cr_loss=0.3645, attn_decoder_loss=0.2536, over 20543.00 frames. ], tot_loss[loss=0.244, ctc_loss=0.1241, cr_loss=0.3696, attn_decoder_loss=0.2491, over 5239572.98 frames. ], batch size: 210, lr: 2.88e-03, grad_scale: 8.0 2024-09-19 14:54:31,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.974e+01 1.104e+02 1.169e+02 2.298e+02, threshold=2.208e+02, percent-clipped=1.0 2024-09-19 14:54:38,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=687700.0, ans=0.025 2024-09-19 14:55:01,570 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:55:05,497 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-38.pt 2024-09-19 14:55:50,038 INFO [train.py:1198] (0/2) Epoch 39, batch 0, loss[loss=0.2131, ctc_loss=0.09408, cr_loss=0.3196, attn_decoder_loss=0.2193, over 29604.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.09408, cr_loss=0.3196, attn_decoder_loss=0.2193, over 29604.00 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 14:55:50,039 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 14:56:08,891 INFO [train.py:1230] (0/2) Epoch 39, validation: loss=0.2125, ctc_loss=0.03631, cr_loss=6.129e-15, attn_decoder_loss=0.232, over 944034.00 frames. 2024-09-19 14:56:08,892 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 14:56:19,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=687800.0, ans=0.125 2024-09-19 14:56:37,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687840.0, ans=0.125 2024-09-19 14:56:40,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=687880.0, ans=0.125 2024-09-19 14:56:43,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=687880.0, ans=0.125 2024-09-19 14:57:04,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=687920.0, ans=0.125 2024-09-19 14:57:19,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=687960.0, ans=0.125 2024-09-19 14:57:24,316 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-172000.pt 2024-09-19 14:57:32,780 INFO [train.py:1198] (0/2) Epoch 39, batch 50, loss[loss=0.2148, ctc_loss=0.1022, cr_loss=0.324, attn_decoder_loss=0.2201, over 29421.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1151, cr_loss=0.3563, attn_decoder_loss=0.2403, over 1270028.75 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 14:57:38,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.46 vs. limit=12.0 2024-09-19 14:57:51,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=688040.0, ans=0.07 2024-09-19 14:58:02,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=22.5 2024-09-19 14:58:06,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=688080.0, ans=0.125 2024-09-19 14:58:12,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=688080.0, ans=0.09899494936611666 2024-09-19 14:58:14,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.884e+01 9.468e+01 1.073e+02 2.116e+02, threshold=1.894e+02, percent-clipped=0.0 2024-09-19 14:58:42,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=688160.0, ans=0.2 2024-09-19 14:58:47,895 INFO [train.py:1198] (0/2) Epoch 39, batch 100, loss[loss=0.2312, ctc_loss=0.1161, cr_loss=0.3847, attn_decoder_loss=0.2355, over 29539.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1169, cr_loss=0.3608, attn_decoder_loss=0.2428, over 2253958.84 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 14:58:48,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2024-09-19 14:59:01,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=688240.0, ans=0.0 2024-09-19 14:59:09,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=688240.0, ans=0.09899494936611666 2024-09-19 14:59:13,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=688240.0, ans=0.125 2024-09-19 14:59:24,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2024-09-19 14:59:30,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=688280.0, ans=0.04949747468305833 2024-09-19 15:00:04,839 INFO [train.py:1198] (0/2) Epoch 39, batch 150, loss[loss=0.2128, ctc_loss=0.1011, cr_loss=0.3304, attn_decoder_loss=0.2178, over 29424.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1149, cr_loss=0.3567, attn_decoder_loss=0.2405, over 3048202.84 frames. ], batch size: 70, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:00:17,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=22.5 2024-09-19 15:00:33,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=688440.0, ans=0.125 2024-09-19 15:00:37,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=688480.0, ans=0.125 2024-09-19 15:00:40,323 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:00:46,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=688480.0, ans=0.125 2024-09-19 15:00:48,859 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.419e+01 8.955e+01 9.625e+01 1.555e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 15:01:02,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=688520.0, ans=0.125 2024-09-19 15:01:10,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=688560.0, ans=0.5 2024-09-19 15:01:13,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=688560.0, ans=0.2 2024-09-19 15:01:16,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=688560.0, ans=0.0 2024-09-19 15:01:22,059 INFO [train.py:1198] (0/2) Epoch 39, batch 200, loss[loss=0.2418, ctc_loss=0.1218, cr_loss=0.3628, attn_decoder_loss=0.2471, over 26978.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1141, cr_loss=0.355, attn_decoder_loss=0.2396, over 3659269.98 frames. ], batch size: 124, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:01:25,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=688600.0, ans=0.125 2024-09-19 15:01:43,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=688640.0, ans=0.125 2024-09-19 15:01:44,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=688640.0, ans=0.05 2024-09-19 15:01:46,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=688640.0, ans=0.125 2024-09-19 15:01:58,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.22 vs. limit=22.5 2024-09-19 15:02:01,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=688680.0, ans=0.125 2024-09-19 15:02:05,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=688720.0, ans=0.0 2024-09-19 15:02:07,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=688720.0, ans=0.2 2024-09-19 15:02:11,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688720.0, ans=0.1 2024-09-19 15:02:37,647 INFO [train.py:1198] (0/2) Epoch 39, batch 250, loss[loss=0.2449, ctc_loss=0.1254, cr_loss=0.3742, attn_decoder_loss=0.2499, over 29214.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1145, cr_loss=0.3558, attn_decoder_loss=0.2394, over 4142289.88 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:02:54,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten.whitening_limit, batch_count=688840.0, ans=22.5 2024-09-19 15:03:06,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=688880.0, ans=0.2 2024-09-19 15:03:19,943 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.603e+01 9.098e+01 9.821e+01 6.363e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-19 15:03:21,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=688920.0, ans=0.125 2024-09-19 15:03:52,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=688960.0, ans=0.0 2024-09-19 15:03:55,541 INFO [train.py:1198] (0/2) Epoch 39, batch 300, loss[loss=0.2454, ctc_loss=0.1275, cr_loss=0.3994, attn_decoder_loss=0.2496, over 29567.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1141, cr_loss=0.3554, attn_decoder_loss=0.2392, over 4512111.14 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:04:06,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2024-09-19 15:04:49,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=689120.0, ans=0.125 2024-09-19 15:05:11,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.76 vs. limit=15.0 2024-09-19 15:05:13,153 INFO [train.py:1198] (0/2) Epoch 39, batch 350, loss[loss=0.2101, ctc_loss=0.09211, cr_loss=0.3008, attn_decoder_loss=0.2166, over 29331.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1137, cr_loss=0.3544, attn_decoder_loss=0.2392, over 4797053.09 frames. ], batch size: 71, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:05:13,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=689200.0, ans=0.025 2024-09-19 15:05:16,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689200.0, ans=0.125 2024-09-19 15:05:23,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=689200.0, ans=0.125 2024-09-19 15:05:24,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=689200.0, ans=0.0 2024-09-19 15:05:50,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689280.0, ans=0.125 2024-09-19 15:05:55,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.459e+01 8.983e+01 9.522e+01 3.712e+02, threshold=1.797e+02, percent-clipped=2.0 2024-09-19 15:05:58,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=689320.0, ans=0.125 2024-09-19 15:06:08,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-19 15:06:18,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.38 vs. limit=22.5 2024-09-19 15:06:28,537 INFO [train.py:1198] (0/2) Epoch 39, batch 400, loss[loss=0.2328, ctc_loss=0.1126, cr_loss=0.3536, attn_decoder_loss=0.2383, over 29712.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1132, cr_loss=0.353, attn_decoder_loss=0.2387, over 5026081.87 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:06:33,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=689400.0, ans=0.025 2024-09-19 15:06:39,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=689400.0, ans=0.0 2024-09-19 15:07:46,832 INFO [train.py:1198] (0/2) Epoch 39, batch 450, loss[loss=0.2426, ctc_loss=0.1116, cr_loss=0.3529, attn_decoder_loss=0.2494, over 29699.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1136, cr_loss=0.354, attn_decoder_loss=0.2392, over 5188118.92 frames. ], batch size: 83, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:07:47,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=689600.0, ans=0.2 2024-09-19 15:08:09,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=689640.0, ans=0.125 2024-09-19 15:08:14,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=12.0 2024-09-19 15:08:15,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=689640.0, ans=0.0 2024-09-19 15:08:27,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=689680.0, ans=0.125 2024-09-19 15:08:32,803 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.422e+01 8.949e+01 9.558e+01 1.384e+02, threshold=1.790e+02, percent-clipped=0.0 2024-09-19 15:08:45,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=689720.0, ans=0.025 2024-09-19 15:08:53,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=689760.0, ans=0.125 2024-09-19 15:09:03,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=689800.0, ans=0.0 2024-09-19 15:09:04,771 INFO [train.py:1198] (0/2) Epoch 39, batch 500, loss[loss=0.2527, ctc_loss=0.1246, cr_loss=0.3895, attn_decoder_loss=0.2583, over 29461.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1136, cr_loss=0.3544, attn_decoder_loss=0.2389, over 5330745.74 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:09:05,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=689800.0, ans=0.125 2024-09-19 15:09:24,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=689840.0, ans=15.0 2024-09-19 15:09:46,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=689880.0, ans=0.125 2024-09-19 15:09:47,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=689880.0, ans=0.025 2024-09-19 15:09:53,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=689920.0, ans=0.0 2024-09-19 15:09:58,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=689920.0, ans=0.125 2024-09-19 15:10:04,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689960.0, ans=0.1 2024-09-19 15:10:08,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=689960.0, ans=0.125 2024-09-19 15:10:19,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=690000.0, ans=0.125 2024-09-19 15:10:20,422 INFO [train.py:1198] (0/2) Epoch 39, batch 550, loss[loss=0.2338, ctc_loss=0.1057, cr_loss=0.3236, attn_decoder_loss=0.2408, over 28829.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1135, cr_loss=0.3536, attn_decoder_loss=0.2389, over 5422767.64 frames. ], batch size: 104, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:10:52,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=690080.0, ans=0.125 2024-09-19 15:11:02,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-19 15:11:04,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.632e+01 8.977e+01 9.526e+01 2.010e+02, threshold=1.795e+02, percent-clipped=2.0 2024-09-19 15:11:38,686 INFO [train.py:1198] (0/2) Epoch 39, batch 600, loss[loss=0.2535, ctc_loss=0.1241, cr_loss=0.3841, attn_decoder_loss=0.2593, over 29287.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1135, cr_loss=0.354, attn_decoder_loss=0.2391, over 5508124.50 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:11:53,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=690200.0, ans=0.125 2024-09-19 15:12:00,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=690240.0, ans=0.125 2024-09-19 15:12:02,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=690240.0, ans=0.2 2024-09-19 15:12:24,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690320.0, ans=0.1 2024-09-19 15:12:37,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2024-09-19 15:12:42,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=690360.0, ans=0.0 2024-09-19 15:12:56,387 INFO [train.py:1198] (0/2) Epoch 39, batch 650, loss[loss=0.2404, ctc_loss=0.1235, cr_loss=0.3813, attn_decoder_loss=0.245, over 29758.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.113, cr_loss=0.3529, attn_decoder_loss=0.2385, over 5586032.50 frames. ], batch size: 81, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:13:02,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=690400.0, ans=0.0 2024-09-19 15:13:05,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=690400.0, ans=0.1 2024-09-19 15:13:26,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690480.0, ans=0.0 2024-09-19 15:13:40,408 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.432e+01 8.976e+01 9.547e+01 1.845e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-19 15:14:00,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=690560.0, ans=0.125 2024-09-19 15:14:12,090 INFO [train.py:1198] (0/2) Epoch 39, batch 700, loss[loss=0.2227, ctc_loss=0.1045, cr_loss=0.3371, attn_decoder_loss=0.2284, over 29531.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1133, cr_loss=0.3532, attn_decoder_loss=0.2388, over 5637095.02 frames. ], batch size: 76, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:14:12,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690600.0, ans=0.1 2024-09-19 15:14:16,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=690600.0, ans=0.0 2024-09-19 15:14:22,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=690600.0, ans=0.0 2024-09-19 15:14:24,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690600.0, ans=0.1 2024-09-19 15:14:32,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=690640.0, ans=6.0 2024-09-19 15:14:33,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=690640.0, ans=0.125 2024-09-19 15:14:34,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=690640.0, ans=0.0 2024-09-19 15:14:42,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-09-19 15:14:45,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=690680.0, ans=0.0 2024-09-19 15:14:58,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=690720.0, ans=0.0 2024-09-19 15:15:01,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=690720.0, ans=0.125 2024-09-19 15:15:27,466 INFO [train.py:1198] (0/2) Epoch 39, batch 750, loss[loss=0.2413, ctc_loss=0.1154, cr_loss=0.3704, attn_decoder_loss=0.2471, over 29698.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1131, cr_loss=0.3528, attn_decoder_loss=0.2384, over 5675391.50 frames. ], batch size: 82, lr: 2.84e-03, grad_scale: 8.0 2024-09-19 15:15:34,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=690800.0, ans=0.125 2024-09-19 15:15:40,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=690800.0, ans=0.125 2024-09-19 15:15:44,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=690840.0, ans=0.025 2024-09-19 15:15:53,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2024-09-19 15:16:13,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.83 vs. limit=15.0 2024-09-19 15:16:15,751 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.497e+01 9.078e+01 9.651e+01 1.974e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 15:16:47,095 INFO [train.py:1198] (0/2) Epoch 39, batch 800, loss[loss=0.2039, ctc_loss=0.09287, cr_loss=0.2991, attn_decoder_loss=0.2096, over 29610.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1131, cr_loss=0.3528, attn_decoder_loss=0.2384, over 5705974.29 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:17:14,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=691040.0, ans=0.125 2024-09-19 15:17:19,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=691080.0, ans=0.2 2024-09-19 15:17:25,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=691080.0, ans=0.0 2024-09-19 15:17:37,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=691120.0, ans=0.125 2024-09-19 15:17:45,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-19 15:17:50,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=691160.0, ans=0.0 2024-09-19 15:18:02,595 INFO [train.py:1198] (0/2) Epoch 39, batch 850, loss[loss=0.2443, ctc_loss=0.1185, cr_loss=0.3588, attn_decoder_loss=0.2503, over 29715.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1128, cr_loss=0.3521, attn_decoder_loss=0.2382, over 5734719.02 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:18:14,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=691200.0, ans=0.05 2024-09-19 15:18:19,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=691240.0, ans=0.0 2024-09-19 15:18:25,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=691240.0, ans=0.125 2024-09-19 15:18:26,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=691240.0, ans=0.0 2024-09-19 15:18:38,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=691280.0, ans=0.0 2024-09-19 15:18:41,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=691280.0, ans=0.125 2024-09-19 15:18:46,234 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.459e+01 8.825e+01 9.402e+01 1.909e+02, threshold=1.765e+02, percent-clipped=1.0 2024-09-19 15:18:58,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=691320.0, ans=0.0 2024-09-19 15:19:06,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=691360.0, ans=0.125 2024-09-19 15:19:18,058 INFO [train.py:1198] (0/2) Epoch 39, batch 900, loss[loss=0.2122, ctc_loss=0.09792, cr_loss=0.3157, attn_decoder_loss=0.2179, over 29630.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1129, cr_loss=0.3524, attn_decoder_loss=0.2387, over 5739054.48 frames. ], batch size: 73, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:19:18,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=691400.0, ans=0.125 2024-09-19 15:20:08,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=691520.0, ans=0.125 2024-09-19 15:20:23,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=691560.0, ans=0.5 2024-09-19 15:20:26,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=691560.0, ans=0.05 2024-09-19 15:20:34,158 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.03 vs. limit=15.0 2024-09-19 15:20:37,730 INFO [train.py:1198] (0/2) Epoch 39, batch 950, loss[loss=0.2186, ctc_loss=0.09537, cr_loss=0.3165, attn_decoder_loss=0.2253, over 29535.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.113, cr_loss=0.3523, attn_decoder_loss=0.2388, over 5742216.84 frames. ], batch size: 74, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:20:39,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=691600.0, ans=0.125 2024-09-19 15:21:11,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691680.0, ans=0.1 2024-09-19 15:21:17,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=691680.0, ans=0.09899494936611666 2024-09-19 15:21:20,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=691680.0, ans=0.125 2024-09-19 15:21:21,646 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.688e+01 9.370e+01 1.012e+02 2.860e+02, threshold=1.874e+02, percent-clipped=2.0 2024-09-19 15:21:27,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.52 vs. limit=22.5 2024-09-19 15:21:46,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=691760.0, ans=0.2 2024-09-19 15:21:53,224 INFO [train.py:1198] (0/2) Epoch 39, batch 1000, loss[loss=0.2312, ctc_loss=0.1143, cr_loss=0.3451, attn_decoder_loss=0.2366, over 29507.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1142, cr_loss=0.3547, attn_decoder_loss=0.2399, over 5736475.03 frames. ], batch size: 77, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:21:54,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=691800.0, ans=0.025 2024-09-19 15:22:02,487 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:22:28,664 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-19 15:22:53,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-19 15:22:58,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=691960.0, ans=0.125 2024-09-19 15:23:08,640 INFO [train.py:1198] (0/2) Epoch 39, batch 1050, loss[loss=0.2456, ctc_loss=0.1213, cr_loss=0.3758, attn_decoder_loss=0.251, over 29678.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1136, cr_loss=0.3535, attn_decoder_loss=0.2393, over 5744592.71 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:23:34,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=692040.0, ans=0.125 2024-09-19 15:23:41,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=692080.0, ans=0.2 2024-09-19 15:23:49,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=692080.0, ans=0.0 2024-09-19 15:23:54,908 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.552e+01 9.121e+01 9.553e+01 1.921e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 15:24:03,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=692120.0, ans=0.0 2024-09-19 15:24:05,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=692120.0, ans=0.125 2024-09-19 15:24:26,500 INFO [train.py:1198] (0/2) Epoch 39, batch 1100, loss[loss=0.225, ctc_loss=0.1054, cr_loss=0.3349, attn_decoder_loss=0.2309, over 29457.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1132, cr_loss=0.3527, attn_decoder_loss=0.239, over 5757123.06 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:24:37,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=692200.0, ans=0.125 2024-09-19 15:25:17,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2024-09-19 15:25:23,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-19 15:25:35,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=692360.0, ans=0.05 2024-09-19 15:25:42,571 INFO [train.py:1198] (0/2) Epoch 39, batch 1150, loss[loss=0.2295, ctc_loss=0.1111, cr_loss=0.3489, attn_decoder_loss=0.2349, over 29450.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1134, cr_loss=0.353, attn_decoder_loss=0.239, over 5754981.89 frames. ], batch size: 78, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 15:25:46,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=692400.0, ans=0.125 2024-09-19 15:26:05,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2024-09-19 15:26:23,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=15.0 2024-09-19 15:26:26,561 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 8.488e+01 9.080e+01 9.695e+01 1.564e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 15:26:30,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.51 vs. limit=15.0 2024-09-19 15:26:42,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-09-19 15:26:43,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=692560.0, ans=0.125 2024-09-19 15:26:58,174 INFO [train.py:1198] (0/2) Epoch 39, batch 1200, loss[loss=0.2377, ctc_loss=0.1133, cr_loss=0.3424, attn_decoder_loss=0.2439, over 29664.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1138, cr_loss=0.3539, attn_decoder_loss=0.2397, over 5747749.79 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 15:27:31,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=692680.0, ans=0.125 2024-09-19 15:27:52,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692720.0, ans=0.1 2024-09-19 15:27:52,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=692720.0, ans=0.125 2024-09-19 15:27:54,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=12.0 2024-09-19 15:27:57,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=692720.0, ans=0.0 2024-09-19 15:28:05,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.69 vs. limit=15.0 2024-09-19 15:28:09,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=692760.0, ans=0.0 2024-09-19 15:28:18,501 INFO [train.py:1198] (0/2) Epoch 39, batch 1250, loss[loss=0.2602, ctc_loss=0.1393, cr_loss=0.4193, attn_decoder_loss=0.2643, over 29533.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1145, cr_loss=0.3558, attn_decoder_loss=0.2405, over 5775392.73 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:28:35,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=692840.0, ans=0.2 2024-09-19 15:28:46,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=692840.0, ans=0.0 2024-09-19 15:29:05,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.604e+01 9.074e+01 9.816e+01 4.150e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-19 15:29:24,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.93 vs. limit=15.0 2024-09-19 15:29:31,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=692960.0, ans=0.125 2024-09-19 15:29:33,885 INFO [train.py:1198] (0/2) Epoch 39, batch 1300, loss[loss=0.2454, ctc_loss=0.1167, cr_loss=0.3557, attn_decoder_loss=0.2518, over 28193.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1144, cr_loss=0.3557, attn_decoder_loss=0.24, over 5781124.70 frames. ], batch size: 111, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:29:38,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=693000.0, ans=0.0 2024-09-19 15:29:41,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=693000.0, ans=0.2 2024-09-19 15:29:43,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693000.0, ans=0.1 2024-09-19 15:29:43,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=693000.0, ans=0.2 2024-09-19 15:29:49,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.42 vs. limit=15.0 2024-09-19 15:30:17,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-19 15:30:49,663 INFO [train.py:1198] (0/2) Epoch 39, batch 1350, loss[loss=0.229, ctc_loss=0.1065, cr_loss=0.3481, attn_decoder_loss=0.2349, over 29757.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1136, cr_loss=0.3544, attn_decoder_loss=0.2396, over 5798879.13 frames. ], batch size: 81, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:31:36,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=693280.0, ans=0.125 2024-09-19 15:31:40,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.563e+01 8.987e+01 9.374e+01 1.474e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 15:31:57,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=693360.0, ans=0.125 2024-09-19 15:32:02,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=693360.0, ans=10.0 2024-09-19 15:32:09,545 INFO [train.py:1198] (0/2) Epoch 39, batch 1400, loss[loss=0.2032, ctc_loss=0.08939, cr_loss=0.3022, attn_decoder_loss=0.2091, over 29612.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1135, cr_loss=0.3546, attn_decoder_loss=0.2394, over 5808969.58 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:32:22,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-09-19 15:32:29,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.73 vs. limit=12.0 2024-09-19 15:32:30,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=693440.0, ans=0.0 2024-09-19 15:32:32,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=693440.0, ans=0.125 2024-09-19 15:32:51,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.94 vs. limit=15.0 2024-09-19 15:32:52,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=693480.0, ans=0.0 2024-09-19 15:33:03,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.57 vs. limit=22.5 2024-09-19 15:33:17,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=693560.0, ans=0.0 2024-09-19 15:33:22,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-09-19 15:33:24,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693600.0, ans=0.1 2024-09-19 15:33:25,283 INFO [train.py:1198] (0/2) Epoch 39, batch 1450, loss[loss=0.2548, ctc_loss=0.1299, cr_loss=0.4057, attn_decoder_loss=0.2596, over 29452.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1138, cr_loss=0.3554, attn_decoder_loss=0.2398, over 5805491.22 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:33:38,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=693640.0, ans=0.125 2024-09-19 15:33:39,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.68 vs. limit=15.0 2024-09-19 15:33:43,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=693640.0, ans=0.125 2024-09-19 15:33:46,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693640.0, ans=0.1 2024-09-19 15:33:55,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-09-19 15:33:58,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=693680.0, ans=0.125 2024-09-19 15:34:00,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693680.0, ans=0.1 2024-09-19 15:34:11,639 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.532e+01 9.213e+01 9.668e+01 2.812e+02, threshold=1.843e+02, percent-clipped=2.0 2024-09-19 15:34:12,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=693720.0, ans=0.125 2024-09-19 15:34:13,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693720.0, ans=0.1 2024-09-19 15:34:15,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=693720.0, ans=0.0 2024-09-19 15:34:31,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=693760.0, ans=0.125 2024-09-19 15:34:38,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=693800.0, ans=0.0 2024-09-19 15:34:40,279 INFO [train.py:1198] (0/2) Epoch 39, batch 1500, loss[loss=0.2326, ctc_loss=0.1126, cr_loss=0.3571, attn_decoder_loss=0.238, over 29631.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1137, cr_loss=0.3551, attn_decoder_loss=0.2399, over 5805488.03 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:34:48,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=693800.0, ans=0.0 2024-09-19 15:34:48,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=693800.0, ans=0.125 2024-09-19 15:34:57,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693840.0, ans=0.1 2024-09-19 15:35:11,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-09-19 15:35:18,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=693880.0, ans=0.0 2024-09-19 15:35:20,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-19 15:35:26,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=693880.0, ans=0.125 2024-09-19 15:35:29,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=693920.0, ans=0.125 2024-09-19 15:35:53,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=693960.0, ans=0.2 2024-09-19 15:35:56,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=693960.0, ans=0.0 2024-09-19 15:36:00,504 INFO [train.py:1198] (0/2) Epoch 39, batch 1550, loss[loss=0.2358, ctc_loss=0.118, cr_loss=0.3708, attn_decoder_loss=0.2406, over 29515.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.114, cr_loss=0.3554, attn_decoder_loss=0.2399, over 5781889.75 frames. ], batch size: 90, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:36:09,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=694000.0, ans=0.07 2024-09-19 15:36:10,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=694000.0, ans=0.125 2024-09-19 15:36:30,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2024-09-19 15:36:47,241 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.376e+01 8.823e+01 9.525e+01 1.389e+02, threshold=1.765e+02, percent-clipped=0.0 2024-09-19 15:36:50,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=694120.0, ans=0.125 2024-09-19 15:36:55,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=694120.0, ans=0.2 2024-09-19 15:36:58,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=694120.0, ans=0.2 2024-09-19 15:37:02,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=694160.0, ans=0.1 2024-09-19 15:37:13,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=694160.0, ans=0.125 2024-09-19 15:37:16,088 INFO [train.py:1198] (0/2) Epoch 39, batch 1600, loss[loss=0.2278, ctc_loss=0.09962, cr_loss=0.3234, attn_decoder_loss=0.2348, over 29687.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1139, cr_loss=0.3547, attn_decoder_loss=0.2396, over 5764443.69 frames. ], batch size: 85, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:37:18,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2024-09-19 15:37:19,964 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2024-09-19 15:37:32,943 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:37:55,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=694280.0, ans=0.0 2024-09-19 15:38:04,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=694320.0, ans=0.125 2024-09-19 15:38:14,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=694360.0, ans=0.125 2024-09-19 15:38:19,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694360.0, ans=0.1 2024-09-19 15:38:20,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=694360.0, ans=0.0 2024-09-19 15:38:27,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=694360.0, ans=0.0 2024-09-19 15:38:31,547 INFO [train.py:1198] (0/2) Epoch 39, batch 1650, loss[loss=0.2434, ctc_loss=0.1172, cr_loss=0.3673, attn_decoder_loss=0.2493, over 29727.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1133, cr_loss=0.3534, attn_decoder_loss=0.2393, over 5758025.67 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:38:36,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=694400.0, ans=0.125 2024-09-19 15:38:58,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=694440.0, ans=0.125 2024-09-19 15:39:01,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=694440.0, ans=0.0 2024-09-19 15:39:12,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=694480.0, ans=0.0 2024-09-19 15:39:17,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-19 15:39:22,826 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.471e+01 8.887e+01 9.578e+01 2.740e+02, threshold=1.777e+02, percent-clipped=2.0 2024-09-19 15:39:33,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=694520.0, ans=0.125 2024-09-19 15:39:33,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=694520.0, ans=0.125 2024-09-19 15:39:49,192 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-19 15:39:51,161 INFO [train.py:1198] (0/2) Epoch 39, batch 1700, loss[loss=0.2027, ctc_loss=0.09043, cr_loss=0.3017, attn_decoder_loss=0.2085, over 29588.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1128, cr_loss=0.3523, attn_decoder_loss=0.2389, over 5780843.16 frames. ], batch size: 69, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:40:02,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=694600.0, ans=0.125 2024-09-19 15:40:02,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=694600.0, ans=0.025 2024-09-19 15:40:39,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=694720.0, ans=0.125 2024-09-19 15:40:45,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=694720.0, ans=0.0 2024-09-19 15:40:51,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=8.0 2024-09-19 15:41:06,503 INFO [train.py:1198] (0/2) Epoch 39, batch 1750, loss[loss=0.2092, ctc_loss=0.09943, cr_loss=0.3176, attn_decoder_loss=0.2143, over 29354.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1129, cr_loss=0.353, attn_decoder_loss=0.2388, over 5789273.76 frames. ], batch size: 67, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:41:26,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=694840.0, ans=0.125 2024-09-19 15:41:34,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=694840.0, ans=0.0 2024-09-19 15:41:37,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=694880.0, ans=0.0 2024-09-19 15:41:38,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=694880.0, ans=0.0 2024-09-19 15:41:53,569 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.741e+01 9.226e+01 9.687e+01 1.772e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-19 15:42:06,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-19 15:42:22,079 INFO [train.py:1198] (0/2) Epoch 39, batch 1800, loss[loss=0.2493, ctc_loss=0.1205, cr_loss=0.3825, attn_decoder_loss=0.2551, over 29709.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1132, cr_loss=0.353, attn_decoder_loss=0.239, over 5791863.09 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:42:39,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=695040.0, ans=0.125 2024-09-19 15:43:21,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695120.0, ans=0.1 2024-09-19 15:43:31,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=695160.0, ans=0.125 2024-09-19 15:43:41,999 INFO [train.py:1198] (0/2) Epoch 39, batch 1850, loss[loss=0.2399, ctc_loss=0.1163, cr_loss=0.3621, attn_decoder_loss=0.2456, over 29642.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1133, cr_loss=0.3537, attn_decoder_loss=0.2388, over 5799274.73 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:43:50,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.68 vs. limit=15.0 2024-09-19 15:43:51,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=695200.0, ans=0.125 2024-09-19 15:44:01,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=695240.0, ans=0.125 2024-09-19 15:44:04,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695240.0, ans=0.125 2024-09-19 15:44:11,185 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:44:17,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=695280.0, ans=0.0 2024-09-19 15:44:26,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.41 vs. limit=15.0 2024-09-19 15:44:28,637 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.566e+01 9.030e+01 9.513e+01 1.502e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-19 15:44:30,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=695320.0, ans=0.0 2024-09-19 15:44:37,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=695320.0, ans=0.125 2024-09-19 15:44:47,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695360.0, ans=0.1 2024-09-19 15:44:48,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695360.0, ans=0.125 2024-09-19 15:44:57,268 INFO [train.py:1198] (0/2) Epoch 39, batch 1900, loss[loss=0.2333, ctc_loss=0.1036, cr_loss=0.34, attn_decoder_loss=0.2401, over 29713.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1136, cr_loss=0.3546, attn_decoder_loss=0.2394, over 5807173.15 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:45:03,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=695400.0, ans=0.0 2024-09-19 15:45:09,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=695400.0, ans=0.0 2024-09-19 15:45:20,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=695440.0, ans=0.0 2024-09-19 15:46:07,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=695560.0, ans=0.0 2024-09-19 15:46:13,431 INFO [train.py:1198] (0/2) Epoch 39, batch 1950, loss[loss=0.2277, ctc_loss=0.1123, cr_loss=0.34, attn_decoder_loss=0.233, over 29460.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1143, cr_loss=0.3563, attn_decoder_loss=0.2408, over 5821590.84 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:46:13,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=695600.0, ans=0.125 2024-09-19 15:46:21,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.86 vs. limit=15.0 2024-09-19 15:46:22,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=695600.0, ans=0.125 2024-09-19 15:46:25,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695600.0, ans=0.1 2024-09-19 15:46:37,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-09-19 15:46:56,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-19 15:47:02,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-19 15:47:04,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.858e+01 9.313e+01 9.741e+01 2.178e+02, threshold=1.863e+02, percent-clipped=1.0 2024-09-19 15:47:07,454 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:47:08,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-19 15:47:31,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=695800.0, ans=0.1 2024-09-19 15:47:32,837 INFO [train.py:1198] (0/2) Epoch 39, batch 2000, loss[loss=0.2008, ctc_loss=0.09166, cr_loss=0.2977, attn_decoder_loss=0.2063, over 29303.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1147, cr_loss=0.3566, attn_decoder_loss=0.2413, over 5796932.16 frames. ], batch size: 67, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 15:48:08,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=695880.0, ans=0.125 2024-09-19 15:48:12,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=695880.0, ans=0.125 2024-09-19 15:48:48,812 INFO [train.py:1198] (0/2) Epoch 39, batch 2050, loss[loss=0.2113, ctc_loss=0.1042, cr_loss=0.3442, attn_decoder_loss=0.2155, over 29416.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1143, cr_loss=0.3556, attn_decoder_loss=0.2405, over 5789189.90 frames. ], batch size: 70, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:49:03,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-19 15:49:11,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=696040.0, ans=0.0 2024-09-19 15:49:20,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 15:49:31,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=696080.0, ans=0.125 2024-09-19 15:49:37,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.542e+01 8.929e+01 9.648e+01 1.386e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-19 15:49:48,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.24 vs. limit=15.0 2024-09-19 15:50:02,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-09-19 15:50:04,425 INFO [train.py:1198] (0/2) Epoch 39, batch 2100, loss[loss=0.2371, ctc_loss=0.1161, cr_loss=0.3557, attn_decoder_loss=0.2426, over 29750.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1139, cr_loss=0.3556, attn_decoder_loss=0.2401, over 5800887.58 frames. ], batch size: 81, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:50:09,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.23 vs. limit=15.0 2024-09-19 15:50:09,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.25 vs. limit=12.0 2024-09-19 15:50:23,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696240.0, ans=0.1 2024-09-19 15:50:28,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.13 vs. limit=10.0 2024-09-19 15:50:32,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=696240.0, ans=0.5 2024-09-19 15:50:56,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=696320.0, ans=0.125 2024-09-19 15:51:09,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=696360.0, ans=0.1 2024-09-19 15:51:23,876 INFO [train.py:1198] (0/2) Epoch 39, batch 2150, loss[loss=0.2329, ctc_loss=0.1177, cr_loss=0.3737, attn_decoder_loss=0.2374, over 29443.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1137, cr_loss=0.3557, attn_decoder_loss=0.2396, over 5814785.15 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:51:25,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=696400.0, ans=0.0 2024-09-19 15:51:28,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=696400.0, ans=0.2 2024-09-19 15:51:28,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=696400.0, ans=0.0 2024-09-19 15:51:45,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=696440.0, ans=0.0 2024-09-19 15:51:54,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=696480.0, ans=0.0 2024-09-19 15:52:12,126 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.545e+01 9.048e+01 9.484e+01 1.799e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 15:52:17,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2024-09-19 15:52:21,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=696520.0, ans=0.0 2024-09-19 15:52:24,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696560.0, ans=0.125 2024-09-19 15:52:39,526 INFO [train.py:1198] (0/2) Epoch 39, batch 2200, loss[loss=0.2364, ctc_loss=0.1139, cr_loss=0.3527, attn_decoder_loss=0.2422, over 29641.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.114, cr_loss=0.3558, attn_decoder_loss=0.2398, over 5810928.35 frames. ], batch size: 86, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:52:53,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=696640.0, ans=0.125 2024-09-19 15:53:09,859 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:53:54,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696800.0, ans=0.1 2024-09-19 15:53:55,433 INFO [train.py:1198] (0/2) Epoch 39, batch 2250, loss[loss=0.2405, ctc_loss=0.1122, cr_loss=0.3498, attn_decoder_loss=0.247, over 29684.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1136, cr_loss=0.3549, attn_decoder_loss=0.2397, over 5810493.44 frames. ], batch size: 82, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:54:04,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696800.0, ans=0.1 2024-09-19 15:54:09,506 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:54:35,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=696880.0, ans=0.0 2024-09-19 15:54:40,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=696880.0, ans=0.0 2024-09-19 15:54:45,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.499e+01 9.039e+01 9.530e+01 1.426e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 15:55:04,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=696960.0, ans=0.125 2024-09-19 15:55:15,221 INFO [train.py:1198] (0/2) Epoch 39, batch 2300, loss[loss=0.2061, ctc_loss=0.08873, cr_loss=0.2944, attn_decoder_loss=0.2126, over 29302.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1129, cr_loss=0.353, attn_decoder_loss=0.2386, over 5797936.49 frames. ], batch size: 71, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:55:16,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-09-19 15:55:51,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=697080.0, ans=0.0 2024-09-19 15:55:54,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=697080.0, ans=0.0 2024-09-19 15:55:55,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-19 15:56:30,109 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=12.0 2024-09-19 15:56:30,684 INFO [train.py:1198] (0/2) Epoch 39, batch 2350, loss[loss=0.2368, ctc_loss=0.1157, cr_loss=0.3706, attn_decoder_loss=0.242, over 29686.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1134, cr_loss=0.3538, attn_decoder_loss=0.239, over 5804213.54 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 8.0 2024-09-19 15:56:42,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=22.5 2024-09-19 15:56:56,490 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:56:58,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.62 vs. limit=10.0 2024-09-19 15:57:11,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=697280.0, ans=0.0 2024-09-19 15:57:13,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697280.0, ans=0.1 2024-09-19 15:57:20,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.672e+01 9.121e+01 9.858e+01 6.738e+02, threshold=1.824e+02, percent-clipped=2.0 2024-09-19 15:57:45,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2024-09-19 15:57:45,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.60 vs. limit=15.0 2024-09-19 15:57:46,027 INFO [train.py:1198] (0/2) Epoch 39, batch 2400, loss[loss=0.2221, ctc_loss=0.1017, cr_loss=0.3371, attn_decoder_loss=0.228, over 29527.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1139, cr_loss=0.3548, attn_decoder_loss=0.2395, over 5808492.50 frames. ], batch size: 76, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 15:57:49,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.72 vs. limit=15.0 2024-09-19 15:58:03,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=697440.0, ans=0.0 2024-09-19 15:58:14,316 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:58:20,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697480.0, ans=0.1 2024-09-19 15:58:46,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=697520.0, ans=0.025 2024-09-19 15:58:52,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-19 15:59:06,350 INFO [train.py:1198] (0/2) Epoch 39, batch 2450, loss[loss=0.2442, ctc_loss=0.1228, cr_loss=0.3692, attn_decoder_loss=0.2495, over 29674.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1143, cr_loss=0.3551, attn_decoder_loss=0.2401, over 5785343.74 frames. ], batch size: 82, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 15:59:08,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=697600.0, ans=0.0 2024-09-19 15:59:36,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=697680.0, ans=0.1 2024-09-19 15:59:48,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=697680.0, ans=0.2 2024-09-19 15:59:55,611 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 8.647e+01 9.273e+01 9.890e+01 2.382e+02, threshold=1.855e+02, percent-clipped=2.0 2024-09-19 15:59:59,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.26 vs. limit=10.0 2024-09-19 16:00:09,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=697760.0, ans=0.125 2024-09-19 16:00:14,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=697760.0, ans=0.125 2024-09-19 16:00:20,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=697800.0, ans=0.0 2024-09-19 16:00:21,285 INFO [train.py:1198] (0/2) Epoch 39, batch 2500, loss[loss=0.2392, ctc_loss=0.1093, cr_loss=0.3547, attn_decoder_loss=0.2458, over 29611.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1148, cr_loss=0.3567, attn_decoder_loss=0.2404, over 5795486.30 frames. ], batch size: 86, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:00:21,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=697800.0, ans=0.0 2024-09-19 16:00:32,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=697800.0, ans=0.0 2024-09-19 16:00:53,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=697880.0, ans=0.0 2024-09-19 16:01:08,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=697920.0, ans=0.0 2024-09-19 16:01:20,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=697960.0, ans=0.07 2024-09-19 16:01:37,124 INFO [train.py:1198] (0/2) Epoch 39, batch 2550, loss[loss=0.2085, ctc_loss=0.09364, cr_loss=0.305, attn_decoder_loss=0.2145, over 29340.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1142, cr_loss=0.3554, attn_decoder_loss=0.2399, over 5799297.67 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:01:42,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=698000.0, ans=0.0 2024-09-19 16:01:46,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=698000.0, ans=0.05 2024-09-19 16:01:56,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=698040.0, ans=0.025 2024-09-19 16:02:12,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-19 16:02:20,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.88 vs. limit=12.0 2024-09-19 16:02:28,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.383e+01 8.876e+01 9.415e+01 4.021e+02, threshold=1.775e+02, percent-clipped=1.0 2024-09-19 16:02:36,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=698120.0, ans=0.0 2024-09-19 16:02:42,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=698160.0, ans=0.0 2024-09-19 16:02:56,979 INFO [train.py:1198] (0/2) Epoch 39, batch 2600, loss[loss=0.2322, ctc_loss=0.1105, cr_loss=0.3465, attn_decoder_loss=0.2381, over 29456.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1143, cr_loss=0.3561, attn_decoder_loss=0.2403, over 5795324.99 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:03:00,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=698200.0, ans=0.1 2024-09-19 16:03:00,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=698200.0, ans=0.125 2024-09-19 16:03:02,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=12.0 2024-09-19 16:03:12,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=698240.0, ans=0.0 2024-09-19 16:03:22,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=698240.0, ans=0.0 2024-09-19 16:03:29,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.19 vs. limit=22.5 2024-09-19 16:03:36,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698280.0, ans=0.1 2024-09-19 16:03:36,614 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:04:12,659 INFO [train.py:1198] (0/2) Epoch 39, batch 2650, loss[loss=0.252, ctc_loss=0.1284, cr_loss=0.3921, attn_decoder_loss=0.257, over 29242.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1143, cr_loss=0.356, attn_decoder_loss=0.2405, over 5801836.55 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:04:27,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.22 vs. limit=10.0 2024-09-19 16:04:58,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=698520.0, ans=0.2 2024-09-19 16:05:02,269 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.675e+01 8.983e+01 9.685e+01 2.002e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 16:05:02,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=698520.0, ans=0.0 2024-09-19 16:05:02,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=698520.0, ans=0.025 2024-09-19 16:05:04,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=698520.0, ans=0.0 2024-09-19 16:05:11,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698560.0, ans=0.1 2024-09-19 16:05:11,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=698560.0, ans=0.0 2024-09-19 16:05:11,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=698560.0, ans=0.5 2024-09-19 16:05:14,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=698560.0, ans=0.125 2024-09-19 16:05:17,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=698560.0, ans=0.1 2024-09-19 16:05:23,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=698560.0, ans=0.0 2024-09-19 16:05:27,840 INFO [train.py:1198] (0/2) Epoch 39, batch 2700, loss[loss=0.2297, ctc_loss=0.1093, cr_loss=0.3434, attn_decoder_loss=0.2355, over 29519.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1146, cr_loss=0.3568, attn_decoder_loss=0.2409, over 5797846.18 frames. ], batch size: 87, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:05:29,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=698600.0, ans=0.2 2024-09-19 16:05:31,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.43 vs. limit=15.0 2024-09-19 16:05:38,619 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:05:48,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=698640.0, ans=0.125 2024-09-19 16:06:18,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=698720.0, ans=0.1 2024-09-19 16:06:27,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=698720.0, ans=0.125 2024-09-19 16:06:38,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698760.0, ans=0.1 2024-09-19 16:06:45,669 INFO [train.py:1198] (0/2) Epoch 39, batch 2750, loss[loss=0.2212, ctc_loss=0.103, cr_loss=0.3507, attn_decoder_loss=0.2266, over 29517.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1138, cr_loss=0.3546, attn_decoder_loss=0.2397, over 5796852.71 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:06:45,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=698800.0, ans=0.2 2024-09-19 16:07:29,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698880.0, ans=0.1 2024-09-19 16:07:34,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=698920.0, ans=0.09899494936611666 2024-09-19 16:07:38,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.394e+01 9.092e+01 9.647e+01 2.225e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 16:07:46,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=698960.0, ans=0.07 2024-09-19 16:08:03,250 INFO [train.py:1198] (0/2) Epoch 39, batch 2800, loss[loss=0.2488, ctc_loss=0.1341, cr_loss=0.3448, attn_decoder_loss=0.2539, over 19535.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.114, cr_loss=0.3545, attn_decoder_loss=0.2399, over 5776372.94 frames. ], batch size: 210, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:08:06,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=699000.0, ans=0.025 2024-09-19 16:08:09,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=699000.0, ans=0.0 2024-09-19 16:08:09,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2024-09-19 16:08:42,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=699080.0, ans=0.125 2024-09-19 16:08:50,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=699120.0, ans=0.125 2024-09-19 16:08:50,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=699120.0, ans=0.0 2024-09-19 16:09:05,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=699160.0, ans=0.2 2024-09-19 16:09:11,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=699160.0, ans=0.025 2024-09-19 16:09:12,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=699160.0, ans=0.2 2024-09-19 16:09:14,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=699160.0, ans=0.0 2024-09-19 16:09:18,775 INFO [train.py:1198] (0/2) Epoch 39, batch 2850, loss[loss=0.2366, ctc_loss=0.1216, cr_loss=0.3662, attn_decoder_loss=0.2412, over 29510.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1144, cr_loss=0.3554, attn_decoder_loss=0.2404, over 5761953.36 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:09:33,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699200.0, ans=0.1 2024-09-19 16:09:45,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=699240.0, ans=0.125 2024-09-19 16:10:12,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.94 vs. limit=6.0 2024-09-19 16:10:13,572 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.628e+01 9.119e+01 9.691e+01 3.191e+02, threshold=1.824e+02, percent-clipped=2.0 2024-09-19 16:10:36,328 INFO [train.py:1198] (0/2) Epoch 39, batch 2900, loss[loss=0.2327, ctc_loss=0.1151, cr_loss=0.3544, attn_decoder_loss=0.2379, over 29437.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.115, cr_loss=0.3567, attn_decoder_loss=0.2414, over 5787344.23 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:10:44,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=699400.0, ans=0.04949747468305833 2024-09-19 16:10:59,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-09-19 16:11:18,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=699480.0, ans=0.025 2024-09-19 16:11:30,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699520.0, ans=0.1 2024-09-19 16:11:41,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=699560.0, ans=0.125 2024-09-19 16:11:53,780 INFO [train.py:1198] (0/2) Epoch 39, batch 2950, loss[loss=0.227, ctc_loss=0.112, cr_loss=0.3642, attn_decoder_loss=0.2317, over 29545.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1139, cr_loss=0.3547, attn_decoder_loss=0.24, over 5782862.58 frames. ], batch size: 75, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:11:54,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.10 vs. limit=22.5 2024-09-19 16:12:03,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=699600.0, ans=0.0 2024-09-19 16:12:08,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=12.0 2024-09-19 16:12:09,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=699640.0, ans=0.125 2024-09-19 16:12:17,563 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-19 16:12:21,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-19 16:12:22,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699680.0, ans=0.1 2024-09-19 16:12:30,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2024-09-19 16:12:32,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2024-09-19 16:12:41,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=699720.0, ans=0.0 2024-09-19 16:12:41,120 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:12:43,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-19 16:12:45,557 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:12:46,051 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.38 vs. limit=10.0 2024-09-19 16:12:46,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.658e+01 9.205e+01 9.936e+01 3.321e+02, threshold=1.841e+02, percent-clipped=1.0 2024-09-19 16:12:54,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=699760.0, ans=0.0 2024-09-19 16:12:57,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=699760.0, ans=0.05 2024-09-19 16:13:09,516 INFO [train.py:1198] (0/2) Epoch 39, batch 3000, loss[loss=0.2397, ctc_loss=0.114, cr_loss=0.3782, attn_decoder_loss=0.2452, over 29757.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1137, cr_loss=0.3548, attn_decoder_loss=0.24, over 5783556.34 frames. ], batch size: 81, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:13:09,517 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 16:13:28,815 INFO [train.py:1230] (0/2) Epoch 39, validation: loss=0.2123, ctc_loss=0.03671, cr_loss=6.289e-15, attn_decoder_loss=0.2318, over 944034.00 frames. 2024-09-19 16:13:28,816 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 16:13:29,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=699800.0, ans=0.0 2024-09-19 16:13:48,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=699840.0, ans=0.0 2024-09-19 16:13:49,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.61 vs. limit=15.0 2024-09-19 16:13:54,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=699840.0, ans=0.0 2024-09-19 16:13:56,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=699840.0, ans=0.125 2024-09-19 16:14:01,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.53 vs. limit=15.0 2024-09-19 16:14:02,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=699880.0, ans=0.0 2024-09-19 16:14:40,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=699960.0, ans=0.025 2024-09-19 16:14:46,919 INFO [train.py:1198] (0/2) Epoch 39, batch 3050, loss[loss=0.2253, ctc_loss=0.1086, cr_loss=0.3556, attn_decoder_loss=0.2303, over 29532.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1144, cr_loss=0.3562, attn_decoder_loss=0.2404, over 5776898.58 frames. ], batch size: 76, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:14:48,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=700000.0, ans=0.125 2024-09-19 16:15:03,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=700040.0, ans=0.025 2024-09-19 16:15:31,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-19 16:15:39,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.454e+01 9.058e+01 9.630e+01 1.961e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 16:15:45,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=700160.0, ans=0.2 2024-09-19 16:16:02,121 INFO [train.py:1198] (0/2) Epoch 39, batch 3100, loss[loss=0.2444, ctc_loss=0.1176, cr_loss=0.3627, attn_decoder_loss=0.2504, over 29300.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1141, cr_loss=0.3558, attn_decoder_loss=0.2399, over 5776698.02 frames. ], batch size: 100, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:16:08,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=700200.0, ans=0.025 2024-09-19 16:16:08,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-09-19 16:16:20,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=12.0 2024-09-19 16:16:21,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=700240.0, ans=0.125 2024-09-19 16:16:29,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=700240.0, ans=0.125 2024-09-19 16:16:54,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-19 16:16:58,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700320.0, ans=0.1 2024-09-19 16:17:19,625 INFO [train.py:1198] (0/2) Epoch 39, batch 3150, loss[loss=0.2567, ctc_loss=0.1351, cr_loss=0.3982, attn_decoder_loss=0.2614, over 28752.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.114, cr_loss=0.3553, attn_decoder_loss=0.24, over 5783160.27 frames. ], batch size: 104, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:17:39,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=700440.0, ans=0.2 2024-09-19 16:17:44,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=700440.0, ans=0.125 2024-09-19 16:18:11,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=700520.0, ans=0.1 2024-09-19 16:18:12,218 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.732e+01 9.135e+01 9.638e+01 1.512e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 16:18:32,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=700560.0, ans=0.0 2024-09-19 16:18:33,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=700560.0, ans=0.5 2024-09-19 16:18:36,823 INFO [train.py:1198] (0/2) Epoch 39, batch 3200, loss[loss=0.2322, ctc_loss=0.1069, cr_loss=0.3469, attn_decoder_loss=0.2384, over 29392.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1136, cr_loss=0.3544, attn_decoder_loss=0.2394, over 5792188.20 frames. ], batch size: 79, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:19:06,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=700680.0, ans=0.0 2024-09-19 16:19:19,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=700680.0, ans=0.125 2024-09-19 16:19:37,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=22.5 2024-09-19 16:19:50,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=700760.0, ans=0.125 2024-09-19 16:19:53,018 INFO [train.py:1198] (0/2) Epoch 39, batch 3250, loss[loss=0.2407, ctc_loss=0.1157, cr_loss=0.357, attn_decoder_loss=0.2467, over 29709.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.114, cr_loss=0.3551, attn_decoder_loss=0.24, over 5799344.29 frames. ], batch size: 84, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:19:56,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=700800.0, ans=0.125 2024-09-19 16:20:02,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=700800.0, ans=0.025 2024-09-19 16:20:15,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=700840.0, ans=0.0 2024-09-19 16:20:15,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=700840.0, ans=0.2 2024-09-19 16:20:21,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=700880.0, ans=0.125 2024-09-19 16:20:42,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=700920.0, ans=0.2 2024-09-19 16:20:46,553 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 8.604e+01 9.197e+01 9.698e+01 1.830e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 16:21:09,879 INFO [train.py:1198] (0/2) Epoch 39, batch 3300, loss[loss=0.2449, ctc_loss=0.12, cr_loss=0.3559, attn_decoder_loss=0.2509, over 28477.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1131, cr_loss=0.353, attn_decoder_loss=0.2389, over 5795699.89 frames. ], batch size: 112, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:21:34,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=12.0 2024-09-19 16:21:36,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-19 16:21:38,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=701080.0, ans=0.0 2024-09-19 16:21:44,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=701080.0, ans=0.125 2024-09-19 16:22:05,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=701120.0, ans=0.0 2024-09-19 16:22:27,046 INFO [train.py:1198] (0/2) Epoch 39, batch 3350, loss[loss=0.2443, ctc_loss=0.1117, cr_loss=0.3412, attn_decoder_loss=0.2515, over 28887.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1137, cr_loss=0.3539, attn_decoder_loss=0.2396, over 5774165.53 frames. ], batch size: 104, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:22:42,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=701240.0, ans=0.0 2024-09-19 16:23:08,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=701280.0, ans=0.2 2024-09-19 16:23:21,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.630e+01 9.121e+01 9.700e+01 6.720e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 16:23:22,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=701320.0, ans=0.015 2024-09-19 16:23:42,603 INFO [train.py:1198] (0/2) Epoch 39, batch 3400, loss[loss=0.2059, ctc_loss=0.09364, cr_loss=0.3127, attn_decoder_loss=0.2114, over 29345.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.114, cr_loss=0.3544, attn_decoder_loss=0.2394, over 5766307.54 frames. ], batch size: 67, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:23:53,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701400.0, ans=0.1 2024-09-19 16:23:56,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=701440.0, ans=0.125 2024-09-19 16:24:14,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701480.0, ans=0.1 2024-09-19 16:24:53,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=12.0 2024-09-19 16:24:55,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=701560.0, ans=0.125 2024-09-19 16:25:00,312 INFO [train.py:1198] (0/2) Epoch 39, batch 3450, loss[loss=0.2433, ctc_loss=0.1156, cr_loss=0.3668, attn_decoder_loss=0.2494, over 28567.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1139, cr_loss=0.3543, attn_decoder_loss=0.2397, over 5774616.31 frames. ], batch size: 112, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:25:17,240 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:25:18,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=701640.0, ans=0.025 2024-09-19 16:25:24,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=701640.0, ans=0.1 2024-09-19 16:25:48,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=701720.0, ans=0.2 2024-09-19 16:25:54,481 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.636e+01 9.201e+01 9.668e+01 2.196e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-19 16:25:56,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=701720.0, ans=0.125 2024-09-19 16:26:11,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=701760.0, ans=0.0 2024-09-19 16:26:17,579 INFO [train.py:1198] (0/2) Epoch 39, batch 3500, loss[loss=0.2176, ctc_loss=0.1044, cr_loss=0.3262, attn_decoder_loss=0.2229, over 29737.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1136, cr_loss=0.3541, attn_decoder_loss=0.2391, over 5776677.68 frames. ], batch size: 72, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:26:23,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=701800.0, ans=0.2 2024-09-19 16:26:34,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=701840.0, ans=0.125 2024-09-19 16:27:08,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=701920.0, ans=0.0 2024-09-19 16:27:14,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=701920.0, ans=0.125 2024-09-19 16:27:31,753 INFO [train.py:1198] (0/2) Epoch 39, batch 3550, loss[loss=0.2539, ctc_loss=0.1316, cr_loss=0.3915, attn_decoder_loss=0.2588, over 29716.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1133, cr_loss=0.3533, attn_decoder_loss=0.2391, over 5783508.33 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 8.0 2024-09-19 16:27:33,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=702000.0, ans=0.0 2024-09-19 16:27:42,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=702000.0, ans=0.0 2024-09-19 16:27:42,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=702000.0, ans=0.0 2024-09-19 16:28:08,889 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:28:24,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.449e+01 9.039e+01 9.569e+01 2.236e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 16:28:35,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=702160.0, ans=0.125 2024-09-19 16:28:45,267 INFO [train.py:1198] (0/2) Epoch 39, batch 3600, loss[loss=0.2313, ctc_loss=0.1168, cr_loss=0.3494, attn_decoder_loss=0.2362, over 29514.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1134, cr_loss=0.3533, attn_decoder_loss=0.2393, over 5792342.23 frames. ], batch size: 77, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:29:16,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=702280.0, ans=0.125 2024-09-19 16:29:47,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.32 vs. limit=12.0 2024-09-19 16:29:55,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=702360.0, ans=0.05 2024-09-19 16:29:57,354 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:30:01,926 INFO [train.py:1198] (0/2) Epoch 39, batch 3650, loss[loss=0.2533, ctc_loss=0.129, cr_loss=0.3976, attn_decoder_loss=0.2583, over 29508.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.113, cr_loss=0.3529, attn_decoder_loss=0.2387, over 5794526.90 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 16:30:18,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=702440.0, ans=0.2 2024-09-19 16:30:48,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=702520.0, ans=0.2 2024-09-19 16:30:55,337 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.559e+01 9.136e+01 9.465e+01 1.942e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-19 16:31:15,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=702600.0, ans=0.0 2024-09-19 16:31:16,132 INFO [train.py:1198] (0/2) Epoch 39, batch 3700, loss[loss=0.2469, ctc_loss=0.1211, cr_loss=0.3557, attn_decoder_loss=0.253, over 29707.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1135, cr_loss=0.3541, attn_decoder_loss=0.2391, over 5804663.90 frames. ], batch size: 84, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:31:45,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-19 16:31:48,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.23 vs. limit=22.5 2024-09-19 16:32:08,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-19 16:32:16,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=702760.0, ans=0.125 2024-09-19 16:32:22,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702760.0, ans=0.1 2024-09-19 16:32:31,624 INFO [train.py:1198] (0/2) Epoch 39, batch 3750, loss[loss=0.2082, ctc_loss=0.09694, cr_loss=0.3104, attn_decoder_loss=0.2137, over 29322.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1133, cr_loss=0.3539, attn_decoder_loss=0.239, over 5808704.86 frames. ], batch size: 67, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:32:34,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=702800.0, ans=0.125 2024-09-19 16:32:34,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=702800.0, ans=0.0 2024-09-19 16:32:35,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-09-19 16:32:47,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2024-09-19 16:33:08,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702880.0, ans=0.0 2024-09-19 16:33:09,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-09-19 16:33:20,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=702920.0, ans=0.125 2024-09-19 16:33:26,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.464e+01 8.930e+01 9.588e+01 2.704e+02, threshold=1.786e+02, percent-clipped=2.0 2024-09-19 16:33:28,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=702920.0, ans=0.125 2024-09-19 16:33:28,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702920.0, ans=0.1 2024-09-19 16:33:37,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702960.0, ans=0.0 2024-09-19 16:33:45,858 INFO [train.py:1198] (0/2) Epoch 39, batch 3800, loss[loss=0.2479, ctc_loss=0.124, cr_loss=0.3631, attn_decoder_loss=0.2536, over 29621.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.113, cr_loss=0.3527, attn_decoder_loss=0.2384, over 5798841.12 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:33:47,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=703000.0, ans=0.0 2024-09-19 16:33:50,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2024-09-19 16:33:58,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.89 vs. limit=15.0 2024-09-19 16:34:25,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=703080.0, ans=0.2 2024-09-19 16:34:42,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=703120.0, ans=0.1 2024-09-19 16:34:45,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703160.0, ans=0.125 2024-09-19 16:34:57,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703160.0, ans=0.125 2024-09-19 16:35:00,350 INFO [train.py:1198] (0/2) Epoch 39, batch 3850, loss[loss=0.2524, ctc_loss=0.1271, cr_loss=0.3889, attn_decoder_loss=0.2577, over 29214.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1126, cr_loss=0.3521, attn_decoder_loss=0.2383, over 5813585.64 frames. ], batch size: 100, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:35:00,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=703200.0, ans=0.125 2024-09-19 16:35:26,243 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=15.0 2024-09-19 16:35:49,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=703320.0, ans=0.04949747468305833 2024-09-19 16:35:56,712 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.648e+01 9.079e+01 9.833e+01 2.007e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-19 16:36:14,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=703400.0, ans=0.0 2024-09-19 16:36:15,986 INFO [train.py:1198] (0/2) Epoch 39, batch 3900, loss[loss=0.2484, ctc_loss=0.1186, cr_loss=0.3704, attn_decoder_loss=0.2546, over 29623.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1131, cr_loss=0.3538, attn_decoder_loss=0.239, over 5817611.17 frames. ], batch size: 86, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:36:40,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-19 16:36:47,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2024-09-19 16:36:51,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=703480.0, ans=0.125 2024-09-19 16:37:06,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=703520.0, ans=0.0 2024-09-19 16:37:29,853 INFO [train.py:1198] (0/2) Epoch 39, batch 3950, loss[loss=0.2589, ctc_loss=0.1213, cr_loss=0.3857, attn_decoder_loss=0.2656, over 29485.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1129, cr_loss=0.3532, attn_decoder_loss=0.2391, over 5836612.93 frames. ], batch size: 97, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:37:55,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=703640.0, ans=0.125 2024-09-19 16:38:11,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=703680.0, ans=0.125 2024-09-19 16:38:12,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=703680.0, ans=0.2 2024-09-19 16:38:16,435 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.75 vs. limit=15.0 2024-09-19 16:38:24,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=703720.0, ans=0.2 2024-09-19 16:38:25,891 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.999e+01 8.633e+01 9.078e+01 9.598e+01 1.411e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 16:38:33,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=703760.0, ans=0.0 2024-09-19 16:38:42,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=703760.0, ans=0.0 2024-09-19 16:38:44,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2024-09-19 16:38:44,854 INFO [train.py:1198] (0/2) Epoch 39, batch 4000, loss[loss=0.2151, ctc_loss=0.0952, cr_loss=0.3197, attn_decoder_loss=0.2213, over 29499.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1129, cr_loss=0.3529, attn_decoder_loss=0.2389, over 5813254.49 frames. ], batch size: 74, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:38:51,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.03 vs. limit=10.0 2024-09-19 16:39:01,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-19 16:39:04,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-19 16:39:13,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-19 16:39:29,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=703920.0, ans=0.0 2024-09-19 16:39:30,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=703920.0, ans=0.09899494936611666 2024-09-19 16:39:39,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=703920.0, ans=0.0 2024-09-19 16:39:55,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=703960.0, ans=0.125 2024-09-19 16:39:57,592 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-176000.pt 2024-09-19 16:40:06,228 INFO [train.py:1198] (0/2) Epoch 39, batch 4050, loss[loss=0.2542, ctc_loss=0.1412, cr_loss=0.3726, attn_decoder_loss=0.2585, over 20018.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1131, cr_loss=0.353, attn_decoder_loss=0.2388, over 5796775.96 frames. ], batch size: 209, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:40:15,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=704000.0, ans=0.125 2024-09-19 16:40:33,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=704040.0, ans=0.125 2024-09-19 16:40:47,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=704080.0, ans=0.2 2024-09-19 16:41:01,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.633e+01 9.112e+01 9.845e+01 1.931e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 16:41:04,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704160.0, ans=0.1 2024-09-19 16:41:18,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-19 16:41:20,591 INFO [train.py:1198] (0/2) Epoch 39, batch 4100, loss[loss=0.2395, ctc_loss=0.1125, cr_loss=0.3529, attn_decoder_loss=0.2458, over 29492.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1134, cr_loss=0.354, attn_decoder_loss=0.2394, over 5793187.42 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:41:28,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=704200.0, ans=0.125 2024-09-19 16:41:32,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=704200.0, ans=0.2 2024-09-19 16:41:38,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=704240.0, ans=0.025 2024-09-19 16:41:53,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=704280.0, ans=0.125 2024-09-19 16:42:13,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=704320.0, ans=0.125 2024-09-19 16:42:15,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=704320.0, ans=0.125 2024-09-19 16:42:16,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704320.0, ans=0.1 2024-09-19 16:42:35,209 INFO [train.py:1198] (0/2) Epoch 39, batch 4150, loss[loss=0.232, ctc_loss=0.1208, cr_loss=0.3815, attn_decoder_loss=0.2359, over 29490.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1133, cr_loss=0.3539, attn_decoder_loss=0.2391, over 5798153.41 frames. ], batch size: 77, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:42:37,115 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:42:38,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=704400.0, ans=0.125 2024-09-19 16:42:47,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=704400.0, ans=0.07 2024-09-19 16:42:50,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=704440.0, ans=0.0 2024-09-19 16:42:50,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704440.0, ans=0.1 2024-09-19 16:43:03,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=704480.0, ans=0.0 2024-09-19 16:43:22,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=704520.0, ans=0.125 2024-09-19 16:43:29,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.507e+01 9.056e+01 9.500e+01 2.477e+02, threshold=1.811e+02, percent-clipped=1.0 2024-09-19 16:43:43,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.69 vs. limit=15.0 2024-09-19 16:43:48,514 INFO [train.py:1198] (0/2) Epoch 39, batch 4200, loss[loss=0.2454, ctc_loss=0.1203, cr_loss=0.3706, attn_decoder_loss=0.2511, over 29526.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1135, cr_loss=0.3546, attn_decoder_loss=0.2393, over 5800715.15 frames. ], batch size: 90, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:43:50,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.91 vs. limit=15.0 2024-09-19 16:43:54,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=704600.0, ans=0.0 2024-09-19 16:44:06,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=704640.0, ans=0.0 2024-09-19 16:44:06,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=704640.0, ans=0.04949747468305833 2024-09-19 16:44:13,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=704640.0, ans=0.0 2024-09-19 16:44:38,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=704720.0, ans=0.125 2024-09-19 16:45:03,389 INFO [train.py:1198] (0/2) Epoch 39, batch 4250, loss[loss=0.2154, ctc_loss=0.1006, cr_loss=0.3256, attn_decoder_loss=0.2209, over 29522.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1129, cr_loss=0.3535, attn_decoder_loss=0.2394, over 5806071.03 frames. ], batch size: 74, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:45:10,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=704800.0, ans=0.2 2024-09-19 16:45:21,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=704840.0, ans=0.125 2024-09-19 16:45:23,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.54 vs. limit=15.0 2024-09-19 16:45:30,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.02 vs. limit=10.0 2024-09-19 16:45:35,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=704880.0, ans=0.1 2024-09-19 16:45:44,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=12.0 2024-09-19 16:45:46,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=704920.0, ans=0.125 2024-09-19 16:45:57,876 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.522e+01 9.039e+01 9.490e+01 2.336e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 16:45:58,687 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.00 vs. limit=6.0 2024-09-19 16:46:17,663 INFO [train.py:1198] (0/2) Epoch 39, batch 4300, loss[loss=0.2491, ctc_loss=0.1201, cr_loss=0.3713, attn_decoder_loss=0.2552, over 29506.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1129, cr_loss=0.3531, attn_decoder_loss=0.2394, over 5794970.55 frames. ], batch size: 87, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:46:36,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=705040.0, ans=0.125 2024-09-19 16:46:46,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=705080.0, ans=0.5 2024-09-19 16:46:56,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2024-09-19 16:47:14,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=705120.0, ans=0.125 2024-09-19 16:47:20,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=705160.0, ans=0.2 2024-09-19 16:47:24,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2024-09-19 16:47:32,381 INFO [train.py:1198] (0/2) Epoch 39, batch 4350, loss[loss=0.2506, ctc_loss=0.1231, cr_loss=0.3978, attn_decoder_loss=0.2559, over 29489.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1157, cr_loss=0.3596, attn_decoder_loss=0.2427, over 5797094.98 frames. ], batch size: 97, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:48:13,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2024-09-19 16:48:27,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 8.862e+01 9.354e+01 9.777e+01 1.379e+02, threshold=1.871e+02, percent-clipped=0.0 2024-09-19 16:48:29,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=705360.0, ans=0.0 2024-09-19 16:48:45,016 INFO [train.py:1198] (0/2) Epoch 39, batch 4400, loss[loss=0.2471, ctc_loss=0.1314, cr_loss=0.3913, attn_decoder_loss=0.2513, over 27543.00 frames. ], tot_loss[loss=0.2389, ctc_loss=0.1165, cr_loss=0.3611, attn_decoder_loss=0.2445, over 5768802.71 frames. ], batch size: 125, lr: 2.81e-03, grad_scale: 16.0 2024-09-19 16:48:47,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-09-19 16:49:00,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=705440.0, ans=0.125 2024-09-19 16:49:35,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=705520.0, ans=0.025 2024-09-19 16:50:00,208 INFO [train.py:1198] (0/2) Epoch 39, batch 4450, loss[loss=0.2529, ctc_loss=0.1392, cr_loss=0.3908, attn_decoder_loss=0.2568, over 20508.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1199, cr_loss=0.3661, attn_decoder_loss=0.2465, over 5574189.99 frames. ], batch size: 209, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:50:11,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=705600.0, ans=0.125 2024-09-19 16:50:19,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-09-19 16:50:21,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705640.0, ans=0.125 2024-09-19 16:50:33,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.47 vs. limit=15.0 2024-09-19 16:50:35,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.04 vs. limit=15.0 2024-09-19 16:50:39,699 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:50:41,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=705680.0, ans=0.0 2024-09-19 16:50:42,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705680.0, ans=0.125 2024-09-19 16:50:48,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2024-09-19 16:50:52,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=705720.0, ans=0.035 2024-09-19 16:50:57,516 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:50:58,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 9.418e+01 1.029e+02 1.185e+02 3.823e+02, threshold=2.058e+02, percent-clipped=1.0 2024-09-19 16:50:59,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705760.0, ans=0.1 2024-09-19 16:51:03,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=705760.0, ans=0.025 2024-09-19 16:51:07,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2024-09-19 16:51:15,114 INFO [train.py:1198] (0/2) Epoch 39, batch 4500, loss[loss=0.2514, ctc_loss=0.1363, cr_loss=0.3667, attn_decoder_loss=0.256, over 20140.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.1226, cr_loss=0.368, attn_decoder_loss=0.2481, over 5233209.78 frames. ], batch size: 209, lr: 2.81e-03, grad_scale: 8.0 2024-09-19 16:51:16,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=705800.0, ans=0.0 2024-09-19 16:51:27,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=705800.0, ans=0.2 2024-09-19 16:51:45,461 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:51:52,033 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-39.pt 2024-09-19 16:52:42,174 INFO [train.py:1198] (0/2) Epoch 40, batch 0, loss[loss=0.2138, ctc_loss=0.1006, cr_loss=0.3157, attn_decoder_loss=0.2194, over 29615.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1006, cr_loss=0.3157, attn_decoder_loss=0.2194, over 29615.00 frames. ], batch size: 73, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 16:52:42,175 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 16:53:00,473 INFO [train.py:1230] (0/2) Epoch 40, validation: loss=0.2128, ctc_loss=0.03605, cr_loss=6.84e-15, attn_decoder_loss=0.2324, over 944034.00 frames. 2024-09-19 16:53:00,473 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 16:53:20,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705940.0, ans=0.1 2024-09-19 16:53:35,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=705980.0, ans=0.125 2024-09-19 16:54:03,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=15.0 2024-09-19 16:54:06,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2024-09-19 16:54:13,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=706060.0, ans=0.025 2024-09-19 16:54:17,753 INFO [train.py:1198] (0/2) Epoch 40, batch 50, loss[loss=0.2054, ctc_loss=0.08898, cr_loss=0.2879, attn_decoder_loss=0.212, over 29406.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1152, cr_loss=0.3579, attn_decoder_loss=0.2401, over 1267750.84 frames. ], batch size: 70, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:54:42,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.881e+01 9.876e+01 1.118e+02 1.337e+02, threshold=1.975e+02, percent-clipped=0.0 2024-09-19 16:54:47,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706140.0, ans=0.125 2024-09-19 16:54:51,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=706180.0, ans=0.125 2024-09-19 16:54:54,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=706180.0, ans=0.125 2024-09-19 16:54:56,686 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:55:16,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=706220.0, ans=0.2 2024-09-19 16:55:25,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-09-19 16:55:27,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-19 16:55:35,517 INFO [train.py:1198] (0/2) Epoch 40, batch 100, loss[loss=0.2254, ctc_loss=0.1069, cr_loss=0.3525, attn_decoder_loss=0.2308, over 29525.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1166, cr_loss=0.3615, attn_decoder_loss=0.2425, over 2251992.36 frames. ], batch size: 76, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:55:37,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=706300.0, ans=0.0 2024-09-19 16:55:48,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=22.5 2024-09-19 16:55:52,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.73 vs. limit=15.0 2024-09-19 16:56:01,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=706340.0, ans=0.0 2024-09-19 16:56:07,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=706380.0, ans=0.125 2024-09-19 16:56:10,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.49 vs. limit=15.0 2024-09-19 16:56:28,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=706420.0, ans=0.0 2024-09-19 16:56:34,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706460.0, ans=0.1 2024-09-19 16:56:50,196 INFO [train.py:1198] (0/2) Epoch 40, batch 150, loss[loss=0.2097, ctc_loss=0.08963, cr_loss=0.3019, attn_decoder_loss=0.2163, over 29402.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1147, cr_loss=0.3563, attn_decoder_loss=0.2407, over 3047538.29 frames. ], batch size: 70, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:57:04,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=706540.0, ans=0.125 2024-09-19 16:57:04,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=706540.0, ans=0.125 2024-09-19 16:57:05,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=706540.0, ans=0.0 2024-09-19 16:57:10,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2024-09-19 16:57:12,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.727e+01 9.012e+01 9.533e+01 1.739e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 16:57:19,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=706580.0, ans=0.0 2024-09-19 16:57:22,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2024-09-19 16:57:23,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=706580.0, ans=0.125 2024-09-19 16:57:24,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-09-19 16:57:41,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=706620.0, ans=0.125 2024-09-19 16:57:54,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-19 16:58:05,140 INFO [train.py:1198] (0/2) Epoch 40, batch 200, loss[loss=0.253, ctc_loss=0.1314, cr_loss=0.4206, attn_decoder_loss=0.2572, over 27376.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1141, cr_loss=0.3552, attn_decoder_loss=0.2399, over 3659524.53 frames. ], batch size: 124, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:59:13,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=706860.0, ans=0.125 2024-09-19 16:59:19,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=706860.0, ans=0.0 2024-09-19 16:59:25,392 INFO [train.py:1198] (0/2) Epoch 40, batch 250, loss[loss=0.2397, ctc_loss=0.1089, cr_loss=0.3473, attn_decoder_loss=0.2465, over 29217.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1139, cr_loss=0.3559, attn_decoder_loss=0.24, over 4142219.56 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 16:59:28,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=706900.0, ans=0.0 2024-09-19 16:59:30,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=706900.0, ans=0.125 2024-09-19 16:59:42,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706940.0, ans=0.1 2024-09-19 16:59:47,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.510e+01 9.023e+01 9.427e+01 1.559e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 17:00:40,602 INFO [train.py:1198] (0/2) Epoch 40, batch 300, loss[loss=0.2505, ctc_loss=0.1272, cr_loss=0.3976, attn_decoder_loss=0.2554, over 29516.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1134, cr_loss=0.3545, attn_decoder_loss=0.2393, over 4511658.97 frames. ], batch size: 92, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:00:44,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.56 vs. limit=15.0 2024-09-19 17:00:51,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=707100.0, ans=10.0 2024-09-19 17:00:56,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=707140.0, ans=0.0 2024-09-19 17:01:14,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=707180.0, ans=0.125 2024-09-19 17:01:17,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=707180.0, ans=0.1 2024-09-19 17:01:23,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=707180.0, ans=0.0 2024-09-19 17:01:37,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=707220.0, ans=0.025 2024-09-19 17:01:37,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=707220.0, ans=0.125 2024-09-19 17:01:39,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=707220.0, ans=0.1 2024-09-19 17:01:46,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=707260.0, ans=0.0 2024-09-19 17:01:46,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=707260.0, ans=0.2 2024-09-19 17:01:56,848 INFO [train.py:1198] (0/2) Epoch 40, batch 350, loss[loss=0.2084, ctc_loss=0.09492, cr_loss=0.3157, attn_decoder_loss=0.2139, over 29315.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1135, cr_loss=0.3548, attn_decoder_loss=0.2396, over 4795897.10 frames. ], batch size: 71, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:01:57,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=707300.0, ans=0.1 2024-09-19 17:02:14,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=707340.0, ans=0.0 2024-09-19 17:02:21,767 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.445e+01 8.881e+01 9.307e+01 1.282e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 17:02:37,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=707380.0, ans=0.2 2024-09-19 17:02:44,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=707420.0, ans=0.0 2024-09-19 17:02:49,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=707420.0, ans=0.0 2024-09-19 17:02:55,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=707420.0, ans=0.125 2024-09-19 17:03:02,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=707460.0, ans=0.125 2024-09-19 17:03:14,413 INFO [train.py:1198] (0/2) Epoch 40, batch 400, loss[loss=0.2332, ctc_loss=0.1101, cr_loss=0.3408, attn_decoder_loss=0.2393, over 29690.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1131, cr_loss=0.3536, attn_decoder_loss=0.2391, over 5025011.80 frames. ], batch size: 82, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:03:47,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=707580.0, ans=0.025 2024-09-19 17:03:57,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707580.0, ans=0.125 2024-09-19 17:03:57,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=707580.0, ans=0.1 2024-09-19 17:04:04,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-19 17:04:27,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=707660.0, ans=0.125 2024-09-19 17:04:30,151 INFO [train.py:1198] (0/2) Epoch 40, batch 450, loss[loss=0.2366, ctc_loss=0.1117, cr_loss=0.3591, attn_decoder_loss=0.2425, over 29679.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1131, cr_loss=0.3533, attn_decoder_loss=0.2391, over 5187797.34 frames. ], batch size: 83, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:04:30,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=707700.0, ans=0.125 2024-09-19 17:04:52,854 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.522e+01 8.945e+01 9.353e+01 2.975e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-19 17:05:41,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=707860.0, ans=0.125 2024-09-19 17:05:41,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-09-19 17:05:43,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=707860.0, ans=0.125 2024-09-19 17:05:45,818 INFO [train.py:1198] (0/2) Epoch 40, batch 500, loss[loss=0.247, ctc_loss=0.1186, cr_loss=0.3741, attn_decoder_loss=0.2529, over 29434.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1124, cr_loss=0.3524, attn_decoder_loss=0.2383, over 5329389.88 frames. ], batch size: 94, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:05:50,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=707900.0, ans=0.025 2024-09-19 17:05:53,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=707900.0, ans=0.2 2024-09-19 17:06:01,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=707940.0, ans=0.125 2024-09-19 17:06:14,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=707940.0, ans=0.125 2024-09-19 17:06:22,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707980.0, ans=0.1 2024-09-19 17:06:37,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=708020.0, ans=0.125 2024-09-19 17:06:48,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=708020.0, ans=0.125 2024-09-19 17:07:00,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=22.5 2024-09-19 17:07:06,553 INFO [train.py:1198] (0/2) Epoch 40, batch 550, loss[loss=0.2473, ctc_loss=0.1214, cr_loss=0.366, attn_decoder_loss=0.2531, over 28898.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1122, cr_loss=0.3515, attn_decoder_loss=0.2384, over 5423755.80 frames. ], batch size: 104, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:07:24,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2024-09-19 17:07:26,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=708140.0, ans=0.125 2024-09-19 17:07:30,881 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.557e+01 8.930e+01 9.623e+01 2.134e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 17:07:53,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=708220.0, ans=0.0 2024-09-19 17:07:59,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=708220.0, ans=0.125 2024-09-19 17:08:08,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-19 17:08:15,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=708260.0, ans=0.09899494936611666 2024-09-19 17:08:16,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=708260.0, ans=0.0 2024-09-19 17:08:21,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.27 vs. limit=15.0 2024-09-19 17:08:22,277 INFO [train.py:1198] (0/2) Epoch 40, batch 600, loss[loss=0.2453, ctc_loss=0.1204, cr_loss=0.3625, attn_decoder_loss=0.2511, over 29292.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1123, cr_loss=0.3519, attn_decoder_loss=0.2388, over 5509728.67 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:08:54,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=708380.0, ans=0.125 2024-09-19 17:08:58,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=708380.0, ans=0.125 2024-09-19 17:09:13,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=708420.0, ans=0.125 2024-09-19 17:09:25,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=708460.0, ans=0.125 2024-09-19 17:09:37,338 INFO [train.py:1198] (0/2) Epoch 40, batch 650, loss[loss=0.2383, ctc_loss=0.1206, cr_loss=0.3761, attn_decoder_loss=0.243, over 29742.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.112, cr_loss=0.3515, attn_decoder_loss=0.2381, over 5587030.03 frames. ], batch size: 81, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:10:03,863 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.522e+01 8.894e+01 9.367e+01 2.518e+02, threshold=1.779e+02, percent-clipped=2.0 2024-09-19 17:10:07,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708540.0, ans=0.1 2024-09-19 17:10:11,702 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:10:12,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=12.0 2024-09-19 17:10:43,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=708660.0, ans=0.2 2024-09-19 17:10:57,360 INFO [train.py:1198] (0/2) Epoch 40, batch 700, loss[loss=0.2353, ctc_loss=0.1185, cr_loss=0.3802, attn_decoder_loss=0.2398, over 29533.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1123, cr_loss=0.3523, attn_decoder_loss=0.2387, over 5636691.97 frames. ], batch size: 76, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:11:00,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=708700.0, ans=0.0 2024-09-19 17:11:11,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=708740.0, ans=0.125 2024-09-19 17:11:34,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=708780.0, ans=0.025 2024-09-19 17:11:49,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=708820.0, ans=0.025 2024-09-19 17:12:06,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708860.0, ans=0.1 2024-09-19 17:12:13,437 INFO [train.py:1198] (0/2) Epoch 40, batch 750, loss[loss=0.2336, ctc_loss=0.1096, cr_loss=0.3343, attn_decoder_loss=0.24, over 29721.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1124, cr_loss=0.3527, attn_decoder_loss=0.2387, over 5675988.10 frames. ], batch size: 82, lr: 2.77e-03, grad_scale: 8.0 2024-09-19 17:12:13,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=708900.0, ans=0.125 2024-09-19 17:12:18,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708900.0, ans=0.1 2024-09-19 17:12:20,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.14 vs. limit=10.0 2024-09-19 17:12:25,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=708900.0, ans=0.5 2024-09-19 17:12:37,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.374e+01 9.046e+01 9.655e+01 1.904e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-19 17:12:46,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=708980.0, ans=0.0 2024-09-19 17:12:56,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-19 17:12:57,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=709020.0, ans=0.0 2024-09-19 17:13:20,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=709060.0, ans=0.0 2024-09-19 17:13:28,941 INFO [train.py:1198] (0/2) Epoch 40, batch 800, loss[loss=0.2185, ctc_loss=0.1008, cr_loss=0.3129, attn_decoder_loss=0.2246, over 29593.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1126, cr_loss=0.3531, attn_decoder_loss=0.2388, over 5706526.95 frames. ], batch size: 73, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:13:29,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=709100.0, ans=0.1 2024-09-19 17:13:32,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=709100.0, ans=0.125 2024-09-19 17:13:41,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=709100.0, ans=0.0 2024-09-19 17:13:41,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=709100.0, ans=0.125 2024-09-19 17:13:59,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2024-09-19 17:14:03,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=709180.0, ans=0.0 2024-09-19 17:14:04,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709180.0, ans=0.1 2024-09-19 17:14:28,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=709220.0, ans=0.125 2024-09-19 17:14:28,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=709220.0, ans=0.0 2024-09-19 17:14:32,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=709260.0, ans=0.0 2024-09-19 17:14:34,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=709260.0, ans=0.125 2024-09-19 17:14:47,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=709300.0, ans=0.125 2024-09-19 17:14:48,727 INFO [train.py:1198] (0/2) Epoch 40, batch 850, loss[loss=0.2382, ctc_loss=0.111, cr_loss=0.3641, attn_decoder_loss=0.2442, over 29719.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1124, cr_loss=0.3527, attn_decoder_loss=0.2384, over 5736360.52 frames. ], batch size: 89, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:14:57,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.16 vs. limit=22.5 2024-09-19 17:14:59,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=709300.0, ans=0.125 2024-09-19 17:15:00,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=709300.0, ans=0.2 2024-09-19 17:15:12,635 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.469e+01 8.929e+01 9.566e+01 2.198e+02, threshold=1.786e+02, percent-clipped=1.0 2024-09-19 17:16:03,933 INFO [train.py:1198] (0/2) Epoch 40, batch 900, loss[loss=0.2154, ctc_loss=0.09639, cr_loss=0.3132, attn_decoder_loss=0.2216, over 29607.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.113, cr_loss=0.3537, attn_decoder_loss=0.239, over 5741201.72 frames. ], batch size: 73, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 17:16:10,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=709500.0, ans=15.0 2024-09-19 17:17:00,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=709620.0, ans=0.95 2024-09-19 17:17:16,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=709660.0, ans=0.125 2024-09-19 17:17:19,239 INFO [train.py:1198] (0/2) Epoch 40, batch 950, loss[loss=0.2108, ctc_loss=0.08825, cr_loss=0.2947, attn_decoder_loss=0.2179, over 29531.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1131, cr_loss=0.3537, attn_decoder_loss=0.239, over 5744431.46 frames. ], batch size: 74, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:17:34,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=709740.0, ans=0.1 2024-09-19 17:17:45,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.548e+01 9.083e+01 9.830e+01 2.215e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-19 17:17:50,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=709780.0, ans=0.0 2024-09-19 17:17:56,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709780.0, ans=0.1 2024-09-19 17:18:22,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=709860.0, ans=0.0 2024-09-19 17:18:36,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=709860.0, ans=0.0 2024-09-19 17:18:39,002 INFO [train.py:1198] (0/2) Epoch 40, batch 1000, loss[loss=0.2263, ctc_loss=0.1011, cr_loss=0.3162, attn_decoder_loss=0.2332, over 29502.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1134, cr_loss=0.3549, attn_decoder_loss=0.2395, over 5737489.44 frames. ], batch size: 77, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:18:52,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=709940.0, ans=0.125 2024-09-19 17:19:09,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=709980.0, ans=0.0 2024-09-19 17:19:11,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.39 vs. limit=15.0 2024-09-19 17:19:33,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=710020.0, ans=0.125 2024-09-19 17:19:51,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=710060.0, ans=0.2 2024-09-19 17:19:52,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=710100.0, ans=0.0 2024-09-19 17:19:54,081 INFO [train.py:1198] (0/2) Epoch 40, batch 1050, loss[loss=0.2432, ctc_loss=0.1133, cr_loss=0.3584, attn_decoder_loss=0.2496, over 29684.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.113, cr_loss=0.354, attn_decoder_loss=0.2386, over 5745774.36 frames. ], batch size: 85, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:20:12,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=710140.0, ans=0.125 2024-09-19 17:20:15,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=710140.0, ans=0.125 2024-09-19 17:20:15,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=710140.0, ans=0.0 2024-09-19 17:20:20,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.590e+01 9.048e+01 9.519e+01 1.628e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 17:20:40,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=710220.0, ans=0.07 2024-09-19 17:20:54,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-19 17:20:56,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=710260.0, ans=0.0 2024-09-19 17:21:09,816 INFO [train.py:1198] (0/2) Epoch 40, batch 1100, loss[loss=0.2298, ctc_loss=0.1084, cr_loss=0.3415, attn_decoder_loss=0.2357, over 29461.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1122, cr_loss=0.3522, attn_decoder_loss=0.2381, over 5758056.75 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:21:19,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=710300.0, ans=0.025 2024-09-19 17:21:20,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=710300.0, ans=0.0 2024-09-19 17:21:26,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=710340.0, ans=0.125 2024-09-19 17:21:39,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=710340.0, ans=0.125 2024-09-19 17:21:58,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=710420.0, ans=0.125 2024-09-19 17:22:20,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-19 17:22:22,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=710460.0, ans=0.125 2024-09-19 17:22:29,989 INFO [train.py:1198] (0/2) Epoch 40, batch 1150, loss[loss=0.2273, ctc_loss=0.1053, cr_loss=0.3412, attn_decoder_loss=0.2333, over 29466.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1125, cr_loss=0.3528, attn_decoder_loss=0.2383, over 5755516.35 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:22:36,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=710500.0, ans=0.0 2024-09-19 17:22:36,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2024-09-19 17:22:38,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=710500.0, ans=0.125 2024-09-19 17:22:42,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=710500.0, ans=0.125 2024-09-19 17:22:48,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=710540.0, ans=0.07 2024-09-19 17:22:55,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.424e+01 8.898e+01 9.617e+01 1.555e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-19 17:23:25,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710620.0, ans=0.1 2024-09-19 17:23:45,227 INFO [train.py:1198] (0/2) Epoch 40, batch 1200, loss[loss=0.2434, ctc_loss=0.1131, cr_loss=0.3464, attn_decoder_loss=0.2502, over 29674.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1126, cr_loss=0.3525, attn_decoder_loss=0.2388, over 5748925.05 frames. ], batch size: 85, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:24:11,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.87 vs. limit=15.0 2024-09-19 17:24:25,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=710780.0, ans=15.0 2024-09-19 17:24:29,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=710820.0, ans=0.0 2024-09-19 17:24:30,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-09-19 17:24:38,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=710820.0, ans=0.05 2024-09-19 17:24:44,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=710860.0, ans=0.2 2024-09-19 17:25:01,056 INFO [train.py:1198] (0/2) Epoch 40, batch 1250, loss[loss=0.2497, ctc_loss=0.1199, cr_loss=0.3723, attn_decoder_loss=0.2559, over 29556.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1132, cr_loss=0.354, attn_decoder_loss=0.2397, over 5776061.68 frames. ], batch size: 92, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:25:07,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=710900.0, ans=0.125 2024-09-19 17:25:12,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=710900.0, ans=0.125 2024-09-19 17:25:21,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=710940.0, ans=0.125 2024-09-19 17:25:26,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710940.0, ans=0.1 2024-09-19 17:25:29,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 8.708e+01 9.133e+01 9.581e+01 1.854e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-19 17:26:19,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-09-19 17:26:21,599 INFO [train.py:1198] (0/2) Epoch 40, batch 1300, loss[loss=0.2388, ctc_loss=0.1077, cr_loss=0.3405, attn_decoder_loss=0.2458, over 28207.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1128, cr_loss=0.353, attn_decoder_loss=0.239, over 5780561.15 frames. ], batch size: 111, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:26:21,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=711100.0, ans=0.125 2024-09-19 17:27:05,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=711180.0, ans=0.2 2024-09-19 17:27:08,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=711220.0, ans=0.2 2024-09-19 17:27:08,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=711220.0, ans=0.2 2024-09-19 17:27:13,534 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-19 17:27:38,126 INFO [train.py:1198] (0/2) Epoch 40, batch 1350, loss[loss=0.2344, ctc_loss=0.1144, cr_loss=0.3441, attn_decoder_loss=0.24, over 29755.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1124, cr_loss=0.3524, attn_decoder_loss=0.2387, over 5797453.13 frames. ], batch size: 81, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:27:39,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.39 vs. limit=22.5 2024-09-19 17:28:03,656 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.275e+01 9.002e+01 9.355e+01 2.084e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 17:28:07,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=711380.0, ans=0.025 2024-09-19 17:28:11,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=711380.0, ans=0.0 2024-09-19 17:28:25,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=711420.0, ans=0.125 2024-09-19 17:28:25,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.13 vs. limit=15.0 2024-09-19 17:28:28,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-19 17:28:42,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=711460.0, ans=0.125 2024-09-19 17:28:45,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=711460.0, ans=0.125 2024-09-19 17:28:52,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=711500.0, ans=0.0 2024-09-19 17:28:53,204 INFO [train.py:1198] (0/2) Epoch 40, batch 1400, loss[loss=0.2022, ctc_loss=0.08693, cr_loss=0.2963, attn_decoder_loss=0.2085, over 29583.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1127, cr_loss=0.3533, attn_decoder_loss=0.2389, over 5808354.62 frames. ], batch size: 69, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:28:59,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=711500.0, ans=0.025 2024-09-19 17:29:12,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=12.0 2024-09-19 17:29:13,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=711540.0, ans=0.0 2024-09-19 17:29:14,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=711540.0, ans=0.125 2024-09-19 17:29:17,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=711540.0, ans=0.125 2024-09-19 17:29:17,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=711540.0, ans=0.125 2024-09-19 17:29:50,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=711620.0, ans=0.025 2024-09-19 17:29:53,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711620.0, ans=0.1 2024-09-19 17:29:56,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711660.0, ans=0.1 2024-09-19 17:30:13,141 INFO [train.py:1198] (0/2) Epoch 40, batch 1450, loss[loss=0.2509, ctc_loss=0.1299, cr_loss=0.3889, attn_decoder_loss=0.2557, over 29423.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1127, cr_loss=0.353, attn_decoder_loss=0.2392, over 5804947.99 frames. ], batch size: 94, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:30:16,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=711700.0, ans=0.125 2024-09-19 17:30:38,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.710e+01 9.115e+01 9.620e+01 3.738e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-19 17:30:52,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=711780.0, ans=0.125 2024-09-19 17:31:10,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2024-09-19 17:31:26,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-19 17:31:27,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=711900.0, ans=0.0 2024-09-19 17:31:28,345 INFO [train.py:1198] (0/2) Epoch 40, batch 1500, loss[loss=0.2406, ctc_loss=0.1204, cr_loss=0.3716, attn_decoder_loss=0.2457, over 29637.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1131, cr_loss=0.3541, attn_decoder_loss=0.2397, over 5804773.98 frames. ], batch size: 86, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:31:39,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=711900.0, ans=0.0 2024-09-19 17:31:48,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.30 vs. limit=12.0 2024-09-19 17:32:00,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=711980.0, ans=0.125 2024-09-19 17:32:11,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711980.0, ans=0.1 2024-09-19 17:32:22,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=712020.0, ans=0.125 2024-09-19 17:32:40,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712060.0, ans=0.1 2024-09-19 17:32:44,594 INFO [train.py:1198] (0/2) Epoch 40, batch 1550, loss[loss=0.252, ctc_loss=0.1374, cr_loss=0.416, attn_decoder_loss=0.2555, over 29504.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1135, cr_loss=0.3544, attn_decoder_loss=0.2397, over 5780995.63 frames. ], batch size: 90, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:33:11,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=712140.0, ans=0.125 2024-09-19 17:33:14,041 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.677e+01 9.047e+01 9.758e+01 3.580e+02, threshold=1.809e+02, percent-clipped=1.0 2024-09-19 17:33:14,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=712140.0, ans=0.2 2024-09-19 17:33:21,949 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:33:26,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=712180.0, ans=0.025 2024-09-19 17:33:38,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712220.0, ans=0.1 2024-09-19 17:33:41,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=712220.0, ans=0.125 2024-09-19 17:34:01,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=712260.0, ans=0.2 2024-09-19 17:34:04,515 INFO [train.py:1198] (0/2) Epoch 40, batch 1600, loss[loss=0.2418, ctc_loss=0.1244, cr_loss=0.3744, attn_decoder_loss=0.2466, over 29657.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1134, cr_loss=0.3538, attn_decoder_loss=0.2394, over 5764385.40 frames. ], batch size: 85, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:34:13,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=712300.0, ans=0.0 2024-09-19 17:34:26,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=712340.0, ans=0.125 2024-09-19 17:34:30,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=712340.0, ans=0.0 2024-09-19 17:34:33,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712380.0, ans=0.1 2024-09-19 17:34:47,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=712380.0, ans=0.0 2024-09-19 17:34:56,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712420.0, ans=0.1 2024-09-19 17:34:56,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=712420.0, ans=0.025 2024-09-19 17:35:09,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=712460.0, ans=0.0 2024-09-19 17:35:11,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=712460.0, ans=0.125 2024-09-19 17:35:20,310 INFO [train.py:1198] (0/2) Epoch 40, batch 1650, loss[loss=0.2461, ctc_loss=0.116, cr_loss=0.3698, attn_decoder_loss=0.2523, over 29704.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1129, cr_loss=0.3529, attn_decoder_loss=0.2389, over 5758515.04 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:35:41,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=712540.0, ans=0.025 2024-09-19 17:35:46,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712540.0, ans=0.1 2024-09-19 17:35:48,736 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.375e+01 9.140e+01 9.741e+01 3.230e+02, threshold=1.828e+02, percent-clipped=2.0 2024-09-19 17:35:50,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=712580.0, ans=0.2 2024-09-19 17:36:09,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2024-09-19 17:36:09,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=22.5 2024-09-19 17:36:16,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=712620.0, ans=0.1 2024-09-19 17:36:35,504 INFO [train.py:1198] (0/2) Epoch 40, batch 1700, loss[loss=0.2125, ctc_loss=0.1012, cr_loss=0.3208, attn_decoder_loss=0.2177, over 29604.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1129, cr_loss=0.3533, attn_decoder_loss=0.2388, over 5780718.43 frames. ], batch size: 69, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:36:40,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=712700.0, ans=0.125 2024-09-19 17:36:46,111 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:36:52,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=712740.0, ans=0.0 2024-09-19 17:37:08,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=8.0 2024-09-19 17:37:55,713 INFO [train.py:1198] (0/2) Epoch 40, batch 1750, loss[loss=0.2068, ctc_loss=0.1001, cr_loss=0.3344, attn_decoder_loss=0.2112, over 29382.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1125, cr_loss=0.3524, attn_decoder_loss=0.2384, over 5788133.09 frames. ], batch size: 67, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:38:24,561 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.436e+01 8.990e+01 9.570e+01 1.574e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-19 17:38:29,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=712980.0, ans=0.125 2024-09-19 17:38:54,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=713060.0, ans=0.025 2024-09-19 17:39:03,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=713060.0, ans=0.125 2024-09-19 17:39:04,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-09-19 17:39:10,864 INFO [train.py:1198] (0/2) Epoch 40, batch 1800, loss[loss=0.2438, ctc_loss=0.1162, cr_loss=0.367, attn_decoder_loss=0.2498, over 29708.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1126, cr_loss=0.3523, attn_decoder_loss=0.2385, over 5790713.70 frames. ], batch size: 83, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:39:17,322 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:39:24,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=713140.0, ans=0.125 2024-09-19 17:39:42,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=713180.0, ans=0.0 2024-09-19 17:39:51,871 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:40:02,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=713220.0, ans=0.0 2024-09-19 17:40:15,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2024-09-19 17:40:26,410 INFO [train.py:1198] (0/2) Epoch 40, batch 1850, loss[loss=0.243, ctc_loss=0.1147, cr_loss=0.3575, attn_decoder_loss=0.2493, over 29622.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1128, cr_loss=0.3529, attn_decoder_loss=0.2385, over 5799032.95 frames. ], batch size: 86, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:40:57,143 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.615e+01 9.088e+01 9.758e+01 2.205e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-19 17:40:59,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=713380.0, ans=0.2 2024-09-19 17:41:15,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=713420.0, ans=0.125 2024-09-19 17:41:28,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=713460.0, ans=0.2 2024-09-19 17:41:30,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=713460.0, ans=0.125 2024-09-19 17:41:33,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2024-09-19 17:41:43,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-19 17:41:43,527 INFO [train.py:1198] (0/2) Epoch 40, batch 1900, loss[loss=0.2385, ctc_loss=0.1142, cr_loss=0.3594, attn_decoder_loss=0.2444, over 29708.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1132, cr_loss=0.3536, attn_decoder_loss=0.239, over 5805992.68 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:41:44,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-09-19 17:41:49,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=713500.0, ans=0.125 2024-09-19 17:42:02,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=713540.0, ans=10.0 2024-09-19 17:42:07,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713540.0, ans=0.1 2024-09-19 17:42:13,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.86 vs. limit=15.0 2024-09-19 17:42:18,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.51 vs. limit=15.0 2024-09-19 17:43:01,613 INFO [train.py:1198] (0/2) Epoch 40, batch 1950, loss[loss=0.2279, ctc_loss=0.1079, cr_loss=0.3546, attn_decoder_loss=0.2333, over 29453.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1138, cr_loss=0.3549, attn_decoder_loss=0.2403, over 5820376.68 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 8.0 2024-09-19 17:43:07,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-09-19 17:43:09,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=713700.0, ans=0.0 2024-09-19 17:43:27,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713740.0, ans=0.1 2024-09-19 17:43:30,165 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.694e+01 9.094e+01 9.637e+01 1.422e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-19 17:43:36,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=713780.0, ans=0.025 2024-09-19 17:43:41,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=713780.0, ans=0.125 2024-09-19 17:43:53,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-19 17:44:16,998 INFO [train.py:1198] (0/2) Epoch 40, batch 2000, loss[loss=0.2108, ctc_loss=0.09089, cr_loss=0.311, attn_decoder_loss=0.2172, over 29348.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1139, cr_loss=0.3549, attn_decoder_loss=0.2407, over 5799444.75 frames. ], batch size: 67, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:44:23,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=713900.0, ans=0.2 2024-09-19 17:44:44,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=713940.0, ans=0.1 2024-09-19 17:45:02,776 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-19 17:45:08,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.38 vs. limit=10.0 2024-09-19 17:45:13,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2024-09-19 17:45:30,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=714060.0, ans=0.125 2024-09-19 17:45:33,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=714100.0, ans=0.025 2024-09-19 17:45:34,773 INFO [train.py:1198] (0/2) Epoch 40, batch 2050, loss[loss=0.2092, ctc_loss=0.09001, cr_loss=0.3153, attn_decoder_loss=0.2154, over 29451.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1131, cr_loss=0.3533, attn_decoder_loss=0.2398, over 5791572.21 frames. ], batch size: 70, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:45:52,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=714140.0, ans=0.04949747468305833 2024-09-19 17:46:00,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=714140.0, ans=0.125 2024-09-19 17:46:05,842 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.392e+01 8.898e+01 9.558e+01 3.245e+02, threshold=1.780e+02, percent-clipped=2.0 2024-09-19 17:46:15,179 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:46:16,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=714180.0, ans=0.2 2024-09-19 17:46:34,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=714220.0, ans=0.0 2024-09-19 17:46:52,706 INFO [train.py:1198] (0/2) Epoch 40, batch 2100, loss[loss=0.2332, ctc_loss=0.1099, cr_loss=0.3612, attn_decoder_loss=0.2389, over 29757.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1129, cr_loss=0.3533, attn_decoder_loss=0.2393, over 5803094.03 frames. ], batch size: 81, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:46:52,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=714300.0, ans=0.025 2024-09-19 17:47:21,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=714380.0, ans=0.0 2024-09-19 17:47:26,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.74 vs. limit=15.0 2024-09-19 17:48:04,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=714460.0, ans=0.0 2024-09-19 17:48:07,616 INFO [train.py:1198] (0/2) Epoch 40, batch 2150, loss[loss=0.2344, ctc_loss=0.1046, cr_loss=0.3303, attn_decoder_loss=0.2415, over 29457.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1123, cr_loss=0.3524, attn_decoder_loss=0.2387, over 5817303.50 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:48:09,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=714500.0, ans=0.05 2024-09-19 17:48:37,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=714540.0, ans=0.025 2024-09-19 17:48:38,291 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.587e+01 9.010e+01 9.804e+01 2.260e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 17:49:01,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=714620.0, ans=0.2 2024-09-19 17:49:02,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=714620.0, ans=0.0 2024-09-19 17:49:05,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=714620.0, ans=0.125 2024-09-19 17:49:07,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=714620.0, ans=0.125 2024-09-19 17:49:22,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=714660.0, ans=0.125 2024-09-19 17:49:24,944 INFO [train.py:1198] (0/2) Epoch 40, batch 2200, loss[loss=0.2488, ctc_loss=0.123, cr_loss=0.377, attn_decoder_loss=0.2544, over 29613.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1127, cr_loss=0.3525, attn_decoder_loss=0.2388, over 5812614.49 frames. ], batch size: 86, lr: 2.76e-03, grad_scale: 16.0 2024-09-19 17:49:54,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=714740.0, ans=0.04949747468305833 2024-09-19 17:49:56,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.13 vs. limit=15.0 2024-09-19 17:49:57,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=714780.0, ans=0.125 2024-09-19 17:50:15,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=714820.0, ans=0.0 2024-09-19 17:50:21,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=714820.0, ans=0.125 2024-09-19 17:50:42,696 INFO [train.py:1198] (0/2) Epoch 40, batch 2250, loss[loss=0.2365, ctc_loss=0.1136, cr_loss=0.3618, attn_decoder_loss=0.2422, over 29715.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1126, cr_loss=0.3528, attn_decoder_loss=0.2389, over 5811883.69 frames. ], batch size: 82, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 17:50:50,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=714900.0, ans=0.0 2024-09-19 17:51:08,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=714940.0, ans=0.125 2024-09-19 17:51:12,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.488e+01 9.052e+01 9.511e+01 5.082e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-19 17:51:12,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=714980.0, ans=0.0 2024-09-19 17:51:16,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=714980.0, ans=0.125 2024-09-19 17:51:40,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=715020.0, ans=0.09899494936611666 2024-09-19 17:51:57,717 INFO [train.py:1198] (0/2) Epoch 40, batch 2300, loss[loss=0.2161, ctc_loss=0.09541, cr_loss=0.3204, attn_decoder_loss=0.2224, over 29305.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1114, cr_loss=0.35, attn_decoder_loss=0.2377, over 5798079.64 frames. ], batch size: 71, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 17:52:17,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=715140.0, ans=0.125 2024-09-19 17:52:26,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715140.0, ans=0.1 2024-09-19 17:52:45,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=715220.0, ans=0.2 2024-09-19 17:52:48,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=715220.0, ans=0.0 2024-09-19 17:52:56,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715220.0, ans=0.125 2024-09-19 17:53:00,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=715260.0, ans=0.07 2024-09-19 17:53:02,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=715260.0, ans=0.0 2024-09-19 17:53:06,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=715260.0, ans=0.0 2024-09-19 17:53:13,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=715300.0, ans=0.125 2024-09-19 17:53:15,284 INFO [train.py:1198] (0/2) Epoch 40, batch 2350, loss[loss=0.2268, ctc_loss=0.1028, cr_loss=0.3235, attn_decoder_loss=0.2333, over 29696.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1119, cr_loss=0.3515, attn_decoder_loss=0.238, over 5803152.83 frames. ], batch size: 83, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 17:53:17,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=715300.0, ans=0.0 2024-09-19 17:53:33,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=715340.0, ans=0.025 2024-09-19 17:53:38,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715340.0, ans=0.0 2024-09-19 17:53:47,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.558e+01 9.025e+01 9.597e+01 1.404e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 17:54:02,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715420.0, ans=0.0 2024-09-19 17:54:09,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-19 17:54:32,768 INFO [train.py:1198] (0/2) Epoch 40, batch 2400, loss[loss=0.2146, ctc_loss=0.09992, cr_loss=0.3384, attn_decoder_loss=0.2198, over 29526.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1124, cr_loss=0.3529, attn_decoder_loss=0.2388, over 5806300.73 frames. ], batch size: 76, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:54:46,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=715540.0, ans=0.025 2024-09-19 17:54:51,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715540.0, ans=0.1 2024-09-19 17:55:03,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=22.5 2024-09-19 17:55:18,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=715620.0, ans=0.5 2024-09-19 17:55:18,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=715620.0, ans=0.125 2024-09-19 17:55:18,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=715620.0, ans=0.125 2024-09-19 17:55:48,040 INFO [train.py:1198] (0/2) Epoch 40, batch 2450, loss[loss=0.2307, ctc_loss=0.1058, cr_loss=0.3295, attn_decoder_loss=0.2373, over 29716.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1132, cr_loss=0.3546, attn_decoder_loss=0.2395, over 5783726.91 frames. ], batch size: 82, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:55:55,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=715700.0, ans=0.125 2024-09-19 17:56:03,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=715740.0, ans=0.0 2024-09-19 17:56:20,252 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.714e+01 9.274e+01 9.862e+01 1.579e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-19 17:56:42,983 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:57:05,504 INFO [train.py:1198] (0/2) Epoch 40, batch 2500, loss[loss=0.239, ctc_loss=0.108, cr_loss=0.3416, attn_decoder_loss=0.246, over 29608.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1131, cr_loss=0.3541, attn_decoder_loss=0.2395, over 5794633.90 frames. ], batch size: 86, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:57:13,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=715900.0, ans=0.125 2024-09-19 17:57:19,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=715940.0, ans=0.0 2024-09-19 17:57:41,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715980.0, ans=0.1 2024-09-19 17:58:13,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=716060.0, ans=0.2 2024-09-19 17:58:24,138 INFO [train.py:1198] (0/2) Epoch 40, batch 2550, loss[loss=0.202, ctc_loss=0.0945, cr_loss=0.3191, attn_decoder_loss=0.2069, over 29351.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1126, cr_loss=0.3534, attn_decoder_loss=0.2392, over 5798417.42 frames. ], batch size: 67, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:58:36,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=716100.0, ans=0.5 2024-09-19 17:58:40,808 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:58:43,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=716140.0, ans=0.0 2024-09-19 17:58:48,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716140.0, ans=0.1 2024-09-19 17:58:53,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.529e+01 8.996e+01 9.557e+01 1.715e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-19 17:59:07,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=716220.0, ans=0.125 2024-09-19 17:59:30,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=716260.0, ans=0.125 2024-09-19 17:59:39,642 INFO [train.py:1198] (0/2) Epoch 40, batch 2600, loss[loss=0.2226, ctc_loss=0.1038, cr_loss=0.3361, attn_decoder_loss=0.2284, over 29465.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1127, cr_loss=0.3535, attn_decoder_loss=0.2395, over 5794737.76 frames. ], batch size: 78, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 17:59:58,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=716340.0, ans=0.1 2024-09-19 18:00:54,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=716500.0, ans=0.125 2024-09-19 18:00:56,059 INFO [train.py:1198] (0/2) Epoch 40, batch 2650, loss[loss=0.2458, ctc_loss=0.1193, cr_loss=0.3573, attn_decoder_loss=0.2519, over 29194.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1129, cr_loss=0.3542, attn_decoder_loss=0.2397, over 5801569.33 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:00:57,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716500.0, ans=0.1 2024-09-19 18:01:05,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=716500.0, ans=0.125 2024-09-19 18:01:28,231 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.493e+01 9.009e+01 9.595e+01 1.150e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-19 18:01:28,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716580.0, ans=0.1 2024-09-19 18:01:46,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=716620.0, ans=0.1 2024-09-19 18:01:51,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-09-19 18:02:06,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=716660.0, ans=0.0 2024-09-19 18:02:13,715 INFO [train.py:1198] (0/2) Epoch 40, batch 2700, loss[loss=0.2432, ctc_loss=0.118, cr_loss=0.3716, attn_decoder_loss=0.2489, over 29509.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1131, cr_loss=0.3541, attn_decoder_loss=0.2402, over 5797132.96 frames. ], batch size: 87, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:02:21,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=716700.0, ans=0.07 2024-09-19 18:02:31,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=716740.0, ans=0.0 2024-09-19 18:02:33,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=716740.0, ans=0.125 2024-09-19 18:02:35,659 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.16 vs. limit=15.0 2024-09-19 18:02:37,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=716740.0, ans=0.0 2024-09-19 18:02:42,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=716780.0, ans=0.0 2024-09-19 18:03:03,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-19 18:03:07,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716820.0, ans=0.1 2024-09-19 18:03:13,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=716860.0, ans=0.125 2024-09-19 18:03:19,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716860.0, ans=0.1 2024-09-19 18:03:29,462 INFO [train.py:1198] (0/2) Epoch 40, batch 2750, loss[loss=0.2108, ctc_loss=0.09994, cr_loss=0.3351, attn_decoder_loss=0.2157, over 29515.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.112, cr_loss=0.3522, attn_decoder_loss=0.2388, over 5795807.17 frames. ], batch size: 75, lr: 2.75e-03, grad_scale: 4.0 2024-09-19 18:03:41,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=12.0 2024-09-19 18:03:41,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=716900.0, ans=0.125 2024-09-19 18:04:04,577 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.418e+01 8.972e+01 9.467e+01 1.420e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 18:04:17,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717020.0, ans=0.125 2024-09-19 18:04:18,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=717020.0, ans=0.125 2024-09-19 18:04:29,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=717020.0, ans=0.125 2024-09-19 18:04:40,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-09-19 18:04:46,937 INFO [train.py:1198] (0/2) Epoch 40, batch 2800, loss[loss=0.2489, ctc_loss=0.1373, cr_loss=0.3897, attn_decoder_loss=0.2526, over 19804.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1125, cr_loss=0.353, attn_decoder_loss=0.2389, over 5776838.83 frames. ], batch size: 209, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:04:57,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=717100.0, ans=0.125 2024-09-19 18:05:00,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=717140.0, ans=0.2 2024-09-19 18:05:08,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=717140.0, ans=0.125 2024-09-19 18:05:14,387 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:05:20,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=717180.0, ans=0.025 2024-09-19 18:05:22,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=717180.0, ans=0.2 2024-09-19 18:05:29,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=717180.0, ans=0.125 2024-09-19 18:06:03,963 INFO [train.py:1198] (0/2) Epoch 40, batch 2850, loss[loss=0.2219, ctc_loss=0.1017, cr_loss=0.3343, attn_decoder_loss=0.2278, over 29496.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1129, cr_loss=0.3537, attn_decoder_loss=0.2395, over 5762779.30 frames. ], batch size: 77, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:06:05,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=717300.0, ans=0.0 2024-09-19 18:06:17,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=717340.0, ans=0.125 2024-09-19 18:06:29,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-09-19 18:06:37,388 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.590e+01 9.012e+01 9.613e+01 1.852e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-19 18:06:42,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=717380.0, ans=0.125 2024-09-19 18:07:19,947 INFO [train.py:1198] (0/2) Epoch 40, batch 2900, loss[loss=0.2342, ctc_loss=0.1153, cr_loss=0.3533, attn_decoder_loss=0.2395, over 29456.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1136, cr_loss=0.3551, attn_decoder_loss=0.2407, over 5788023.15 frames. ], batch size: 79, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:07:24,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=717500.0, ans=0.04949747468305833 2024-09-19 18:07:34,285 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:07:46,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=717540.0, ans=0.125 2024-09-19 18:08:06,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=717620.0, ans=0.125 2024-09-19 18:08:07,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=717620.0, ans=0.2 2024-09-19 18:08:07,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=717620.0, ans=0.125 2024-09-19 18:08:37,796 INFO [train.py:1198] (0/2) Epoch 40, batch 2950, loss[loss=0.2166, ctc_loss=0.09499, cr_loss=0.3023, attn_decoder_loss=0.2234, over 29490.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1129, cr_loss=0.3533, attn_decoder_loss=0.2395, over 5781375.83 frames. ], batch size: 75, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:08:41,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-19 18:09:02,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=717740.0, ans=0.125 2024-09-19 18:09:11,438 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.443e+01 9.079e+01 9.666e+01 1.457e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 18:09:56,123 INFO [train.py:1198] (0/2) Epoch 40, batch 3000, loss[loss=0.2263, ctc_loss=0.1033, cr_loss=0.334, attn_decoder_loss=0.2326, over 29754.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1127, cr_loss=0.3531, attn_decoder_loss=0.2393, over 5783270.48 frames. ], batch size: 81, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:09:56,124 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 18:10:14,432 INFO [train.py:1230] (0/2) Epoch 40, validation: loss=0.2122, ctc_loss=0.03685, cr_loss=5.615e-15, attn_decoder_loss=0.2317, over 944034.00 frames. 2024-09-19 18:10:14,432 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 18:10:28,593 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:11:03,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=718020.0, ans=0.0 2024-09-19 18:11:12,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=718020.0, ans=0.02 2024-09-19 18:11:17,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=718060.0, ans=0.0 2024-09-19 18:11:32,566 INFO [train.py:1198] (0/2) Epoch 40, batch 3050, loss[loss=0.2252, ctc_loss=0.1114, cr_loss=0.3369, attn_decoder_loss=0.2303, over 29535.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1132, cr_loss=0.3536, attn_decoder_loss=0.2401, over 5776459.84 frames. ], batch size: 76, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:11:50,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=718140.0, ans=0.125 2024-09-19 18:11:56,050 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.59 vs. limit=15.0 2024-09-19 18:11:58,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=718140.0, ans=0.125 2024-09-19 18:12:05,628 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 8.520e+01 9.084e+01 9.934e+01 1.461e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 18:12:12,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=718180.0, ans=0.025 2024-09-19 18:12:47,572 INFO [train.py:1198] (0/2) Epoch 40, batch 3100, loss[loss=0.2497, ctc_loss=0.1272, cr_loss=0.381, attn_decoder_loss=0.2549, over 29271.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.113, cr_loss=0.3531, attn_decoder_loss=0.24, over 5777258.43 frames. ], batch size: 100, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:12:52,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718300.0, ans=0.125 2024-09-19 18:12:58,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=718300.0, ans=0.0 2024-09-19 18:13:06,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=718340.0, ans=0.125 2024-09-19 18:13:17,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=718340.0, ans=0.0 2024-09-19 18:13:20,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=718380.0, ans=0.0 2024-09-19 18:13:58,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=718460.0, ans=0.2 2024-09-19 18:14:06,043 INFO [train.py:1198] (0/2) Epoch 40, batch 3150, loss[loss=0.242, ctc_loss=0.1166, cr_loss=0.366, attn_decoder_loss=0.2478, over 28870.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1131, cr_loss=0.3532, attn_decoder_loss=0.2399, over 5783856.70 frames. ], batch size: 104, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:14:06,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=718500.0, ans=0.125 2024-09-19 18:14:07,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=718500.0, ans=0.0 2024-09-19 18:14:13,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-19 18:14:16,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=718500.0, ans=0.025 2024-09-19 18:14:18,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=718500.0, ans=0.1 2024-09-19 18:14:30,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=718540.0, ans=0.0 2024-09-19 18:14:36,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=718580.0, ans=0.2 2024-09-19 18:14:39,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.506e+01 9.197e+01 9.540e+01 2.562e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 18:14:46,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.25 vs. limit=12.0 2024-09-19 18:14:47,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=718580.0, ans=0.0 2024-09-19 18:15:23,524 INFO [train.py:1198] (0/2) Epoch 40, batch 3200, loss[loss=0.2374, ctc_loss=0.1121, cr_loss=0.3476, attn_decoder_loss=0.2436, over 29389.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1131, cr_loss=0.3537, attn_decoder_loss=0.2395, over 5794368.89 frames. ], batch size: 79, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:16:04,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=718780.0, ans=0.125 2024-09-19 18:16:07,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=718820.0, ans=0.07 2024-09-19 18:16:12,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=718820.0, ans=0.125 2024-09-19 18:16:14,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=718820.0, ans=0.0 2024-09-19 18:16:31,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=718860.0, ans=0.95 2024-09-19 18:16:38,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.24 vs. limit=22.5 2024-09-19 18:16:38,752 INFO [train.py:1198] (0/2) Epoch 40, batch 3250, loss[loss=0.23, ctc_loss=0.1084, cr_loss=0.3495, attn_decoder_loss=0.2357, over 29717.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.113, cr_loss=0.3538, attn_decoder_loss=0.2397, over 5801180.60 frames. ], batch size: 84, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:16:42,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=718900.0, ans=0.0 2024-09-19 18:16:45,175 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:16:54,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=718940.0, ans=0.125 2024-09-19 18:16:58,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=718940.0, ans=0.125 2024-09-19 18:17:09,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=718980.0, ans=0.125 2024-09-19 18:17:13,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.518e+01 9.005e+01 9.479e+01 1.398e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 18:17:29,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=719020.0, ans=0.04949747468305833 2024-09-19 18:17:55,808 INFO [train.py:1198] (0/2) Epoch 40, batch 3300, loss[loss=0.2477, ctc_loss=0.1206, cr_loss=0.3604, attn_decoder_loss=0.2538, over 28342.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1122, cr_loss=0.352, attn_decoder_loss=0.2385, over 5798021.47 frames. ], batch size: 111, lr: 2.75e-03, grad_scale: 16.0 2024-09-19 18:17:57,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719100.0, ans=0.1 2024-09-19 18:18:12,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=719140.0, ans=0.2 2024-09-19 18:18:37,289 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:19:01,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=719260.0, ans=0.125 2024-09-19 18:19:13,643 INFO [train.py:1198] (0/2) Epoch 40, batch 3350, loss[loss=0.254, ctc_loss=0.1258, cr_loss=0.3693, attn_decoder_loss=0.26, over 28855.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1134, cr_loss=0.3546, attn_decoder_loss=0.2393, over 5774274.70 frames. ], batch size: 104, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:19:27,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=719340.0, ans=0.95 2024-09-19 18:19:41,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=719340.0, ans=0.125 2024-09-19 18:19:48,474 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.582e+01 9.036e+01 9.650e+01 6.119e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-19 18:19:50,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=719380.0, ans=0.0 2024-09-19 18:19:51,000 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-09-19 18:20:26,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719460.0, ans=0.1 2024-09-19 18:20:29,156 INFO [train.py:1198] (0/2) Epoch 40, batch 3400, loss[loss=0.2099, ctc_loss=0.09878, cr_loss=0.3238, attn_decoder_loss=0.215, over 29392.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1138, cr_loss=0.3553, attn_decoder_loss=0.2394, over 5767492.07 frames. ], batch size: 67, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:20:46,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=719540.0, ans=0.0 2024-09-19 18:21:39,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=719660.0, ans=0.0 2024-09-19 18:21:41,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=719660.0, ans=0.2 2024-09-19 18:21:41,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=719660.0, ans=0.125 2024-09-19 18:21:46,898 INFO [train.py:1198] (0/2) Epoch 40, batch 3450, loss[loss=0.2453, ctc_loss=0.1225, cr_loss=0.3743, attn_decoder_loss=0.2507, over 28327.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1135, cr_loss=0.3542, attn_decoder_loss=0.2395, over 5775310.95 frames. ], batch size: 111, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:21:52,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.95 vs. limit=10.0 2024-09-19 18:22:09,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=719740.0, ans=0.025 2024-09-19 18:22:21,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.580e+01 9.014e+01 9.618e+01 1.900e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 18:22:23,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=719780.0, ans=0.2 2024-09-19 18:22:28,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.25 vs. limit=15.0 2024-09-19 18:22:43,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-19 18:22:46,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=719820.0, ans=0.0 2024-09-19 18:23:04,446 INFO [train.py:1198] (0/2) Epoch 40, batch 3500, loss[loss=0.2093, ctc_loss=0.0908, cr_loss=0.3126, attn_decoder_loss=0.2155, over 29303.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1131, cr_loss=0.3533, attn_decoder_loss=0.2389, over 5777795.50 frames. ], batch size: 71, lr: 2.75e-03, grad_scale: 8.0 2024-09-19 18:23:21,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=719940.0, ans=0.125 2024-09-19 18:23:32,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-09-19 18:23:33,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=719980.0, ans=0.125 2024-09-19 18:23:40,939 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-180000.pt 2024-09-19 18:23:51,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=719980.0, ans=0.125 2024-09-19 18:23:51,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=719980.0, ans=0.07 2024-09-19 18:23:55,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=720020.0, ans=0.125 2024-09-19 18:24:04,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=720020.0, ans=0.04949747468305833 2024-09-19 18:24:09,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.34 vs. limit=22.5 2024-09-19 18:24:26,508 INFO [train.py:1198] (0/2) Epoch 40, batch 3550, loss[loss=0.241, ctc_loss=0.1111, cr_loss=0.3365, attn_decoder_loss=0.248, over 29719.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1129, cr_loss=0.3529, attn_decoder_loss=0.239, over 5782977.12 frames. ], batch size: 89, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:24:29,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2024-09-19 18:24:30,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 2024-09-19 18:24:34,778 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.49 vs. limit=15.0 2024-09-19 18:24:35,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=720100.0, ans=0.125 2024-09-19 18:24:36,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-09-19 18:24:40,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=720140.0, ans=0.0 2024-09-19 18:24:40,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=720140.0, ans=0.025 2024-09-19 18:24:45,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-09-19 18:24:45,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=720140.0, ans=0.1 2024-09-19 18:24:56,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=720180.0, ans=0.1 2024-09-19 18:24:58,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=720180.0, ans=0.125 2024-09-19 18:24:59,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=720180.0, ans=0.125 2024-09-19 18:25:00,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.600e+01 9.034e+01 9.634e+01 4.593e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-19 18:25:03,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=720180.0, ans=0.2 2024-09-19 18:25:09,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=720220.0, ans=0.0 2024-09-19 18:25:13,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-19 18:25:14,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720220.0, ans=0.125 2024-09-19 18:25:22,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=720220.0, ans=0.0 2024-09-19 18:25:40,388 INFO [train.py:1198] (0/2) Epoch 40, batch 3600, loss[loss=0.2315, ctc_loss=0.1086, cr_loss=0.3454, attn_decoder_loss=0.2374, over 29500.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.113, cr_loss=0.3536, attn_decoder_loss=0.2393, over 5792287.83 frames. ], batch size: 77, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:25:47,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720300.0, ans=0.125 2024-09-19 18:25:48,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=720300.0, ans=0.0 2024-09-19 18:26:10,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=720380.0, ans=0.025 2024-09-19 18:26:22,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=720380.0, ans=0.125 2024-09-19 18:26:24,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=720380.0, ans=0.125 2024-09-19 18:26:56,394 INFO [train.py:1198] (0/2) Epoch 40, batch 3650, loss[loss=0.2405, ctc_loss=0.1165, cr_loss=0.35, attn_decoder_loss=0.2465, over 29514.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1122, cr_loss=0.3522, attn_decoder_loss=0.2384, over 5793222.66 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:27:21,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=720540.0, ans=0.0 2024-09-19 18:27:24,816 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:27:30,425 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.608e+01 9.210e+01 9.736e+01 1.315e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-19 18:27:56,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-09-19 18:27:59,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=720660.0, ans=0.1 2024-09-19 18:28:10,685 INFO [train.py:1198] (0/2) Epoch 40, batch 3700, loss[loss=0.2449, ctc_loss=0.1165, cr_loss=0.3686, attn_decoder_loss=0.251, over 29720.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1121, cr_loss=0.3519, attn_decoder_loss=0.2385, over 5804284.66 frames. ], batch size: 84, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:28:21,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=720700.0, ans=0.04949747468305833 2024-09-19 18:28:30,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=720740.0, ans=0.125 2024-09-19 18:28:44,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720780.0, ans=0.0 2024-09-19 18:29:16,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.60 vs. limit=15.0 2024-09-19 18:29:18,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=12.0 2024-09-19 18:29:26,574 INFO [train.py:1198] (0/2) Epoch 40, batch 3750, loss[loss=0.2147, ctc_loss=0.09589, cr_loss=0.3043, attn_decoder_loss=0.2212, over 29361.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1122, cr_loss=0.3521, attn_decoder_loss=0.2385, over 5808559.94 frames. ], batch size: 67, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:29:30,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-09-19 18:29:34,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=720900.0, ans=0.125 2024-09-19 18:29:43,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=720940.0, ans=0.125 2024-09-19 18:30:01,966 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.539e+01 9.071e+01 9.494e+01 1.651e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 18:30:13,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=721020.0, ans=0.0 2024-09-19 18:30:18,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=721020.0, ans=0.125 2024-09-19 18:30:30,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=721060.0, ans=0.2 2024-09-19 18:30:40,981 INFO [train.py:1198] (0/2) Epoch 40, batch 3800, loss[loss=0.2386, ctc_loss=0.1057, cr_loss=0.3522, attn_decoder_loss=0.2456, over 29623.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1121, cr_loss=0.3516, attn_decoder_loss=0.2379, over 5799320.15 frames. ], batch size: 86, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:30:44,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=721100.0, ans=0.1 2024-09-19 18:31:18,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=721180.0, ans=0.125 2024-09-19 18:31:32,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721220.0, ans=0.1 2024-09-19 18:31:32,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=721220.0, ans=0.125 2024-09-19 18:31:40,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=721260.0, ans=0.125 2024-09-19 18:31:56,313 INFO [train.py:1198] (0/2) Epoch 40, batch 3850, loss[loss=0.2483, ctc_loss=0.1204, cr_loss=0.3606, attn_decoder_loss=0.2545, over 29316.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1116, cr_loss=0.3507, attn_decoder_loss=0.2378, over 5812392.26 frames. ], batch size: 100, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:32:04,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-09-19 18:32:17,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=721340.0, ans=0.2 2024-09-19 18:32:17,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721340.0, ans=0.1 2024-09-19 18:32:31,699 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.429e+01 8.857e+01 9.400e+01 1.753e+02, threshold=1.771e+02, percent-clipped=0.0 2024-09-19 18:32:36,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=721380.0, ans=0.1 2024-09-19 18:32:51,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=721420.0, ans=0.05 2024-09-19 18:32:55,908 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.53 vs. limit=22.5 2024-09-19 18:32:59,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=721460.0, ans=0.125 2024-09-19 18:33:02,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=721460.0, ans=0.0 2024-09-19 18:33:05,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=721460.0, ans=0.125 2024-09-19 18:33:10,021 INFO [train.py:1198] (0/2) Epoch 40, batch 3900, loss[loss=0.2458, ctc_loss=0.1206, cr_loss=0.3758, attn_decoder_loss=0.2514, over 29635.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1122, cr_loss=0.3524, attn_decoder_loss=0.2387, over 5816778.24 frames. ], batch size: 86, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:33:19,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.31 vs. limit=10.0 2024-09-19 18:33:37,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=721580.0, ans=0.0 2024-09-19 18:33:39,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=721580.0, ans=0.125 2024-09-19 18:33:50,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=721580.0, ans=0.2 2024-09-19 18:34:09,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=721660.0, ans=0.04949747468305833 2024-09-19 18:34:10,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-09-19 18:34:23,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=721660.0, ans=15.0 2024-09-19 18:34:25,656 INFO [train.py:1198] (0/2) Epoch 40, batch 3950, loss[loss=0.2459, ctc_loss=0.1222, cr_loss=0.3686, attn_decoder_loss=0.2514, over 29447.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1122, cr_loss=0.3527, attn_decoder_loss=0.2389, over 5835855.39 frames. ], batch size: 97, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:34:29,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=721700.0, ans=0.125 2024-09-19 18:34:42,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=721740.0, ans=0.125 2024-09-19 18:34:52,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=721740.0, ans=0.0 2024-09-19 18:34:59,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721780.0, ans=0.1 2024-09-19 18:35:00,912 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.605e+01 9.141e+01 9.620e+01 2.736e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-19 18:35:12,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=721820.0, ans=0.0 2024-09-19 18:35:21,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=721820.0, ans=0.0 2024-09-19 18:35:26,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=721860.0, ans=0.125 2024-09-19 18:35:31,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=12.0 2024-09-19 18:35:39,103 INFO [train.py:1198] (0/2) Epoch 40, batch 4000, loss[loss=0.2241, ctc_loss=0.1082, cr_loss=0.3496, attn_decoder_loss=0.2292, over 29523.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1122, cr_loss=0.3528, attn_decoder_loss=0.239, over 5812913.53 frames. ], batch size: 74, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:36:02,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=721940.0, ans=0.125 2024-09-19 18:36:06,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=721940.0, ans=0.2 2024-09-19 18:36:17,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=721980.0, ans=0.2 2024-09-19 18:36:23,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=722020.0, ans=0.0 2024-09-19 18:36:23,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=722020.0, ans=0.1 2024-09-19 18:36:41,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=722060.0, ans=0.1 2024-09-19 18:36:41,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=722060.0, ans=0.2 2024-09-19 18:36:54,439 INFO [train.py:1198] (0/2) Epoch 40, batch 4050, loss[loss=0.2487, ctc_loss=0.1355, cr_loss=0.3757, attn_decoder_loss=0.2529, over 19798.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1127, cr_loss=0.3535, attn_decoder_loss=0.2389, over 5796045.58 frames. ], batch size: 209, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:36:56,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722100.0, ans=0.125 2024-09-19 18:37:02,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=722100.0, ans=0.2 2024-09-19 18:37:22,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=722180.0, ans=0.0 2024-09-19 18:37:29,855 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.666e+01 9.149e+01 9.737e+01 4.805e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-19 18:37:55,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.69 vs. limit=15.0 2024-09-19 18:37:57,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-19 18:37:59,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=722260.0, ans=0.0 2024-09-19 18:38:08,049 INFO [train.py:1198] (0/2) Epoch 40, batch 4100, loss[loss=0.2504, ctc_loss=0.1269, cr_loss=0.3861, attn_decoder_loss=0.2555, over 29517.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1131, cr_loss=0.3546, attn_decoder_loss=0.2392, over 5792544.01 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:38:26,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.81 vs. limit=15.0 2024-09-19 18:38:46,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=722380.0, ans=0.025 2024-09-19 18:38:49,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=722380.0, ans=0.2 2024-09-19 18:38:50,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=722380.0, ans=0.125 2024-09-19 18:38:53,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=722420.0, ans=0.125 2024-09-19 18:39:13,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.67 vs. limit=15.0 2024-09-19 18:39:14,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-19 18:39:23,036 INFO [train.py:1198] (0/2) Epoch 40, batch 4150, loss[loss=0.2211, ctc_loss=0.1035, cr_loss=0.3206, attn_decoder_loss=0.227, over 29488.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1128, cr_loss=0.3539, attn_decoder_loss=0.239, over 5797393.96 frames. ], batch size: 77, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:39:31,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2024-09-19 18:39:36,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=722540.0, ans=0.0 2024-09-19 18:39:39,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=722540.0, ans=0.0 2024-09-19 18:39:57,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.418e+01 8.915e+01 9.615e+01 1.835e+02, threshold=1.783e+02, percent-clipped=1.0 2024-09-19 18:40:21,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=722660.0, ans=0.125 2024-09-19 18:40:36,013 INFO [train.py:1198] (0/2) Epoch 40, batch 4200, loss[loss=0.2501, ctc_loss=0.125, cr_loss=0.3816, attn_decoder_loss=0.2555, over 29478.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1127, cr_loss=0.3539, attn_decoder_loss=0.2391, over 5799917.79 frames. ], batch size: 90, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:40:42,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2024-09-19 18:41:22,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=722820.0, ans=0.04949747468305833 2024-09-19 18:41:34,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=722860.0, ans=0.02 2024-09-19 18:41:37,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722860.0, ans=0.125 2024-09-19 18:41:37,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-09-19 18:41:50,104 INFO [train.py:1198] (0/2) Epoch 40, batch 4250, loss[loss=0.2214, ctc_loss=0.1014, cr_loss=0.3211, attn_decoder_loss=0.2276, over 29516.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1123, cr_loss=0.3528, attn_decoder_loss=0.239, over 5805688.66 frames. ], batch size: 74, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:42:27,450 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.534e+01 9.089e+01 9.722e+01 3.339e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-19 18:42:28,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.82 vs. limit=15.0 2024-09-19 18:42:35,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=723020.0, ans=0.125 2024-09-19 18:42:52,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723060.0, ans=0.125 2024-09-19 18:43:04,240 INFO [train.py:1198] (0/2) Epoch 40, batch 4300, loss[loss=0.24, ctc_loss=0.1058, cr_loss=0.3354, attn_decoder_loss=0.2475, over 29528.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1116, cr_loss=0.3511, attn_decoder_loss=0.2389, over 5795303.32 frames. ], batch size: 87, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:43:19,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723140.0, ans=0.1 2024-09-19 18:43:22,436 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:43:55,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-19 18:44:11,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=723260.0, ans=0.025 2024-09-19 18:44:19,231 INFO [train.py:1198] (0/2) Epoch 40, batch 4350, loss[loss=0.2438, ctc_loss=0.1119, cr_loss=0.3423, attn_decoder_loss=0.2508, over 29483.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1143, cr_loss=0.3567, attn_decoder_loss=0.2421, over 5798401.02 frames. ], batch size: 97, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:44:55,945 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.976e+01 9.434e+01 1.012e+02 1.882e+02, threshold=1.887e+02, percent-clipped=1.0 2024-09-19 18:45:24,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=723460.0, ans=0.0 2024-09-19 18:45:32,919 INFO [train.py:1198] (0/2) Epoch 40, batch 4400, loss[loss=0.2443, ctc_loss=0.1219, cr_loss=0.3919, attn_decoder_loss=0.2492, over 27121.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1157, cr_loss=0.3597, attn_decoder_loss=0.2444, over 5768628.91 frames. ], batch size: 124, lr: 2.74e-03, grad_scale: 16.0 2024-09-19 18:45:41,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2024-09-19 18:46:01,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-19 18:46:30,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.46 vs. limit=15.0 2024-09-19 18:46:34,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=723660.0, ans=0.2 2024-09-19 18:46:46,892 INFO [train.py:1198] (0/2) Epoch 40, batch 4450, loss[loss=0.2606, ctc_loss=0.1472, cr_loss=0.4113, attn_decoder_loss=0.2641, over 20060.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1194, cr_loss=0.3659, attn_decoder_loss=0.2466, over 5583481.43 frames. ], batch size: 210, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:46:50,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723700.0, ans=0.1 2024-09-19 18:47:07,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=723740.0, ans=0.125 2024-09-19 18:47:13,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=723740.0, ans=0.0 2024-09-19 18:47:26,345 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.232e+01 9.186e+01 1.020e+02 1.192e+02 3.727e+02, threshold=2.040e+02, percent-clipped=2.0 2024-09-19 18:47:29,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=723780.0, ans=0.125 2024-09-19 18:47:52,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=723860.0, ans=0.95 2024-09-19 18:47:56,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=723860.0, ans=0.125 2024-09-19 18:47:59,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723860.0, ans=0.1 2024-09-19 18:48:02,467 INFO [train.py:1198] (0/2) Epoch 40, batch 4500, loss[loss=0.2531, ctc_loss=0.1323, cr_loss=0.3837, attn_decoder_loss=0.258, over 20665.00 frames. ], tot_loss[loss=0.2432, ctc_loss=0.1226, cr_loss=0.3685, attn_decoder_loss=0.2484, over 5242714.70 frames. ], batch size: 210, lr: 2.74e-03, grad_scale: 8.0 2024-09-19 18:48:02,898 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:48:23,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=723940.0, ans=0.125 2024-09-19 18:48:24,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=723940.0, ans=0.0 2024-09-19 18:48:27,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=723940.0, ans=0.125 2024-09-19 18:48:39,560 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-40.pt 2024-09-19 18:49:17,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2024-09-19 18:49:17,628 INFO [train.py:1198] (0/2) Epoch 41, batch 0, loss[loss=0.2146, ctc_loss=0.09514, cr_loss=0.3081, attn_decoder_loss=0.2211, over 29626.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.09514, cr_loss=0.3081, attn_decoder_loss=0.2211, over 29626.00 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:49:17,629 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 18:49:35,953 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9509, 3.5523, 3.7480, 3.8470], device='cuda:0') 2024-09-19 18:49:36,956 INFO [train.py:1230] (0/2) Epoch 41, validation: loss=0.2123, ctc_loss=0.03622, cr_loss=6.741e-15, attn_decoder_loss=0.2319, over 944034.00 frames. 2024-09-19 18:49:36,956 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 18:50:10,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2024-09-19 18:50:34,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=724120.0, ans=0.025 2024-09-19 18:50:36,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=724160.0, ans=0.125 2024-09-19 18:50:48,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.26 vs. limit=15.0 2024-09-19 18:50:52,550 INFO [train.py:1198] (0/2) Epoch 41, batch 50, loss[loss=0.2084, ctc_loss=0.09023, cr_loss=0.3064, attn_decoder_loss=0.2147, over 29438.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1149, cr_loss=0.3581, attn_decoder_loss=0.2404, over 1268527.95 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:50:54,028 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.153e+01 1.062e+02 1.232e+02 3.092e+02, threshold=2.125e+02, percent-clipped=2.0 2024-09-19 18:51:00,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=724200.0, ans=0.125 2024-09-19 18:51:06,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=724240.0, ans=0.5 2024-09-19 18:51:07,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=724240.0, ans=0.0 2024-09-19 18:51:18,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-09-19 18:51:18,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.73 vs. limit=15.0 2024-09-19 18:51:27,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=724280.0, ans=0.025 2024-09-19 18:51:29,581 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-19 18:51:30,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=724280.0, ans=0.0 2024-09-19 18:51:33,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=724280.0, ans=0.2 2024-09-19 18:51:34,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=724280.0, ans=0.2 2024-09-19 18:51:34,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=724280.0, ans=0.0 2024-09-19 18:51:57,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724360.0, ans=0.1 2024-09-19 18:52:00,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=724360.0, ans=0.0 2024-09-19 18:52:07,799 INFO [train.py:1198] (0/2) Epoch 41, batch 100, loss[loss=0.2252, ctc_loss=0.1153, cr_loss=0.3636, attn_decoder_loss=0.2293, over 29534.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1161, cr_loss=0.3623, attn_decoder_loss=0.2421, over 2252710.34 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:52:17,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724400.0, ans=0.1 2024-09-19 18:52:57,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=724520.0, ans=0.0 2024-09-19 18:53:07,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724520.0, ans=0.0 2024-09-19 18:53:10,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724560.0, ans=0.0 2024-09-19 18:53:23,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=724560.0, ans=0.125 2024-09-19 18:53:27,432 INFO [train.py:1198] (0/2) Epoch 41, batch 150, loss[loss=0.2076, ctc_loss=0.09995, cr_loss=0.3373, attn_decoder_loss=0.212, over 29397.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1136, cr_loss=0.3557, attn_decoder_loss=0.2395, over 3047738.22 frames. ], batch size: 70, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:53:30,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 8.681e+01 9.088e+01 9.657e+01 1.697e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 18:53:42,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724640.0, ans=0.125 2024-09-19 18:54:00,781 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:54:07,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-19 18:54:17,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724720.0, ans=0.1 2024-09-19 18:54:23,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2024-09-19 18:54:37,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-19 18:54:42,659 INFO [train.py:1198] (0/2) Epoch 41, batch 200, loss[loss=0.2464, ctc_loss=0.1239, cr_loss=0.3676, attn_decoder_loss=0.2519, over 27668.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.113, cr_loss=0.3543, attn_decoder_loss=0.239, over 3658396.96 frames. ], batch size: 125, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:54:44,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=724800.0, ans=0.125 2024-09-19 18:55:16,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=724880.0, ans=0.125 2024-09-19 18:55:26,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=724920.0, ans=0.5 2024-09-19 18:55:57,744 INFO [train.py:1198] (0/2) Epoch 41, batch 250, loss[loss=0.2548, ctc_loss=0.1286, cr_loss=0.4022, attn_decoder_loss=0.2598, over 29313.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1125, cr_loss=0.3539, attn_decoder_loss=0.2388, over 4141526.74 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:56:00,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.416e+01 8.964e+01 9.351e+01 1.561e+02, threshold=1.793e+02, percent-clipped=0.0 2024-09-19 18:56:31,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=725080.0, ans=0.125 2024-09-19 18:57:17,570 INFO [train.py:1198] (0/2) Epoch 41, batch 300, loss[loss=0.2446, ctc_loss=0.1264, cr_loss=0.3907, attn_decoder_loss=0.2491, over 29548.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1123, cr_loss=0.353, attn_decoder_loss=0.2383, over 4509944.96 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:57:27,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=725200.0, ans=0.125 2024-09-19 18:57:56,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=725280.0, ans=0.0 2024-09-19 18:58:10,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=725320.0, ans=0.04949747468305833 2024-09-19 18:58:19,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=725360.0, ans=0.0 2024-09-19 18:58:21,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2024-09-19 18:58:23,394 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.00 vs. limit=22.5 2024-09-19 18:58:33,157 INFO [train.py:1198] (0/2) Epoch 41, batch 350, loss[loss=0.2076, ctc_loss=0.08828, cr_loss=0.2996, attn_decoder_loss=0.2142, over 29324.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1129, cr_loss=0.3547, attn_decoder_loss=0.2388, over 4796383.44 frames. ], batch size: 71, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 18:58:34,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=725400.0, ans=0.125 2024-09-19 18:58:36,040 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.397e+01 8.852e+01 9.608e+01 1.644e+02, threshold=1.770e+02, percent-clipped=0.0 2024-09-19 18:58:39,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=725400.0, ans=0.2 2024-09-19 18:58:42,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=725400.0, ans=0.0 2024-09-19 18:59:24,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=725520.0, ans=0.0 2024-09-19 18:59:25,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=725520.0, ans=0.0 2024-09-19 18:59:46,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.97 vs. limit=10.0 2024-09-19 18:59:48,519 INFO [train.py:1198] (0/2) Epoch 41, batch 400, loss[loss=0.2292, ctc_loss=0.1061, cr_loss=0.3445, attn_decoder_loss=0.2352, over 29698.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1122, cr_loss=0.3537, attn_decoder_loss=0.2384, over 5026577.42 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 18:59:48,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=725600.0, ans=0.125 2024-09-19 18:59:59,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=725600.0, ans=0.125 2024-09-19 19:00:05,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=725640.0, ans=0.2 2024-09-19 19:00:05,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=725640.0, ans=0.2 2024-09-19 19:00:15,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=725640.0, ans=0.125 2024-09-19 19:00:19,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=725680.0, ans=0.0 2024-09-19 19:01:08,850 INFO [train.py:1198] (0/2) Epoch 41, batch 450, loss[loss=0.2264, ctc_loss=0.1013, cr_loss=0.334, attn_decoder_loss=0.2329, over 29691.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1121, cr_loss=0.353, attn_decoder_loss=0.2385, over 5189824.24 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:01:11,781 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.467e+01 8.907e+01 9.504e+01 2.028e+02, threshold=1.781e+02, percent-clipped=1.0 2024-09-19 19:01:20,727 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=12.0 2024-09-19 19:01:29,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2024-09-19 19:01:36,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725840.0, ans=0.125 2024-09-19 19:02:18,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=725960.0, ans=0.0 2024-09-19 19:02:21,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=725960.0, ans=0.0 2024-09-19 19:02:24,452 INFO [train.py:1198] (0/2) Epoch 41, batch 500, loss[loss=0.2444, ctc_loss=0.1148, cr_loss=0.362, attn_decoder_loss=0.2507, over 29428.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1118, cr_loss=0.3522, attn_decoder_loss=0.2381, over 5332075.58 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:02:29,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=726000.0, ans=0.125 2024-09-19 19:02:51,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=726040.0, ans=0.09899494936611666 2024-09-19 19:03:02,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=726080.0, ans=0.0 2024-09-19 19:03:05,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=726080.0, ans=0.125 2024-09-19 19:03:10,738 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-19 19:03:25,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=726160.0, ans=0.125 2024-09-19 19:03:37,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=726160.0, ans=0.125 2024-09-19 19:03:37,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-19 19:03:39,812 INFO [train.py:1198] (0/2) Epoch 41, batch 550, loss[loss=0.245, ctc_loss=0.1185, cr_loss=0.3792, attn_decoder_loss=0.2506, over 28702.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1118, cr_loss=0.352, attn_decoder_loss=0.2382, over 5424417.03 frames. ], batch size: 104, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:03:42,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.739e+01 9.193e+01 9.957e+01 2.783e+02, threshold=1.839e+02, percent-clipped=3.0 2024-09-19 19:03:45,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.51 vs. limit=15.0 2024-09-19 19:04:16,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=726280.0, ans=0.125 2024-09-19 19:04:17,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726280.0, ans=0.0 2024-09-19 19:04:18,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2024-09-19 19:04:28,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=726320.0, ans=0.2 2024-09-19 19:04:30,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726320.0, ans=0.1 2024-09-19 19:04:40,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=726360.0, ans=0.0 2024-09-19 19:04:58,290 INFO [train.py:1198] (0/2) Epoch 41, batch 600, loss[loss=0.2477, ctc_loss=0.1249, cr_loss=0.388, attn_decoder_loss=0.2527, over 29284.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.112, cr_loss=0.3524, attn_decoder_loss=0.2384, over 5510380.86 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:04:59,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2024-09-19 19:05:03,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=15.0 2024-09-19 19:05:20,160 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:05:33,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=726480.0, ans=0.1 2024-09-19 19:05:39,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=726480.0, ans=0.125 2024-09-19 19:05:40,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=726480.0, ans=0.07 2024-09-19 19:05:50,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=726520.0, ans=0.125 2024-09-19 19:05:51,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=726520.0, ans=0.125 2024-09-19 19:06:15,355 INFO [train.py:1198] (0/2) Epoch 41, batch 650, loss[loss=0.2356, ctc_loss=0.1044, cr_loss=0.3253, attn_decoder_loss=0.2429, over 29775.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1109, cr_loss=0.3502, attn_decoder_loss=0.2378, over 5586482.20 frames. ], batch size: 81, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:06:19,865 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.350e+01 8.880e+01 9.262e+01 1.448e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-19 19:06:36,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=12.0 2024-09-19 19:06:41,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726640.0, ans=0.1 2024-09-19 19:06:45,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=726680.0, ans=0.05 2024-09-19 19:07:02,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=726720.0, ans=0.125 2024-09-19 19:07:17,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=726760.0, ans=0.2 2024-09-19 19:07:29,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=726800.0, ans=0.125 2024-09-19 19:07:30,717 INFO [train.py:1198] (0/2) Epoch 41, batch 700, loss[loss=0.2251, ctc_loss=0.1052, cr_loss=0.3381, attn_decoder_loss=0.2309, over 29536.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1115, cr_loss=0.3515, attn_decoder_loss=0.2387, over 5636945.53 frames. ], batch size: 76, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:07:34,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=726800.0, ans=0.125 2024-09-19 19:07:35,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=726800.0, ans=0.2 2024-09-19 19:07:37,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.50 vs. limit=15.0 2024-09-19 19:07:49,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-19 19:07:54,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=726840.0, ans=0.125 2024-09-19 19:08:01,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=726880.0, ans=0.125 2024-09-19 19:08:11,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=726880.0, ans=0.125 2024-09-19 19:08:22,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=726920.0, ans=0.125 2024-09-19 19:08:22,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=726920.0, ans=0.125 2024-09-19 19:08:23,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=726920.0, ans=0.0 2024-09-19 19:08:25,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=726920.0, ans=0.07 2024-09-19 19:08:26,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=726920.0, ans=0.0 2024-09-19 19:08:46,095 INFO [train.py:1198] (0/2) Epoch 41, batch 750, loss[loss=0.2442, ctc_loss=0.1174, cr_loss=0.3612, attn_decoder_loss=0.2502, over 29723.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1113, cr_loss=0.3511, attn_decoder_loss=0.2383, over 5677086.91 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:08:52,746 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.416e+01 8.976e+01 9.718e+01 1.767e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-19 19:08:54,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=727000.0, ans=0.125 2024-09-19 19:08:56,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2024-09-19 19:09:00,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-19 19:09:30,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-19 19:09:36,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2024-09-19 19:09:45,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=727120.0, ans=0.0 2024-09-19 19:10:06,047 INFO [train.py:1198] (0/2) Epoch 41, batch 800, loss[loss=0.2098, ctc_loss=0.09009, cr_loss=0.2969, attn_decoder_loss=0.2165, over 29624.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1117, cr_loss=0.3518, attn_decoder_loss=0.2383, over 5707766.77 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:10:27,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=727240.0, ans=0.2 2024-09-19 19:10:29,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-09-19 19:10:30,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=727240.0, ans=0.035 2024-09-19 19:10:30,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=727240.0, ans=0.025 2024-09-19 19:10:30,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-09-19 19:10:38,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727280.0, ans=0.1 2024-09-19 19:10:51,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=727320.0, ans=0.125 2024-09-19 19:10:58,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.32 vs. limit=22.5 2024-09-19 19:11:00,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=727320.0, ans=0.05 2024-09-19 19:11:06,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=727360.0, ans=0.125 2024-09-19 19:11:21,347 INFO [train.py:1198] (0/2) Epoch 41, batch 850, loss[loss=0.2343, ctc_loss=0.111, cr_loss=0.3458, attn_decoder_loss=0.2404, over 29692.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1114, cr_loss=0.3508, attn_decoder_loss=0.2379, over 5737609.00 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:11:25,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.437e+01 9.040e+01 9.490e+01 1.672e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-19 19:11:56,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=727480.0, ans=0.025 2024-09-19 19:11:57,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=727480.0, ans=0.125 2024-09-19 19:12:17,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=727520.0, ans=0.125 2024-09-19 19:12:17,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=727520.0, ans=0.125 2024-09-19 19:12:31,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=727560.0, ans=0.125 2024-09-19 19:12:37,296 INFO [train.py:1198] (0/2) Epoch 41, batch 900, loss[loss=0.2031, ctc_loss=0.09108, cr_loss=0.3138, attn_decoder_loss=0.2085, over 29584.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1116, cr_loss=0.3513, attn_decoder_loss=0.2381, over 5742257.53 frames. ], batch size: 73, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:12:45,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=727600.0, ans=0.0 2024-09-19 19:12:49,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.00 vs. limit=6.0 2024-09-19 19:12:54,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=727640.0, ans=0.125 2024-09-19 19:13:26,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=727720.0, ans=0.5 2024-09-19 19:13:38,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=727720.0, ans=0.1 2024-09-19 19:13:38,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727720.0, ans=0.1 2024-09-19 19:13:56,361 INFO [train.py:1198] (0/2) Epoch 41, batch 950, loss[loss=0.2202, ctc_loss=0.1, cr_loss=0.3232, attn_decoder_loss=0.2264, over 29494.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1118, cr_loss=0.3514, attn_decoder_loss=0.2382, over 5744592.35 frames. ], batch size: 74, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:14:00,868 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.606e+01 9.118e+01 9.826e+01 2.095e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-19 19:14:01,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=727800.0, ans=0.0 2024-09-19 19:14:08,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727800.0, ans=0.1 2024-09-19 19:14:17,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=727840.0, ans=0.025 2024-09-19 19:14:37,965 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=22.5 2024-09-19 19:14:39,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2024-09-19 19:14:54,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-09-19 19:15:07,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=727960.0, ans=0.0 2024-09-19 19:15:12,356 INFO [train.py:1198] (0/2) Epoch 41, batch 1000, loss[loss=0.2183, ctc_loss=0.1068, cr_loss=0.3313, attn_decoder_loss=0.2234, over 29488.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1127, cr_loss=0.3533, attn_decoder_loss=0.2389, over 5737124.46 frames. ], batch size: 77, lr: 2.70e-03, grad_scale: 16.0 2024-09-19 19:15:38,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=728040.0, ans=0.125 2024-09-19 19:15:49,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=728080.0, ans=0.125 2024-09-19 19:15:49,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=728080.0, ans=0.2 2024-09-19 19:16:02,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=728120.0, ans=0.125 2024-09-19 19:16:05,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728120.0, ans=0.1 2024-09-19 19:16:11,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=728160.0, ans=0.125 2024-09-19 19:16:29,739 INFO [train.py:1198] (0/2) Epoch 41, batch 1050, loss[loss=0.2466, ctc_loss=0.1297, cr_loss=0.3965, attn_decoder_loss=0.2508, over 29684.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1123, cr_loss=0.3525, attn_decoder_loss=0.2386, over 5744228.47 frames. ], batch size: 85, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:16:33,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728200.0, ans=0.1 2024-09-19 19:16:34,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=728200.0, ans=0.0 2024-09-19 19:16:35,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.570e+01 9.055e+01 9.661e+01 1.822e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-19 19:16:40,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=728200.0, ans=0.0 2024-09-19 19:16:42,246 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:16:52,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=728240.0, ans=0.125 2024-09-19 19:17:00,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=728280.0, ans=0.125 2024-09-19 19:17:11,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=728280.0, ans=0.2 2024-09-19 19:17:13,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=728280.0, ans=0.0 2024-09-19 19:17:41,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.75 vs. limit=10.0 2024-09-19 19:17:47,176 INFO [train.py:1198] (0/2) Epoch 41, batch 1100, loss[loss=0.2269, ctc_loss=0.1073, cr_loss=0.3444, attn_decoder_loss=0.2325, over 29454.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1123, cr_loss=0.3524, attn_decoder_loss=0.2384, over 5756379.99 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 8.0 2024-09-19 19:17:51,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=728400.0, ans=0.125 2024-09-19 19:18:02,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=728440.0, ans=0.125 2024-09-19 19:18:14,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728440.0, ans=0.1 2024-09-19 19:18:23,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2024-09-19 19:18:25,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-09-19 19:18:45,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-09-19 19:18:50,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=728560.0, ans=0.125 2024-09-19 19:19:02,689 INFO [train.py:1198] (0/2) Epoch 41, batch 1150, loss[loss=0.226, ctc_loss=0.1107, cr_loss=0.3458, attn_decoder_loss=0.2311, over 29422.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1122, cr_loss=0.3523, attn_decoder_loss=0.2382, over 5754490.22 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:19:08,825 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.493e+01 8.986e+01 9.432e+01 3.581e+02, threshold=1.797e+02, percent-clipped=4.0 2024-09-19 19:19:25,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=728640.0, ans=0.2 2024-09-19 19:19:28,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=728640.0, ans=0.0 2024-09-19 19:19:41,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.34 vs. limit=10.0 2024-09-19 19:20:08,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=728760.0, ans=0.2 2024-09-19 19:20:20,770 INFO [train.py:1198] (0/2) Epoch 41, batch 1200, loss[loss=0.2268, ctc_loss=0.0965, cr_loss=0.312, attn_decoder_loss=0.2344, over 29662.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1125, cr_loss=0.3527, attn_decoder_loss=0.2389, over 5746539.84 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:20:42,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728840.0, ans=0.1 2024-09-19 19:20:55,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.56 vs. limit=22.5 2024-09-19 19:21:08,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-19 19:21:16,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=728920.0, ans=0.125 2024-09-19 19:21:33,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=728960.0, ans=0.2 2024-09-19 19:21:38,650 INFO [train.py:1198] (0/2) Epoch 41, batch 1250, loss[loss=0.2507, ctc_loss=0.1263, cr_loss=0.4046, attn_decoder_loss=0.2555, over 29561.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1129, cr_loss=0.3537, attn_decoder_loss=0.2396, over 5774193.70 frames. ], batch size: 92, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:21:40,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=729000.0, ans=0.0 2024-09-19 19:21:44,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.620e+01 9.115e+01 9.641e+01 1.627e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-19 19:21:48,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=729000.0, ans=0.2 2024-09-19 19:21:52,743 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:21:54,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=729040.0, ans=0.2 2024-09-19 19:21:55,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=729040.0, ans=0.0 2024-09-19 19:22:03,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=729040.0, ans=0.2 2024-09-19 19:22:26,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=729120.0, ans=0.0 2024-09-19 19:22:31,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=729120.0, ans=0.0 2024-09-19 19:22:32,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=729120.0, ans=0.0 2024-09-19 19:22:47,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=729160.0, ans=0.125 2024-09-19 19:22:54,565 INFO [train.py:1198] (0/2) Epoch 41, batch 1300, loss[loss=0.2351, ctc_loss=0.1027, cr_loss=0.3207, attn_decoder_loss=0.2427, over 28206.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1125, cr_loss=0.353, attn_decoder_loss=0.2393, over 5778630.18 frames. ], batch size: 111, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:23:02,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=729200.0, ans=0.125 2024-09-19 19:23:02,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=729200.0, ans=0.125 2024-09-19 19:23:22,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=729240.0, ans=0.125 2024-09-19 19:23:32,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=22.5 2024-09-19 19:23:54,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=729360.0, ans=0.125 2024-09-19 19:23:55,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=729360.0, ans=0.0 2024-09-19 19:24:10,603 INFO [train.py:1198] (0/2) Epoch 41, batch 1350, loss[loss=0.2344, ctc_loss=0.1126, cr_loss=0.3506, attn_decoder_loss=0.2402, over 29737.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1119, cr_loss=0.3521, attn_decoder_loss=0.2387, over 5795556.01 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:24:17,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=729400.0, ans=0.125 2024-09-19 19:24:18,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.406e+01 8.862e+01 9.438e+01 1.295e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 19:24:22,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=729400.0, ans=0.125 2024-09-19 19:24:24,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.93 vs. limit=10.0 2024-09-19 19:24:29,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729440.0, ans=0.1 2024-09-19 19:25:05,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-19 19:25:30,641 INFO [train.py:1198] (0/2) Epoch 41, batch 1400, loss[loss=0.2107, ctc_loss=0.09775, cr_loss=0.3177, attn_decoder_loss=0.2162, over 29588.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1117, cr_loss=0.3515, attn_decoder_loss=0.2383, over 5807095.24 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:26:18,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=729720.0, ans=0.0 2024-09-19 19:26:19,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-09-19 19:26:45,630 INFO [train.py:1198] (0/2) Epoch 41, batch 1450, loss[loss=0.2412, ctc_loss=0.1154, cr_loss=0.3434, attn_decoder_loss=0.2476, over 29422.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1119, cr_loss=0.3519, attn_decoder_loss=0.2388, over 5805047.48 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:26:51,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.557e+01 9.068e+01 9.745e+01 1.592e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 19:27:02,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=729840.0, ans=0.025 2024-09-19 19:27:37,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=729920.0, ans=0.5 2024-09-19 19:27:40,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=729920.0, ans=0.025 2024-09-19 19:27:46,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=729960.0, ans=0.02 2024-09-19 19:27:47,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=729960.0, ans=0.0 2024-09-19 19:28:03,357 INFO [train.py:1198] (0/2) Epoch 41, batch 1500, loss[loss=0.2402, ctc_loss=0.1164, cr_loss=0.3627, attn_decoder_loss=0.2459, over 29630.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1118, cr_loss=0.3515, attn_decoder_loss=0.239, over 5804787.69 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:28:49,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=730120.0, ans=0.125 2024-09-19 19:29:05,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-19 19:29:15,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=730160.0, ans=0.025 2024-09-19 19:29:21,291 INFO [train.py:1198] (0/2) Epoch 41, batch 1550, loss[loss=0.2575, ctc_loss=0.1304, cr_loss=0.4053, attn_decoder_loss=0.2626, over 29504.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1123, cr_loss=0.3521, attn_decoder_loss=0.2389, over 5779722.47 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:29:27,257 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.596e+01 9.016e+01 9.921e+01 2.014e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-19 19:29:35,136 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:29:42,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=730240.0, ans=0.0 2024-09-19 19:29:58,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730280.0, ans=0.125 2024-09-19 19:30:09,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=730320.0, ans=0.2 2024-09-19 19:30:36,416 INFO [train.py:1198] (0/2) Epoch 41, batch 1600, loss[loss=0.2411, ctc_loss=0.1174, cr_loss=0.3575, attn_decoder_loss=0.2469, over 29684.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1125, cr_loss=0.3525, attn_decoder_loss=0.2388, over 5762624.43 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:30:42,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=730400.0, ans=0.125 2024-09-19 19:30:44,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=730400.0, ans=0.125 2024-09-19 19:31:10,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=730480.0, ans=0.0 2024-09-19 19:31:13,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=730480.0, ans=0.5 2024-09-19 19:31:13,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=730480.0, ans=0.2 2024-09-19 19:31:23,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=730520.0, ans=0.0 2024-09-19 19:31:23,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=730520.0, ans=0.2 2024-09-19 19:31:37,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=730560.0, ans=0.125 2024-09-19 19:31:48,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=730560.0, ans=0.125 2024-09-19 19:31:51,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=730560.0, ans=0.125 2024-09-19 19:31:54,179 INFO [train.py:1198] (0/2) Epoch 41, batch 1650, loss[loss=0.2437, ctc_loss=0.1172, cr_loss=0.3612, attn_decoder_loss=0.2497, over 29727.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1118, cr_loss=0.3512, attn_decoder_loss=0.2384, over 5757302.44 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:32:03,245 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.587e+01 9.228e+01 9.861e+01 2.680e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-19 19:32:18,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=730640.0, ans=0.0 2024-09-19 19:32:40,403 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:33:01,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=730760.0, ans=0.125 2024-09-19 19:33:11,335 INFO [train.py:1198] (0/2) Epoch 41, batch 1700, loss[loss=0.2131, ctc_loss=0.1037, cr_loss=0.3434, attn_decoder_loss=0.2176, over 29578.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1115, cr_loss=0.3503, attn_decoder_loss=0.2384, over 5779870.64 frames. ], batch size: 69, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:33:11,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730800.0, ans=0.1 2024-09-19 19:33:26,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=730840.0, ans=0.125 2024-09-19 19:33:53,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730880.0, ans=0.1 2024-09-19 19:33:59,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730920.0, ans=0.125 2024-09-19 19:33:59,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=730920.0, ans=0.125 2024-09-19 19:34:26,871 INFO [train.py:1198] (0/2) Epoch 41, batch 1750, loss[loss=0.2022, ctc_loss=0.08466, cr_loss=0.2766, attn_decoder_loss=0.2091, over 29308.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1113, cr_loss=0.3496, attn_decoder_loss=0.2379, over 5788345.34 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:34:35,973 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.612e+01 9.117e+01 9.709e+01 1.098e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-19 19:34:47,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2024-09-19 19:34:54,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=731040.0, ans=0.125 2024-09-19 19:35:06,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=731080.0, ans=0.125 2024-09-19 19:35:21,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=22.5 2024-09-19 19:35:22,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=731120.0, ans=0.125 2024-09-19 19:35:35,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-09-19 19:35:44,369 INFO [train.py:1198] (0/2) Epoch 41, batch 1800, loss[loss=0.2425, ctc_loss=0.1144, cr_loss=0.3708, attn_decoder_loss=0.2485, over 29685.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1117, cr_loss=0.3508, attn_decoder_loss=0.2383, over 5790314.90 frames. ], batch size: 83, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:35:46,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=731200.0, ans=0.0 2024-09-19 19:35:55,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=731200.0, ans=0.0 2024-09-19 19:35:55,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=731200.0, ans=0.2 2024-09-19 19:35:56,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=731200.0, ans=0.0 2024-09-19 19:36:00,200 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=167.07 vs. limit=15.0 2024-09-19 19:36:07,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=731240.0, ans=0.0 2024-09-19 19:36:12,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=731240.0, ans=0.0 2024-09-19 19:36:24,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=731280.0, ans=0.2 2024-09-19 19:36:28,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=731320.0, ans=0.0 2024-09-19 19:36:34,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=731320.0, ans=0.125 2024-09-19 19:36:47,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-09-19 19:37:02,122 INFO [train.py:1198] (0/2) Epoch 41, batch 1850, loss[loss=0.2507, ctc_loss=0.1208, cr_loss=0.3871, attn_decoder_loss=0.2565, over 29625.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1115, cr_loss=0.3506, attn_decoder_loss=0.2381, over 5796257.21 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:37:10,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.675e+01 9.084e+01 9.615e+01 1.395e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-19 19:37:39,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=731480.0, ans=0.125 2024-09-19 19:37:46,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731520.0, ans=0.1 2024-09-19 19:37:48,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=731520.0, ans=0.05 2024-09-19 19:37:49,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.70 vs. limit=10.0 2024-09-19 19:38:00,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=731560.0, ans=0.025 2024-09-19 19:38:08,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=731560.0, ans=0.125 2024-09-19 19:38:14,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=731560.0, ans=0.125 2024-09-19 19:38:14,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-09-19 19:38:17,062 INFO [train.py:1198] (0/2) Epoch 41, batch 1900, loss[loss=0.2478, ctc_loss=0.12, cr_loss=0.3755, attn_decoder_loss=0.2536, over 29695.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1118, cr_loss=0.3517, attn_decoder_loss=0.239, over 5804709.65 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:38:32,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=731640.0, ans=0.1 2024-09-19 19:38:38,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=731640.0, ans=0.125 2024-09-19 19:38:46,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=731680.0, ans=0.125 2024-09-19 19:39:31,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=731760.0, ans=0.125 2024-09-19 19:39:34,508 INFO [train.py:1198] (0/2) Epoch 41, batch 1950, loss[loss=0.23, ctc_loss=0.1162, cr_loss=0.3572, attn_decoder_loss=0.2347, over 29470.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.113, cr_loss=0.3542, attn_decoder_loss=0.2403, over 5819411.55 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 8.0 2024-09-19 19:39:43,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.775e+01 9.303e+01 9.846e+01 2.591e+02, threshold=1.861e+02, percent-clipped=0.0 2024-09-19 19:40:16,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=731880.0, ans=0.125 2024-09-19 19:40:18,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=731920.0, ans=0.0 2024-09-19 19:40:36,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=731960.0, ans=0.125 2024-09-19 19:40:40,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=731960.0, ans=0.125 2024-09-19 19:40:44,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2024-09-19 19:40:47,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-09-19 19:40:51,606 INFO [train.py:1198] (0/2) Epoch 41, batch 2000, loss[loss=0.2124, ctc_loss=0.1015, cr_loss=0.3245, attn_decoder_loss=0.2175, over 29373.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1131, cr_loss=0.3537, attn_decoder_loss=0.2405, over 5796017.08 frames. ], batch size: 67, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:40:59,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=732000.0, ans=0.125 2024-09-19 19:41:36,467 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-09-19 19:41:37,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=732120.0, ans=0.125 2024-09-19 19:42:07,266 INFO [train.py:1198] (0/2) Epoch 41, batch 2050, loss[loss=0.2122, ctc_loss=0.09986, cr_loss=0.3161, attn_decoder_loss=0.2177, over 29444.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1128, cr_loss=0.3528, attn_decoder_loss=0.2395, over 5788810.10 frames. ], batch size: 70, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:42:16,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 8.645e+01 9.096e+01 9.473e+01 4.528e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 19:42:16,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=732200.0, ans=0.125 2024-09-19 19:42:25,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=732240.0, ans=0.0 2024-09-19 19:42:28,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=732240.0, ans=0.0 2024-09-19 19:42:41,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=732280.0, ans=0.0 2024-09-19 19:42:45,671 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:42:48,815 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:42:52,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-09-19 19:42:54,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=732320.0, ans=0.05 2024-09-19 19:43:12,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=732360.0, ans=0.125 2024-09-19 19:43:24,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2024-09-19 19:43:25,409 INFO [train.py:1198] (0/2) Epoch 41, batch 2100, loss[loss=0.2295, ctc_loss=0.1109, cr_loss=0.3556, attn_decoder_loss=0.2348, over 29769.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1119, cr_loss=0.3515, attn_decoder_loss=0.2389, over 5801428.81 frames. ], batch size: 81, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:43:43,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=732440.0, ans=0.125 2024-09-19 19:43:43,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=732440.0, ans=0.125 2024-09-19 19:43:52,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=732440.0, ans=0.125 2024-09-19 19:44:00,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-19 19:44:03,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=732480.0, ans=0.125 2024-09-19 19:44:33,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=732560.0, ans=0.0 2024-09-19 19:44:34,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=22.5 2024-09-19 19:44:42,400 INFO [train.py:1198] (0/2) Epoch 41, batch 2150, loss[loss=0.2297, ctc_loss=0.1109, cr_loss=0.3591, attn_decoder_loss=0.2349, over 29444.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1114, cr_loss=0.3504, attn_decoder_loss=0.2382, over 5816373.07 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:44:48,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=732600.0, ans=0.0 2024-09-19 19:44:50,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=732600.0, ans=0.0 2024-09-19 19:44:51,571 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.227e+01 8.830e+01 9.472e+01 1.149e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-19 19:44:51,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=732600.0, ans=0.125 2024-09-19 19:45:02,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=732640.0, ans=0.0 2024-09-19 19:45:13,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=732680.0, ans=0.0 2024-09-19 19:45:55,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=732760.0, ans=0.0 2024-09-19 19:45:58,476 INFO [train.py:1198] (0/2) Epoch 41, batch 2200, loss[loss=0.2403, ctc_loss=0.1153, cr_loss=0.3581, attn_decoder_loss=0.2462, over 29610.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1117, cr_loss=0.351, attn_decoder_loss=0.2385, over 5812369.48 frames. ], batch size: 86, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:46:12,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=732840.0, ans=0.0 2024-09-19 19:46:51,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=732920.0, ans=0.125 2024-09-19 19:47:16,340 INFO [train.py:1198] (0/2) Epoch 41, batch 2250, loss[loss=0.2354, ctc_loss=0.1137, cr_loss=0.3606, attn_decoder_loss=0.2409, over 29688.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1114, cr_loss=0.351, attn_decoder_loss=0.2383, over 5811424.46 frames. ], batch size: 82, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:47:19,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=733000.0, ans=0.5 2024-09-19 19:47:25,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.546e+01 9.093e+01 9.694e+01 2.560e+02, threshold=1.819e+02, percent-clipped=3.0 2024-09-19 19:47:25,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733000.0, ans=0.125 2024-09-19 19:47:25,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=733000.0, ans=0.0 2024-09-19 19:47:53,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=733080.0, ans=0.125 2024-09-19 19:48:03,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=733120.0, ans=0.1 2024-09-19 19:48:06,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=733120.0, ans=0.0 2024-09-19 19:48:10,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=733120.0, ans=0.125 2024-09-19 19:48:33,637 INFO [train.py:1198] (0/2) Epoch 41, batch 2300, loss[loss=0.2072, ctc_loss=0.09647, cr_loss=0.3145, attn_decoder_loss=0.2125, over 29322.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1111, cr_loss=0.3503, attn_decoder_loss=0.2375, over 5798408.92 frames. ], batch size: 71, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:48:34,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-19 19:48:55,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2024-09-19 19:49:08,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=733280.0, ans=0.95 2024-09-19 19:49:16,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733280.0, ans=0.125 2024-09-19 19:49:19,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=733320.0, ans=0.0 2024-09-19 19:49:49,348 INFO [train.py:1198] (0/2) Epoch 41, batch 2350, loss[loss=0.249, ctc_loss=0.1226, cr_loss=0.3793, attn_decoder_loss=0.2547, over 29674.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1112, cr_loss=0.3501, attn_decoder_loss=0.2376, over 5804755.30 frames. ], batch size: 83, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:49:58,164 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.660e+01 9.088e+01 9.774e+01 1.601e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-19 19:50:04,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=733440.0, ans=0.1 2024-09-19 19:50:16,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=733440.0, ans=0.0 2024-09-19 19:50:19,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.12 vs. limit=22.5 2024-09-19 19:50:21,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=733480.0, ans=0.125 2024-09-19 19:50:24,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=733480.0, ans=0.125 2024-09-19 19:50:28,702 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:50:39,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=733520.0, ans=0.0 2024-09-19 19:51:06,692 INFO [train.py:1198] (0/2) Epoch 41, batch 2400, loss[loss=0.2181, ctc_loss=0.1011, cr_loss=0.3239, attn_decoder_loss=0.2239, over 29539.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1119, cr_loss=0.3517, attn_decoder_loss=0.2383, over 5807978.56 frames. ], batch size: 76, lr: 2.69e-03, grad_scale: 32.0 2024-09-19 19:51:12,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=733600.0, ans=0.125 2024-09-19 19:51:34,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=733640.0, ans=0.95 2024-09-19 19:52:01,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733720.0, ans=0.125 2024-09-19 19:52:02,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=733720.0, ans=0.125 2024-09-19 19:52:09,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=733760.0, ans=0.0 2024-09-19 19:52:18,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=733760.0, ans=0.125 2024-09-19 19:52:24,348 INFO [train.py:1198] (0/2) Epoch 41, batch 2450, loss[loss=0.256, ctc_loss=0.1339, cr_loss=0.4267, attn_decoder_loss=0.26, over 29680.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1123, cr_loss=0.3527, attn_decoder_loss=0.2389, over 5783457.31 frames. ], batch size: 82, lr: 2.69e-03, grad_scale: 16.0 2024-09-19 19:52:33,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=733800.0, ans=0.125 2024-09-19 19:52:34,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.655e+01 9.209e+01 9.754e+01 2.010e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-19 19:53:12,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=733920.0, ans=0.1 2024-09-19 19:53:33,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=733960.0, ans=0.1 2024-09-19 19:53:39,343 INFO [train.py:1198] (0/2) Epoch 41, batch 2500, loss[loss=0.2502, ctc_loss=0.1252, cr_loss=0.3802, attn_decoder_loss=0.2556, over 29619.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1125, cr_loss=0.3536, attn_decoder_loss=0.239, over 5793375.91 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:53:42,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=734000.0, ans=0.0 2024-09-19 19:53:52,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=734000.0, ans=0.0 2024-09-19 19:54:17,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=734080.0, ans=0.125 2024-09-19 19:54:22,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=734080.0, ans=0.035 2024-09-19 19:54:41,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=734160.0, ans=0.125 2024-09-19 19:54:45,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=15.0 2024-09-19 19:54:46,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=734160.0, ans=0.0 2024-09-19 19:54:50,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=734160.0, ans=0.5 2024-09-19 19:54:57,246 INFO [train.py:1198] (0/2) Epoch 41, batch 2550, loss[loss=0.2014, ctc_loss=0.09104, cr_loss=0.3116, attn_decoder_loss=0.2067, over 29368.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1122, cr_loss=0.3533, attn_decoder_loss=0.239, over 5797163.21 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:55:03,622 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:55:07,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.421e+01 8.984e+01 9.489e+01 4.917e+02, threshold=1.797e+02, percent-clipped=1.0 2024-09-19 19:55:24,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734240.0, ans=0.1 2024-09-19 19:55:25,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=734280.0, ans=0.125 2024-09-19 19:55:33,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=734280.0, ans=0.025 2024-09-19 19:55:51,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=734320.0, ans=0.0 2024-09-19 19:55:58,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=734360.0, ans=0.125 2024-09-19 19:55:59,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=734360.0, ans=0.125 2024-09-19 19:56:13,327 INFO [train.py:1198] (0/2) Epoch 41, batch 2600, loss[loss=0.229, ctc_loss=0.1118, cr_loss=0.3474, attn_decoder_loss=0.2343, over 29430.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1122, cr_loss=0.353, attn_decoder_loss=0.239, over 5793901.17 frames. ], batch size: 78, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:56:14,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.72 vs. limit=15.0 2024-09-19 19:56:33,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=734440.0, ans=0.125 2024-09-19 19:56:34,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=734440.0, ans=10.0 2024-09-19 19:56:38,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=734440.0, ans=0.0 2024-09-19 19:56:44,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=734480.0, ans=0.125 2024-09-19 19:57:02,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=734520.0, ans=0.125 2024-09-19 19:57:09,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-19 19:57:15,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734560.0, ans=0.125 2024-09-19 19:57:20,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=734560.0, ans=0.125 2024-09-19 19:57:30,508 INFO [train.py:1198] (0/2) Epoch 41, batch 2650, loss[loss=0.2403, ctc_loss=0.1108, cr_loss=0.3417, attn_decoder_loss=0.2471, over 29266.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1121, cr_loss=0.3532, attn_decoder_loss=0.2393, over 5799790.18 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:57:35,544 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:57:36,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=734600.0, ans=0.2 2024-09-19 19:57:38,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=734600.0, ans=0.0 2024-09-19 19:57:41,310 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.633e+01 9.136e+01 9.710e+01 1.315e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 19:57:52,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=734640.0, ans=0.04949747468305833 2024-09-19 19:58:22,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-09-19 19:58:35,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734760.0, ans=0.1 2024-09-19 19:58:38,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=734760.0, ans=0.125 2024-09-19 19:58:48,358 INFO [train.py:1198] (0/2) Epoch 41, batch 2700, loss[loss=0.2361, ctc_loss=0.1033, cr_loss=0.3239, attn_decoder_loss=0.2436, over 29516.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1122, cr_loss=0.3529, attn_decoder_loss=0.2394, over 5795354.23 frames. ], batch size: 87, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 19:58:56,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=734800.0, ans=0.2 2024-09-19 19:59:00,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734800.0, ans=0.125 2024-09-19 19:59:31,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.64 vs. limit=15.0 2024-09-19 19:59:32,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=734920.0, ans=0.125 2024-09-19 19:59:35,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=734920.0, ans=0.125 2024-09-19 19:59:36,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=734920.0, ans=0.0 2024-09-19 19:59:49,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.68 vs. limit=15.0 2024-09-19 20:00:03,693 INFO [train.py:1198] (0/2) Epoch 41, batch 2750, loss[loss=0.2168, ctc_loss=0.09663, cr_loss=0.3127, attn_decoder_loss=0.2233, over 29530.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1114, cr_loss=0.3508, attn_decoder_loss=0.2384, over 5795139.98 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:00:12,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=735000.0, ans=0.0 2024-09-19 20:00:14,116 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.495e+01 8.920e+01 9.727e+01 1.790e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 20:00:28,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=735040.0, ans=0.04949747468305833 2024-09-19 20:00:32,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-09-19 20:01:20,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=735200.0, ans=0.125 2024-09-19 20:01:21,645 INFO [train.py:1198] (0/2) Epoch 41, batch 2800, loss[loss=0.2517, ctc_loss=0.1334, cr_loss=0.3706, attn_decoder_loss=0.2566, over 20152.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1119, cr_loss=0.3519, attn_decoder_loss=0.2387, over 5776301.33 frames. ], batch size: 209, lr: 2.68e-03, grad_scale: 32.0 2024-09-19 20:01:34,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=735200.0, ans=0.0 2024-09-19 20:01:44,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=735240.0, ans=0.125 2024-09-19 20:01:47,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=735240.0, ans=0.125 2024-09-19 20:01:48,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=735240.0, ans=0.2 2024-09-19 20:01:54,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735280.0, ans=0.1 2024-09-19 20:02:06,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735320.0, ans=0.1 2024-09-19 20:02:08,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=735320.0, ans=0.0 2024-09-19 20:02:38,653 INFO [train.py:1198] (0/2) Epoch 41, batch 2850, loss[loss=0.2214, ctc_loss=0.1019, cr_loss=0.3253, attn_decoder_loss=0.2274, over 29504.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1121, cr_loss=0.352, attn_decoder_loss=0.2392, over 5761712.31 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:02:48,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=735400.0, ans=0.025 2024-09-19 20:02:50,642 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 8.742e+01 9.309e+01 1.007e+02 1.847e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-19 20:03:04,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=735440.0, ans=0.125 2024-09-19 20:03:25,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735520.0, ans=0.125 2024-09-19 20:03:53,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-19 20:03:53,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.07 vs. limit=15.0 2024-09-19 20:03:54,114 INFO [train.py:1198] (0/2) Epoch 41, batch 2900, loss[loss=0.2282, ctc_loss=0.1077, cr_loss=0.3408, attn_decoder_loss=0.2341, over 29423.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1131, cr_loss=0.3545, attn_decoder_loss=0.2405, over 5787572.51 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:04:05,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.39 vs. limit=15.0 2024-09-19 20:04:22,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=735640.0, ans=0.125 2024-09-19 20:04:46,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=735720.0, ans=0.125 2024-09-19 20:05:01,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=735760.0, ans=0.025 2024-09-19 20:05:12,053 INFO [train.py:1198] (0/2) Epoch 41, batch 2950, loss[loss=0.231, ctc_loss=0.113, cr_loss=0.3633, attn_decoder_loss=0.2361, over 29522.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1124, cr_loss=0.3527, attn_decoder_loss=0.2394, over 5781734.07 frames. ], batch size: 75, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:05:15,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=735800.0, ans=0.125 2024-09-19 20:05:24,172 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.387e+01 8.869e+01 9.638e+01 2.369e+02, threshold=1.774e+02, percent-clipped=2.0 2024-09-19 20:05:30,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=735840.0, ans=0.0 2024-09-19 20:05:48,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=735880.0, ans=0.0 2024-09-19 20:05:48,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=735880.0, ans=0.125 2024-09-19 20:06:02,917 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.52 vs. limit=22.5 2024-09-19 20:06:11,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=735960.0, ans=0.125 2024-09-19 20:06:28,787 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-184000.pt 2024-09-19 20:06:37,443 INFO [train.py:1198] (0/2) Epoch 41, batch 3000, loss[loss=0.2456, ctc_loss=0.1209, cr_loss=0.3703, attn_decoder_loss=0.2513, over 29754.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1128, cr_loss=0.3536, attn_decoder_loss=0.2394, over 5781709.65 frames. ], batch size: 81, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:06:37,443 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 20:06:55,725 INFO [train.py:1230] (0/2) Epoch 41, validation: loss=0.2123, ctc_loss=0.03697, cr_loss=6.466e-15, attn_decoder_loss=0.2318, over 944034.00 frames. 2024-09-19 20:06:55,725 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 20:07:12,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=736040.0, ans=0.125 2024-09-19 20:07:26,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=736080.0, ans=0.125 2024-09-19 20:07:32,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736080.0, ans=0.125 2024-09-19 20:07:38,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=736080.0, ans=0.0 2024-09-19 20:07:40,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=736120.0, ans=0.07 2024-09-19 20:07:52,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=736120.0, ans=0.07 2024-09-19 20:08:13,492 INFO [train.py:1198] (0/2) Epoch 41, batch 3050, loss[loss=0.2354, ctc_loss=0.1199, cr_loss=0.3596, attn_decoder_loss=0.2403, over 29509.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1134, cr_loss=0.3551, attn_decoder_loss=0.2401, over 5775946.74 frames. ], batch size: 76, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:08:14,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.30 vs. limit=22.5 2024-09-19 20:08:25,659 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.668e+01 9.193e+01 9.788e+01 2.004e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 20:08:34,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736240.0, ans=0.125 2024-09-19 20:08:38,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=736240.0, ans=0.125 2024-09-19 20:09:23,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-09-19 20:09:28,914 INFO [train.py:1198] (0/2) Epoch 41, batch 3100, loss[loss=0.2426, ctc_loss=0.1186, cr_loss=0.366, attn_decoder_loss=0.2483, over 29273.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.113, cr_loss=0.3539, attn_decoder_loss=0.2396, over 5775907.98 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:09:42,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=736440.0, ans=0.1 2024-09-19 20:09:45,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=736440.0, ans=0.0 2024-09-19 20:09:53,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=736440.0, ans=0.2 2024-09-19 20:10:11,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736480.0, ans=0.1 2024-09-19 20:10:25,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=736520.0, ans=0.125 2024-09-19 20:10:31,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=736560.0, ans=0.1 2024-09-19 20:10:46,493 INFO [train.py:1198] (0/2) Epoch 41, batch 3150, loss[loss=0.237, ctc_loss=0.1126, cr_loss=0.3494, attn_decoder_loss=0.243, over 28926.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1131, cr_loss=0.3541, attn_decoder_loss=0.2398, over 5782412.71 frames. ], batch size: 104, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:10:58,490 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.553e+01 9.133e+01 9.719e+01 1.833e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 20:10:58,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736600.0, ans=0.1 2024-09-19 20:11:03,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=736640.0, ans=0.125 2024-09-19 20:11:12,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=736640.0, ans=0.125 2024-09-19 20:11:27,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=736680.0, ans=0.0 2024-09-19 20:11:49,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736760.0, ans=0.125 2024-09-19 20:12:04,119 INFO [train.py:1198] (0/2) Epoch 41, batch 3200, loss[loss=0.2304, ctc_loss=0.1043, cr_loss=0.3269, attn_decoder_loss=0.2371, over 29432.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1125, cr_loss=0.3532, attn_decoder_loss=0.2392, over 5793452.91 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 32.0 2024-09-19 20:12:07,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736800.0, ans=0.1 2024-09-19 20:12:21,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=736840.0, ans=0.125 2024-09-19 20:12:22,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=736840.0, ans=0.125 2024-09-19 20:12:25,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=736840.0, ans=0.125 2024-09-19 20:12:52,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=12.0 2024-09-19 20:12:57,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736920.0, ans=0.125 2024-09-19 20:13:20,182 INFO [train.py:1198] (0/2) Epoch 41, batch 3250, loss[loss=0.232, ctc_loss=0.1125, cr_loss=0.3434, attn_decoder_loss=0.2377, over 29692.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1128, cr_loss=0.3538, attn_decoder_loss=0.2398, over 5799548.25 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:13:33,816 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.531e+01 9.147e+01 9.717e+01 1.259e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-19 20:13:40,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=737040.0, ans=0.2 2024-09-19 20:13:53,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=737080.0, ans=0.0 2024-09-19 20:14:09,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=737120.0, ans=0.125 2024-09-19 20:14:09,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=737120.0, ans=0.125 2024-09-19 20:14:27,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=737160.0, ans=0.125 2024-09-19 20:14:37,474 INFO [train.py:1198] (0/2) Epoch 41, batch 3300, loss[loss=0.242, ctc_loss=0.1084, cr_loss=0.3539, attn_decoder_loss=0.249, over 28614.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1118, cr_loss=0.3516, attn_decoder_loss=0.2385, over 5797120.50 frames. ], batch size: 112, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:14:44,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737200.0, ans=0.125 2024-09-19 20:15:07,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=737280.0, ans=0.125 2024-09-19 20:15:52,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=737360.0, ans=0.0 2024-09-19 20:15:52,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2024-09-19 20:15:53,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=737400.0, ans=0.125 2024-09-19 20:15:54,696 INFO [train.py:1198] (0/2) Epoch 41, batch 3350, loss[loss=0.2383, ctc_loss=0.1129, cr_loss=0.3516, attn_decoder_loss=0.2444, over 28886.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1125, cr_loss=0.3529, attn_decoder_loss=0.2391, over 5773908.28 frames. ], batch size: 104, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:15:59,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=737400.0, ans=0.0 2024-09-19 20:15:59,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=737400.0, ans=0.2 2024-09-19 20:16:08,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.656e+01 9.093e+01 9.789e+01 1.911e+02, threshold=1.819e+02, percent-clipped=2.0 2024-09-19 20:16:35,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737480.0, ans=0.125 2024-09-19 20:16:46,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=12.0 2024-09-19 20:17:06,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.39 vs. limit=15.0 2024-09-19 20:17:10,491 INFO [train.py:1198] (0/2) Epoch 41, batch 3400, loss[loss=0.2086, ctc_loss=0.09939, cr_loss=0.3146, attn_decoder_loss=0.2137, over 29358.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1125, cr_loss=0.3527, attn_decoder_loss=0.239, over 5766010.78 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:17:15,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=737600.0, ans=0.0 2024-09-19 20:17:15,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737600.0, ans=0.1 2024-09-19 20:17:19,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=737600.0, ans=0.0 2024-09-19 20:17:56,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-09-19 20:17:56,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=737720.0, ans=0.125 2024-09-19 20:18:21,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=737760.0, ans=0.125 2024-09-19 20:18:25,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=12.0 2024-09-19 20:18:26,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=737800.0, ans=10.0 2024-09-19 20:18:28,097 INFO [train.py:1198] (0/2) Epoch 41, batch 3450, loss[loss=0.2445, ctc_loss=0.107, cr_loss=0.3405, attn_decoder_loss=0.2522, over 28187.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1124, cr_loss=0.3526, attn_decoder_loss=0.2392, over 5775145.21 frames. ], batch size: 111, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:18:28,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=737800.0, ans=0.07 2024-09-19 20:18:41,841 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.606e+01 8.497e+01 9.130e+01 9.574e+01 2.613e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-19 20:18:46,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=737840.0, ans=0.125 2024-09-19 20:18:58,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737880.0, ans=0.1 2024-09-19 20:19:20,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-09-19 20:19:27,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.64 vs. limit=22.5 2024-09-19 20:19:33,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=737960.0, ans=0.125 2024-09-19 20:19:43,464 INFO [train.py:1198] (0/2) Epoch 41, batch 3500, loss[loss=0.2161, ctc_loss=0.102, cr_loss=0.3541, attn_decoder_loss=0.2209, over 29335.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1122, cr_loss=0.3524, attn_decoder_loss=0.2388, over 5777714.49 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:19:50,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=738000.0, ans=0.2 2024-09-19 20:20:00,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=738040.0, ans=0.0 2024-09-19 20:20:17,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=738080.0, ans=0.05 2024-09-19 20:20:28,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-19 20:20:35,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.71 vs. limit=15.0 2024-09-19 20:20:42,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738120.0, ans=0.1 2024-09-19 20:20:46,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=738160.0, ans=0.125 2024-09-19 20:20:49,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=738160.0, ans=0.0 2024-09-19 20:20:51,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=738160.0, ans=0.0 2024-09-19 20:20:59,911 INFO [train.py:1198] (0/2) Epoch 41, batch 3550, loss[loss=0.2389, ctc_loss=0.1091, cr_loss=0.3343, attn_decoder_loss=0.2459, over 29708.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1117, cr_loss=0.3513, attn_decoder_loss=0.2384, over 5784314.26 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:21:03,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=738200.0, ans=0.025 2024-09-19 20:21:04,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=738200.0, ans=0.05 2024-09-19 20:21:14,692 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.523e+01 8.996e+01 9.507e+01 2.339e+02, threshold=1.799e+02, percent-clipped=2.0 2024-09-19 20:21:28,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=738280.0, ans=0.0 2024-09-19 20:21:34,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.40 vs. limit=22.5 2024-09-19 20:21:47,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=738320.0, ans=0.125 2024-09-19 20:22:03,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738360.0, ans=0.1 2024-09-19 20:22:14,208 INFO [train.py:1198] (0/2) Epoch 41, batch 3600, loss[loss=0.228, ctc_loss=0.1123, cr_loss=0.3573, attn_decoder_loss=0.2329, over 29518.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1116, cr_loss=0.3513, attn_decoder_loss=0.2386, over 5792162.69 frames. ], batch size: 77, lr: 2.68e-03, grad_scale: 16.0 2024-09-19 20:22:33,362 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.97 vs. limit=10.0 2024-09-19 20:22:33,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=738440.0, ans=0.0 2024-09-19 20:22:50,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=738480.0, ans=0.0 2024-09-19 20:22:55,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=738480.0, ans=0.1 2024-09-19 20:23:07,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=738520.0, ans=0.125 2024-09-19 20:23:17,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=738560.0, ans=0.125 2024-09-19 20:23:30,391 INFO [train.py:1198] (0/2) Epoch 41, batch 3650, loss[loss=0.2548, ctc_loss=0.135, cr_loss=0.4031, attn_decoder_loss=0.2591, over 29507.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1118, cr_loss=0.3516, attn_decoder_loss=0.2382, over 5794309.16 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:23:44,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-19 20:23:46,680 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.451e+01 9.065e+01 9.454e+01 1.125e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-19 20:23:49,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.54 vs. limit=15.0 2024-09-19 20:23:58,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-19 20:24:16,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=738720.0, ans=0.125 2024-09-19 20:24:16,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=738720.0, ans=0.125 2024-09-19 20:24:17,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-09-19 20:24:22,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738720.0, ans=0.1 2024-09-19 20:24:31,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=738760.0, ans=0.0 2024-09-19 20:24:43,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=738800.0, ans=0.0 2024-09-19 20:24:44,919 INFO [train.py:1198] (0/2) Epoch 41, batch 3700, loss[loss=0.2328, ctc_loss=0.11, cr_loss=0.3408, attn_decoder_loss=0.2388, over 29698.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.112, cr_loss=0.3521, attn_decoder_loss=0.2385, over 5804851.69 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:24:48,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=738800.0, ans=0.0 2024-09-19 20:24:48,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-09-19 20:25:53,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=738960.0, ans=0.0 2024-09-19 20:25:58,933 INFO [train.py:1198] (0/2) Epoch 41, batch 3750, loss[loss=0.2126, ctc_loss=0.1055, cr_loss=0.3478, attn_decoder_loss=0.2168, over 29353.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1118, cr_loss=0.3519, attn_decoder_loss=0.2382, over 5808283.97 frames. ], batch size: 67, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:26:11,533 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:26:17,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.549e+01 9.026e+01 9.637e+01 1.696e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-19 20:26:33,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=739080.0, ans=0.0 2024-09-19 20:26:35,009 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:26:35,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=739080.0, ans=0.125 2024-09-19 20:26:44,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=739120.0, ans=0.0 2024-09-19 20:26:50,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-19 20:26:51,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=739120.0, ans=0.125 2024-09-19 20:26:51,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=739120.0, ans=0.125 2024-09-19 20:26:53,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=739120.0, ans=0.05 2024-09-19 20:26:56,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739120.0, ans=0.1 2024-09-19 20:27:14,354 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:27:15,520 INFO [train.py:1198] (0/2) Epoch 41, batch 3800, loss[loss=0.236, ctc_loss=0.1047, cr_loss=0.3276, attn_decoder_loss=0.2433, over 29651.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1117, cr_loss=0.3512, attn_decoder_loss=0.2379, over 5797868.65 frames. ], batch size: 86, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:27:15,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=739200.0, ans=0.025 2024-09-19 20:27:17,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=739200.0, ans=0.0 2024-09-19 20:27:30,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=739240.0, ans=0.125 2024-09-19 20:27:41,222 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:27:53,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.54 vs. limit=15.0 2024-09-19 20:28:11,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=739320.0, ans=0.125 2024-09-19 20:28:17,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-19 20:28:29,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=739400.0, ans=0.2 2024-09-19 20:28:30,237 INFO [train.py:1198] (0/2) Epoch 41, batch 3850, loss[loss=0.2334, ctc_loss=0.1038, cr_loss=0.3285, attn_decoder_loss=0.2405, over 29202.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1114, cr_loss=0.3507, attn_decoder_loss=0.2378, over 5809614.31 frames. ], batch size: 100, lr: 2.68e-03, grad_scale: 8.0 2024-09-19 20:28:37,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=739400.0, ans=0.2 2024-09-19 20:28:45,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.74 vs. limit=12.0 2024-09-19 20:28:47,850 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.446e+01 9.109e+01 9.536e+01 1.999e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-19 20:28:49,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=739440.0, ans=0.0 2024-09-19 20:28:51,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=739440.0, ans=0.025 2024-09-19 20:28:57,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739440.0, ans=0.125 2024-09-19 20:29:03,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=739480.0, ans=0.2 2024-09-19 20:29:04,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=739480.0, ans=0.125 2024-09-19 20:29:08,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=739480.0, ans=0.125 2024-09-19 20:29:15,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=739520.0, ans=0.125 2024-09-19 20:29:46,225 INFO [train.py:1198] (0/2) Epoch 41, batch 3900, loss[loss=0.2392, ctc_loss=0.1087, cr_loss=0.3513, attn_decoder_loss=0.2458, over 29641.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1114, cr_loss=0.3507, attn_decoder_loss=0.2382, over 5814794.75 frames. ], batch size: 86, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:29:46,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=739600.0, ans=0.2 2024-09-19 20:29:52,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=739600.0, ans=0.0 2024-09-19 20:30:13,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2024-09-19 20:30:33,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=739720.0, ans=0.125 2024-09-19 20:30:46,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=739760.0, ans=0.0 2024-09-19 20:30:49,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2024-09-19 20:30:55,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=739760.0, ans=0.07 2024-09-19 20:30:57,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=739760.0, ans=0.0 2024-09-19 20:30:59,903 INFO [train.py:1198] (0/2) Epoch 41, batch 3950, loss[loss=0.2506, ctc_loss=0.1262, cr_loss=0.383, attn_decoder_loss=0.256, over 29454.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1113, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5834225.96 frames. ], batch size: 97, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:31:16,084 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.615e+01 9.061e+01 9.543e+01 2.103e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 20:31:20,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=739840.0, ans=0.125 2024-09-19 20:31:45,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739920.0, ans=0.1 2024-09-19 20:31:51,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=739920.0, ans=0.125 2024-09-19 20:32:08,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=739960.0, ans=0.0 2024-09-19 20:32:12,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=739960.0, ans=0.0 2024-09-19 20:32:15,341 INFO [train.py:1198] (0/2) Epoch 41, batch 4000, loss[loss=0.2185, ctc_loss=0.09627, cr_loss=0.3143, attn_decoder_loss=0.2251, over 29495.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1114, cr_loss=0.3505, attn_decoder_loss=0.2383, over 5810132.87 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:32:21,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=740000.0, ans=0.0 2024-09-19 20:32:22,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=740000.0, ans=0.125 2024-09-19 20:32:27,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=740000.0, ans=0.125 2024-09-19 20:32:37,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=740040.0, ans=0.125 2024-09-19 20:32:42,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=740040.0, ans=0.0 2024-09-19 20:33:01,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=740120.0, ans=0.0 2024-09-19 20:33:01,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740120.0, ans=0.1 2024-09-19 20:33:02,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2024-09-19 20:33:08,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=740120.0, ans=0.0 2024-09-19 20:33:22,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=740160.0, ans=0.0 2024-09-19 20:33:30,563 INFO [train.py:1198] (0/2) Epoch 41, batch 4050, loss[loss=0.2544, ctc_loss=0.1372, cr_loss=0.399, attn_decoder_loss=0.2585, over 20028.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1113, cr_loss=0.3504, attn_decoder_loss=0.2382, over 5795065.28 frames. ], batch size: 209, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:33:35,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=12.0 2024-09-19 20:33:44,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=740240.0, ans=0.125 2024-09-19 20:33:46,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740240.0, ans=0.1 2024-09-19 20:33:48,071 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.566e+01 9.117e+01 9.789e+01 2.862e+02, threshold=1.823e+02, percent-clipped=4.0 2024-09-19 20:33:52,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=740240.0, ans=0.09899494936611666 2024-09-19 20:34:04,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=740280.0, ans=0.025 2024-09-19 20:34:17,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=22.5 2024-09-19 20:34:27,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=740360.0, ans=0.2 2024-09-19 20:34:30,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=740360.0, ans=0.125 2024-09-19 20:34:37,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=740360.0, ans=0.2 2024-09-19 20:34:42,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=740400.0, ans=0.125 2024-09-19 20:34:43,681 INFO [train.py:1198] (0/2) Epoch 41, batch 4100, loss[loss=0.2426, ctc_loss=0.1137, cr_loss=0.3564, attn_decoder_loss=0.249, over 29477.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1117, cr_loss=0.351, attn_decoder_loss=0.2388, over 5790568.40 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:34:54,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=15.0 2024-09-19 20:35:16,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=740480.0, ans=0.0 2024-09-19 20:35:25,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=740480.0, ans=6.0 2024-09-19 20:35:50,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=740560.0, ans=0.1 2024-09-19 20:35:52,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.41 vs. limit=10.0 2024-09-19 20:35:53,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=740560.0, ans=0.025 2024-09-19 20:35:57,578 INFO [train.py:1198] (0/2) Epoch 41, batch 4150, loss[loss=0.2226, ctc_loss=0.1061, cr_loss=0.3337, attn_decoder_loss=0.2281, over 29525.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1115, cr_loss=0.3502, attn_decoder_loss=0.2382, over 5796506.38 frames. ], batch size: 77, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:36:02,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=740600.0, ans=0.0 2024-09-19 20:36:16,236 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.604e+01 9.031e+01 9.625e+01 1.845e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-19 20:36:20,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=740640.0, ans=0.2 2024-09-19 20:36:30,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=740680.0, ans=0.0 2024-09-19 20:36:37,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.29 vs. limit=22.5 2024-09-19 20:37:04,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=740760.0, ans=0.0 2024-09-19 20:37:12,297 INFO [train.py:1198] (0/2) Epoch 41, batch 4200, loss[loss=0.2513, ctc_loss=0.134, cr_loss=0.4121, attn_decoder_loss=0.2552, over 29520.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1124, cr_loss=0.352, attn_decoder_loss=0.2389, over 5798180.06 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:37:30,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=740840.0, ans=0.0 2024-09-19 20:37:33,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=740840.0, ans=0.125 2024-09-19 20:37:34,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=740840.0, ans=0.0 2024-09-19 20:37:37,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=740840.0, ans=0.125 2024-09-19 20:37:37,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=740840.0, ans=0.125 2024-09-19 20:38:26,690 INFO [train.py:1198] (0/2) Epoch 41, batch 4250, loss[loss=0.2273, ctc_loss=0.1036, cr_loss=0.3332, attn_decoder_loss=0.2337, over 29523.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.112, cr_loss=0.351, attn_decoder_loss=0.2388, over 5804132.99 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:38:37,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=741000.0, ans=0.2 2024-09-19 20:38:44,065 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.665e+01 9.196e+01 9.683e+01 5.015e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-19 20:38:46,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-19 20:38:54,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=741080.0, ans=0.0 2024-09-19 20:38:57,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=741080.0, ans=0.0 2024-09-19 20:39:05,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=741080.0, ans=0.125 2024-09-19 20:39:15,808 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-09-19 20:39:20,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2024-09-19 20:39:40,138 INFO [train.py:1198] (0/2) Epoch 41, batch 4300, loss[loss=0.238, ctc_loss=0.1151, cr_loss=0.3399, attn_decoder_loss=0.2441, over 29512.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1121, cr_loss=0.3512, attn_decoder_loss=0.239, over 5792352.73 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:39:49,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.45 vs. limit=15.0 2024-09-19 20:39:57,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=741240.0, ans=0.125 2024-09-19 20:40:12,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=741280.0, ans=0.125 2024-09-19 20:40:16,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=741280.0, ans=0.125 2024-09-19 20:40:22,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=741280.0, ans=0.025 2024-09-19 20:40:33,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=741320.0, ans=0.125 2024-09-19 20:40:55,598 INFO [train.py:1198] (0/2) Epoch 41, batch 4350, loss[loss=0.2515, ctc_loss=0.124, cr_loss=0.3899, attn_decoder_loss=0.257, over 29519.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1146, cr_loss=0.357, attn_decoder_loss=0.2422, over 5795744.81 frames. ], batch size: 97, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:40:58,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2024-09-19 20:41:06,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=741400.0, ans=0.0 2024-09-19 20:41:13,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.893e+01 9.255e+01 9.747e+01 1.701e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-19 20:41:36,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=741480.0, ans=0.125 2024-09-19 20:41:52,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=741560.0, ans=0.07 2024-09-19 20:41:59,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741560.0, ans=0.1 2024-09-19 20:42:08,702 INFO [train.py:1198] (0/2) Epoch 41, batch 4400, loss[loss=0.2443, ctc_loss=0.1307, cr_loss=0.3968, attn_decoder_loss=0.2481, over 27330.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1156, cr_loss=0.3596, attn_decoder_loss=0.2441, over 5767103.52 frames. ], batch size: 124, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:42:13,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=741600.0, ans=0.125 2024-09-19 20:42:27,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741640.0, ans=0.125 2024-09-19 20:42:40,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=741680.0, ans=0.09899494936611666 2024-09-19 20:43:10,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741760.0, ans=0.125 2024-09-19 20:43:16,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=741760.0, ans=0.2 2024-09-19 20:43:22,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=741800.0, ans=0.025 2024-09-19 20:43:23,298 INFO [train.py:1198] (0/2) Epoch 41, batch 4450, loss[loss=0.2496, ctc_loss=0.1312, cr_loss=0.3865, attn_decoder_loss=0.2541, over 20660.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1188, cr_loss=0.3649, attn_decoder_loss=0.2461, over 5579401.19 frames. ], batch size: 209, lr: 2.67e-03, grad_scale: 16.0 2024-09-19 20:43:37,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=741840.0, ans=0.2 2024-09-19 20:43:41,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.090e+01 9.304e+01 9.971e+01 1.121e+02 2.265e+02, threshold=1.994e+02, percent-clipped=2.0 2024-09-19 20:44:38,327 INFO [train.py:1198] (0/2) Epoch 41, batch 4500, loss[loss=0.249, ctc_loss=0.134, cr_loss=0.368, attn_decoder_loss=0.2536, over 20785.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1215, cr_loss=0.3661, attn_decoder_loss=0.2476, over 5239476.48 frames. ], batch size: 210, lr: 2.67e-03, grad_scale: 8.0 2024-09-19 20:44:59,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=742040.0, ans=0.125 2024-09-19 20:45:15,160 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-41.pt 2024-09-19 20:46:06,158 INFO [train.py:1198] (0/2) Epoch 42, batch 0, loss[loss=0.2098, ctc_loss=0.09304, cr_loss=0.304, attn_decoder_loss=0.216, over 29618.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.09304, cr_loss=0.304, attn_decoder_loss=0.216, over 29618.00 frames. ], batch size: 73, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:46:06,159 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 20:46:24,583 INFO [train.py:1230] (0/2) Epoch 42, validation: loss=0.2127, ctc_loss=0.03579, cr_loss=6.428e-15, attn_decoder_loss=0.2324, over 944034.00 frames. 2024-09-19 20:46:24,584 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 20:46:29,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742100.0, ans=0.0 2024-09-19 20:46:32,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=742100.0, ans=0.125 2024-09-19 20:46:36,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=742100.0, ans=0.2 2024-09-19 20:46:55,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.43 vs. limit=22.5 2024-09-19 20:47:10,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=742220.0, ans=0.125 2024-09-19 20:47:19,979 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=22.5 2024-09-19 20:47:21,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-19 20:47:21,856 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 9.381e+01 1.084e+02 1.178e+02 1.554e+02, threshold=2.167e+02, percent-clipped=0.0 2024-09-19 20:47:25,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.27 vs. limit=15.0 2024-09-19 20:47:42,178 INFO [train.py:1198] (0/2) Epoch 42, batch 50, loss[loss=0.2102, ctc_loss=0.09328, cr_loss=0.3127, attn_decoder_loss=0.2162, over 29452.00 frames. ], tot_loss[loss=0.2345, ctc_loss=0.1139, cr_loss=0.3564, attn_decoder_loss=0.24, over 1267510.91 frames. ], batch size: 70, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:47:50,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=742300.0, ans=0.125 2024-09-19 20:48:13,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.63 vs. limit=15.0 2024-09-19 20:48:30,110 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:48:59,798 INFO [train.py:1198] (0/2) Epoch 42, batch 100, loss[loss=0.2223, ctc_loss=0.1037, cr_loss=0.3385, attn_decoder_loss=0.2279, over 29533.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1148, cr_loss=0.3578, attn_decoder_loss=0.2418, over 2251441.59 frames. ], batch size: 76, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:49:03,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-09-19 20:49:10,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=742500.0, ans=0.0 2024-09-19 20:49:20,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=742540.0, ans=0.0 2024-09-19 20:49:28,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=742580.0, ans=0.2 2024-09-19 20:49:30,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.64 vs. limit=22.5 2024-09-19 20:49:56,419 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.687e+01 8.987e+01 9.639e+01 1.254e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-19 20:50:09,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=742660.0, ans=0.0 2024-09-19 20:50:14,292 INFO [train.py:1198] (0/2) Epoch 42, batch 150, loss[loss=0.2129, ctc_loss=0.0977, cr_loss=0.3264, attn_decoder_loss=0.2185, over 29403.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1131, cr_loss=0.3548, attn_decoder_loss=0.24, over 3045153.91 frames. ], batch size: 70, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:50:14,689 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:50:20,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.71 vs. limit=10.0 2024-09-19 20:50:38,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=742740.0, ans=0.125 2024-09-19 20:50:44,617 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:50:47,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742780.0, ans=0.125 2024-09-19 20:50:57,132 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.31 vs. limit=10.0 2024-09-19 20:50:59,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=742820.0, ans=0.125 2024-09-19 20:51:05,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=742820.0, ans=0.1 2024-09-19 20:51:06,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=742820.0, ans=0.0 2024-09-19 20:51:21,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=742860.0, ans=0.0 2024-09-19 20:51:31,401 INFO [train.py:1198] (0/2) Epoch 42, batch 200, loss[loss=0.2486, ctc_loss=0.1238, cr_loss=0.3801, attn_decoder_loss=0.2541, over 27508.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1118, cr_loss=0.3515, attn_decoder_loss=0.2383, over 3657236.37 frames. ], batch size: 124, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:51:57,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.81 vs. limit=12.0 2024-09-19 20:52:15,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-19 20:52:31,003 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.542e+01 9.078e+01 9.443e+01 1.255e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 20:52:31,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=743020.0, ans=0.0 2024-09-19 20:52:49,282 INFO [train.py:1198] (0/2) Epoch 42, batch 250, loss[loss=0.2424, ctc_loss=0.1104, cr_loss=0.3401, attn_decoder_loss=0.2495, over 29306.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1117, cr_loss=0.3513, attn_decoder_loss=0.2382, over 4139721.23 frames. ], batch size: 100, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:52:51,681 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.90 vs. limit=15.0 2024-09-19 20:53:13,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=743140.0, ans=10.0 2024-09-19 20:53:29,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=743180.0, ans=0.0 2024-09-19 20:53:46,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.33 vs. limit=22.5 2024-09-19 20:53:47,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=743220.0, ans=0.125 2024-09-19 20:53:53,081 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:54:04,815 INFO [train.py:1198] (0/2) Epoch 42, batch 300, loss[loss=0.2483, ctc_loss=0.1226, cr_loss=0.3916, attn_decoder_loss=0.2536, over 29519.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1114, cr_loss=0.3512, attn_decoder_loss=0.2382, over 4509588.99 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 16.0 2024-09-19 20:54:05,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=743300.0, ans=0.0 2024-09-19 20:54:06,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743300.0, ans=0.1 2024-09-19 20:54:27,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=743340.0, ans=0.125 2024-09-19 20:54:43,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.59 vs. limit=15.0 2024-09-19 20:54:44,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=743380.0, ans=0.07 2024-09-19 20:54:49,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2024-09-19 20:54:58,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=743420.0, ans=0.0 2024-09-19 20:55:03,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 8.625e+01 9.047e+01 9.646e+01 1.583e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 20:55:07,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=743460.0, ans=0.025 2024-09-19 20:55:17,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=743460.0, ans=0.2 2024-09-19 20:55:22,738 INFO [train.py:1198] (0/2) Epoch 42, batch 350, loss[loss=0.2083, ctc_loss=0.08809, cr_loss=0.3052, attn_decoder_loss=0.2149, over 29319.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1118, cr_loss=0.3521, attn_decoder_loss=0.2385, over 4795361.75 frames. ], batch size: 71, lr: 2.64e-03, grad_scale: 8.0 2024-09-19 20:55:23,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=743500.0, ans=0.125 2024-09-19 20:55:36,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=743540.0, ans=0.09899494936611666 2024-09-19 20:55:58,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=743580.0, ans=0.125 2024-09-19 20:56:14,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=743620.0, ans=0.125 2024-09-19 20:56:17,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=743620.0, ans=0.0 2024-09-19 20:56:19,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-19 20:56:32,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=743660.0, ans=0.2 2024-09-19 20:56:40,157 INFO [train.py:1198] (0/2) Epoch 42, batch 400, loss[loss=0.2411, ctc_loss=0.1157, cr_loss=0.3618, attn_decoder_loss=0.247, over 29710.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1114, cr_loss=0.3516, attn_decoder_loss=0.2382, over 5025712.49 frames. ], batch size: 82, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 20:56:42,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-09-19 20:56:47,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-19 20:57:00,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=743740.0, ans=0.0 2024-09-19 20:57:39,493 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.837e+01 8.484e+01 8.956e+01 9.498e+01 1.659e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 20:57:51,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.30 vs. limit=15.0 2024-09-19 20:57:56,177 INFO [train.py:1198] (0/2) Epoch 42, batch 450, loss[loss=0.2375, ctc_loss=0.1116, cr_loss=0.361, attn_decoder_loss=0.2435, over 29688.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1115, cr_loss=0.3512, attn_decoder_loss=0.2383, over 5185851.18 frames. ], batch size: 83, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 20:58:09,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=743940.0, ans=0.025 2024-09-19 20:58:33,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-19 20:58:34,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=743980.0, ans=0.025 2024-09-19 20:58:56,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-19 20:58:58,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744060.0, ans=0.1 2024-09-19 20:59:12,136 INFO [train.py:1198] (0/2) Epoch 42, batch 500, loss[loss=0.2548, ctc_loss=0.1315, cr_loss=0.3931, attn_decoder_loss=0.2598, over 29441.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1111, cr_loss=0.3504, attn_decoder_loss=0.2375, over 5330611.37 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 20:59:15,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=744100.0, ans=0.125 2024-09-19 20:59:20,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744100.0, ans=0.125 2024-09-19 20:59:27,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-19 20:59:43,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=744180.0, ans=0.025 2024-09-19 20:59:44,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-19 20:59:49,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=744180.0, ans=0.125 2024-09-19 20:59:57,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744180.0, ans=0.1 2024-09-19 21:00:01,025 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:00:07,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744220.0, ans=0.1 2024-09-19 21:00:15,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.359e+01 8.854e+01 9.452e+01 4.385e+02, threshold=1.771e+02, percent-clipped=2.0 2024-09-19 21:00:23,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=744260.0, ans=0.125 2024-09-19 21:00:24,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=744260.0, ans=0.0 2024-09-19 21:00:25,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.68 vs. limit=15.0 2024-09-19 21:00:32,320 INFO [train.py:1198] (0/2) Epoch 42, batch 550, loss[loss=0.2495, ctc_loss=0.1187, cr_loss=0.3615, attn_decoder_loss=0.256, over 28899.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1112, cr_loss=0.3508, attn_decoder_loss=0.2377, over 5423246.40 frames. ], batch size: 104, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:00:34,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744300.0, ans=0.1 2024-09-19 21:00:35,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=744300.0, ans=0.015 2024-09-19 21:00:43,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=744300.0, ans=0.2 2024-09-19 21:00:53,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=744340.0, ans=0.05 2024-09-19 21:00:56,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744340.0, ans=0.125 2024-09-19 21:00:58,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744340.0, ans=0.1 2024-09-19 21:01:20,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744420.0, ans=0.1 2024-09-19 21:01:24,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-19 21:01:26,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-09-19 21:01:42,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=744460.0, ans=0.2 2024-09-19 21:01:47,810 INFO [train.py:1198] (0/2) Epoch 42, batch 600, loss[loss=0.2522, ctc_loss=0.1275, cr_loss=0.384, attn_decoder_loss=0.2575, over 29263.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1113, cr_loss=0.3511, attn_decoder_loss=0.2381, over 5510875.27 frames. ], batch size: 100, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:02:13,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-19 21:02:15,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744540.0, ans=0.1 2024-09-19 21:02:22,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744580.0, ans=0.1 2024-09-19 21:02:47,683 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.400e+01 8.795e+01 9.486e+01 1.602e+02, threshold=1.759e+02, percent-clipped=0.0 2024-09-19 21:03:02,674 INFO [train.py:1198] (0/2) Epoch 42, batch 650, loss[loss=0.2315, ctc_loss=0.1135, cr_loss=0.359, attn_decoder_loss=0.2366, over 29757.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1107, cr_loss=0.3499, attn_decoder_loss=0.2374, over 5587567.90 frames. ], batch size: 81, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:03:10,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=744700.0, ans=0.125 2024-09-19 21:03:36,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=744780.0, ans=0.025 2024-09-19 21:03:41,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=744780.0, ans=0.125 2024-09-19 21:03:44,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744780.0, ans=0.1 2024-09-19 21:03:46,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=744780.0, ans=0.025 2024-09-19 21:03:47,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-09-19 21:03:53,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=744820.0, ans=0.025 2024-09-19 21:04:09,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=744860.0, ans=0.025 2024-09-19 21:04:22,721 INFO [train.py:1198] (0/2) Epoch 42, batch 700, loss[loss=0.2179, ctc_loss=0.09945, cr_loss=0.3067, attn_decoder_loss=0.2243, over 29528.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1107, cr_loss=0.3499, attn_decoder_loss=0.2378, over 5638279.80 frames. ], batch size: 76, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:04:29,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-19 21:04:32,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-19 21:04:54,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=744980.0, ans=0.125 2024-09-19 21:05:23,216 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.486e+01 9.011e+01 9.700e+01 3.654e+02, threshold=1.802e+02, percent-clipped=4.0 2024-09-19 21:05:23,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-09-19 21:05:38,323 INFO [train.py:1198] (0/2) Epoch 42, batch 750, loss[loss=0.2422, ctc_loss=0.1126, cr_loss=0.3744, attn_decoder_loss=0.2483, over 29680.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1105, cr_loss=0.3495, attn_decoder_loss=0.2376, over 5676198.42 frames. ], batch size: 82, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:05:43,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-19 21:05:44,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=745100.0, ans=15.0 2024-09-19 21:06:05,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=12.0 2024-09-19 21:06:39,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=745260.0, ans=0.025 2024-09-19 21:06:53,497 INFO [train.py:1198] (0/2) Epoch 42, batch 800, loss[loss=0.2099, ctc_loss=0.0875, cr_loss=0.2931, attn_decoder_loss=0.217, over 29599.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1108, cr_loss=0.3504, attn_decoder_loss=0.2378, over 5707084.56 frames. ], batch size: 73, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:07:02,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=745300.0, ans=10.0 2024-09-19 21:07:05,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=745300.0, ans=0.1 2024-09-19 21:07:25,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745380.0, ans=0.1 2024-09-19 21:07:26,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-09-19 21:07:59,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.594e+01 9.081e+01 9.628e+01 1.457e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 21:08:07,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=745460.0, ans=0.07 2024-09-19 21:08:12,821 INFO [train.py:1198] (0/2) Epoch 42, batch 850, loss[loss=0.2547, ctc_loss=0.1244, cr_loss=0.388, attn_decoder_loss=0.2605, over 29717.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1106, cr_loss=0.3497, attn_decoder_loss=0.2377, over 5735788.04 frames. ], batch size: 89, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:08:14,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=745500.0, ans=0.125 2024-09-19 21:08:20,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=745500.0, ans=0.025 2024-09-19 21:08:33,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=745540.0, ans=0.0 2024-09-19 21:08:34,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-19 21:08:39,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=745540.0, ans=0.0 2024-09-19 21:08:41,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=745580.0, ans=0.125 2024-09-19 21:08:51,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=745580.0, ans=0.1 2024-09-19 21:08:51,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=745580.0, ans=0.0 2024-09-19 21:09:28,713 INFO [train.py:1198] (0/2) Epoch 42, batch 900, loss[loss=0.222, ctc_loss=0.1062, cr_loss=0.3506, attn_decoder_loss=0.2271, over 29632.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.111, cr_loss=0.3505, attn_decoder_loss=0.238, over 5740733.25 frames. ], batch size: 73, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:09:29,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=745700.0, ans=0.09899494936611666 2024-09-19 21:09:29,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=745700.0, ans=0.0 2024-09-19 21:09:33,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=745700.0, ans=0.0 2024-09-19 21:09:33,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=745700.0, ans=0.125 2024-09-19 21:09:45,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=745740.0, ans=0.0 2024-09-19 21:10:02,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=745780.0, ans=0.125 2024-09-19 21:10:14,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745820.0, ans=0.1 2024-09-19 21:10:17,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=745820.0, ans=0.07 2024-09-19 21:10:21,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=745820.0, ans=0.125 2024-09-19 21:10:23,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=745820.0, ans=0.125 2024-09-19 21:10:30,383 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.573e+01 9.060e+01 9.874e+01 1.680e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-19 21:10:39,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=745860.0, ans=0.125 2024-09-19 21:10:43,724 INFO [train.py:1198] (0/2) Epoch 42, batch 950, loss[loss=0.2184, ctc_loss=0.09903, cr_loss=0.3335, attn_decoder_loss=0.2242, over 29503.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1106, cr_loss=0.3494, attn_decoder_loss=0.2379, over 5744311.20 frames. ], batch size: 74, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:10:44,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=745900.0, ans=0.0 2024-09-19 21:11:17,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=745980.0, ans=0.125 2024-09-19 21:11:39,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-19 21:12:03,071 INFO [train.py:1198] (0/2) Epoch 42, batch 1000, loss[loss=0.2372, ctc_loss=0.1209, cr_loss=0.3881, attn_decoder_loss=0.2415, over 29516.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.3506, attn_decoder_loss=0.2385, over 5737780.72 frames. ], batch size: 77, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:12:27,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=746140.0, ans=0.125 2024-09-19 21:12:36,133 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.81 vs. limit=10.0 2024-09-19 21:13:05,356 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.540e+01 9.060e+01 9.719e+01 2.106e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 21:13:11,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=746260.0, ans=0.125 2024-09-19 21:13:19,000 INFO [train.py:1198] (0/2) Epoch 42, batch 1050, loss[loss=0.2249, ctc_loss=0.1039, cr_loss=0.3351, attn_decoder_loss=0.2309, over 29654.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1107, cr_loss=0.3497, attn_decoder_loss=0.2375, over 5746351.38 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:13:23,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=746300.0, ans=0.125 2024-09-19 21:13:35,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746340.0, ans=0.1 2024-09-19 21:13:44,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=746340.0, ans=0.2 2024-09-19 21:14:23,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=746460.0, ans=0.125 2024-09-19 21:14:23,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-09-19 21:14:35,086 INFO [train.py:1198] (0/2) Epoch 42, batch 1100, loss[loss=0.2213, ctc_loss=0.1062, cr_loss=0.3405, attn_decoder_loss=0.2265, over 29457.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1106, cr_loss=0.3492, attn_decoder_loss=0.2375, over 5757143.64 frames. ], batch size: 78, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:14:46,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=746500.0, ans=0.125 2024-09-19 21:15:06,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=746580.0, ans=0.0 2024-09-19 21:15:15,923 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.95 vs. limit=10.0 2024-09-19 21:15:19,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-09-19 21:15:39,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.586e+01 9.042e+01 9.812e+01 2.400e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 21:15:39,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=746660.0, ans=0.125 2024-09-19 21:15:42,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=746660.0, ans=0.2 2024-09-19 21:15:50,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=746660.0, ans=0.0 2024-09-19 21:15:50,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=746660.0, ans=0.125 2024-09-19 21:15:55,103 INFO [train.py:1198] (0/2) Epoch 42, batch 1150, loss[loss=0.2283, ctc_loss=0.1162, cr_loss=0.3628, attn_decoder_loss=0.2327, over 29457.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1111, cr_loss=0.3501, attn_decoder_loss=0.2378, over 5756536.41 frames. ], batch size: 78, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:15:59,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2024-09-19 21:15:59,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=746700.0, ans=0.2 2024-09-19 21:16:07,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=746700.0, ans=0.125 2024-09-19 21:16:21,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=746740.0, ans=0.025 2024-09-19 21:16:25,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-19 21:16:37,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=746780.0, ans=0.0 2024-09-19 21:16:40,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746820.0, ans=0.0 2024-09-19 21:17:05,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=746860.0, ans=0.125 2024-09-19 21:17:10,789 INFO [train.py:1198] (0/2) Epoch 42, batch 1200, loss[loss=0.2386, ctc_loss=0.1135, cr_loss=0.3529, attn_decoder_loss=0.2447, over 29681.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1115, cr_loss=0.3511, attn_decoder_loss=0.2385, over 5749582.75 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:17:12,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=746900.0, ans=0.125 2024-09-19 21:17:14,171 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:17:14,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=746900.0, ans=10.0 2024-09-19 21:17:18,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=746900.0, ans=0.125 2024-09-19 21:17:19,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=746900.0, ans=0.125 2024-09-19 21:17:29,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=746940.0, ans=0.0 2024-09-19 21:17:40,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=746980.0, ans=0.0 2024-09-19 21:18:08,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=747020.0, ans=0.0 2024-09-19 21:18:13,019 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.675e+01 9.072e+01 9.806e+01 1.661e+02, threshold=1.814e+02, percent-clipped=1.0 2024-09-19 21:18:16,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=747060.0, ans=0.2 2024-09-19 21:18:26,691 INFO [train.py:1198] (0/2) Epoch 42, batch 1250, loss[loss=0.2529, ctc_loss=0.1277, cr_loss=0.3873, attn_decoder_loss=0.2582, over 29506.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.112, cr_loss=0.3526, attn_decoder_loss=0.2393, over 5776729.50 frames. ], batch size: 92, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:18:48,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=747140.0, ans=0.0 2024-09-19 21:19:02,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=747180.0, ans=0.125 2024-09-19 21:19:10,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=747180.0, ans=0.0 2024-09-19 21:19:15,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=747220.0, ans=0.0 2024-09-19 21:19:38,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=747260.0, ans=0.0 2024-09-19 21:19:47,438 INFO [train.py:1198] (0/2) Epoch 42, batch 1300, loss[loss=0.2365, ctc_loss=0.1106, cr_loss=0.3562, attn_decoder_loss=0.2426, over 28169.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1117, cr_loss=0.3522, attn_decoder_loss=0.2389, over 5780111.14 frames. ], batch size: 111, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:20:50,662 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.538e+01 9.081e+01 9.476e+01 1.507e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-19 21:20:59,451 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-09-19 21:21:02,966 INFO [train.py:1198] (0/2) Epoch 42, batch 1350, loss[loss=0.2298, ctc_loss=0.1121, cr_loss=0.3579, attn_decoder_loss=0.235, over 29748.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.3514, attn_decoder_loss=0.2384, over 5795691.94 frames. ], batch size: 81, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:21:03,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.61 vs. limit=15.0 2024-09-19 21:21:14,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=12.0 2024-09-19 21:21:16,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=747540.0, ans=0.125 2024-09-19 21:21:46,221 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:21:46,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=747620.0, ans=0.125 2024-09-19 21:21:56,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=747620.0, ans=0.125 2024-09-19 21:22:16,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=747700.0, ans=0.0 2024-09-19 21:22:17,780 INFO [train.py:1198] (0/2) Epoch 42, batch 1400, loss[loss=0.2021, ctc_loss=0.08377, cr_loss=0.2874, attn_decoder_loss=0.2089, over 29587.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.3513, attn_decoder_loss=0.2384, over 5806863.25 frames. ], batch size: 69, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:22:19,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=747700.0, ans=0.2 2024-09-19 21:22:45,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=747740.0, ans=0.2 2024-09-19 21:22:50,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=747780.0, ans=0.1 2024-09-19 21:23:03,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=22.5 2024-09-19 21:23:14,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=747820.0, ans=0.125 2024-09-19 21:23:16,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=747820.0, ans=0.2 2024-09-19 21:23:23,217 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.442e+01 9.058e+01 9.585e+01 2.575e+02, threshold=1.812e+02, percent-clipped=1.0 2024-09-19 21:23:35,307 INFO [train.py:1198] (0/2) Epoch 42, batch 1450, loss[loss=0.2524, ctc_loss=0.1289, cr_loss=0.4019, attn_decoder_loss=0.2572, over 29440.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1114, cr_loss=0.3517, attn_decoder_loss=0.2387, over 5803137.06 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:23:39,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2024-09-19 21:23:52,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=747940.0, ans=10.0 2024-09-19 21:23:56,055 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:24:05,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=747940.0, ans=0.0 2024-09-19 21:24:07,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=747980.0, ans=0.125 2024-09-19 21:24:12,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747980.0, ans=0.1 2024-09-19 21:24:12,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=747980.0, ans=0.025 2024-09-19 21:24:17,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=747980.0, ans=0.0 2024-09-19 21:24:18,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=747980.0, ans=0.0 2024-09-19 21:24:48,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=748060.0, ans=0.125 2024-09-19 21:24:53,283 INFO [train.py:1198] (0/2) Epoch 42, batch 1500, loss[loss=0.2412, ctc_loss=0.1082, cr_loss=0.335, attn_decoder_loss=0.2485, over 29649.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1115, cr_loss=0.3515, attn_decoder_loss=0.2391, over 5804244.94 frames. ], batch size: 86, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:25:57,691 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.590e+01 8.992e+01 9.499e+01 3.130e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 21:26:00,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.13 vs. limit=15.0 2024-09-19 21:26:09,801 INFO [train.py:1198] (0/2) Epoch 42, batch 1550, loss[loss=0.251, ctc_loss=0.1291, cr_loss=0.4052, attn_decoder_loss=0.2556, over 29475.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1118, cr_loss=0.3516, attn_decoder_loss=0.2392, over 5780668.75 frames. ], batch size: 90, lr: 2.63e-03, grad_scale: 8.0 2024-09-19 21:26:16,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2024-09-19 21:26:19,017 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:26:25,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=748340.0, ans=0.125 2024-09-19 21:26:34,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=748340.0, ans=0.0 2024-09-19 21:26:37,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=748340.0, ans=0.125 2024-09-19 21:26:42,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748380.0, ans=0.125 2024-09-19 21:26:45,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748380.0, ans=0.1 2024-09-19 21:27:09,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=748420.0, ans=0.125 2024-09-19 21:27:22,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=748460.0, ans=0.025 2024-09-19 21:27:26,805 INFO [train.py:1198] (0/2) Epoch 42, batch 1600, loss[loss=0.2404, ctc_loss=0.1137, cr_loss=0.3666, attn_decoder_loss=0.2463, over 29688.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1116, cr_loss=0.3509, attn_decoder_loss=0.2389, over 5763146.36 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:27:27,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=748500.0, ans=0.125 2024-09-19 21:27:28,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=748500.0, ans=0.0 2024-09-19 21:27:50,110 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:27:51,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748540.0, ans=0.1 2024-09-19 21:27:52,024 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-19 21:28:08,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2024-09-19 21:28:15,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748620.0, ans=0.125 2024-09-19 21:28:23,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=748620.0, ans=0.125 2024-09-19 21:28:32,061 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.535e+01 9.042e+01 9.603e+01 1.807e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-19 21:28:43,819 INFO [train.py:1198] (0/2) Epoch 42, batch 1650, loss[loss=0.2301, ctc_loss=0.09968, cr_loss=0.3373, attn_decoder_loss=0.2371, over 29712.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.35, attn_decoder_loss=0.2385, over 5759972.55 frames. ], batch size: 89, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:29:04,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=748740.0, ans=0.125 2024-09-19 21:29:13,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=748780.0, ans=0.125 2024-09-19 21:29:20,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=748780.0, ans=0.0 2024-09-19 21:29:47,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=748860.0, ans=0.0 2024-09-19 21:29:50,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2024-09-19 21:29:51,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=22.5 2024-09-19 21:29:51,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=748860.0, ans=0.0 2024-09-19 21:29:54,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-19 21:29:59,147 INFO [train.py:1198] (0/2) Epoch 42, batch 1700, loss[loss=0.2115, ctc_loss=0.1016, cr_loss=0.3279, attn_decoder_loss=0.2164, over 29563.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1109, cr_loss=0.3497, attn_decoder_loss=0.2382, over 5781003.62 frames. ], batch size: 69, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:29:59,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=748900.0, ans=0.2 2024-09-19 21:31:04,408 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.510e+01 9.136e+01 9.466e+01 1.659e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 21:31:16,664 INFO [train.py:1198] (0/2) Epoch 42, batch 1750, loss[loss=0.21, ctc_loss=0.09229, cr_loss=0.3041, attn_decoder_loss=0.2164, over 29334.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1107, cr_loss=0.3494, attn_decoder_loss=0.2378, over 5788122.25 frames. ], batch size: 67, lr: 2.63e-03, grad_scale: 16.0 2024-09-19 21:31:30,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=749140.0, ans=0.1 2024-09-19 21:31:32,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749140.0, ans=0.1 2024-09-19 21:31:40,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2024-09-19 21:31:44,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749140.0, ans=0.125 2024-09-19 21:31:52,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=749180.0, ans=0.125 2024-09-19 21:31:54,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=749180.0, ans=0.125 2024-09-19 21:32:00,092 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=749180.0, ans=0.2 2024-09-19 21:32:09,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749220.0, ans=0.125 2024-09-19 21:32:16,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=749220.0, ans=0.125 2024-09-19 21:32:27,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-09-19 21:32:34,212 INFO [train.py:1198] (0/2) Epoch 42, batch 1800, loss[loss=0.2357, ctc_loss=0.1138, cr_loss=0.3623, attn_decoder_loss=0.2412, over 29672.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1109, cr_loss=0.3496, attn_decoder_loss=0.2379, over 5791465.46 frames. ], batch size: 83, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:32:34,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=749300.0, ans=0.0 2024-09-19 21:33:13,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749380.0, ans=0.1 2024-09-19 21:33:20,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=749420.0, ans=0.02 2024-09-19 21:33:35,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=749460.0, ans=0.0 2024-09-19 21:33:36,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=749460.0, ans=0.125 2024-09-19 21:33:39,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.440e+01 8.862e+01 9.428e+01 1.419e+02, threshold=1.772e+02, percent-clipped=0.0 2024-09-19 21:33:47,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-19 21:33:49,906 INFO [train.py:1198] (0/2) Epoch 42, batch 1850, loss[loss=0.2486, ctc_loss=0.1166, cr_loss=0.3579, attn_decoder_loss=0.2553, over 29643.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1107, cr_loss=0.3493, attn_decoder_loss=0.2378, over 5796765.49 frames. ], batch size: 86, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:33:50,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2024-09-19 21:34:23,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=749580.0, ans=0.125 2024-09-19 21:34:35,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2024-09-19 21:34:38,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.57 vs. limit=10.0 2024-09-19 21:34:45,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=749620.0, ans=0.0 2024-09-19 21:35:01,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=749660.0, ans=0.125 2024-09-19 21:35:07,214 INFO [train.py:1198] (0/2) Epoch 42, batch 1900, loss[loss=0.2425, ctc_loss=0.1143, cr_loss=0.3543, attn_decoder_loss=0.2488, over 29686.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.3501, attn_decoder_loss=0.2384, over 5804347.85 frames. ], batch size: 89, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:35:07,674 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:35:13,671 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:35:14,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749700.0, ans=0.125 2024-09-19 21:35:49,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-19 21:35:57,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-09-19 21:35:59,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=749820.0, ans=0.125 2024-09-19 21:36:14,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.827e+01 8.670e+01 9.049e+01 9.659e+01 1.303e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-19 21:36:22,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=749860.0, ans=0.125 2024-09-19 21:36:24,972 INFO [train.py:1198] (0/2) Epoch 42, batch 1950, loss[loss=0.2319, ctc_loss=0.1102, cr_loss=0.3595, attn_decoder_loss=0.2374, over 29445.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1118, cr_loss=0.3522, attn_decoder_loss=0.2396, over 5819765.80 frames. ], batch size: 78, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:36:28,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749900.0, ans=0.1 2024-09-19 21:36:40,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=749940.0, ans=0.025 2024-09-19 21:36:49,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=749940.0, ans=0.95 2024-09-19 21:36:58,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=749980.0, ans=0.0 2024-09-19 21:36:58,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=749980.0, ans=0.125 2024-09-19 21:37:08,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=750020.0, ans=0.0 2024-09-19 21:37:40,250 INFO [train.py:1198] (0/2) Epoch 42, batch 2000, loss[loss=0.204, ctc_loss=0.08689, cr_loss=0.2763, attn_decoder_loss=0.2108, over 29326.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1125, cr_loss=0.353, attn_decoder_loss=0.2399, over 5796244.49 frames. ], batch size: 67, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:38:04,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=750140.0, ans=0.125 2024-09-19 21:38:15,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2024-09-19 21:38:25,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=750180.0, ans=0.125 2024-09-19 21:38:36,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-19 21:38:39,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2024-09-19 21:38:47,687 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.670e+01 9.136e+01 9.850e+01 1.573e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-19 21:38:58,270 INFO [train.py:1198] (0/2) Epoch 42, batch 2050, loss[loss=0.2182, ctc_loss=0.1036, cr_loss=0.328, attn_decoder_loss=0.2237, over 29473.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1119, cr_loss=0.3518, attn_decoder_loss=0.2389, over 5787086.27 frames. ], batch size: 70, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:39:19,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=750340.0, ans=0.0 2024-09-19 21:39:50,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2024-09-19 21:39:59,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750460.0, ans=0.125 2024-09-19 21:40:16,258 INFO [train.py:1198] (0/2) Epoch 42, batch 2100, loss[loss=0.2282, ctc_loss=0.1096, cr_loss=0.358, attn_decoder_loss=0.2334, over 29772.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1115, cr_loss=0.351, attn_decoder_loss=0.2384, over 5799344.87 frames. ], batch size: 81, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:40:18,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-19 21:40:39,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750540.0, ans=0.1 2024-09-19 21:41:17,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=750660.0, ans=0.125 2024-09-19 21:41:20,506 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.604e+01 9.019e+01 9.390e+01 1.185e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-19 21:41:20,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=750660.0, ans=0.125 2024-09-19 21:41:22,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=750660.0, ans=0.2 2024-09-19 21:41:31,113 INFO [train.py:1198] (0/2) Epoch 42, batch 2150, loss[loss=0.2189, ctc_loss=0.1024, cr_loss=0.3288, attn_decoder_loss=0.2245, over 29460.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1106, cr_loss=0.3494, attn_decoder_loss=0.2376, over 5814307.82 frames. ], batch size: 78, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:41:57,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=750740.0, ans=0.0 2024-09-19 21:42:11,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750780.0, ans=0.1 2024-09-19 21:42:23,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.28 vs. limit=12.0 2024-09-19 21:42:41,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=750860.0, ans=0.125 2024-09-19 21:42:48,194 INFO [train.py:1198] (0/2) Epoch 42, batch 2200, loss[loss=0.2432, ctc_loss=0.118, cr_loss=0.3714, attn_decoder_loss=0.2488, over 29603.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1108, cr_loss=0.3497, attn_decoder_loss=0.2376, over 5810960.49 frames. ], batch size: 86, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:42:56,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=750900.0, ans=0.2 2024-09-19 21:43:06,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=750940.0, ans=0.125 2024-09-19 21:43:08,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=750940.0, ans=15.0 2024-09-19 21:43:09,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=750940.0, ans=0.125 2024-09-19 21:43:46,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=751020.0, ans=0.125 2024-09-19 21:43:55,415 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.242e+01 8.649e+01 8.991e+01 9.667e+01 4.201e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 21:44:01,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=751060.0, ans=0.0 2024-09-19 21:44:06,040 INFO [train.py:1198] (0/2) Epoch 42, batch 2250, loss[loss=0.2317, ctc_loss=0.107, cr_loss=0.335, attn_decoder_loss=0.2381, over 29700.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1109, cr_loss=0.35, attn_decoder_loss=0.2379, over 5811247.25 frames. ], batch size: 82, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:44:39,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=751180.0, ans=0.125 2024-09-19 21:45:04,213 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2024-09-19 21:45:09,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=751260.0, ans=0.125 2024-09-19 21:45:11,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=751260.0, ans=0.125 2024-09-19 21:45:21,542 INFO [train.py:1198] (0/2) Epoch 42, batch 2300, loss[loss=0.2105, ctc_loss=0.09396, cr_loss=0.3023, attn_decoder_loss=0.2167, over 29321.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1105, cr_loss=0.3486, attn_decoder_loss=0.2372, over 5797525.02 frames. ], batch size: 71, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:45:23,904 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-09-19 21:45:32,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751300.0, ans=0.1 2024-09-19 21:45:41,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=751340.0, ans=0.125 2024-09-19 21:46:28,380 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 8.459e+01 9.034e+01 9.702e+01 2.715e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-19 21:46:39,169 INFO [train.py:1198] (0/2) Epoch 42, batch 2350, loss[loss=0.2411, ctc_loss=0.1139, cr_loss=0.3556, attn_decoder_loss=0.2473, over 29684.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.111, cr_loss=0.3498, attn_decoder_loss=0.2375, over 5802071.55 frames. ], batch size: 83, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:46:55,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.74 vs. limit=10.0 2024-09-19 21:47:09,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=751580.0, ans=0.5 2024-09-19 21:47:14,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=751580.0, ans=0.125 2024-09-19 21:47:29,873 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:47:44,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=751660.0, ans=0.125 2024-09-19 21:47:56,843 INFO [train.py:1198] (0/2) Epoch 42, batch 2400, loss[loss=0.2237, ctc_loss=0.1045, cr_loss=0.337, attn_decoder_loss=0.2295, over 29526.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1107, cr_loss=0.3495, attn_decoder_loss=0.2376, over 5806722.72 frames. ], batch size: 76, lr: 2.62e-03, grad_scale: 32.0 2024-09-19 21:48:27,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.98 vs. limit=22.5 2024-09-19 21:49:02,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=751860.0, ans=0.0 2024-09-19 21:49:03,289 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.663e+01 9.186e+01 9.777e+01 4.524e+02, threshold=1.837e+02, percent-clipped=1.0 2024-09-19 21:49:12,375 INFO [train.py:1198] (0/2) Epoch 42, batch 2450, loss[loss=0.2321, ctc_loss=0.1093, cr_loss=0.3427, attn_decoder_loss=0.2381, over 29730.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1112, cr_loss=0.3506, attn_decoder_loss=0.2382, over 5784150.28 frames. ], batch size: 82, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:49:14,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-09-19 21:49:19,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=751900.0, ans=0.125 2024-09-19 21:49:21,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=751900.0, ans=0.0 2024-09-19 21:49:51,019 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-188000.pt 2024-09-19 21:50:07,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=752020.0, ans=0.125 2024-09-19 21:50:09,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.05 vs. limit=10.0 2024-09-19 21:50:11,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=752020.0, ans=0.125 2024-09-19 21:50:19,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=752020.0, ans=0.125 2024-09-19 21:50:26,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.38 vs. limit=22.5 2024-09-19 21:50:31,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=752060.0, ans=0.2 2024-09-19 21:50:37,334 INFO [train.py:1198] (0/2) Epoch 42, batch 2500, loss[loss=0.2411, ctc_loss=0.1101, cr_loss=0.3404, attn_decoder_loss=0.2481, over 29646.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1113, cr_loss=0.3507, attn_decoder_loss=0.2382, over 5794787.95 frames. ], batch size: 86, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:50:40,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=752100.0, ans=0.125 2024-09-19 21:50:54,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752140.0, ans=0.1 2024-09-19 21:50:54,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=752140.0, ans=0.2 2024-09-19 21:51:03,361 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:51:04,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2024-09-19 21:51:16,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=22.5 2024-09-19 21:51:29,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=752220.0, ans=0.125 2024-09-19 21:51:46,105 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 8.685e+01 9.215e+01 9.799e+01 2.260e+02, threshold=1.843e+02, percent-clipped=2.0 2024-09-19 21:51:47,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=752260.0, ans=0.125 2024-09-19 21:51:55,267 INFO [train.py:1198] (0/2) Epoch 42, batch 2550, loss[loss=0.2161, ctc_loss=0.1068, cr_loss=0.3292, attn_decoder_loss=0.2209, over 29396.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1115, cr_loss=0.351, attn_decoder_loss=0.2386, over 5797389.70 frames. ], batch size: 67, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:51:58,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=752300.0, ans=0.125 2024-09-19 21:52:03,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.45 vs. limit=15.0 2024-09-19 21:52:18,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-19 21:52:31,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=752380.0, ans=0.025 2024-09-19 21:52:38,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=22.5 2024-09-19 21:52:45,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=752420.0, ans=0.125 2024-09-19 21:52:52,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=752420.0, ans=0.0 2024-09-19 21:53:01,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=752460.0, ans=0.125 2024-09-19 21:53:04,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-09-19 21:53:10,741 INFO [train.py:1198] (0/2) Epoch 42, batch 2600, loss[loss=0.2296, ctc_loss=0.1128, cr_loss=0.3579, attn_decoder_loss=0.2346, over 29446.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1115, cr_loss=0.3509, attn_decoder_loss=0.2389, over 5794840.18 frames. ], batch size: 78, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:53:11,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=752500.0, ans=0.2 2024-09-19 21:53:12,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=752500.0, ans=0.0 2024-09-19 21:53:13,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=752500.0, ans=0.125 2024-09-19 21:53:14,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=752500.0, ans=0.125 2024-09-19 21:53:35,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=752540.0, ans=0.125 2024-09-19 21:53:35,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752540.0, ans=0.1 2024-09-19 21:53:43,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=752580.0, ans=0.125 2024-09-19 21:53:56,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=752620.0, ans=0.025 2024-09-19 21:54:10,251 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:54:10,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=752620.0, ans=0.125 2024-09-19 21:54:14,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=752660.0, ans=0.125 2024-09-19 21:54:18,979 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.622e+01 9.143e+01 9.724e+01 1.437e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-19 21:54:26,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=752700.0, ans=0.125 2024-09-19 21:54:27,800 INFO [train.py:1198] (0/2) Epoch 42, batch 2650, loss[loss=0.2486, ctc_loss=0.1187, cr_loss=0.3773, attn_decoder_loss=0.2546, over 29294.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1115, cr_loss=0.3512, attn_decoder_loss=0.239, over 5801135.67 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:54:34,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=752700.0, ans=0.035 2024-09-19 21:54:41,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=752740.0, ans=0.125 2024-09-19 21:54:58,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=752780.0, ans=0.125 2024-09-19 21:55:00,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2024-09-19 21:55:04,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=752780.0, ans=0.0 2024-09-19 21:55:06,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=752780.0, ans=0.125 2024-09-19 21:55:18,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=752820.0, ans=0.125 2024-09-19 21:55:45,576 INFO [train.py:1198] (0/2) Epoch 42, batch 2700, loss[loss=0.2409, ctc_loss=0.1156, cr_loss=0.3702, attn_decoder_loss=0.2466, over 29525.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1121, cr_loss=0.3524, attn_decoder_loss=0.2396, over 5797041.63 frames. ], batch size: 87, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:56:08,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752940.0, ans=0.1 2024-09-19 21:56:08,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=752940.0, ans=0.0 2024-09-19 21:56:20,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752980.0, ans=0.1 2024-09-19 21:56:26,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=752980.0, ans=0.125 2024-09-19 21:56:48,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=753060.0, ans=0.125 2024-09-19 21:56:50,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=753060.0, ans=0.2 2024-09-19 21:56:51,999 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.707e+01 9.259e+01 9.781e+01 2.020e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-19 21:56:55,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-09-19 21:57:01,154 INFO [train.py:1198] (0/2) Epoch 42, batch 2750, loss[loss=0.2331, ctc_loss=0.115, cr_loss=0.3678, attn_decoder_loss=0.2381, over 29524.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1113, cr_loss=0.3511, attn_decoder_loss=0.2385, over 5796874.40 frames. ], batch size: 75, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 21:57:03,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753100.0, ans=0.1 2024-09-19 21:57:16,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2024-09-19 21:57:21,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-09-19 21:57:23,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753140.0, ans=0.125 2024-09-19 21:57:25,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=12.0 2024-09-19 21:57:39,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753180.0, ans=0.1 2024-09-19 21:57:40,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-09-19 21:58:03,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=753260.0, ans=0.0 2024-09-19 21:58:05,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=753260.0, ans=0.0 2024-09-19 21:58:05,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=753260.0, ans=0.0 2024-09-19 21:58:18,578 INFO [train.py:1198] (0/2) Epoch 42, batch 2800, loss[loss=0.2636, ctc_loss=0.1503, cr_loss=0.4094, attn_decoder_loss=0.2671, over 19953.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1115, cr_loss=0.3513, attn_decoder_loss=0.2385, over 5778317.23 frames. ], batch size: 209, lr: 2.62e-03, grad_scale: 32.0 2024-09-19 21:58:26,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.59 vs. limit=15.0 2024-09-19 21:58:44,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753340.0, ans=0.1 2024-09-19 21:58:47,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753380.0, ans=0.1 2024-09-19 21:58:51,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=753380.0, ans=0.125 2024-09-19 21:59:05,848 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:59:11,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=753420.0, ans=0.035 2024-09-19 21:59:13,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=753420.0, ans=0.0 2024-09-19 21:59:15,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-19 21:59:22,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=753460.0, ans=0.125 2024-09-19 21:59:29,300 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.790e+01 9.273e+01 9.887e+01 2.081e+02, threshold=1.855e+02, percent-clipped=1.0 2024-09-19 21:59:32,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=753460.0, ans=0.125 2024-09-19 21:59:34,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=753500.0, ans=0.125 2024-09-19 21:59:35,331 INFO [train.py:1198] (0/2) Epoch 42, batch 2850, loss[loss=0.2266, ctc_loss=0.1031, cr_loss=0.3346, attn_decoder_loss=0.2329, over 29516.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1117, cr_loss=0.3517, attn_decoder_loss=0.2389, over 5763642.41 frames. ], batch size: 77, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 21:59:47,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=753500.0, ans=0.125 2024-09-19 21:59:47,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=753500.0, ans=0.2 2024-09-19 21:59:49,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=753540.0, ans=0.0 2024-09-19 22:00:06,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-09-19 22:00:33,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=753620.0, ans=0.125 2024-09-19 22:00:37,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.76 vs. limit=15.0 2024-09-19 22:00:41,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=753660.0, ans=0.07 2024-09-19 22:00:46,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753660.0, ans=0.1 2024-09-19 22:00:51,145 INFO [train.py:1198] (0/2) Epoch 42, batch 2900, loss[loss=0.2267, ctc_loss=0.1032, cr_loss=0.3418, attn_decoder_loss=0.2328, over 29425.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1125, cr_loss=0.3539, attn_decoder_loss=0.2401, over 5788719.71 frames. ], batch size: 79, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:01:16,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=753740.0, ans=0.2 2024-09-19 22:01:31,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=753780.0, ans=0.2 2024-09-19 22:01:43,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=753820.0, ans=0.125 2024-09-19 22:01:45,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=753820.0, ans=0.0 2024-09-19 22:01:46,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=753820.0, ans=0.0 2024-09-19 22:01:54,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-19 22:02:02,879 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.715e+01 9.227e+01 9.833e+01 2.599e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-19 22:02:08,881 INFO [train.py:1198] (0/2) Epoch 42, batch 2950, loss[loss=0.2166, ctc_loss=0.1042, cr_loss=0.3422, attn_decoder_loss=0.2215, over 29513.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1118, cr_loss=0.3522, attn_decoder_loss=0.2389, over 5783447.02 frames. ], batch size: 75, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:02:18,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=753900.0, ans=0.125 2024-09-19 22:02:33,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=753940.0, ans=0.1 2024-09-19 22:02:39,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=753980.0, ans=0.025 2024-09-19 22:02:55,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=754020.0, ans=0.125 2024-09-19 22:03:08,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-19 22:03:26,591 INFO [train.py:1198] (0/2) Epoch 42, batch 3000, loss[loss=0.2382, ctc_loss=0.117, cr_loss=0.3576, attn_decoder_loss=0.2438, over 29755.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1119, cr_loss=0.3523, attn_decoder_loss=0.2388, over 5783881.88 frames. ], batch size: 81, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:03:26,592 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 22:03:36,046 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2613, 3.9431, 3.7381, 3.3029], device='cuda:0') 2024-09-19 22:03:44,994 INFO [train.py:1230] (0/2) Epoch 42, validation: loss=0.212, ctc_loss=0.03659, cr_loss=6.044e-15, attn_decoder_loss=0.2315, over 944034.00 frames. 2024-09-19 22:03:44,995 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 22:04:18,002 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=22.5 2024-09-19 22:04:38,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=754220.0, ans=0.0 2024-09-19 22:04:40,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=754220.0, ans=0.05 2024-09-19 22:04:56,824 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 8.635e+01 9.210e+01 9.879e+01 1.269e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-19 22:05:03,034 INFO [train.py:1198] (0/2) Epoch 42, batch 3050, loss[loss=0.2186, ctc_loss=0.09853, cr_loss=0.3277, attn_decoder_loss=0.2246, over 29533.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1121, cr_loss=0.3527, attn_decoder_loss=0.2394, over 5777161.23 frames. ], batch size: 76, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:05:04,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=754300.0, ans=0.125 2024-09-19 22:05:19,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754340.0, ans=0.125 2024-09-19 22:05:52,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-09-19 22:06:18,289 INFO [train.py:1198] (0/2) Epoch 42, batch 3100, loss[loss=0.2378, ctc_loss=0.1139, cr_loss=0.33, attn_decoder_loss=0.2442, over 29304.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1119, cr_loss=0.352, attn_decoder_loss=0.239, over 5776917.96 frames. ], batch size: 100, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:06:29,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2024-09-19 22:07:05,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=754620.0, ans=0.0 2024-09-19 22:07:12,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=22.5 2024-09-19 22:07:16,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=754620.0, ans=0.125 2024-09-19 22:07:20,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=754660.0, ans=0.125 2024-09-19 22:07:30,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.575e+01 9.075e+01 9.708e+01 6.330e+02, threshold=1.815e+02, percent-clipped=2.0 2024-09-19 22:07:36,120 INFO [train.py:1198] (0/2) Epoch 42, batch 3150, loss[loss=0.2437, ctc_loss=0.118, cr_loss=0.3659, attn_decoder_loss=0.2496, over 28882.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1117, cr_loss=0.3515, attn_decoder_loss=0.2389, over 5783706.09 frames. ], batch size: 104, lr: 2.62e-03, grad_scale: 8.0 2024-09-19 22:07:49,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754740.0, ans=0.1 2024-09-19 22:08:03,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=754740.0, ans=0.0 2024-09-19 22:08:14,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=754780.0, ans=0.125 2024-09-19 22:08:14,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-09-19 22:08:26,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.93 vs. limit=22.5 2024-09-19 22:08:40,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=754860.0, ans=0.0 2024-09-19 22:08:46,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=754860.0, ans=0.0 2024-09-19 22:08:47,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=754860.0, ans=0.0 2024-09-19 22:08:52,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754900.0, ans=0.1 2024-09-19 22:08:53,271 INFO [train.py:1198] (0/2) Epoch 42, batch 3200, loss[loss=0.2227, ctc_loss=0.1084, cr_loss=0.3485, attn_decoder_loss=0.2276, over 29766.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1111, cr_loss=0.3502, attn_decoder_loss=0.238, over 5794924.87 frames. ], batch size: 80, lr: 2.62e-03, grad_scale: 16.0 2024-09-19 22:09:02,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=754900.0, ans=0.125 2024-09-19 22:09:11,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=754940.0, ans=0.125 2024-09-19 22:09:17,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=754940.0, ans=0.125 2024-09-19 22:09:32,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 22:09:46,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=755020.0, ans=0.125 2024-09-19 22:10:03,487 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.581e+01 9.115e+01 9.616e+01 1.393e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-19 22:10:09,484 INFO [train.py:1198] (0/2) Epoch 42, batch 3250, loss[loss=0.2404, ctc_loss=0.1229, cr_loss=0.388, attn_decoder_loss=0.2448, over 29712.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1116, cr_loss=0.3514, attn_decoder_loss=0.2386, over 5801375.90 frames. ], batch size: 84, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:10:12,219 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-09-19 22:10:17,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=755100.0, ans=0.125 2024-09-19 22:10:20,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=755100.0, ans=0.2 2024-09-19 22:10:24,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=755140.0, ans=0.07 2024-09-19 22:10:58,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=12.0 2024-09-19 22:11:06,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-09-19 22:11:14,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=755260.0, ans=0.2 2024-09-19 22:11:26,684 INFO [train.py:1198] (0/2) Epoch 42, batch 3300, loss[loss=0.2467, ctc_loss=0.1179, cr_loss=0.3521, attn_decoder_loss=0.2532, over 28235.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1107, cr_loss=0.3497, attn_decoder_loss=0.2375, over 5797790.37 frames. ], batch size: 111, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:11:28,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=755300.0, ans=0.125 2024-09-19 22:11:54,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-09-19 22:12:11,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-09-19 22:12:39,471 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.624e+01 9.226e+01 9.886e+01 3.496e+02, threshold=1.845e+02, percent-clipped=4.0 2024-09-19 22:12:39,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=755460.0, ans=0.125 2024-09-19 22:12:44,137 INFO [train.py:1198] (0/2) Epoch 42, batch 3350, loss[loss=0.2426, ctc_loss=0.1143, cr_loss=0.3629, attn_decoder_loss=0.2487, over 28854.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1114, cr_loss=0.3511, attn_decoder_loss=0.2383, over 5773960.32 frames. ], batch size: 104, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:13:02,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=755540.0, ans=0.125 2024-09-19 22:13:07,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-19 22:13:10,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755540.0, ans=0.1 2024-09-19 22:13:29,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=755620.0, ans=0.2 2024-09-19 22:13:34,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=755620.0, ans=0.2 2024-09-19 22:13:40,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=755620.0, ans=0.0 2024-09-19 22:13:41,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=755620.0, ans=0.0 2024-09-19 22:13:49,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.68 vs. limit=15.0 2024-09-19 22:13:59,651 INFO [train.py:1198] (0/2) Epoch 42, batch 3400, loss[loss=0.2066, ctc_loss=0.09239, cr_loss=0.3126, attn_decoder_loss=0.2123, over 29343.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1117, cr_loss=0.3517, attn_decoder_loss=0.2384, over 5764968.46 frames. ], batch size: 67, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:14:13,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=755740.0, ans=0.025 2024-09-19 22:14:49,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755820.0, ans=0.1 2024-09-19 22:14:49,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-19 22:14:50,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755820.0, ans=0.1 2024-09-19 22:15:10,682 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-19 22:15:12,722 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.618e+01 8.954e+01 9.599e+01 1.831e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-19 22:15:17,198 INFO [train.py:1198] (0/2) Epoch 42, batch 3450, loss[loss=0.2434, ctc_loss=0.1251, cr_loss=0.3875, attn_decoder_loss=0.248, over 28502.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1119, cr_loss=0.3523, attn_decoder_loss=0.2386, over 5772882.16 frames. ], batch size: 112, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:15:25,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=755900.0, ans=0.0 2024-09-19 22:15:47,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=755980.0, ans=0.035 2024-09-19 22:15:50,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755980.0, ans=0.1 2024-09-19 22:15:58,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=755980.0, ans=0.0 2024-09-19 22:16:03,453 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.33 vs. limit=15.0 2024-09-19 22:16:06,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.66 vs. limit=15.0 2024-09-19 22:16:14,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=756020.0, ans=0.0 2024-09-19 22:16:21,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=756060.0, ans=0.025 2024-09-19 22:16:34,972 INFO [train.py:1198] (0/2) Epoch 42, batch 3500, loss[loss=0.2121, ctc_loss=0.09119, cr_loss=0.2943, attn_decoder_loss=0.219, over 29282.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1116, cr_loss=0.3517, attn_decoder_loss=0.2382, over 5775042.91 frames. ], batch size: 71, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:16:36,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=756100.0, ans=0.0 2024-09-19 22:16:36,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=756100.0, ans=0.0 2024-09-19 22:17:02,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-19 22:17:03,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=756180.0, ans=0.025 2024-09-19 22:17:03,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=756180.0, ans=0.2 2024-09-19 22:17:15,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756180.0, ans=0.125 2024-09-19 22:17:26,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.09 vs. limit=15.0 2024-09-19 22:17:37,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=12.0 2024-09-19 22:17:43,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756260.0, ans=0.1 2024-09-19 22:17:44,786 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.622e+01 9.000e+01 9.662e+01 3.411e+02, threshold=1.800e+02, percent-clipped=1.0 2024-09-19 22:17:49,260 INFO [train.py:1198] (0/2) Epoch 42, batch 3550, loss[loss=0.2513, ctc_loss=0.1207, cr_loss=0.3608, attn_decoder_loss=0.2578, over 29704.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1116, cr_loss=0.3517, attn_decoder_loss=0.2384, over 5782461.41 frames. ], batch size: 89, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:17:49,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=756300.0, ans=0.0 2024-09-19 22:18:02,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=756340.0, ans=0.125 2024-09-19 22:18:02,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=756340.0, ans=0.125 2024-09-19 22:18:07,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=756340.0, ans=0.0 2024-09-19 22:18:39,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=756420.0, ans=0.05 2024-09-19 22:18:51,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.90 vs. limit=10.0 2024-09-19 22:18:52,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756460.0, ans=0.1 2024-09-19 22:19:02,903 INFO [train.py:1198] (0/2) Epoch 42, batch 3600, loss[loss=0.2213, ctc_loss=0.1039, cr_loss=0.3269, attn_decoder_loss=0.2271, over 29525.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1116, cr_loss=0.352, attn_decoder_loss=0.2385, over 5792065.62 frames. ], batch size: 77, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:19:21,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=756540.0, ans=0.09899494936611666 2024-09-19 22:19:22,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=756540.0, ans=0.125 2024-09-19 22:19:25,007 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2024-09-19 22:19:27,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=756540.0, ans=0.125 2024-09-19 22:19:32,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-19 22:19:38,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=756580.0, ans=0.0 2024-09-19 22:19:44,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-19 22:19:57,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-19 22:20:04,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756660.0, ans=0.1 2024-09-19 22:20:05,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=756660.0, ans=0.0 2024-09-19 22:20:14,584 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.526e+01 8.930e+01 9.587e+01 1.613e+02, threshold=1.786e+02, percent-clipped=0.0 2024-09-19 22:20:19,051 INFO [train.py:1198] (0/2) Epoch 42, batch 3650, loss[loss=0.2365, ctc_loss=0.1246, cr_loss=0.381, attn_decoder_loss=0.2404, over 29489.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.111, cr_loss=0.3507, attn_decoder_loss=0.2379, over 5793452.74 frames. ], batch size: 90, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:20:34,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=756740.0, ans=0.125 2024-09-19 22:20:46,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=756740.0, ans=0.125 2024-09-19 22:21:11,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2024-09-19 22:21:19,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=756860.0, ans=0.07 2024-09-19 22:21:20,494 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.97 vs. limit=5.0 2024-09-19 22:21:24,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=756860.0, ans=15.0 2024-09-19 22:21:28,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=756860.0, ans=0.125 2024-09-19 22:21:33,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-19 22:21:34,296 INFO [train.py:1198] (0/2) Epoch 42, batch 3700, loss[loss=0.2398, ctc_loss=0.1102, cr_loss=0.35, attn_decoder_loss=0.2464, over 29715.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1108, cr_loss=0.3502, attn_decoder_loss=0.2381, over 5803661.14 frames. ], batch size: 84, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:21:44,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756900.0, ans=0.1 2024-09-19 22:21:50,319 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-19 22:22:16,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=756980.0, ans=0.125 2024-09-19 22:22:27,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.30 vs. limit=15.0 2024-09-19 22:22:33,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=757020.0, ans=0.0 2024-09-19 22:22:46,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.506e+01 9.125e+01 9.829e+01 2.175e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-19 22:22:50,495 INFO [train.py:1198] (0/2) Epoch 42, batch 3750, loss[loss=0.2055, ctc_loss=0.09267, cr_loss=0.3168, attn_decoder_loss=0.211, over 29308.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1102, cr_loss=0.349, attn_decoder_loss=0.2376, over 5806050.15 frames. ], batch size: 67, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:23:11,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=757140.0, ans=0.125 2024-09-19 22:23:24,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=757180.0, ans=0.125 2024-09-19 22:23:26,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=757180.0, ans=0.125 2024-09-19 22:23:38,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=757220.0, ans=10.0 2024-09-19 22:23:45,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=757220.0, ans=0.0 2024-09-19 22:24:04,581 INFO [train.py:1198] (0/2) Epoch 42, batch 3800, loss[loss=0.2303, ctc_loss=0.09818, cr_loss=0.3156, attn_decoder_loss=0.238, over 29621.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1101, cr_loss=0.3485, attn_decoder_loss=0.2374, over 5797246.97 frames. ], batch size: 86, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:24:10,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-09-19 22:24:11,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.58 vs. limit=22.5 2024-09-19 22:24:16,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=757300.0, ans=0.125 2024-09-19 22:24:33,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=757380.0, ans=0.125 2024-09-19 22:24:39,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=22.5 2024-09-19 22:25:13,885 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.649e+01 9.029e+01 9.772e+01 5.131e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-19 22:25:16,217 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.10 vs. limit=22.5 2024-09-19 22:25:17,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=757500.0, ans=0.125 2024-09-19 22:25:18,240 INFO [train.py:1198] (0/2) Epoch 42, batch 3850, loss[loss=0.2599, ctc_loss=0.132, cr_loss=0.3847, attn_decoder_loss=0.2655, over 29292.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1103, cr_loss=0.3491, attn_decoder_loss=0.2376, over 5810881.80 frames. ], batch size: 100, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:25:20,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=757500.0, ans=0.0 2024-09-19 22:25:20,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.53 vs. limit=15.0 2024-09-19 22:25:33,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=757540.0, ans=0.125 2024-09-19 22:25:39,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=757540.0, ans=0.125 2024-09-19 22:25:46,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=757580.0, ans=0.1 2024-09-19 22:25:47,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=757580.0, ans=0.0 2024-09-19 22:25:55,960 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:26:06,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.00 vs. limit=10.0 2024-09-19 22:26:33,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=757700.0, ans=15.0 2024-09-19 22:26:33,899 INFO [train.py:1198] (0/2) Epoch 42, batch 3900, loss[loss=0.2468, ctc_loss=0.1165, cr_loss=0.3668, attn_decoder_loss=0.2531, over 29630.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1107, cr_loss=0.3501, attn_decoder_loss=0.238, over 5815839.82 frames. ], batch size: 86, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:26:37,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=757700.0, ans=0.0 2024-09-19 22:26:38,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=757700.0, ans=0.125 2024-09-19 22:26:38,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=757700.0, ans=0.0 2024-09-19 22:26:40,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=757700.0, ans=0.125 2024-09-19 22:26:54,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=757740.0, ans=0.0 2024-09-19 22:27:03,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=757780.0, ans=0.5 2024-09-19 22:27:11,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=757780.0, ans=0.125 2024-09-19 22:27:30,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=757820.0, ans=10.0 2024-09-19 22:27:34,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=757860.0, ans=0.2 2024-09-19 22:27:41,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=757860.0, ans=0.0 2024-09-19 22:27:44,479 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.643e+01 9.033e+01 9.490e+01 1.279e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-19 22:27:49,121 INFO [train.py:1198] (0/2) Epoch 42, batch 3950, loss[loss=0.2432, ctc_loss=0.1144, cr_loss=0.3693, attn_decoder_loss=0.2493, over 29461.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1104, cr_loss=0.3495, attn_decoder_loss=0.2378, over 5835145.76 frames. ], batch size: 97, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:27:51,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2024-09-19 22:27:52,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757900.0, ans=0.1 2024-09-19 22:27:58,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=757900.0, ans=0.0 2024-09-19 22:28:18,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=757980.0, ans=0.0 2024-09-19 22:29:02,376 INFO [train.py:1198] (0/2) Epoch 42, batch 4000, loss[loss=0.215, ctc_loss=0.09504, cr_loss=0.3007, attn_decoder_loss=0.2216, over 29516.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1107, cr_loss=0.3499, attn_decoder_loss=0.2381, over 5812832.02 frames. ], batch size: 74, lr: 2.61e-03, grad_scale: 32.0 2024-09-19 22:29:20,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=758140.0, ans=0.0 2024-09-19 22:29:20,235 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:29:32,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=758180.0, ans=0.125 2024-09-19 22:29:40,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=758180.0, ans=0.1 2024-09-19 22:29:40,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=758180.0, ans=0.125 2024-09-19 22:29:48,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=758220.0, ans=0.125 2024-09-19 22:29:49,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=758220.0, ans=0.125 2024-09-19 22:29:55,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=758220.0, ans=0.125 2024-09-19 22:29:56,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-09-19 22:30:04,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=758260.0, ans=10.0 2024-09-19 22:30:13,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.535e+01 8.990e+01 9.708e+01 1.890e+02, threshold=1.798e+02, percent-clipped=2.0 2024-09-19 22:30:16,060 INFO [train.py:1198] (0/2) Epoch 42, batch 4050, loss[loss=0.2612, ctc_loss=0.1518, cr_loss=0.4075, attn_decoder_loss=0.2643, over 19972.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1113, cr_loss=0.351, attn_decoder_loss=0.2383, over 5795931.20 frames. ], batch size: 210, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:30:23,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=758300.0, ans=0.125 2024-09-19 22:31:31,088 INFO [train.py:1198] (0/2) Epoch 42, batch 4100, loss[loss=0.2422, ctc_loss=0.1218, cr_loss=0.3598, attn_decoder_loss=0.2476, over 29519.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1116, cr_loss=0.3515, attn_decoder_loss=0.2385, over 5791043.64 frames. ], batch size: 90, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:31:43,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=758500.0, ans=0.125 2024-09-19 22:32:09,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=758580.0, ans=0.125 2024-09-19 22:32:16,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=758620.0, ans=0.2 2024-09-19 22:32:42,743 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.799e+01 9.217e+01 9.992e+01 1.793e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-19 22:32:45,671 INFO [train.py:1198] (0/2) Epoch 42, batch 4150, loss[loss=0.2351, ctc_loss=0.1137, cr_loss=0.3648, attn_decoder_loss=0.2405, over 29496.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1116, cr_loss=0.3516, attn_decoder_loss=0.2384, over 5797299.54 frames. ], batch size: 77, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:32:57,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758700.0, ans=0.1 2024-09-19 22:32:57,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=758700.0, ans=0.07 2024-09-19 22:33:16,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=758780.0, ans=0.125 2024-09-19 22:33:22,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758780.0, ans=0.125 2024-09-19 22:33:58,999 INFO [train.py:1198] (0/2) Epoch 42, batch 4200, loss[loss=0.2577, ctc_loss=0.1402, cr_loss=0.4327, attn_decoder_loss=0.2611, over 29484.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1116, cr_loss=0.352, attn_decoder_loss=0.2385, over 5799671.92 frames. ], batch size: 90, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:34:11,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=758900.0, ans=0.125 2024-09-19 22:34:30,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.23 vs. limit=22.5 2024-09-19 22:34:31,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=758980.0, ans=0.125 2024-09-19 22:34:33,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.20 vs. limit=15.0 2024-09-19 22:35:10,285 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.752e+01 9.321e+01 9.976e+01 3.736e+02, threshold=1.864e+02, percent-clipped=1.0 2024-09-19 22:35:10,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=759060.0, ans=0.125 2024-09-19 22:35:13,180 INFO [train.py:1198] (0/2) Epoch 42, batch 4250, loss[loss=0.2114, ctc_loss=0.0876, cr_loss=0.3024, attn_decoder_loss=0.2185, over 29499.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1117, cr_loss=0.3518, attn_decoder_loss=0.2388, over 5806299.39 frames. ], batch size: 74, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:35:23,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=759100.0, ans=0.1 2024-09-19 22:35:32,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=759140.0, ans=0.2 2024-09-19 22:35:48,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.73 vs. limit=10.0 2024-09-19 22:35:52,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=759180.0, ans=0.125 2024-09-19 22:36:11,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.54 vs. limit=10.0 2024-09-19 22:36:27,662 INFO [train.py:1198] (0/2) Epoch 42, batch 4300, loss[loss=0.2399, ctc_loss=0.1161, cr_loss=0.368, attn_decoder_loss=0.2455, over 29525.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1115, cr_loss=0.3512, attn_decoder_loss=0.2388, over 5795488.81 frames. ], batch size: 87, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:36:30,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=759300.0, ans=0.1 2024-09-19 22:36:34,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-19 22:36:44,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=759340.0, ans=0.1 2024-09-19 22:36:49,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.68 vs. limit=15.0 2024-09-19 22:37:00,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759380.0, ans=0.125 2024-09-19 22:37:22,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=759420.0, ans=0.0 2024-09-19 22:37:38,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2024-09-19 22:37:38,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.652e+01 9.323e+01 9.871e+01 1.907e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-19 22:37:41,935 INFO [train.py:1198] (0/2) Epoch 42, batch 4350, loss[loss=0.2557, ctc_loss=0.1235, cr_loss=0.379, attn_decoder_loss=0.2619, over 29525.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1141, cr_loss=0.3565, attn_decoder_loss=0.2421, over 5797781.91 frames. ], batch size: 97, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:37:42,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=759500.0, ans=0.125 2024-09-19 22:38:17,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=759580.0, ans=0.125 2024-09-19 22:38:31,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=759620.0, ans=0.2 2024-09-19 22:38:39,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=759620.0, ans=0.125 2024-09-19 22:38:55,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.75 vs. limit=10.0 2024-09-19 22:38:56,125 INFO [train.py:1198] (0/2) Epoch 42, batch 4400, loss[loss=0.2399, ctc_loss=0.1204, cr_loss=0.3721, attn_decoder_loss=0.2449, over 27398.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1151, cr_loss=0.3585, attn_decoder_loss=0.244, over 5768697.79 frames. ], batch size: 124, lr: 2.61e-03, grad_scale: 32.0 2024-09-19 22:39:01,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=9.00 vs. limit=10.0 2024-09-19 22:39:32,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=759780.0, ans=0.1 2024-09-19 22:39:42,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-09-19 22:40:06,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=759860.0, ans=0.1 2024-09-19 22:40:07,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.246e+01 9.179e+01 9.547e+01 1.014e+02 2.970e+02, threshold=1.909e+02, percent-clipped=2.0 2024-09-19 22:40:09,216 INFO [train.py:1198] (0/2) Epoch 42, batch 4450, loss[loss=0.2529, ctc_loss=0.1369, cr_loss=0.3714, attn_decoder_loss=0.2575, over 20018.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1182, cr_loss=0.3636, attn_decoder_loss=0.2458, over 5582453.23 frames. ], batch size: 210, lr: 2.61e-03, grad_scale: 16.0 2024-09-19 22:40:10,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=759900.0, ans=0.125 2024-09-19 22:40:34,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=759940.0, ans=0.0 2024-09-19 22:40:42,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=759980.0, ans=0.2 2024-09-19 22:40:59,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=760020.0, ans=0.0 2024-09-19 22:40:59,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-19 22:41:04,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-09-19 22:41:25,586 INFO [train.py:1198] (0/2) Epoch 42, batch 4500, loss[loss=0.2598, ctc_loss=0.1444, cr_loss=0.402, attn_decoder_loss=0.2637, over 20598.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1212, cr_loss=0.3664, attn_decoder_loss=0.2476, over 5241073.03 frames. ], batch size: 210, lr: 2.61e-03, grad_scale: 8.0 2024-09-19 22:41:32,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=760100.0, ans=0.2 2024-09-19 22:41:39,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=760140.0, ans=0.2 2024-09-19 22:41:41,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=760140.0, ans=0.125 2024-09-19 22:41:54,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=760180.0, ans=0.015 2024-09-19 22:41:57,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=760180.0, ans=0.2 2024-09-19 22:42:03,068 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-42.pt 2024-09-19 22:42:41,134 INFO [train.py:1198] (0/2) Epoch 43, batch 0, loss[loss=0.2039, ctc_loss=0.08759, cr_loss=0.293, attn_decoder_loss=0.2103, over 29619.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.08759, cr_loss=0.293, attn_decoder_loss=0.2103, over 29619.00 frames. ], batch size: 73, lr: 2.58e-03, grad_scale: 16.0 2024-09-19 22:42:41,135 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-19 22:42:44,316 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3358, 3.8196, 4.1357, 4.2389], device='cuda:0') 2024-09-19 22:43:00,148 INFO [train.py:1230] (0/2) Epoch 43, validation: loss=0.2125, ctc_loss=0.03634, cr_loss=6.648e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-19 22:43:00,148 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-19 22:43:07,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-19 22:43:09,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760200.0, ans=0.1 2024-09-19 22:43:11,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=760200.0, ans=0.5 2024-09-19 22:43:26,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760240.0, ans=0.1 2024-09-19 22:43:39,928 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 1.042e+02 1.140e+02 1.225e+02 1.755e+02, threshold=2.281e+02, percent-clipped=0.0 2024-09-19 22:43:49,795 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-19 22:44:11,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=760360.0, ans=0.125 2024-09-19 22:44:17,477 INFO [train.py:1198] (0/2) Epoch 43, batch 50, loss[loss=0.2085, ctc_loss=0.0911, cr_loss=0.3043, attn_decoder_loss=0.2148, over 29440.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1121, cr_loss=0.3516, attn_decoder_loss=0.2389, over 1267312.63 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:44:55,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=760480.0, ans=0.2 2024-09-19 22:45:05,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-19 22:45:06,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=760520.0, ans=0.035 2024-09-19 22:45:24,798 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-09-19 22:45:33,227 INFO [train.py:1198] (0/2) Epoch 43, batch 100, loss[loss=0.2203, ctc_loss=0.1015, cr_loss=0.3325, attn_decoder_loss=0.2261, over 29555.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1135, cr_loss=0.3549, attn_decoder_loss=0.2411, over 2252337.85 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:45:34,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=760600.0, ans=0.2 2024-09-19 22:45:53,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=760640.0, ans=0.125 2024-09-19 22:46:10,561 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.774e+01 9.184e+01 9.707e+01 2.214e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-19 22:46:10,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=760680.0, ans=0.1 2024-09-19 22:46:12,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=760680.0, ans=0.1 2024-09-19 22:46:53,114 INFO [train.py:1198] (0/2) Epoch 43, batch 150, loss[loss=0.2101, ctc_loss=0.09194, cr_loss=0.3154, attn_decoder_loss=0.2162, over 29421.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1121, cr_loss=0.3528, attn_decoder_loss=0.2394, over 3047524.24 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:47:07,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=760840.0, ans=0.125 2024-09-19 22:47:13,253 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:47:17,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=760840.0, ans=0.0 2024-09-19 22:47:28,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.36 vs. limit=15.0 2024-09-19 22:47:30,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=760880.0, ans=0.0 2024-09-19 22:47:42,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=760920.0, ans=0.0 2024-09-19 22:48:07,816 INFO [train.py:1198] (0/2) Epoch 43, batch 200, loss[loss=0.2424, ctc_loss=0.1197, cr_loss=0.366, attn_decoder_loss=0.2479, over 27593.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1114, cr_loss=0.352, attn_decoder_loss=0.2385, over 3659758.62 frames. ], batch size: 125, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:48:15,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=761000.0, ans=0.1 2024-09-19 22:48:28,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=761040.0, ans=0.125 2024-09-19 22:48:32,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=761040.0, ans=0.05 2024-09-19 22:48:34,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.93 vs. limit=10.0 2024-09-19 22:48:44,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=761080.0, ans=0.2 2024-09-19 22:48:45,409 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.451e+01 8.919e+01 9.338e+01 1.606e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-19 22:49:23,061 INFO [train.py:1198] (0/2) Epoch 43, batch 250, loss[loss=0.2447, ctc_loss=0.1194, cr_loss=0.3815, attn_decoder_loss=0.2502, over 29275.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1115, cr_loss=0.3517, attn_decoder_loss=0.2383, over 4141175.69 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:49:25,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=761200.0, ans=0.1 2024-09-19 22:50:08,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761320.0, ans=0.1 2024-09-19 22:50:35,101 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:50:40,744 INFO [train.py:1198] (0/2) Epoch 43, batch 300, loss[loss=0.2406, ctc_loss=0.1178, cr_loss=0.3813, attn_decoder_loss=0.2457, over 29517.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1109, cr_loss=0.351, attn_decoder_loss=0.2378, over 4510601.90 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:50:44,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761400.0, ans=0.125 2024-09-19 22:50:55,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=761400.0, ans=0.0 2024-09-19 22:51:14,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=761480.0, ans=0.125 2024-09-19 22:51:20,733 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.675e+01 9.148e+01 9.609e+01 2.085e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-19 22:51:25,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=761480.0, ans=0.125 2024-09-19 22:51:34,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=761520.0, ans=0.125 2024-09-19 22:51:59,187 INFO [train.py:1198] (0/2) Epoch 43, batch 350, loss[loss=0.2076, ctc_loss=0.09594, cr_loss=0.3221, attn_decoder_loss=0.2129, over 29324.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1112, cr_loss=0.3519, attn_decoder_loss=0.2382, over 4795857.38 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:52:12,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.29 vs. limit=22.5 2024-09-19 22:52:41,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=761680.0, ans=0.125 2024-09-19 22:53:14,431 INFO [train.py:1198] (0/2) Epoch 43, batch 400, loss[loss=0.2349, ctc_loss=0.1077, cr_loss=0.3404, attn_decoder_loss=0.2415, over 29728.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1106, cr_loss=0.3508, attn_decoder_loss=0.2379, over 5026435.87 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 32.0 2024-09-19 22:53:14,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=761800.0, ans=0.125 2024-09-19 22:53:16,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=761800.0, ans=0.2 2024-09-19 22:53:29,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=761840.0, ans=0.025 2024-09-19 22:53:31,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=761840.0, ans=0.125 2024-09-19 22:53:33,572 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-09-19 22:53:38,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=761840.0, ans=0.125 2024-09-19 22:53:52,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=761880.0, ans=0.0 2024-09-19 22:53:53,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=12.0 2024-09-19 22:53:53,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.678e+01 9.168e+01 9.670e+01 1.497e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-19 22:53:58,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=761920.0, ans=0.125 2024-09-19 22:54:04,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=761920.0, ans=0.125 2024-09-19 22:54:10,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=761920.0, ans=0.2 2024-09-19 22:54:20,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=761960.0, ans=0.025 2024-09-19 22:54:22,767 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.35 vs. limit=15.0 2024-09-19 22:54:29,064 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.97 vs. limit=10.0 2024-09-19 22:54:32,428 INFO [train.py:1198] (0/2) Epoch 43, batch 450, loss[loss=0.2361, ctc_loss=0.1105, cr_loss=0.345, attn_decoder_loss=0.2424, over 29667.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1109, cr_loss=0.3509, attn_decoder_loss=0.2381, over 5186107.89 frames. ], batch size: 83, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:54:46,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762040.0, ans=0.1 2024-09-19 22:55:09,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=762080.0, ans=0.07 2024-09-19 22:55:15,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=762080.0, ans=0.125 2024-09-19 22:55:21,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2024-09-19 22:55:21,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=762120.0, ans=0.025 2024-09-19 22:55:23,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2024-09-19 22:55:50,316 INFO [train.py:1198] (0/2) Epoch 43, batch 500, loss[loss=0.2503, ctc_loss=0.1271, cr_loss=0.389, attn_decoder_loss=0.2553, over 29474.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1107, cr_loss=0.3498, attn_decoder_loss=0.2377, over 5329135.46 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:55:55,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=762200.0, ans=0.125 2024-09-19 22:55:56,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=762200.0, ans=0.125 2024-09-19 22:56:01,427 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:56:19,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=762280.0, ans=0.0 2024-09-19 22:56:27,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=762280.0, ans=0.2 2024-09-19 22:56:29,956 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.617e+01 8.998e+01 9.696e+01 3.544e+02, threshold=1.800e+02, percent-clipped=2.0 2024-09-19 22:56:33,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=762280.0, ans=0.125 2024-09-19 22:56:38,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.98 vs. limit=15.0 2024-09-19 22:56:50,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-19 22:56:50,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-19 22:56:56,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-09-19 22:57:06,486 INFO [train.py:1198] (0/2) Epoch 43, batch 550, loss[loss=0.2413, ctc_loss=0.1158, cr_loss=0.3575, attn_decoder_loss=0.2473, over 28785.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.351, attn_decoder_loss=0.2379, over 5421798.15 frames. ], batch size: 104, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:57:42,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=762480.0, ans=0.125 2024-09-19 22:57:59,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=762520.0, ans=0.125 2024-09-19 22:58:02,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=762520.0, ans=0.125 2024-09-19 22:58:16,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=762560.0, ans=0.2 2024-09-19 22:58:16,886 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:58:19,980 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:58:24,033 INFO [train.py:1198] (0/2) Epoch 43, batch 600, loss[loss=0.2388, ctc_loss=0.1132, cr_loss=0.3587, attn_decoder_loss=0.2448, over 29198.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1113, cr_loss=0.3513, attn_decoder_loss=0.2382, over 5507763.26 frames. ], batch size: 100, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 22:58:43,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=762640.0, ans=0.2 2024-09-19 22:58:52,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.05 vs. limit=22.5 2024-09-19 22:59:01,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=762680.0, ans=0.0 2024-09-19 22:59:05,155 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.510e+01 8.971e+01 9.586e+01 1.722e+02, threshold=1.794e+02, percent-clipped=0.0 2024-09-19 22:59:20,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=762720.0, ans=0.5 2024-09-19 22:59:24,035 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.47 vs. limit=15.0 2024-09-19 22:59:29,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=762760.0, ans=0.2 2024-09-19 22:59:38,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.59 vs. limit=10.0 2024-09-19 22:59:40,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-09-19 22:59:41,223 INFO [train.py:1198] (0/2) Epoch 43, batch 650, loss[loss=0.2291, ctc_loss=0.1051, cr_loss=0.3337, attn_decoder_loss=0.2355, over 29770.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1106, cr_loss=0.3499, attn_decoder_loss=0.2377, over 5585268.95 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:00:36,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.98 vs. limit=15.0 2024-09-19 23:00:37,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=762920.0, ans=0.0 2024-09-19 23:00:41,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=762960.0, ans=0.025 2024-09-19 23:00:43,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=762960.0, ans=0.125 2024-09-19 23:00:43,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=762960.0, ans=0.025 2024-09-19 23:00:46,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=762960.0, ans=0.125 2024-09-19 23:00:56,555 INFO [train.py:1198] (0/2) Epoch 43, batch 700, loss[loss=0.2313, ctc_loss=0.1153, cr_loss=0.3738, attn_decoder_loss=0.2359, over 29542.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1109, cr_loss=0.3507, attn_decoder_loss=0.2381, over 5636719.42 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:01:10,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.88 vs. limit=10.0 2024-09-19 23:01:19,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=763040.0, ans=0.2 2024-09-19 23:01:28,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=763080.0, ans=0.125 2024-09-19 23:01:34,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=763080.0, ans=0.125 2024-09-19 23:01:35,727 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.576e+01 9.155e+01 9.558e+01 1.416e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-19 23:01:46,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=763120.0, ans=0.125 2024-09-19 23:01:58,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=763160.0, ans=0.125 2024-09-19 23:01:58,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=763160.0, ans=0.2 2024-09-19 23:02:00,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=763160.0, ans=0.025 2024-09-19 23:02:04,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763160.0, ans=0.125 2024-09-19 23:02:14,578 INFO [train.py:1198] (0/2) Epoch 43, batch 750, loss[loss=0.2326, ctc_loss=0.1062, cr_loss=0.3555, attn_decoder_loss=0.2387, over 29723.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1108, cr_loss=0.3505, attn_decoder_loss=0.238, over 5675308.93 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:02:20,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763200.0, ans=0.125 2024-09-19 23:02:57,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-19 23:03:02,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=763320.0, ans=0.125 2024-09-19 23:03:03,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=763320.0, ans=0.2 2024-09-19 23:03:17,341 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:03:31,977 INFO [train.py:1198] (0/2) Epoch 43, batch 800, loss[loss=0.2117, ctc_loss=0.0904, cr_loss=0.2983, attn_decoder_loss=0.2185, over 29594.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1107, cr_loss=0.3497, attn_decoder_loss=0.238, over 5706822.83 frames. ], batch size: 73, lr: 2.57e-03, grad_scale: 32.0 2024-09-19 23:03:44,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=763400.0, ans=0.1 2024-09-19 23:03:44,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763400.0, ans=0.1 2024-09-19 23:03:50,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=763440.0, ans=0.0 2024-09-19 23:04:13,923 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.564e+01 8.973e+01 9.746e+01 2.709e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-19 23:04:17,755 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=15.0 2024-09-19 23:04:33,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=763560.0, ans=0.0 2024-09-19 23:04:36,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=763560.0, ans=0.125 2024-09-19 23:04:41,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=763560.0, ans=0.05 2024-09-19 23:04:47,006 INFO [train.py:1198] (0/2) Epoch 43, batch 850, loss[loss=0.2489, ctc_loss=0.1271, cr_loss=0.3768, attn_decoder_loss=0.2541, over 29680.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1105, cr_loss=0.3496, attn_decoder_loss=0.2377, over 5735965.13 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:04:53,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=763600.0, ans=0.0 2024-09-19 23:05:33,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=763720.0, ans=0.125 2024-09-19 23:05:41,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=763720.0, ans=10.0 2024-09-19 23:05:49,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=763760.0, ans=0.125 2024-09-19 23:05:52,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=763760.0, ans=0.125 2024-09-19 23:06:04,721 INFO [train.py:1198] (0/2) Epoch 43, batch 900, loss[loss=0.2156, ctc_loss=0.09557, cr_loss=0.3117, attn_decoder_loss=0.222, over 29596.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1106, cr_loss=0.3495, attn_decoder_loss=0.238, over 5741732.52 frames. ], batch size: 73, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:06:10,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=763800.0, ans=0.05 2024-09-19 23:06:16,933 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:06:16,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=763800.0, ans=0.0 2024-09-19 23:06:32,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=763840.0, ans=0.0 2024-09-19 23:06:33,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=763880.0, ans=0.0 2024-09-19 23:06:46,895 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.546e+01 9.046e+01 9.640e+01 1.475e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 23:06:47,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=763880.0, ans=0.125 2024-09-19 23:07:08,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=763960.0, ans=0.125 2024-09-19 23:07:22,193 INFO [train.py:1198] (0/2) Epoch 43, batch 950, loss[loss=0.2173, ctc_loss=0.095, cr_loss=0.3149, attn_decoder_loss=0.2239, over 29489.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1108, cr_loss=0.3505, attn_decoder_loss=0.2381, over 5742841.23 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:07:58,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=764080.0, ans=0.025 2024-09-19 23:08:02,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=764080.0, ans=0.125 2024-09-19 23:08:03,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=12.0 2024-09-19 23:08:20,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764160.0, ans=0.125 2024-09-19 23:08:28,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=764160.0, ans=0.0 2024-09-19 23:08:36,653 INFO [train.py:1198] (0/2) Epoch 43, batch 1000, loss[loss=0.2261, ctc_loss=0.1071, cr_loss=0.3325, attn_decoder_loss=0.232, over 29529.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1114, cr_loss=0.3519, attn_decoder_loss=0.2388, over 5737998.25 frames. ], batch size: 77, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:08:37,797 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-19 23:08:39,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=764200.0, ans=0.125 2024-09-19 23:09:05,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=764280.0, ans=0.125 2024-09-19 23:09:18,835 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.655e+01 9.178e+01 9.837e+01 2.417e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-19 23:09:19,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=764280.0, ans=0.0 2024-09-19 23:09:30,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.01 vs. limit=15.0 2024-09-19 23:09:51,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=764360.0, ans=0.125 2024-09-19 23:09:54,202 INFO [train.py:1198] (0/2) Epoch 43, batch 1050, loss[loss=0.2372, ctc_loss=0.113, cr_loss=0.3557, attn_decoder_loss=0.2431, over 29683.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1112, cr_loss=0.3504, attn_decoder_loss=0.238, over 5745468.84 frames. ], batch size: 85, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:10:01,134 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=15.0 2024-09-19 23:10:18,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=764440.0, ans=0.025 2024-09-19 23:10:25,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=764480.0, ans=0.125 2024-09-19 23:10:41,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=764520.0, ans=0.125 2024-09-19 23:11:11,877 INFO [train.py:1198] (0/2) Epoch 43, batch 1100, loss[loss=0.2399, ctc_loss=0.1184, cr_loss=0.3663, attn_decoder_loss=0.2453, over 29457.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.111, cr_loss=0.3502, attn_decoder_loss=0.238, over 5757734.75 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:11:22,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=764600.0, ans=0.125 2024-09-19 23:11:39,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=764640.0, ans=0.125 2024-09-19 23:11:39,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=764640.0, ans=0.025 2024-09-19 23:11:54,299 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.378e+01 8.891e+01 9.353e+01 1.322e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-19 23:12:27,892 INFO [train.py:1198] (0/2) Epoch 43, batch 1150, loss[loss=0.2231, ctc_loss=0.1045, cr_loss=0.3375, attn_decoder_loss=0.2288, over 29429.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1109, cr_loss=0.3502, attn_decoder_loss=0.2379, over 5756916.04 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 8.0 2024-09-19 23:12:28,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=764800.0, ans=0.0 2024-09-19 23:12:45,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=764840.0, ans=0.1 2024-09-19 23:12:48,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-09-19 23:12:51,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=764840.0, ans=0.07 2024-09-19 23:12:54,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=764840.0, ans=0.125 2024-09-19 23:13:15,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=764920.0, ans=0.1 2024-09-19 23:13:21,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=764920.0, ans=0.125 2024-09-19 23:13:45,766 INFO [train.py:1198] (0/2) Epoch 43, batch 1200, loss[loss=0.2476, ctc_loss=0.1203, cr_loss=0.3663, attn_decoder_loss=0.2536, over 29653.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1115, cr_loss=0.3515, attn_decoder_loss=0.2385, over 5750012.39 frames. ], batch size: 85, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:13:49,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=765000.0, ans=0.125 2024-09-19 23:14:28,141 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.714e+01 9.128e+01 9.687e+01 4.379e+02, threshold=1.826e+02, percent-clipped=2.0 2024-09-19 23:14:44,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=765120.0, ans=0.125 2024-09-19 23:14:54,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=765160.0, ans=0.0 2024-09-19 23:14:56,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=765160.0, ans=0.125 2024-09-19 23:15:03,375 INFO [train.py:1198] (0/2) Epoch 43, batch 1250, loss[loss=0.2408, ctc_loss=0.1175, cr_loss=0.3653, attn_decoder_loss=0.2464, over 29521.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1118, cr_loss=0.352, attn_decoder_loss=0.239, over 5777671.53 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:15:17,165 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:15:28,068 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:15:28,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=12.0 2024-09-19 23:15:32,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=765280.0, ans=0.0 2024-09-19 23:15:39,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=765280.0, ans=0.09899494936611666 2024-09-19 23:15:53,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=765320.0, ans=0.025 2024-09-19 23:16:02,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=765360.0, ans=0.035 2024-09-19 23:16:07,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=765360.0, ans=0.1 2024-09-19 23:16:13,480 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:16:17,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=765400.0, ans=0.0 2024-09-19 23:16:19,057 INFO [train.py:1198] (0/2) Epoch 43, batch 1300, loss[loss=0.2413, ctc_loss=0.1148, cr_loss=0.3656, attn_decoder_loss=0.2472, over 28195.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1112, cr_loss=0.3511, attn_decoder_loss=0.2381, over 5782469.89 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:16:22,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=765400.0, ans=0.125 2024-09-19 23:16:32,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=765440.0, ans=0.0 2024-09-19 23:16:52,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=765480.0, ans=0.125 2024-09-19 23:16:56,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-19 23:16:57,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=765480.0, ans=0.125 2024-09-19 23:17:01,388 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.583e+01 9.046e+01 9.582e+01 1.774e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-19 23:17:04,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=765520.0, ans=0.125 2024-09-19 23:17:13,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=765520.0, ans=0.025 2024-09-19 23:17:19,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765560.0, ans=0.1 2024-09-19 23:17:37,164 INFO [train.py:1198] (0/2) Epoch 43, batch 1350, loss[loss=0.2338, ctc_loss=0.113, cr_loss=0.3499, attn_decoder_loss=0.2394, over 29754.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1109, cr_loss=0.3507, attn_decoder_loss=0.238, over 5798690.41 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:18:54,305 INFO [train.py:1198] (0/2) Epoch 43, batch 1400, loss[loss=0.2063, ctc_loss=0.09094, cr_loss=0.2938, attn_decoder_loss=0.2125, over 29599.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1109, cr_loss=0.3506, attn_decoder_loss=0.238, over 5809586.42 frames. ], batch size: 69, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:19:02,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=765800.0, ans=0.2 2024-09-19 23:19:03,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=765800.0, ans=0.125 2024-09-19 23:19:16,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=765840.0, ans=15.0 2024-09-19 23:19:23,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=765880.0, ans=0.0 2024-09-19 23:19:29,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=765880.0, ans=0.1 2024-09-19 23:19:32,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=765880.0, ans=0.0 2024-09-19 23:19:36,335 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.462e+01 9.127e+01 9.642e+01 1.340e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-19 23:19:56,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=765960.0, ans=0.1 2024-09-19 23:20:02,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=765960.0, ans=0.125 2024-09-19 23:20:09,476 INFO [train.py:1198] (0/2) Epoch 43, batch 1450, loss[loss=0.2498, ctc_loss=0.1259, cr_loss=0.3849, attn_decoder_loss=0.255, over 29464.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1107, cr_loss=0.3508, attn_decoder_loss=0.2382, over 5806186.72 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:20:21,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=766000.0, ans=0.0 2024-09-19 23:20:23,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=766040.0, ans=0.1 2024-09-19 23:20:33,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=766040.0, ans=0.0 2024-09-19 23:20:39,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=766080.0, ans=0.2 2024-09-19 23:20:44,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=766080.0, ans=0.025 2024-09-19 23:20:47,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=766080.0, ans=0.0 2024-09-19 23:20:48,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=766080.0, ans=0.125 2024-09-19 23:21:00,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-09-19 23:21:26,807 INFO [train.py:1198] (0/2) Epoch 43, batch 1500, loss[loss=0.2406, ctc_loss=0.1156, cr_loss=0.3652, attn_decoder_loss=0.2464, over 29625.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1108, cr_loss=0.3507, attn_decoder_loss=0.2384, over 5806562.00 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 16.0 2024-09-19 23:21:30,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=766200.0, ans=0.125 2024-09-19 23:21:40,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=766240.0, ans=0.2 2024-09-19 23:21:51,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=766240.0, ans=0.125 2024-09-19 23:22:09,443 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.602e+01 9.131e+01 9.560e+01 1.543e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-19 23:22:23,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=766320.0, ans=0.0 2024-09-19 23:22:29,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.41 vs. limit=22.5 2024-09-19 23:22:45,491 INFO [train.py:1198] (0/2) Epoch 43, batch 1550, loss[loss=0.2446, ctc_loss=0.1254, cr_loss=0.3856, attn_decoder_loss=0.2493, over 29518.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1109, cr_loss=0.3507, attn_decoder_loss=0.2384, over 5782339.06 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:23:36,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=766520.0, ans=0.0 2024-09-19 23:23:42,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766520.0, ans=0.1 2024-09-19 23:23:55,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=22.5 2024-09-19 23:24:00,262 INFO [train.py:1198] (0/2) Epoch 43, batch 1600, loss[loss=0.2363, ctc_loss=0.1056, cr_loss=0.3287, attn_decoder_loss=0.2435, over 29670.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1109, cr_loss=0.3506, attn_decoder_loss=0.2381, over 5764558.48 frames. ], batch size: 85, lr: 2.56e-03, grad_scale: 32.0 2024-09-19 23:24:30,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=766680.0, ans=0.0 2024-09-19 23:24:44,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 8.578e+01 9.126e+01 9.935e+01 1.775e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-19 23:24:45,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=766720.0, ans=0.0 2024-09-19 23:24:49,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=766720.0, ans=0.125 2024-09-19 23:24:51,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2024-09-19 23:25:17,716 INFO [train.py:1198] (0/2) Epoch 43, batch 1650, loss[loss=0.2445, ctc_loss=0.1184, cr_loss=0.3613, attn_decoder_loss=0.2504, over 29738.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.111, cr_loss=0.3504, attn_decoder_loss=0.2379, over 5756783.66 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:25:24,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766800.0, ans=0.1 2024-09-19 23:25:34,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=766840.0, ans=0.2 2024-09-19 23:25:36,837 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5 2024-09-19 23:25:41,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=766840.0, ans=0.0 2024-09-19 23:25:52,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=766880.0, ans=0.0 2024-09-19 23:26:21,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=766960.0, ans=0.125 2024-09-19 23:26:34,723 INFO [train.py:1198] (0/2) Epoch 43, batch 1700, loss[loss=0.2114, ctc_loss=0.09423, cr_loss=0.3103, attn_decoder_loss=0.2175, over 29616.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1103, cr_loss=0.3488, attn_decoder_loss=0.2376, over 5779001.63 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:26:39,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767000.0, ans=0.1 2024-09-19 23:26:39,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=767000.0, ans=0.0 2024-09-19 23:26:47,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=767000.0, ans=0.125 2024-09-19 23:26:49,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2024-09-19 23:26:59,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=767040.0, ans=0.2 2024-09-19 23:27:18,348 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.484e+01 9.017e+01 9.514e+01 1.146e+02, threshold=1.803e+02, percent-clipped=0.0 2024-09-19 23:27:44,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=767160.0, ans=0.0 2024-09-19 23:27:50,588 INFO [train.py:1198] (0/2) Epoch 43, batch 1750, loss[loss=0.2125, ctc_loss=0.09853, cr_loss=0.3231, attn_decoder_loss=0.218, over 29332.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1101, cr_loss=0.3486, attn_decoder_loss=0.2373, over 5788026.91 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:27:59,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=767200.0, ans=0.125 2024-09-19 23:28:10,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=767240.0, ans=0.125 2024-09-19 23:28:24,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767280.0, ans=0.1 2024-09-19 23:28:47,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=767320.0, ans=0.125 2024-09-19 23:29:07,603 INFO [train.py:1198] (0/2) Epoch 43, batch 1800, loss[loss=0.229, ctc_loss=0.1011, cr_loss=0.3269, attn_decoder_loss=0.2359, over 29679.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1101, cr_loss=0.3484, attn_decoder_loss=0.2374, over 5790689.97 frames. ], batch size: 83, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:29:11,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2024-09-19 23:29:48,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767480.0, ans=0.1 2024-09-19 23:29:50,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=767480.0, ans=0.0 2024-09-19 23:29:51,300 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.348e+01 8.939e+01 9.600e+01 1.459e+02, threshold=1.788e+02, percent-clipped=0.0 2024-09-19 23:29:53,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=767520.0, ans=0.025 2024-09-19 23:30:05,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=767520.0, ans=0.0 2024-09-19 23:30:17,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=767560.0, ans=0.125 2024-09-19 23:30:20,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=767560.0, ans=0.125 2024-09-19 23:30:23,259 INFO [train.py:1198] (0/2) Epoch 43, batch 1850, loss[loss=0.23, ctc_loss=0.1009, cr_loss=0.3018, attn_decoder_loss=0.2376, over 29639.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1102, cr_loss=0.3489, attn_decoder_loss=0.2373, over 5795241.88 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:30:31,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=767600.0, ans=0.0 2024-09-19 23:30:39,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=767640.0, ans=0.0 2024-09-19 23:30:39,501 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.46 vs. limit=22.5 2024-09-19 23:31:35,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-19 23:31:40,271 INFO [train.py:1198] (0/2) Epoch 43, batch 1900, loss[loss=0.2452, ctc_loss=0.116, cr_loss=0.349, attn_decoder_loss=0.2518, over 29710.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1105, cr_loss=0.3494, attn_decoder_loss=0.2379, over 5803330.94 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:31:49,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=767800.0, ans=0.0 2024-09-19 23:31:55,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=767840.0, ans=0.125 2024-09-19 23:32:09,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=767880.0, ans=0.2 2024-09-19 23:32:13,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-19 23:32:24,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.779e+01 9.176e+01 9.742e+01 1.549e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-19 23:32:30,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767920.0, ans=0.125 2024-09-19 23:32:45,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=767960.0, ans=0.125 2024-09-19 23:32:49,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=767960.0, ans=0.125 2024-09-19 23:32:56,691 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-192000.pt 2024-09-19 23:33:04,946 INFO [train.py:1198] (0/2) Epoch 43, batch 1950, loss[loss=0.2188, ctc_loss=0.1006, cr_loss=0.3224, attn_decoder_loss=0.2248, over 29471.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1111, cr_loss=0.3512, attn_decoder_loss=0.239, over 5818038.45 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:33:05,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=768000.0, ans=0.125 2024-09-19 23:33:16,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=768000.0, ans=0.0 2024-09-19 23:33:26,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768040.0, ans=0.1 2024-09-19 23:34:07,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=768160.0, ans=0.2 2024-09-19 23:34:18,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.91 vs. limit=15.0 2024-09-19 23:34:20,478 INFO [train.py:1198] (0/2) Epoch 43, batch 2000, loss[loss=0.2135, ctc_loss=0.105, cr_loss=0.3483, attn_decoder_loss=0.2178, over 29336.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1117, cr_loss=0.3518, attn_decoder_loss=0.2393, over 5796746.78 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 32.0 2024-09-19 23:34:27,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=768200.0, ans=0.125 2024-09-19 23:34:34,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.08 vs. limit=12.0 2024-09-19 23:34:36,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=768240.0, ans=0.125 2024-09-19 23:35:07,942 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 8.662e+01 9.256e+01 9.828e+01 2.553e+02, threshold=1.851e+02, percent-clipped=3.0 2024-09-19 23:35:17,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=768320.0, ans=0.0 2024-09-19 23:35:38,235 INFO [train.py:1198] (0/2) Epoch 43, batch 2050, loss[loss=0.2115, ctc_loss=0.09723, cr_loss=0.3171, attn_decoder_loss=0.2172, over 29442.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1111, cr_loss=0.3502, attn_decoder_loss=0.2383, over 5789279.72 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:36:09,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2024-09-19 23:36:21,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-19 23:36:26,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768520.0, ans=0.1 2024-09-19 23:36:27,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.83 vs. limit=15.0 2024-09-19 23:36:43,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768560.0, ans=0.1 2024-09-19 23:36:43,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=768560.0, ans=0.125 2024-09-19 23:36:45,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=768560.0, ans=0.0 2024-09-19 23:36:55,473 INFO [train.py:1198] (0/2) Epoch 43, batch 2100, loss[loss=0.235, ctc_loss=0.1028, cr_loss=0.3492, attn_decoder_loss=0.242, over 29762.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1107, cr_loss=0.3498, attn_decoder_loss=0.2379, over 5800944.88 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:36:57,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=768600.0, ans=0.2 2024-09-19 23:37:03,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=768600.0, ans=0.125 2024-09-19 23:37:10,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=768640.0, ans=0.2 2024-09-19 23:37:15,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=768640.0, ans=0.125 2024-09-19 23:37:16,930 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:37:24,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-09-19 23:37:32,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.67 vs. limit=15.0 2024-09-19 23:37:39,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768720.0, ans=0.1 2024-09-19 23:37:41,776 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.395e+01 8.911e+01 9.448e+01 1.160e+02, threshold=1.782e+02, percent-clipped=0.0 2024-09-19 23:37:53,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=768760.0, ans=10.0 2024-09-19 23:37:56,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-19 23:38:10,685 INFO [train.py:1198] (0/2) Epoch 43, batch 2150, loss[loss=0.2334, ctc_loss=0.1167, cr_loss=0.3579, attn_decoder_loss=0.2384, over 29456.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1106, cr_loss=0.3495, attn_decoder_loss=0.2375, over 5815829.34 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:38:18,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=768800.0, ans=0.0 2024-09-19 23:38:40,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=768840.0, ans=0.0 2024-09-19 23:39:04,554 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:39:28,385 INFO [train.py:1198] (0/2) Epoch 43, batch 2200, loss[loss=0.2412, ctc_loss=0.1162, cr_loss=0.3659, attn_decoder_loss=0.2469, over 29625.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1105, cr_loss=0.3494, attn_decoder_loss=0.2375, over 5812053.61 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:39:40,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=769000.0, ans=0.025 2024-09-19 23:39:45,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=12.0 2024-09-19 23:39:55,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=769040.0, ans=0.125 2024-09-19 23:40:10,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=769080.0, ans=0.125 2024-09-19 23:40:14,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.529e+01 9.034e+01 9.598e+01 1.063e+03, threshold=1.807e+02, percent-clipped=3.0 2024-09-19 23:40:46,046 INFO [train.py:1198] (0/2) Epoch 43, batch 2250, loss[loss=0.239, ctc_loss=0.1114, cr_loss=0.3491, attn_decoder_loss=0.2454, over 29700.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1104, cr_loss=0.3492, attn_decoder_loss=0.2376, over 5812170.43 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:40:47,953 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:41:10,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=769240.0, ans=0.2 2024-09-19 23:41:11,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=22.5 2024-09-19 23:41:14,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769280.0, ans=0.1 2024-09-19 23:41:20,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2024-09-19 23:41:46,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=769360.0, ans=0.0 2024-09-19 23:42:01,235 INFO [train.py:1198] (0/2) Epoch 43, batch 2300, loss[loss=0.2034, ctc_loss=0.08519, cr_loss=0.2863, attn_decoder_loss=0.2102, over 29308.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1099, cr_loss=0.3478, attn_decoder_loss=0.2368, over 5800405.72 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:42:29,272 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-09-19 23:42:49,953 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.425e+01 9.007e+01 9.590e+01 1.483e+02, threshold=1.801e+02, percent-clipped=0.0 2024-09-19 23:42:54,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769520.0, ans=0.1 2024-09-19 23:43:12,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769560.0, ans=0.1 2024-09-19 23:43:19,011 INFO [train.py:1198] (0/2) Epoch 43, batch 2350, loss[loss=0.2377, ctc_loss=0.1082, cr_loss=0.3338, attn_decoder_loss=0.2447, over 29690.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1099, cr_loss=0.3479, attn_decoder_loss=0.237, over 5805140.13 frames. ], batch size: 83, lr: 2.56e-03, grad_scale: 8.0 2024-09-19 23:43:32,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769640.0, ans=0.1 2024-09-19 23:44:14,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=769720.0, ans=0.5 2024-09-19 23:44:23,311 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-19 23:44:27,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=769760.0, ans=0.2 2024-09-19 23:44:33,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=769760.0, ans=0.025 2024-09-19 23:44:36,555 INFO [train.py:1198] (0/2) Epoch 43, batch 2400, loss[loss=0.2221, ctc_loss=0.103, cr_loss=0.327, attn_decoder_loss=0.228, over 29507.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1102, cr_loss=0.3488, attn_decoder_loss=0.2375, over 5808895.85 frames. ], batch size: 76, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:44:49,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.50 vs. limit=15.0 2024-09-19 23:44:51,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=769840.0, ans=15.0 2024-09-19 23:44:52,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2024-09-19 23:45:17,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=769880.0, ans=0.025 2024-09-19 23:45:23,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 8.868e+01 9.245e+01 1.005e+02 2.989e+02, threshold=1.849e+02, percent-clipped=3.0 2024-09-19 23:45:31,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=769920.0, ans=0.0 2024-09-19 23:45:38,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=769960.0, ans=0.125 2024-09-19 23:45:41,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769960.0, ans=0.1 2024-09-19 23:45:49,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=769960.0, ans=0.0 2024-09-19 23:45:52,121 INFO [train.py:1198] (0/2) Epoch 43, batch 2450, loss[loss=0.2326, ctc_loss=0.1128, cr_loss=0.362, attn_decoder_loss=0.2379, over 29692.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1108, cr_loss=0.3503, attn_decoder_loss=0.2382, over 5785894.37 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:45:59,987 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:46:22,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-19 23:46:41,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=770120.0, ans=0.1 2024-09-19 23:47:09,633 INFO [train.py:1198] (0/2) Epoch 43, batch 2500, loss[loss=0.2421, ctc_loss=0.1162, cr_loss=0.3753, attn_decoder_loss=0.2477, over 29597.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1107, cr_loss=0.3505, attn_decoder_loss=0.2381, over 5795959.21 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:47:11,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=770200.0, ans=0.125 2024-09-19 23:47:17,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=770200.0, ans=0.125 2024-09-19 23:47:46,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=770280.0, ans=0.125 2024-09-19 23:47:56,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.730e+01 9.095e+01 9.659e+01 1.544e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-19 23:47:58,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=770320.0, ans=0.1 2024-09-19 23:48:06,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.16 vs. limit=15.0 2024-09-19 23:48:07,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-09-19 23:48:28,168 INFO [train.py:1198] (0/2) Epoch 43, batch 2550, loss[loss=0.2064, ctc_loss=0.09794, cr_loss=0.3185, attn_decoder_loss=0.2114, over 29369.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1107, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5796930.55 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:48:28,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=770400.0, ans=0.05 2024-09-19 23:48:29,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2024-09-19 23:48:50,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=770440.0, ans=0.125 2024-09-19 23:49:04,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=770480.0, ans=0.125 2024-09-19 23:49:04,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2024-09-19 23:49:18,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=770520.0, ans=0.125 2024-09-19 23:49:43,748 INFO [train.py:1198] (0/2) Epoch 43, batch 2600, loss[loss=0.2289, ctc_loss=0.1044, cr_loss=0.3355, attn_decoder_loss=0.2353, over 29458.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1108, cr_loss=0.3503, attn_decoder_loss=0.2384, over 5794572.41 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:50:03,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=770640.0, ans=0.09899494936611666 2024-09-19 23:50:06,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=770640.0, ans=0.125 2024-09-19 23:50:16,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770680.0, ans=0.125 2024-09-19 23:50:17,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=770680.0, ans=0.125 2024-09-19 23:50:19,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=770680.0, ans=0.0 2024-09-19 23:50:32,557 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.619e+01 9.177e+01 9.694e+01 1.714e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-19 23:50:33,190 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-19 23:50:37,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.73 vs. limit=12.0 2024-09-19 23:50:44,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=770760.0, ans=0.2 2024-09-19 23:50:55,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=770760.0, ans=0.125 2024-09-19 23:51:01,472 INFO [train.py:1198] (0/2) Epoch 43, batch 2650, loss[loss=0.2409, ctc_loss=0.104, cr_loss=0.3345, attn_decoder_loss=0.2486, over 29313.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1111, cr_loss=0.351, attn_decoder_loss=0.2388, over 5800930.99 frames. ], batch size: 100, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:51:10,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=770800.0, ans=0.09899494936611666 2024-09-19 23:51:12,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=770800.0, ans=0.125 2024-09-19 23:51:15,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770840.0, ans=0.1 2024-09-19 23:51:18,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=770840.0, ans=0.0 2024-09-19 23:51:27,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=770840.0, ans=0.1 2024-09-19 23:51:27,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=770840.0, ans=0.125 2024-09-19 23:51:31,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=770880.0, ans=0.0 2024-09-19 23:51:41,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=770880.0, ans=0.125 2024-09-19 23:52:18,210 INFO [train.py:1198] (0/2) Epoch 43, batch 2700, loss[loss=0.2426, ctc_loss=0.1136, cr_loss=0.3686, attn_decoder_loss=0.2488, over 29529.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1113, cr_loss=0.3514, attn_decoder_loss=0.2391, over 5797342.03 frames. ], batch size: 87, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:52:26,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=771000.0, ans=0.0 2024-09-19 23:52:30,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=771000.0, ans=0.125 2024-09-19 23:52:32,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=771040.0, ans=0.125 2024-09-19 23:52:35,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=771040.0, ans=0.0 2024-09-19 23:52:54,028 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-19 23:53:05,377 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.492e+01 9.068e+01 9.521e+01 1.768e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-19 23:53:23,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=771160.0, ans=0.5 2024-09-19 23:53:34,516 INFO [train.py:1198] (0/2) Epoch 43, batch 2750, loss[loss=0.2292, ctc_loss=0.1084, cr_loss=0.3468, attn_decoder_loss=0.2349, over 29518.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1108, cr_loss=0.3505, attn_decoder_loss=0.2383, over 5796011.02 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:53:44,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.83 vs. limit=22.5 2024-09-19 23:53:47,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=771200.0, ans=15.0 2024-09-19 23:53:52,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=771240.0, ans=0.07 2024-09-19 23:54:05,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=771280.0, ans=0.0 2024-09-19 23:54:18,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=771280.0, ans=0.0 2024-09-19 23:54:38,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=771360.0, ans=0.1 2024-09-19 23:54:40,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=771360.0, ans=0.125 2024-09-19 23:54:40,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=771360.0, ans=0.2 2024-09-19 23:54:52,209 INFO [train.py:1198] (0/2) Epoch 43, batch 2800, loss[loss=0.2513, ctc_loss=0.1373, cr_loss=0.3869, attn_decoder_loss=0.2553, over 20270.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1112, cr_loss=0.3512, attn_decoder_loss=0.2385, over 5776065.65 frames. ], batch size: 209, lr: 2.56e-03, grad_scale: 32.0 2024-09-19 23:54:54,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=771400.0, ans=0.2 2024-09-19 23:54:59,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=771400.0, ans=0.05 2024-09-19 23:55:01,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=771400.0, ans=0.2 2024-09-19 23:55:25,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=771480.0, ans=0.0 2024-09-19 23:55:30,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=771480.0, ans=0.125 2024-09-19 23:55:30,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771480.0, ans=0.1 2024-09-19 23:55:34,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=771480.0, ans=0.125 2024-09-19 23:55:40,240 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.712e+01 9.201e+01 9.753e+01 5.037e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-19 23:55:58,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771560.0, ans=0.125 2024-09-19 23:56:08,851 INFO [train.py:1198] (0/2) Epoch 43, batch 2850, loss[loss=0.2242, ctc_loss=0.1052, cr_loss=0.3417, attn_decoder_loss=0.2298, over 29479.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1118, cr_loss=0.352, attn_decoder_loss=0.2391, over 5761160.95 frames. ], batch size: 77, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:56:09,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=771600.0, ans=0.5 2024-09-19 23:56:12,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=771600.0, ans=0.025 2024-09-19 23:56:19,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=771600.0, ans=0.125 2024-09-19 23:56:24,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=771640.0, ans=0.2 2024-09-19 23:56:41,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=771680.0, ans=0.125 2024-09-19 23:56:41,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-09-19 23:56:51,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=771680.0, ans=0.0 2024-09-19 23:57:05,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771720.0, ans=0.1 2024-09-19 23:57:06,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=771720.0, ans=0.125 2024-09-19 23:57:14,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=771760.0, ans=0.0 2024-09-19 23:57:16,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-09-19 23:57:24,707 INFO [train.py:1198] (0/2) Epoch 43, batch 2900, loss[loss=0.2267, ctc_loss=0.1033, cr_loss=0.3332, attn_decoder_loss=0.233, over 29419.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1118, cr_loss=0.353, attn_decoder_loss=0.2399, over 5786662.46 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:57:28,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.91 vs. limit=6.0 2024-09-19 23:57:30,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=771800.0, ans=0.95 2024-09-19 23:57:39,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-09-19 23:58:03,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=771880.0, ans=0.95 2024-09-19 23:58:14,811 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.451e+01 8.980e+01 9.523e+01 1.534e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-19 23:58:24,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=771920.0, ans=0.125 2024-09-19 23:58:27,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=771960.0, ans=0.025 2024-09-19 23:58:33,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=771960.0, ans=0.2 2024-09-19 23:58:42,029 INFO [train.py:1198] (0/2) Epoch 43, batch 2950, loss[loss=0.2276, ctc_loss=0.1106, cr_loss=0.3541, attn_decoder_loss=0.2327, over 29511.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1106, cr_loss=0.3504, attn_decoder_loss=0.2384, over 5782433.07 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:58:51,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=772000.0, ans=0.125 2024-09-19 23:58:54,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=772000.0, ans=0.125 2024-09-19 23:59:05,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772040.0, ans=0.125 2024-09-19 23:59:17,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=772080.0, ans=0.125 2024-09-19 23:59:30,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=772120.0, ans=0.125 2024-09-19 23:59:36,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=772120.0, ans=0.0 2024-09-19 23:59:51,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-19 23:59:59,897 INFO [train.py:1198] (0/2) Epoch 43, batch 3000, loss[loss=0.2337, ctc_loss=0.1098, cr_loss=0.3518, attn_decoder_loss=0.2396, over 29773.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.11, cr_loss=0.3494, attn_decoder_loss=0.238, over 5783618.13 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 16.0 2024-09-19 23:59:59,897 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 00:00:10,486 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7358, 5.5045, 5.3153, 4.9204], device='cuda:0') 2024-09-20 00:00:10,937 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7649, 3.2162, 3.3404, 3.4658], device='cuda:0') 2024-09-20 00:00:18,198 INFO [train.py:1230] (0/2) Epoch 43, validation: loss=0.2118, ctc_loss=0.03672, cr_loss=6.551e-15, attn_decoder_loss=0.2313, over 944034.00 frames. 2024-09-20 00:00:18,199 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 00:00:35,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=772240.0, ans=0.07 2024-09-20 00:00:35,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=772240.0, ans=0.0 2024-09-20 00:00:42,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-09-20 00:01:00,368 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2024-09-20 00:01:06,794 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.607e+01 9.085e+01 9.850e+01 2.122e+02, threshold=1.817e+02, percent-clipped=1.0 2024-09-20 00:01:08,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772320.0, ans=0.1 2024-09-20 00:01:22,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772360.0, ans=0.1 2024-09-20 00:01:28,204 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:01:31,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=772360.0, ans=0.0 2024-09-20 00:01:34,008 INFO [train.py:1198] (0/2) Epoch 43, batch 3050, loss[loss=0.2206, ctc_loss=0.1072, cr_loss=0.3534, attn_decoder_loss=0.2254, over 29523.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1105, cr_loss=0.3505, attn_decoder_loss=0.2386, over 5777923.30 frames. ], batch size: 76, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:01:48,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=772400.0, ans=0.0 2024-09-20 00:01:59,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=772440.0, ans=0.2 2024-09-20 00:02:11,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2024-09-20 00:02:14,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=772480.0, ans=0.125 2024-09-20 00:02:17,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=772480.0, ans=15.0 2024-09-20 00:02:29,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772520.0, ans=0.1 2024-09-20 00:02:33,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=772520.0, ans=0.015 2024-09-20 00:02:42,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=772560.0, ans=0.0 2024-09-20 00:02:51,506 INFO [train.py:1198] (0/2) Epoch 43, batch 3100, loss[loss=0.2508, ctc_loss=0.125, cr_loss=0.3784, attn_decoder_loss=0.2563, over 29322.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1107, cr_loss=0.351, attn_decoder_loss=0.2386, over 5777143.46 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:02:55,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.56 vs. limit=22.5 2024-09-20 00:03:41,807 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.560e+01 8.944e+01 9.719e+01 1.343e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-20 00:04:09,809 INFO [train.py:1198] (0/2) Epoch 43, batch 3150, loss[loss=0.2482, ctc_loss=0.1197, cr_loss=0.3521, attn_decoder_loss=0.2547, over 28850.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1105, cr_loss=0.3502, attn_decoder_loss=0.2385, over 5784555.17 frames. ], batch size: 104, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:04:15,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-09-20 00:04:49,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=772880.0, ans=0.125 2024-09-20 00:04:54,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-20 00:05:18,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-09-20 00:05:25,183 INFO [train.py:1198] (0/2) Epoch 43, batch 3200, loss[loss=0.2262, ctc_loss=0.104, cr_loss=0.3337, attn_decoder_loss=0.2323, over 29408.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1103, cr_loss=0.3498, attn_decoder_loss=0.2381, over 5795604.45 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:05:25,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773000.0, ans=0.1 2024-09-20 00:05:42,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2024-09-20 00:05:53,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=773040.0, ans=0.0 2024-09-20 00:06:11,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=773120.0, ans=0.025 2024-09-20 00:06:17,443 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.459e+01 9.068e+01 9.712e+01 1.068e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 00:06:18,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-09-20 00:06:29,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773160.0, ans=0.1 2024-09-20 00:06:43,189 INFO [train.py:1198] (0/2) Epoch 43, batch 3250, loss[loss=0.2429, ctc_loss=0.1185, cr_loss=0.3657, attn_decoder_loss=0.2486, over 29694.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1109, cr_loss=0.3514, attn_decoder_loss=0.2388, over 5801269.20 frames. ], batch size: 84, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:06:56,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.01 vs. limit=22.5 2024-09-20 00:07:03,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=773240.0, ans=0.0 2024-09-20 00:07:04,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773240.0, ans=0.1 2024-09-20 00:07:26,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=773280.0, ans=0.125 2024-09-20 00:07:46,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=773360.0, ans=0.125 2024-09-20 00:07:55,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=773360.0, ans=0.0 2024-09-20 00:08:00,882 INFO [train.py:1198] (0/2) Epoch 43, batch 3300, loss[loss=0.2404, ctc_loss=0.1167, cr_loss=0.3724, attn_decoder_loss=0.2458, over 28221.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1106, cr_loss=0.3505, attn_decoder_loss=0.2376, over 5798768.37 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:08:49,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=773520.0, ans=0.125 2024-09-20 00:08:50,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=773520.0, ans=0.125 2024-09-20 00:08:52,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.624e+01 9.248e+01 9.741e+01 2.844e+02, threshold=1.850e+02, percent-clipped=2.0 2024-09-20 00:08:55,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.53 vs. limit=12.0 2024-09-20 00:09:11,743 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:09:15,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=773600.0, ans=0.125 2024-09-20 00:09:16,242 INFO [train.py:1198] (0/2) Epoch 43, batch 3350, loss[loss=0.2461, ctc_loss=0.1194, cr_loss=0.3619, attn_decoder_loss=0.2521, over 28847.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1108, cr_loss=0.3503, attn_decoder_loss=0.238, over 5775840.20 frames. ], batch size: 104, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:09:18,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=773600.0, ans=0.0 2024-09-20 00:09:23,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=22.5 2024-09-20 00:10:05,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773720.0, ans=0.1 2024-09-20 00:10:14,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=773720.0, ans=0.07 2024-09-20 00:10:34,063 INFO [train.py:1198] (0/2) Epoch 43, batch 3400, loss[loss=0.2127, ctc_loss=0.104, cr_loss=0.34, attn_decoder_loss=0.2172, over 29396.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1113, cr_loss=0.3513, attn_decoder_loss=0.2383, over 5767080.38 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:10:37,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=773800.0, ans=0.025 2024-09-20 00:10:41,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-20 00:10:43,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=773800.0, ans=0.125 2024-09-20 00:11:01,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-20 00:11:12,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=773880.0, ans=0.125 2024-09-20 00:11:20,247 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:11:26,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=773920.0, ans=0.125 2024-09-20 00:11:27,406 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.527e+01 9.240e+01 9.845e+01 1.909e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-20 00:11:30,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=773920.0, ans=0.2 2024-09-20 00:11:39,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=773960.0, ans=0.2 2024-09-20 00:11:43,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=12.0 2024-09-20 00:11:47,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=773960.0, ans=0.2 2024-09-20 00:11:51,437 INFO [train.py:1198] (0/2) Epoch 43, batch 3450, loss[loss=0.2399, ctc_loss=0.1121, cr_loss=0.3625, attn_decoder_loss=0.246, over 28345.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1109, cr_loss=0.3503, attn_decoder_loss=0.2383, over 5775069.86 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:12:25,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=774080.0, ans=0.125 2024-09-20 00:12:52,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=774160.0, ans=0.125 2024-09-20 00:12:58,941 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2024-09-20 00:12:59,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=774160.0, ans=0.0 2024-09-20 00:13:02,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=774160.0, ans=0.0 2024-09-20 00:13:06,966 INFO [train.py:1198] (0/2) Epoch 43, batch 3500, loss[loss=0.2111, ctc_loss=0.0888, cr_loss=0.3109, attn_decoder_loss=0.2178, over 29349.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1106, cr_loss=0.3492, attn_decoder_loss=0.2377, over 5776399.36 frames. ], batch size: 71, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:13:10,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=774200.0, ans=0.0 2024-09-20 00:13:20,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=774240.0, ans=0.025 2024-09-20 00:13:22,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=774240.0, ans=0.125 2024-09-20 00:13:40,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=774280.0, ans=0.025 2024-09-20 00:13:42,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=774280.0, ans=0.025 2024-09-20 00:13:58,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=774320.0, ans=0.025 2024-09-20 00:13:59,839 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.502e+01 8.947e+01 9.671e+01 2.846e+02, threshold=1.789e+02, percent-clipped=1.0 2024-09-20 00:14:04,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=774320.0, ans=0.125 2024-09-20 00:14:20,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774360.0, ans=0.125 2024-09-20 00:14:23,853 INFO [train.py:1198] (0/2) Epoch 43, batch 3550, loss[loss=0.2385, ctc_loss=0.1047, cr_loss=0.3345, attn_decoder_loss=0.2459, over 29719.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1105, cr_loss=0.3493, attn_decoder_loss=0.2378, over 5784398.12 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:14:34,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=774400.0, ans=0.0 2024-09-20 00:14:50,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=774440.0, ans=0.125 2024-09-20 00:14:52,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=774480.0, ans=0.125 2024-09-20 00:15:05,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=774480.0, ans=0.09899494936611666 2024-09-20 00:15:36,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=774600.0, ans=0.125 2024-09-20 00:15:37,574 INFO [train.py:1198] (0/2) Epoch 43, batch 3600, loss[loss=0.2209, ctc_loss=0.103, cr_loss=0.3453, attn_decoder_loss=0.2264, over 29516.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1111, cr_loss=0.3513, attn_decoder_loss=0.2383, over 5793736.97 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:15:46,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774600.0, ans=0.125 2024-09-20 00:15:47,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=12.0 2024-09-20 00:15:55,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-09-20 00:16:20,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=774680.0, ans=0.1 2024-09-20 00:16:25,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774720.0, ans=0.125 2024-09-20 00:16:30,157 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.541e+01 9.175e+01 9.569e+01 2.464e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 00:16:39,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=774760.0, ans=0.125 2024-09-20 00:16:42,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=774760.0, ans=0.125 2024-09-20 00:16:51,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=774760.0, ans=0.0 2024-09-20 00:16:52,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=774800.0, ans=0.2 2024-09-20 00:16:54,084 INFO [train.py:1198] (0/2) Epoch 43, batch 3650, loss[loss=0.2429, ctc_loss=0.1235, cr_loss=0.3826, attn_decoder_loss=0.2476, over 29513.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1104, cr_loss=0.35, attn_decoder_loss=0.2376, over 5794418.80 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:17:02,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.03 vs. limit=15.0 2024-09-20 00:17:43,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=774920.0, ans=0.125 2024-09-20 00:17:49,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=774920.0, ans=0.0 2024-09-20 00:18:04,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=774960.0, ans=10.0 2024-09-20 00:18:06,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.40 vs. limit=15.0 2024-09-20 00:18:08,750 INFO [train.py:1198] (0/2) Epoch 43, batch 3700, loss[loss=0.2352, ctc_loss=0.1161, cr_loss=0.371, attn_decoder_loss=0.2402, over 29705.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1103, cr_loss=0.3501, attn_decoder_loss=0.2378, over 5804658.18 frames. ], batch size: 84, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:18:29,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=775040.0, ans=0.025 2024-09-20 00:18:59,090 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 8.457e+01 9.128e+01 9.477e+01 6.609e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 00:19:23,197 INFO [train.py:1198] (0/2) Epoch 43, batch 3750, loss[loss=0.1992, ctc_loss=0.08377, cr_loss=0.2782, attn_decoder_loss=0.2059, over 29311.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1104, cr_loss=0.35, attn_decoder_loss=0.2377, over 5808015.09 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:19:35,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=775200.0, ans=0.5 2024-09-20 00:19:37,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=775240.0, ans=0.125 2024-09-20 00:19:39,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=775240.0, ans=0.125 2024-09-20 00:19:46,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-09-20 00:20:15,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=775320.0, ans=0.2 2024-09-20 00:20:39,071 INFO [train.py:1198] (0/2) Epoch 43, batch 3800, loss[loss=0.2408, ctc_loss=0.1098, cr_loss=0.3467, attn_decoder_loss=0.2477, over 29631.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1103, cr_loss=0.3498, attn_decoder_loss=0.2374, over 5797288.16 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:20:53,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.59 vs. limit=10.0 2024-09-20 00:20:57,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.91 vs. limit=15.0 2024-09-20 00:21:10,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2024-09-20 00:21:24,543 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.65 vs. limit=15.0 2024-09-20 00:21:29,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.559e+01 9.199e+01 9.773e+01 2.259e+02, threshold=1.840e+02, percent-clipped=2.0 2024-09-20 00:21:49,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=775560.0, ans=0.0 2024-09-20 00:21:50,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=775560.0, ans=0.125 2024-09-20 00:21:53,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=775600.0, ans=0.025 2024-09-20 00:21:55,046 INFO [train.py:1198] (0/2) Epoch 43, batch 3850, loss[loss=0.2512, ctc_loss=0.1245, cr_loss=0.3763, attn_decoder_loss=0.2569, over 29245.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.11, cr_loss=0.3496, attn_decoder_loss=0.2373, over 5811011.49 frames. ], batch size: 100, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:22:00,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.05 vs. limit=12.0 2024-09-20 00:22:00,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2024-09-20 00:22:07,207 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:22:20,618 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:23:09,181 INFO [train.py:1198] (0/2) Epoch 43, batch 3900, loss[loss=0.2438, ctc_loss=0.1124, cr_loss=0.362, attn_decoder_loss=0.2503, over 29627.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1104, cr_loss=0.3507, attn_decoder_loss=0.2377, over 5815995.11 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:23:36,493 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.69 vs. limit=15.0 2024-09-20 00:23:43,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775880.0, ans=0.125 2024-09-20 00:23:47,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=775880.0, ans=0.2 2024-09-20 00:23:50,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=775880.0, ans=0.1 2024-09-20 00:23:59,430 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.663e+01 9.061e+01 9.537e+01 1.215e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-20 00:24:23,507 INFO [train.py:1198] (0/2) Epoch 43, batch 3950, loss[loss=0.2519, ctc_loss=0.1304, cr_loss=0.3895, attn_decoder_loss=0.2567, over 29536.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1103, cr_loss=0.3502, attn_decoder_loss=0.2378, over 5835287.52 frames. ], batch size: 97, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:24:34,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=776000.0, ans=0.125 2024-09-20 00:24:52,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-20 00:25:04,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=776080.0, ans=0.025 2024-09-20 00:25:10,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=776120.0, ans=0.025 2024-09-20 00:25:13,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=776120.0, ans=0.0 2024-09-20 00:25:15,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=776120.0, ans=0.125 2024-09-20 00:25:38,024 INFO [train.py:1198] (0/2) Epoch 43, batch 4000, loss[loss=0.2139, ctc_loss=0.09668, cr_loss=0.3236, attn_decoder_loss=0.2197, over 29527.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1104, cr_loss=0.3502, attn_decoder_loss=0.2378, over 5812257.35 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 32.0 2024-09-20 00:25:38,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776200.0, ans=0.1 2024-09-20 00:26:07,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=776280.0, ans=0.125 2024-09-20 00:26:25,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=776320.0, ans=0.2 2024-09-20 00:26:29,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.878e+01 9.363e+01 9.780e+01 3.308e+02, threshold=1.873e+02, percent-clipped=2.0 2024-09-20 00:26:41,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=776360.0, ans=0.125 2024-09-20 00:26:53,251 INFO [train.py:1198] (0/2) Epoch 43, batch 4050, loss[loss=0.2562, ctc_loss=0.1427, cr_loss=0.3952, attn_decoder_loss=0.26, over 19875.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1106, cr_loss=0.3506, attn_decoder_loss=0.2378, over 5795305.11 frames. ], batch size: 210, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:27:02,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=776400.0, ans=0.125 2024-09-20 00:27:06,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=776440.0, ans=0.125 2024-09-20 00:27:15,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=776440.0, ans=0.0 2024-09-20 00:27:24,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=776480.0, ans=0.07 2024-09-20 00:27:48,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 00:28:06,605 INFO [train.py:1198] (0/2) Epoch 43, batch 4100, loss[loss=0.2463, ctc_loss=0.1227, cr_loss=0.3745, attn_decoder_loss=0.2517, over 29493.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.111, cr_loss=0.351, attn_decoder_loss=0.2382, over 5790697.73 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:28:09,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=776600.0, ans=0.125 2024-09-20 00:28:25,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=776640.0, ans=0.125 2024-09-20 00:28:37,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=776680.0, ans=0.0 2024-09-20 00:28:57,815 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.624e+01 9.289e+01 9.929e+01 2.714e+02, threshold=1.858e+02, percent-clipped=2.0 2024-09-20 00:29:04,675 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2024-09-20 00:29:20,429 INFO [train.py:1198] (0/2) Epoch 43, batch 4150, loss[loss=0.2227, ctc_loss=0.101, cr_loss=0.3304, attn_decoder_loss=0.2289, over 29506.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1108, cr_loss=0.3506, attn_decoder_loss=0.2382, over 5795861.00 frames. ], batch size: 77, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:29:27,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=12.0 2024-09-20 00:29:31,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=776800.0, ans=0.125 2024-09-20 00:29:51,533 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-09-20 00:30:03,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.53 vs. limit=15.0 2024-09-20 00:30:10,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=776920.0, ans=0.125 2024-09-20 00:30:36,192 INFO [train.py:1198] (0/2) Epoch 43, batch 4200, loss[loss=0.2419, ctc_loss=0.1154, cr_loss=0.3602, attn_decoder_loss=0.2479, over 29520.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1107, cr_loss=0.3508, attn_decoder_loss=0.2383, over 5797984.57 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:30:45,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777000.0, ans=0.125 2024-09-20 00:30:46,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=777000.0, ans=0.125 2024-09-20 00:30:59,274 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2024-09-20 00:31:05,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=777080.0, ans=0.2 2024-09-20 00:31:14,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777080.0, ans=0.125 2024-09-20 00:31:29,059 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.571e+01 8.984e+01 9.502e+01 1.265e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-20 00:31:30,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=777120.0, ans=0.0 2024-09-20 00:31:40,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=777160.0, ans=0.125 2024-09-20 00:31:49,477 INFO [train.py:1198] (0/2) Epoch 43, batch 4250, loss[loss=0.2057, ctc_loss=0.08499, cr_loss=0.2871, attn_decoder_loss=0.2127, over 29515.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1103, cr_loss=0.3498, attn_decoder_loss=0.2382, over 5803793.73 frames. ], batch size: 74, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:32:05,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=777240.0, ans=0.125 2024-09-20 00:32:28,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-09-20 00:32:36,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=777320.0, ans=0.5 2024-09-20 00:32:49,266 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-20 00:32:55,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=777360.0, ans=0.025 2024-09-20 00:32:58,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=777360.0, ans=0.1 2024-09-20 00:33:02,914 INFO [train.py:1198] (0/2) Epoch 43, batch 4300, loss[loss=0.2332, ctc_loss=0.1095, cr_loss=0.3553, attn_decoder_loss=0.2391, over 29557.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1099, cr_loss=0.3486, attn_decoder_loss=0.2383, over 5793710.60 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:33:05,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.94 vs. limit=10.0 2024-09-20 00:33:06,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=777400.0, ans=0.125 2024-09-20 00:33:26,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=777440.0, ans=0.07 2024-09-20 00:33:27,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=777440.0, ans=0.125 2024-09-20 00:33:32,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=777480.0, ans=0.035 2024-09-20 00:33:55,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-20 00:33:57,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.855e+01 9.292e+01 9.899e+01 2.383e+02, threshold=1.858e+02, percent-clipped=1.0 2024-09-20 00:33:58,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777520.0, ans=0.125 2024-09-20 00:34:11,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.62 vs. limit=22.5 2024-09-20 00:34:17,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=777600.0, ans=0.125 2024-09-20 00:34:18,765 INFO [train.py:1198] (0/2) Epoch 43, batch 4350, loss[loss=0.2472, ctc_loss=0.1261, cr_loss=0.4094, attn_decoder_loss=0.2515, over 29489.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1126, cr_loss=0.355, attn_decoder_loss=0.2413, over 5797524.71 frames. ], batch size: 97, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:34:23,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777600.0, ans=0.125 2024-09-20 00:34:29,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=777600.0, ans=0.025 2024-09-20 00:34:52,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=777680.0, ans=0.0 2024-09-20 00:35:11,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=777720.0, ans=0.1 2024-09-20 00:35:29,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=777760.0, ans=0.0 2024-09-20 00:35:31,779 INFO [train.py:1198] (0/2) Epoch 43, batch 4400, loss[loss=0.2415, ctc_loss=0.1198, cr_loss=0.38, attn_decoder_loss=0.2465, over 27417.00 frames. ], tot_loss[loss=0.2377, ctc_loss=0.1141, cr_loss=0.3582, attn_decoder_loss=0.2435, over 5769488.55 frames. ], batch size: 124, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:35:49,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=777840.0, ans=0.125 2024-09-20 00:36:07,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777880.0, ans=0.1 2024-09-20 00:36:21,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=777920.0, ans=0.125 2024-09-20 00:36:23,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-20 00:36:25,884 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.138e+01 9.169e+01 9.548e+01 1.005e+02 2.703e+02, threshold=1.910e+02, percent-clipped=1.0 2024-09-20 00:36:32,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=777960.0, ans=0.2 2024-09-20 00:36:46,770 INFO [train.py:1198] (0/2) Epoch 43, batch 4450, loss[loss=0.2478, ctc_loss=0.1393, cr_loss=0.3912, attn_decoder_loss=0.2511, over 20309.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1175, cr_loss=0.3633, attn_decoder_loss=0.2456, over 5577179.81 frames. ], batch size: 209, lr: 2.55e-03, grad_scale: 16.0 2024-09-20 00:36:47,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2024-09-20 00:36:48,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=778000.0, ans=0.125 2024-09-20 00:36:53,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=778000.0, ans=0.5 2024-09-20 00:37:06,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=778040.0, ans=0.2 2024-09-20 00:37:08,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=778040.0, ans=0.025 2024-09-20 00:37:18,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=778080.0, ans=0.0 2024-09-20 00:37:47,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=778160.0, ans=0.125 2024-09-20 00:37:47,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=778160.0, ans=0.125 2024-09-20 00:37:56,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=778160.0, ans=0.0 2024-09-20 00:38:01,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.52 vs. limit=15.0 2024-09-20 00:38:01,715 INFO [train.py:1198] (0/2) Epoch 43, batch 4500, loss[loss=0.2425, ctc_loss=0.1266, cr_loss=0.3587, attn_decoder_loss=0.2475, over 20298.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1201, cr_loss=0.3651, attn_decoder_loss=0.2472, over 5234479.56 frames. ], batch size: 209, lr: 2.55e-03, grad_scale: 8.0 2024-09-20 00:38:25,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=778240.0, ans=0.025 2024-09-20 00:38:38,570 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-43.pt 2024-09-20 00:39:29,384 INFO [train.py:1198] (0/2) Epoch 44, batch 0, loss[loss=0.2193, ctc_loss=0.09658, cr_loss=0.3306, attn_decoder_loss=0.2256, over 29625.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.09658, cr_loss=0.3306, attn_decoder_loss=0.2256, over 29625.00 frames. ], batch size: 73, lr: 2.52e-03, grad_scale: 16.0 2024-09-20 00:39:29,385 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 00:39:43,013 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1837, 5.2776, 5.0190, 3.0121], device='cuda:0') 2024-09-20 00:39:47,833 INFO [train.py:1230] (0/2) Epoch 44, validation: loss=0.2131, ctc_loss=0.03639, cr_loss=8.375e-15, attn_decoder_loss=0.2327, over 944034.00 frames. 2024-09-20 00:39:47,834 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 00:39:52,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=778300.0, ans=0.125 2024-09-20 00:40:03,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=778340.0, ans=0.125 2024-09-20 00:40:03,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=778340.0, ans=0.07 2024-09-20 00:40:05,916 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.560e+01 1.073e+02 1.152e+02 1.272e+02 3.214e+02, threshold=2.305e+02, percent-clipped=2.0 2024-09-20 00:40:32,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=778420.0, ans=0.025 2024-09-20 00:40:39,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=778420.0, ans=0.125 2024-09-20 00:40:46,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=778420.0, ans=0.2 2024-09-20 00:40:54,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2024-09-20 00:40:56,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=778460.0, ans=0.0 2024-09-20 00:41:03,897 INFO [train.py:1198] (0/2) Epoch 44, batch 50, loss[loss=0.208, ctc_loss=0.09508, cr_loss=0.3109, attn_decoder_loss=0.2136, over 29419.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1129, cr_loss=0.3568, attn_decoder_loss=0.2392, over 1268656.83 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 16.0 2024-09-20 00:41:04,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=778500.0, ans=0.0 2024-09-20 00:41:07,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=778500.0, ans=0.04949747468305833 2024-09-20 00:41:11,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=778500.0, ans=0.125 2024-09-20 00:41:33,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=778540.0, ans=0.025 2024-09-20 00:41:39,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=778580.0, ans=0.125 2024-09-20 00:41:45,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=778580.0, ans=0.125 2024-09-20 00:41:56,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=778620.0, ans=0.125 2024-09-20 00:41:57,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=778620.0, ans=0.0 2024-09-20 00:42:10,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=778660.0, ans=0.125 2024-09-20 00:42:13,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=778660.0, ans=0.125 2024-09-20 00:42:23,326 INFO [train.py:1198] (0/2) Epoch 44, batch 100, loss[loss=0.2305, ctc_loss=0.1122, cr_loss=0.3583, attn_decoder_loss=0.2357, over 29543.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1143, cr_loss=0.3594, attn_decoder_loss=0.2413, over 2254962.01 frames. ], batch size: 76, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:42:38,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=778740.0, ans=0.125 2024-09-20 00:42:41,355 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.747e+01 9.046e+01 9.804e+01 1.542e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-20 00:42:47,100 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-20 00:42:52,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=778780.0, ans=0.125 2024-09-20 00:42:58,184 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:42:58,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2024-09-20 00:43:04,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2024-09-20 00:43:22,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=778860.0, ans=0.0 2024-09-20 00:43:33,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778860.0, ans=0.0 2024-09-20 00:43:37,845 INFO [train.py:1198] (0/2) Epoch 44, batch 150, loss[loss=0.2108, ctc_loss=0.09747, cr_loss=0.3133, attn_decoder_loss=0.2164, over 29437.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.112, cr_loss=0.3547, attn_decoder_loss=0.239, over 3048979.00 frames. ], batch size: 70, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:43:45,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=778900.0, ans=0.125 2024-09-20 00:43:54,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=778940.0, ans=0.125 2024-09-20 00:43:54,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=778940.0, ans=0.125 2024-09-20 00:44:17,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-20 00:44:18,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=778980.0, ans=0.0 2024-09-20 00:44:52,621 INFO [train.py:1198] (0/2) Epoch 44, batch 200, loss[loss=0.248, ctc_loss=0.1178, cr_loss=0.3651, attn_decoder_loss=0.2543, over 27173.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1111, cr_loss=0.3524, attn_decoder_loss=0.2379, over 3660229.34 frames. ], batch size: 124, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:44:57,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=779100.0, ans=0.1 2024-09-20 00:45:10,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=779140.0, ans=0.125 2024-09-20 00:45:12,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=779140.0, ans=0.125 2024-09-20 00:45:13,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.429e+01 8.994e+01 9.673e+01 1.827e+02, threshold=1.799e+02, percent-clipped=1.0 2024-09-20 00:45:39,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=779220.0, ans=0.125 2024-09-20 00:45:53,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=779220.0, ans=0.125 2024-09-20 00:46:01,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=779260.0, ans=0.0 2024-09-20 00:46:02,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=779260.0, ans=0.5 2024-09-20 00:46:07,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.77 vs. limit=15.0 2024-09-20 00:46:10,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=779260.0, ans=0.125 2024-09-20 00:46:12,945 INFO [train.py:1198] (0/2) Epoch 44, batch 250, loss[loss=0.2524, ctc_loss=0.127, cr_loss=0.3788, attn_decoder_loss=0.2579, over 29237.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1106, cr_loss=0.3515, attn_decoder_loss=0.2378, over 4142737.39 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:46:28,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=779340.0, ans=0.09899494936611666 2024-09-20 00:46:31,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=779340.0, ans=0.025 2024-09-20 00:47:01,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=779420.0, ans=0.125 2024-09-20 00:47:18,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-09-20 00:47:20,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=779460.0, ans=0.125 2024-09-20 00:47:22,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=779460.0, ans=0.125 2024-09-20 00:47:28,089 INFO [train.py:1198] (0/2) Epoch 44, batch 300, loss[loss=0.2463, ctc_loss=0.1207, cr_loss=0.3704, attn_decoder_loss=0.252, over 29531.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1107, cr_loss=0.3515, attn_decoder_loss=0.2378, over 4511243.37 frames. ], batch size: 92, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:47:29,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=779500.0, ans=0.0 2024-09-20 00:47:44,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2024-09-20 00:47:47,519 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.498e+01 8.969e+01 9.392e+01 3.050e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-20 00:47:49,494 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:47:50,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=779540.0, ans=0.125 2024-09-20 00:47:58,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=779580.0, ans=0.125 2024-09-20 00:48:36,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=779660.0, ans=0.02 2024-09-20 00:48:43,473 INFO [train.py:1198] (0/2) Epoch 44, batch 350, loss[loss=0.207, ctc_loss=0.08909, cr_loss=0.2935, attn_decoder_loss=0.2135, over 29341.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1112, cr_loss=0.3524, attn_decoder_loss=0.2383, over 4796838.46 frames. ], batch size: 71, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 00:49:23,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=779780.0, ans=0.1 2024-09-20 00:50:03,122 INFO [train.py:1198] (0/2) Epoch 44, batch 400, loss[loss=0.2291, ctc_loss=0.1098, cr_loss=0.3483, attn_decoder_loss=0.2346, over 29715.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1105, cr_loss=0.3504, attn_decoder_loss=0.2376, over 5026303.07 frames. ], batch size: 82, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:50:10,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2024-09-20 00:50:18,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=779940.0, ans=10.0 2024-09-20 00:50:22,886 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.550e+01 9.066e+01 9.796e+01 2.019e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-20 00:50:55,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780020.0, ans=0.1 2024-09-20 00:51:02,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=780020.0, ans=0.0 2024-09-20 00:51:06,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780060.0, ans=0.125 2024-09-20 00:51:06,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=780060.0, ans=0.0 2024-09-20 00:51:14,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=780060.0, ans=0.2 2024-09-20 00:51:14,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=780060.0, ans=0.09899494936611666 2024-09-20 00:51:16,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=780060.0, ans=0.125 2024-09-20 00:51:19,806 INFO [train.py:1198] (0/2) Epoch 44, batch 450, loss[loss=0.2386, ctc_loss=0.1148, cr_loss=0.3672, attn_decoder_loss=0.2442, over 29698.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1104, cr_loss=0.3497, attn_decoder_loss=0.2377, over 5188611.34 frames. ], batch size: 83, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:51:30,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=780100.0, ans=0.0 2024-09-20 00:51:59,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780180.0, ans=0.125 2024-09-20 00:52:35,600 INFO [train.py:1198] (0/2) Epoch 44, batch 500, loss[loss=0.2544, ctc_loss=0.1252, cr_loss=0.3933, attn_decoder_loss=0.26, over 29438.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1097, cr_loss=0.3482, attn_decoder_loss=0.2368, over 5331645.37 frames. ], batch size: 94, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:52:36,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2024-09-20 00:52:40,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=780300.0, ans=0.2 2024-09-20 00:52:57,329 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.573e+01 8.977e+01 9.726e+01 1.793e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-20 00:53:02,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=780340.0, ans=0.125 2024-09-20 00:53:04,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-09-20 00:53:29,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=780420.0, ans=0.0 2024-09-20 00:53:39,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=780460.0, ans=0.2 2024-09-20 00:53:49,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=780460.0, ans=0.07 2024-09-20 00:53:55,482 INFO [train.py:1198] (0/2) Epoch 44, batch 550, loss[loss=0.2379, ctc_loss=0.1043, cr_loss=0.3287, attn_decoder_loss=0.2454, over 28841.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1099, cr_loss=0.348, attn_decoder_loss=0.237, over 5425659.34 frames. ], batch size: 104, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:54:19,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=780540.0, ans=0.125 2024-09-20 00:54:22,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=780540.0, ans=0.2 2024-09-20 00:54:34,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=780580.0, ans=0.125 2024-09-20 00:54:48,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=780620.0, ans=0.125 2024-09-20 00:54:54,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=780660.0, ans=0.0 2024-09-20 00:55:10,810 INFO [train.py:1198] (0/2) Epoch 44, batch 600, loss[loss=0.2438, ctc_loss=0.1187, cr_loss=0.3734, attn_decoder_loss=0.2494, over 29229.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.11, cr_loss=0.3481, attn_decoder_loss=0.2372, over 5511875.62 frames. ], batch size: 100, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:55:27,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=780740.0, ans=0.125 2024-09-20 00:55:30,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.511e+01 9.110e+01 9.777e+01 1.650e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-20 00:55:50,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=780780.0, ans=0.125 2024-09-20 00:55:51,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=780780.0, ans=0.125 2024-09-20 00:56:07,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=780820.0, ans=0.0 2024-09-20 00:56:10,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=780860.0, ans=0.125 2024-09-20 00:56:16,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=780860.0, ans=0.0 2024-09-20 00:56:26,255 INFO [train.py:1198] (0/2) Epoch 44, batch 650, loss[loss=0.2385, ctc_loss=0.1169, cr_loss=0.3568, attn_decoder_loss=0.2441, over 29761.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1091, cr_loss=0.3464, attn_decoder_loss=0.2367, over 5588215.73 frames. ], batch size: 81, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:56:33,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=780900.0, ans=0.0 2024-09-20 00:56:52,066 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.00 vs. limit=15.0 2024-09-20 00:56:55,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=780940.0, ans=0.0 2024-09-20 00:57:14,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=781020.0, ans=0.2 2024-09-20 00:57:14,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=781020.0, ans=22.5 2024-09-20 00:57:36,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=781060.0, ans=0.1 2024-09-20 00:57:46,189 INFO [train.py:1198] (0/2) Epoch 44, batch 700, loss[loss=0.2263, ctc_loss=0.1073, cr_loss=0.3309, attn_decoder_loss=0.2321, over 29523.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1093, cr_loss=0.3466, attn_decoder_loss=0.2369, over 5638581.80 frames. ], batch size: 76, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:58:01,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781140.0, ans=0.1 2024-09-20 00:58:05,629 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.523e+01 8.995e+01 9.436e+01 1.726e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 00:58:09,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-20 00:58:45,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2024-09-20 00:58:46,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=781260.0, ans=0.125 2024-09-20 00:58:48,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=781260.0, ans=0.125 2024-09-20 00:59:01,550 INFO [train.py:1198] (0/2) Epoch 44, batch 750, loss[loss=0.2303, ctc_loss=0.1017, cr_loss=0.3201, attn_decoder_loss=0.2375, over 29700.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.109, cr_loss=0.3456, attn_decoder_loss=0.2365, over 5678228.47 frames. ], batch size: 82, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 00:59:01,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=781300.0, ans=0.0 2024-09-20 00:59:06,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=781300.0, ans=0.0 2024-09-20 00:59:06,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.48 vs. limit=22.5 2024-09-20 00:59:09,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2024-09-20 00:59:23,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2024-09-20 00:59:37,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=781380.0, ans=0.0 2024-09-20 00:59:37,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=781380.0, ans=0.125 2024-09-20 00:59:50,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=781420.0, ans=0.0 2024-09-20 01:00:00,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=781460.0, ans=0.05 2024-09-20 01:00:01,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=781460.0, ans=0.125 2024-09-20 01:00:04,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=781460.0, ans=0.125 2024-09-20 01:00:09,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=781460.0, ans=0.125 2024-09-20 01:00:16,836 INFO [train.py:1198] (0/2) Epoch 44, batch 800, loss[loss=0.2094, ctc_loss=0.09722, cr_loss=0.327, attn_decoder_loss=0.2146, over 29579.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1092, cr_loss=0.3464, attn_decoder_loss=0.2365, over 5707967.39 frames. ], batch size: 73, lr: 2.51e-03, grad_scale: 32.0 2024-09-20 01:00:37,970 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.478e+01 8.977e+01 9.680e+01 1.726e+02, threshold=1.795e+02, percent-clipped=0.0 2024-09-20 01:00:53,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.81 vs. limit=6.0 2024-09-20 01:00:53,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=781580.0, ans=0.125 2024-09-20 01:01:17,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.09 vs. limit=15.0 2024-09-20 01:01:34,654 INFO [train.py:1198] (0/2) Epoch 44, batch 850, loss[loss=0.2448, ctc_loss=0.1154, cr_loss=0.3595, attn_decoder_loss=0.2512, over 29705.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1091, cr_loss=0.3463, attn_decoder_loss=0.2366, over 5736907.03 frames. ], batch size: 89, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:01:34,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=781700.0, ans=0.0 2024-09-20 01:01:37,894 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:01:51,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2024-09-20 01:02:13,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=781780.0, ans=0.125 2024-09-20 01:02:16,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=781780.0, ans=0.2 2024-09-20 01:02:19,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=781780.0, ans=0.0 2024-09-20 01:02:19,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=781780.0, ans=0.07 2024-09-20 01:02:42,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-09-20 01:02:52,366 INFO [train.py:1198] (0/2) Epoch 44, batch 900, loss[loss=0.2118, ctc_loss=0.09533, cr_loss=0.3258, attn_decoder_loss=0.2176, over 29602.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1095, cr_loss=0.3471, attn_decoder_loss=0.237, over 5742099.74 frames. ], batch size: 73, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:03:03,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2024-09-20 01:03:15,003 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.619e+01 9.074e+01 9.618e+01 1.505e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-20 01:03:22,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=781980.0, ans=0.125 2024-09-20 01:03:27,480 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:03:42,997 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-20 01:03:46,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=782020.0, ans=0.025 2024-09-20 01:03:48,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=782020.0, ans=0.125 2024-09-20 01:04:07,365 INFO [train.py:1198] (0/2) Epoch 44, batch 950, loss[loss=0.206, ctc_loss=0.08885, cr_loss=0.2955, attn_decoder_loss=0.2124, over 29510.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1095, cr_loss=0.3474, attn_decoder_loss=0.2372, over 5743491.77 frames. ], batch size: 74, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:04:16,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782100.0, ans=0.1 2024-09-20 01:04:33,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=782140.0, ans=0.1 2024-09-20 01:04:39,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=782180.0, ans=0.125 2024-09-20 01:04:50,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=782180.0, ans=0.125 2024-09-20 01:04:59,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782220.0, ans=0.1 2024-09-20 01:05:24,400 INFO [train.py:1198] (0/2) Epoch 44, batch 1000, loss[loss=0.2274, ctc_loss=0.1114, cr_loss=0.3492, attn_decoder_loss=0.2326, over 29494.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1104, cr_loss=0.349, attn_decoder_loss=0.238, over 5737837.87 frames. ], batch size: 77, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:05:30,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=782300.0, ans=0.1 2024-09-20 01:05:49,521 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.837e+01 9.355e+01 1.004e+02 2.810e+02, threshold=1.871e+02, percent-clipped=1.0 2024-09-20 01:06:06,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=782380.0, ans=0.0 2024-09-20 01:06:11,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=782420.0, ans=0.0 2024-09-20 01:06:38,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=782460.0, ans=0.125 2024-09-20 01:06:42,614 INFO [train.py:1198] (0/2) Epoch 44, batch 1050, loss[loss=0.2421, ctc_loss=0.114, cr_loss=0.3415, attn_decoder_loss=0.2487, over 29675.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1102, cr_loss=0.3486, attn_decoder_loss=0.2374, over 5745247.80 frames. ], batch size: 85, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:06:42,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=782500.0, ans=0.125 2024-09-20 01:06:44,453 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:06:54,173 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.43 vs. limit=12.0 2024-09-20 01:07:01,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=782540.0, ans=0.1 2024-09-20 01:07:02,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=782540.0, ans=0.0 2024-09-20 01:07:04,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=782540.0, ans=0.05 2024-09-20 01:07:07,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782540.0, ans=0.1 2024-09-20 01:07:44,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=782660.0, ans=0.125 2024-09-20 01:07:54,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=782660.0, ans=0.125 2024-09-20 01:07:58,406 INFO [train.py:1198] (0/2) Epoch 44, batch 1100, loss[loss=0.2281, ctc_loss=0.1067, cr_loss=0.3373, attn_decoder_loss=0.2341, over 29471.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1101, cr_loss=0.3486, attn_decoder_loss=0.2375, over 5757450.82 frames. ], batch size: 78, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:08:23,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.487e+01 8.944e+01 9.556e+01 1.706e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-20 01:08:41,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782780.0, ans=0.125 2024-09-20 01:08:44,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=782820.0, ans=0.125 2024-09-20 01:08:47,223 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-20 01:09:16,487 INFO [train.py:1198] (0/2) Epoch 44, batch 1150, loss[loss=0.2277, ctc_loss=0.1038, cr_loss=0.3335, attn_decoder_loss=0.234, over 29459.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1101, cr_loss=0.3488, attn_decoder_loss=0.2371, over 5754782.39 frames. ], batch size: 78, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:10:17,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=783060.0, ans=0.0 2024-09-20 01:10:25,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=783060.0, ans=0.0 2024-09-20 01:10:33,779 INFO [train.py:1198] (0/2) Epoch 44, batch 1200, loss[loss=0.2325, ctc_loss=0.1097, cr_loss=0.3464, attn_decoder_loss=0.2385, over 29674.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1106, cr_loss=0.3497, attn_decoder_loss=0.2379, over 5748006.98 frames. ], batch size: 85, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:10:37,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=783100.0, ans=0.125 2024-09-20 01:10:56,606 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.611e+01 9.178e+01 9.686e+01 1.323e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-20 01:10:58,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=783140.0, ans=0.125 2024-09-20 01:11:36,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=783260.0, ans=0.015 2024-09-20 01:11:36,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=783260.0, ans=0.125 2024-09-20 01:11:44,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=783260.0, ans=0.0 2024-09-20 01:11:50,054 INFO [train.py:1198] (0/2) Epoch 44, batch 1250, loss[loss=0.2485, ctc_loss=0.1291, cr_loss=0.3995, attn_decoder_loss=0.2529, over 29517.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1109, cr_loss=0.3504, attn_decoder_loss=0.2384, over 5775629.13 frames. ], batch size: 92, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:11:59,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=783300.0, ans=0.0 2024-09-20 01:12:09,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-09-20 01:12:28,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=783380.0, ans=0.125 2024-09-20 01:12:31,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783380.0, ans=0.1 2024-09-20 01:12:37,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=783420.0, ans=0.125 2024-09-20 01:12:37,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=783420.0, ans=0.125 2024-09-20 01:12:37,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=783420.0, ans=0.09899494936611666 2024-09-20 01:12:43,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=783420.0, ans=0.025 2024-09-20 01:12:43,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=783420.0, ans=0.125 2024-09-20 01:12:44,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-09-20 01:12:46,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.58 vs. limit=15.0 2024-09-20 01:13:02,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=783460.0, ans=0.025 2024-09-20 01:13:05,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783460.0, ans=0.1 2024-09-20 01:13:07,790 INFO [train.py:1198] (0/2) Epoch 44, batch 1300, loss[loss=0.2393, ctc_loss=0.1031, cr_loss=0.3271, attn_decoder_loss=0.2472, over 28290.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1106, cr_loss=0.3494, attn_decoder_loss=0.2378, over 5779139.09 frames. ], batch size: 111, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:13:08,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=783500.0, ans=0.125 2024-09-20 01:13:12,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=783500.0, ans=0.125 2024-09-20 01:13:28,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2024-09-20 01:13:30,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.490e+01 8.901e+01 9.557e+01 1.827e+02, threshold=1.780e+02, percent-clipped=0.0 2024-09-20 01:13:55,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=783620.0, ans=0.0 2024-09-20 01:13:58,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783620.0, ans=0.1 2024-09-20 01:14:13,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-20 01:14:14,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=783660.0, ans=0.1 2024-09-20 01:14:23,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-09-20 01:14:25,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.01 vs. limit=12.0 2024-09-20 01:14:25,848 INFO [train.py:1198] (0/2) Epoch 44, batch 1350, loss[loss=0.2361, ctc_loss=0.114, cr_loss=0.3559, attn_decoder_loss=0.2417, over 29763.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1103, cr_loss=0.3492, attn_decoder_loss=0.2376, over 5796261.51 frames. ], batch size: 81, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:14:40,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=783740.0, ans=0.125 2024-09-20 01:14:40,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=783740.0, ans=0.125 2024-09-20 01:14:40,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=783740.0, ans=0.0 2024-09-20 01:14:42,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=783740.0, ans=0.125 2024-09-20 01:14:54,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=783780.0, ans=0.025 2024-09-20 01:15:01,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=783780.0, ans=0.1 2024-09-20 01:15:02,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.92 vs. limit=12.0 2024-09-20 01:15:10,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=783820.0, ans=10.0 2024-09-20 01:15:40,847 INFO [train.py:1198] (0/2) Epoch 44, batch 1400, loss[loss=0.2077, ctc_loss=0.09281, cr_loss=0.3045, attn_decoder_loss=0.2137, over 29600.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.11, cr_loss=0.3487, attn_decoder_loss=0.2374, over 5808125.16 frames. ], batch size: 69, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:15:43,429 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-09-20 01:15:48,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=783900.0, ans=0.125 2024-09-20 01:16:03,128 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.567e+01 9.208e+01 9.655e+01 2.033e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-20 01:16:17,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=783980.0, ans=0.0 2024-09-20 01:16:19,228 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-196000.pt 2024-09-20 01:16:40,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=784020.0, ans=0.125 2024-09-20 01:16:41,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=784020.0, ans=0.125 2024-09-20 01:17:05,608 INFO [train.py:1198] (0/2) Epoch 44, batch 1450, loss[loss=0.2511, ctc_loss=0.125, cr_loss=0.3883, attn_decoder_loss=0.2565, over 29449.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1104, cr_loss=0.3499, attn_decoder_loss=0.2378, over 5805163.55 frames. ], batch size: 94, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:17:10,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=784100.0, ans=0.125 2024-09-20 01:18:14,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=784260.0, ans=0.125 2024-09-20 01:18:23,234 INFO [train.py:1198] (0/2) Epoch 44, batch 1500, loss[loss=0.24, ctc_loss=0.1147, cr_loss=0.3486, attn_decoder_loss=0.2461, over 29628.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1103, cr_loss=0.3497, attn_decoder_loss=0.2381, over 5805354.00 frames. ], batch size: 86, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:18:23,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=784300.0, ans=0.025 2024-09-20 01:18:34,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=784300.0, ans=0.035 2024-09-20 01:18:40,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=784340.0, ans=0.0 2024-09-20 01:18:47,501 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.610e+01 9.049e+01 9.653e+01 2.114e+02, threshold=1.810e+02, percent-clipped=1.0 2024-09-20 01:19:01,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-09-20 01:19:06,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=784380.0, ans=0.0 2024-09-20 01:19:33,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=784460.0, ans=0.125 2024-09-20 01:19:39,004 INFO [train.py:1198] (0/2) Epoch 44, batch 1550, loss[loss=0.2477, ctc_loss=0.1233, cr_loss=0.3784, attn_decoder_loss=0.2531, over 29524.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1108, cr_loss=0.3499, attn_decoder_loss=0.2385, over 5780568.08 frames. ], batch size: 90, lr: 2.51e-03, grad_scale: 8.0 2024-09-20 01:19:40,030 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-20 01:19:43,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=784500.0, ans=0.125 2024-09-20 01:20:24,208 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-20 01:20:47,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=784660.0, ans=0.2 2024-09-20 01:20:56,207 INFO [train.py:1198] (0/2) Epoch 44, batch 1600, loss[loss=0.2417, ctc_loss=0.1183, cr_loss=0.3677, attn_decoder_loss=0.2472, over 29660.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1108, cr_loss=0.35, attn_decoder_loss=0.2383, over 5762219.35 frames. ], batch size: 85, lr: 2.51e-03, grad_scale: 16.0 2024-09-20 01:21:02,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=784700.0, ans=0.2 2024-09-20 01:21:07,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=784700.0, ans=0.125 2024-09-20 01:21:14,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=784740.0, ans=0.125 2024-09-20 01:21:18,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=784740.0, ans=0.2 2024-09-20 01:21:20,302 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.674e+01 9.197e+01 9.675e+01 9.690e+02, threshold=1.839e+02, percent-clipped=2.0 2024-09-20 01:21:20,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=784740.0, ans=0.0 2024-09-20 01:22:14,099 INFO [train.py:1198] (0/2) Epoch 44, batch 1650, loss[loss=0.2489, ctc_loss=0.1221, cr_loss=0.3825, attn_decoder_loss=0.2544, over 29691.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1104, cr_loss=0.3493, attn_decoder_loss=0.238, over 5756344.99 frames. ], batch size: 89, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:23:07,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=785020.0, ans=0.2 2024-09-20 01:23:23,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=785060.0, ans=0.125 2024-09-20 01:23:26,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=785060.0, ans=0.0 2024-09-20 01:23:29,266 INFO [train.py:1198] (0/2) Epoch 44, batch 1700, loss[loss=0.2033, ctc_loss=0.08759, cr_loss=0.3065, attn_decoder_loss=0.2093, over 29566.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1102, cr_loss=0.3489, attn_decoder_loss=0.238, over 5778423.79 frames. ], batch size: 69, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:23:38,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=785100.0, ans=0.07 2024-09-20 01:23:43,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=785140.0, ans=0.0 2024-09-20 01:23:52,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=785140.0, ans=0.0 2024-09-20 01:23:55,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.621e+01 9.129e+01 9.684e+01 1.448e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 01:23:57,413 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:24:04,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=785180.0, ans=0.125 2024-09-20 01:24:12,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=785180.0, ans=0.09899494936611666 2024-09-20 01:24:13,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=785180.0, ans=0.125 2024-09-20 01:24:15,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=785220.0, ans=0.2 2024-09-20 01:24:20,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-09-20 01:24:46,811 INFO [train.py:1198] (0/2) Epoch 44, batch 1750, loss[loss=0.2085, ctc_loss=0.09494, cr_loss=0.318, attn_decoder_loss=0.214, over 29341.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1099, cr_loss=0.3479, attn_decoder_loss=0.2377, over 5788309.47 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:24:48,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=785300.0, ans=0.125 2024-09-20 01:25:17,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-09-20 01:25:21,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=785380.0, ans=0.125 2024-09-20 01:26:01,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=785460.0, ans=0.5 2024-09-20 01:26:03,906 INFO [train.py:1198] (0/2) Epoch 44, batch 1800, loss[loss=0.2457, ctc_loss=0.1215, cr_loss=0.3741, attn_decoder_loss=0.2512, over 29690.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1097, cr_loss=0.3479, attn_decoder_loss=0.2377, over 5790349.91 frames. ], batch size: 83, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:26:06,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=12.0 2024-09-20 01:26:16,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=785500.0, ans=0.07 2024-09-20 01:26:27,988 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.530e+01 8.993e+01 9.458e+01 1.310e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 01:26:29,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=785540.0, ans=0.125 2024-09-20 01:26:32,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=785580.0, ans=0.125 2024-09-20 01:26:56,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=22.5 2024-09-20 01:27:10,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=785660.0, ans=10.0 2024-09-20 01:27:19,805 INFO [train.py:1198] (0/2) Epoch 44, batch 1850, loss[loss=0.2332, ctc_loss=0.1036, cr_loss=0.3238, attn_decoder_loss=0.2403, over 29625.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1097, cr_loss=0.3485, attn_decoder_loss=0.2377, over 5798606.39 frames. ], batch size: 86, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:27:20,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=785700.0, ans=0.0 2024-09-20 01:27:44,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=785740.0, ans=0.2 2024-09-20 01:27:46,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=785740.0, ans=0.0 2024-09-20 01:28:13,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=785820.0, ans=0.0 2024-09-20 01:28:15,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=785820.0, ans=0.125 2024-09-20 01:28:16,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=785820.0, ans=0.125 2024-09-20 01:28:37,305 INFO [train.py:1198] (0/2) Epoch 44, batch 1900, loss[loss=0.2333, ctc_loss=0.09809, cr_loss=0.3168, attn_decoder_loss=0.2413, over 29700.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1099, cr_loss=0.3489, attn_decoder_loss=0.2382, over 5806673.13 frames. ], batch size: 89, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:28:42,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-20 01:28:51,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=785940.0, ans=0.0 2024-09-20 01:28:54,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-20 01:28:58,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=785940.0, ans=0.0 2024-09-20 01:29:00,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.84 vs. limit=15.0 2024-09-20 01:29:01,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.601e+01 9.112e+01 9.762e+01 1.549e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-20 01:29:47,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=786060.0, ans=0.125 2024-09-20 01:29:50,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=786060.0, ans=0.07 2024-09-20 01:29:54,968 INFO [train.py:1198] (0/2) Epoch 44, batch 1950, loss[loss=0.2329, ctc_loss=0.1137, cr_loss=0.3517, attn_decoder_loss=0.2383, over 29451.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.111, cr_loss=0.3516, attn_decoder_loss=0.2393, over 5820909.90 frames. ], batch size: 78, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:30:19,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.90 vs. limit=10.0 2024-09-20 01:30:24,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=786180.0, ans=0.125 2024-09-20 01:30:33,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=786180.0, ans=0.025 2024-09-20 01:31:07,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 01:31:09,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.60 vs. limit=15.0 2024-09-20 01:31:09,701 INFO [train.py:1198] (0/2) Epoch 44, batch 2000, loss[loss=0.2018, ctc_loss=0.08974, cr_loss=0.2936, attn_decoder_loss=0.2077, over 29358.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1112, cr_loss=0.3517, attn_decoder_loss=0.2395, over 5797361.89 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 32.0 2024-09-20 01:31:15,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-09-20 01:31:15,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-20 01:31:30,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=786340.0, ans=0.0 2024-09-20 01:31:37,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.592e+01 9.152e+01 9.700e+01 1.620e+02, threshold=1.830e+02, percent-clipped=0.0 2024-09-20 01:31:48,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=786380.0, ans=0.0 2024-09-20 01:31:59,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786420.0, ans=0.1 2024-09-20 01:32:27,931 INFO [train.py:1198] (0/2) Epoch 44, batch 2050, loss[loss=0.2122, ctc_loss=0.1005, cr_loss=0.3195, attn_decoder_loss=0.2175, over 29443.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1104, cr_loss=0.3503, attn_decoder_loss=0.2383, over 5788997.11 frames. ], batch size: 70, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:32:46,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2024-09-20 01:33:07,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=786580.0, ans=0.0 2024-09-20 01:33:07,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=786580.0, ans=0.0 2024-09-20 01:33:11,849 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:33:13,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=786620.0, ans=0.2 2024-09-20 01:33:18,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=786620.0, ans=0.125 2024-09-20 01:33:28,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=786660.0, ans=0.1 2024-09-20 01:33:44,954 INFO [train.py:1198] (0/2) Epoch 44, batch 2100, loss[loss=0.2273, ctc_loss=0.1086, cr_loss=0.3473, attn_decoder_loss=0.2328, over 29756.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1093, cr_loss=0.3484, attn_decoder_loss=0.2373, over 5801082.62 frames. ], batch size: 81, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:34:00,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=786740.0, ans=0.2 2024-09-20 01:34:00,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.13 vs. limit=15.0 2024-09-20 01:34:10,461 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.592e+01 8.953e+01 9.546e+01 1.075e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-20 01:34:30,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=786820.0, ans=0.125 2024-09-20 01:34:58,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=786900.0, ans=0.025 2024-09-20 01:34:59,829 INFO [train.py:1198] (0/2) Epoch 44, batch 2150, loss[loss=0.2238, ctc_loss=0.09986, cr_loss=0.3431, attn_decoder_loss=0.23, over 29484.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.109, cr_loss=0.3478, attn_decoder_loss=0.2368, over 5816411.87 frames. ], batch size: 78, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:35:11,612 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-09-20 01:35:37,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=786980.0, ans=0.0 2024-09-20 01:36:02,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-09-20 01:36:17,944 INFO [train.py:1198] (0/2) Epoch 44, batch 2200, loss[loss=0.2315, ctc_loss=0.1053, cr_loss=0.3467, attn_decoder_loss=0.2379, over 29614.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1093, cr_loss=0.348, attn_decoder_loss=0.237, over 5813032.49 frames. ], batch size: 86, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:36:24,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=787100.0, ans=0.0 2024-09-20 01:36:30,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=787100.0, ans=0.1 2024-09-20 01:36:35,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=787140.0, ans=0.125 2024-09-20 01:36:42,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=787140.0, ans=0.0 2024-09-20 01:36:43,208 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.551e+01 8.996e+01 9.508e+01 1.674e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 01:36:44,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.11 vs. limit=22.5 2024-09-20 01:36:54,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=787180.0, ans=0.0 2024-09-20 01:37:03,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=787220.0, ans=0.2 2024-09-20 01:37:12,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=787220.0, ans=0.0 2024-09-20 01:37:34,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=787300.0, ans=0.1 2024-09-20 01:37:35,608 INFO [train.py:1198] (0/2) Epoch 44, batch 2250, loss[loss=0.2398, ctc_loss=0.1133, cr_loss=0.3482, attn_decoder_loss=0.2461, over 29710.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1089, cr_loss=0.3467, attn_decoder_loss=0.2369, over 5812907.59 frames. ], batch size: 82, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:37:43,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=787300.0, ans=0.125 2024-09-20 01:38:04,581 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:38:10,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-20 01:38:13,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=787380.0, ans=0.0 2024-09-20 01:38:17,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=787380.0, ans=0.125 2024-09-20 01:38:35,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=787460.0, ans=0.0 2024-09-20 01:38:44,060 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2024-09-20 01:38:50,742 INFO [train.py:1198] (0/2) Epoch 44, batch 2300, loss[loss=0.2118, ctc_loss=0.103, cr_loss=0.3333, attn_decoder_loss=0.2165, over 29340.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.109, cr_loss=0.3473, attn_decoder_loss=0.2363, over 5800208.47 frames. ], batch size: 71, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:38:54,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=787500.0, ans=0.1 2024-09-20 01:39:03,575 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.75 vs. limit=12.0 2024-09-20 01:39:10,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=787540.0, ans=0.04949747468305833 2024-09-20 01:39:13,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=787540.0, ans=0.125 2024-09-20 01:39:18,228 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.668e+01 9.192e+01 9.767e+01 1.748e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-20 01:39:31,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-09-20 01:39:53,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=787660.0, ans=0.025 2024-09-20 01:40:08,548 INFO [train.py:1198] (0/2) Epoch 44, batch 2350, loss[loss=0.2425, ctc_loss=0.1154, cr_loss=0.3788, attn_decoder_loss=0.2482, over 29685.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1096, cr_loss=0.3487, attn_decoder_loss=0.2368, over 5805614.50 frames. ], batch size: 83, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:40:09,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2024-09-20 01:40:23,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=787740.0, ans=0.025 2024-09-20 01:40:28,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=787740.0, ans=0.0 2024-09-20 01:40:30,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2024-09-20 01:40:59,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.57 vs. limit=15.0 2024-09-20 01:41:26,339 INFO [train.py:1198] (0/2) Epoch 44, batch 2400, loss[loss=0.2126, ctc_loss=0.09099, cr_loss=0.2975, attn_decoder_loss=0.2195, over 29543.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1098, cr_loss=0.349, attn_decoder_loss=0.2372, over 5808715.85 frames. ], batch size: 76, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:41:38,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=787900.0, ans=0.0 2024-09-20 01:41:50,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=787940.0, ans=0.2 2024-09-20 01:41:53,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.728e+01 9.218e+01 9.758e+01 1.607e+02, threshold=1.844e+02, percent-clipped=0.0 2024-09-20 01:42:42,339 INFO [train.py:1198] (0/2) Epoch 44, batch 2450, loss[loss=0.2329, ctc_loss=0.1075, cr_loss=0.3539, attn_decoder_loss=0.239, over 29718.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1101, cr_loss=0.3492, attn_decoder_loss=0.2381, over 5785394.76 frames. ], batch size: 82, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:42:48,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=788100.0, ans=0.125 2024-09-20 01:43:02,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=788140.0, ans=0.07 2024-09-20 01:43:32,829 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:43:32,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=788220.0, ans=0.0 2024-09-20 01:43:59,853 INFO [train.py:1198] (0/2) Epoch 44, batch 2500, loss[loss=0.2487, ctc_loss=0.1252, cr_loss=0.382, attn_decoder_loss=0.2539, over 29618.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1104, cr_loss=0.3493, attn_decoder_loss=0.2382, over 5794909.70 frames. ], batch size: 86, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:44:14,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2024-09-20 01:44:21,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=788340.0, ans=0.125 2024-09-20 01:44:26,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.639e+01 9.215e+01 9.726e+01 1.262e+02, threshold=1.843e+02, percent-clipped=0.0 2024-09-20 01:44:27,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-09-20 01:44:35,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.74 vs. limit=10.0 2024-09-20 01:44:39,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788380.0, ans=0.1 2024-09-20 01:44:59,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=788460.0, ans=0.125 2024-09-20 01:45:09,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=788460.0, ans=0.05 2024-09-20 01:45:17,642 INFO [train.py:1198] (0/2) Epoch 44, batch 2550, loss[loss=0.2073, ctc_loss=0.09275, cr_loss=0.3073, attn_decoder_loss=0.2132, over 29311.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1103, cr_loss=0.3491, attn_decoder_loss=0.2381, over 5798227.42 frames. ], batch size: 67, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:45:17,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=788500.0, ans=0.125 2024-09-20 01:45:49,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=788580.0, ans=0.5 2024-09-20 01:45:49,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.67 vs. limit=15.0 2024-09-20 01:46:01,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=788620.0, ans=0.2 2024-09-20 01:46:10,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788620.0, ans=0.125 2024-09-20 01:46:26,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=788660.0, ans=0.125 2024-09-20 01:46:32,703 INFO [train.py:1198] (0/2) Epoch 44, batch 2600, loss[loss=0.2308, ctc_loss=0.1121, cr_loss=0.3634, attn_decoder_loss=0.2359, over 29429.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1108, cr_loss=0.3504, attn_decoder_loss=0.2387, over 5794862.69 frames. ], batch size: 78, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:46:43,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=788700.0, ans=0.0 2024-09-20 01:46:53,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=788740.0, ans=0.025 2024-09-20 01:47:03,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.497e+01 9.008e+01 9.570e+01 2.359e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 01:47:09,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=788780.0, ans=0.125 2024-09-20 01:47:14,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=788780.0, ans=0.125 2024-09-20 01:47:37,843 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-20 01:47:43,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=788860.0, ans=0.025 2024-09-20 01:47:50,464 INFO [train.py:1198] (0/2) Epoch 44, batch 2650, loss[loss=0.2477, ctc_loss=0.1148, cr_loss=0.3606, attn_decoder_loss=0.2544, over 29274.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1108, cr_loss=0.3502, attn_decoder_loss=0.2389, over 5801028.37 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:48:52,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=789060.0, ans=0.125 2024-09-20 01:49:07,981 INFO [train.py:1198] (0/2) Epoch 44, batch 2700, loss[loss=0.2504, ctc_loss=0.1197, cr_loss=0.3745, attn_decoder_loss=0.2566, over 29543.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1108, cr_loss=0.3505, attn_decoder_loss=0.2391, over 5796874.56 frames. ], batch size: 87, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:49:20,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=789100.0, ans=0.125 2024-09-20 01:49:21,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=789140.0, ans=0.0 2024-09-20 01:49:33,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=789140.0, ans=0.125 2024-09-20 01:49:36,539 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.638e+01 9.038e+01 9.626e+01 7.105e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-20 01:49:38,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=789180.0, ans=0.0 2024-09-20 01:50:11,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=789260.0, ans=0.125 2024-09-20 01:50:14,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=789260.0, ans=0.125 2024-09-20 01:50:23,488 INFO [train.py:1198] (0/2) Epoch 44, batch 2750, loss[loss=0.2217, ctc_loss=0.1096, cr_loss=0.3577, attn_decoder_loss=0.2262, over 29508.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1101, cr_loss=0.3491, attn_decoder_loss=0.2379, over 5794569.28 frames. ], batch size: 75, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:50:37,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789340.0, ans=0.1 2024-09-20 01:50:42,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=789340.0, ans=0.025 2024-09-20 01:50:55,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=789380.0, ans=0.125 2024-09-20 01:51:10,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=789420.0, ans=0.0 2024-09-20 01:51:14,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=12.0 2024-09-20 01:51:19,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=789420.0, ans=0.125 2024-09-20 01:51:21,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=789420.0, ans=0.125 2024-09-20 01:51:41,198 INFO [train.py:1198] (0/2) Epoch 44, batch 2800, loss[loss=0.2575, ctc_loss=0.1406, cr_loss=0.3884, attn_decoder_loss=0.2618, over 20265.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1105, cr_loss=0.3496, attn_decoder_loss=0.2381, over 5775863.06 frames. ], batch size: 210, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:51:41,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=789500.0, ans=0.125 2024-09-20 01:51:48,958 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:51:53,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=789500.0, ans=0.125 2024-09-20 01:52:03,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=789540.0, ans=0.125 2024-09-20 01:52:09,703 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 8.779e+01 9.114e+01 9.644e+01 1.703e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 01:52:19,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=789580.0, ans=0.125 2024-09-20 01:52:39,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=789620.0, ans=0.04949747468305833 2024-09-20 01:52:40,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=789660.0, ans=0.125 2024-09-20 01:52:58,696 INFO [train.py:1198] (0/2) Epoch 44, batch 2850, loss[loss=0.2265, ctc_loss=0.1063, cr_loss=0.3398, attn_decoder_loss=0.2323, over 29523.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1109, cr_loss=0.3505, attn_decoder_loss=0.2384, over 5762266.45 frames. ], batch size: 77, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 01:53:17,185 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:53:22,029 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.56 vs. limit=15.0 2024-09-20 01:53:30,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=789780.0, ans=0.125 2024-09-20 01:53:41,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=789780.0, ans=0.0 2024-09-20 01:53:42,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=789820.0, ans=0.2 2024-09-20 01:53:50,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=789820.0, ans=0.2 2024-09-20 01:54:08,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=789860.0, ans=0.2 2024-09-20 01:54:11,717 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.89 vs. limit=10.0 2024-09-20 01:54:13,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.12 vs. limit=10.0 2024-09-20 01:54:13,955 INFO [train.py:1198] (0/2) Epoch 44, batch 2900, loss[loss=0.2327, ctc_loss=0.1107, cr_loss=0.3448, attn_decoder_loss=0.2386, over 29401.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1108, cr_loss=0.3508, attn_decoder_loss=0.239, over 5787849.30 frames. ], batch size: 79, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:54:23,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=789900.0, ans=0.0 2024-09-20 01:54:24,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=789900.0, ans=0.125 2024-09-20 01:54:37,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789940.0, ans=0.1 2024-09-20 01:54:46,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.571e+01 8.849e+01 9.680e+01 1.963e+02, threshold=1.770e+02, percent-clipped=2.0 2024-09-20 01:54:47,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-20 01:55:03,163 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:55:03,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-09-20 01:55:13,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790020.0, ans=0.125 2024-09-20 01:55:15,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=790060.0, ans=0.0 2024-09-20 01:55:20,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-20 01:55:31,618 INFO [train.py:1198] (0/2) Epoch 44, batch 2950, loss[loss=0.2228, ctc_loss=0.1079, cr_loss=0.3563, attn_decoder_loss=0.2277, over 29510.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1102, cr_loss=0.3495, attn_decoder_loss=0.2379, over 5782380.57 frames. ], batch size: 75, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:55:41,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=790100.0, ans=0.125 2024-09-20 01:55:50,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=790140.0, ans=0.125 2024-09-20 01:55:56,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=790140.0, ans=0.125 2024-09-20 01:56:08,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=790180.0, ans=0.0 2024-09-20 01:56:14,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=790180.0, ans=0.1 2024-09-20 01:56:24,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.89 vs. limit=22.5 2024-09-20 01:56:25,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=790220.0, ans=0.125 2024-09-20 01:56:33,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.33 vs. limit=10.0 2024-09-20 01:56:49,773 INFO [train.py:1198] (0/2) Epoch 44, batch 3000, loss[loss=0.2281, ctc_loss=0.1076, cr_loss=0.3437, attn_decoder_loss=0.2338, over 29759.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1104, cr_loss=0.35, attn_decoder_loss=0.2378, over 5782390.45 frames. ], batch size: 81, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:56:49,773 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 01:57:08,098 INFO [train.py:1230] (0/2) Epoch 44, validation: loss=0.2127, ctc_loss=0.03705, cr_loss=7.369e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-20 01:57:08,098 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 01:57:14,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=790300.0, ans=0.125 2024-09-20 01:57:18,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-09-20 01:57:26,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=790340.0, ans=0.125 2024-09-20 01:57:37,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=12.0 2024-09-20 01:57:38,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.680e+01 9.147e+01 9.757e+01 3.916e+02, threshold=1.829e+02, percent-clipped=1.0 2024-09-20 01:57:55,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=790420.0, ans=0.0 2024-09-20 01:58:01,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=790420.0, ans=0.125 2024-09-20 01:58:11,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=790460.0, ans=0.125 2024-09-20 01:58:12,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-09-20 01:58:26,517 INFO [train.py:1198] (0/2) Epoch 44, batch 3050, loss[loss=0.2276, ctc_loss=0.1086, cr_loss=0.3537, attn_decoder_loss=0.2329, over 29521.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1108, cr_loss=0.3508, attn_decoder_loss=0.2386, over 5776523.32 frames. ], batch size: 76, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:58:28,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=790500.0, ans=0.0 2024-09-20 01:58:28,928 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=15.0 2024-09-20 01:58:36,355 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-09-20 01:59:10,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=790620.0, ans=0.125 2024-09-20 01:59:15,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=790620.0, ans=0.2 2024-09-20 01:59:21,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=790620.0, ans=0.2 2024-09-20 01:59:21,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790620.0, ans=0.0 2024-09-20 01:59:36,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=790660.0, ans=0.125 2024-09-20 01:59:41,914 INFO [train.py:1198] (0/2) Epoch 44, batch 3100, loss[loss=0.2374, ctc_loss=0.1128, cr_loss=0.3476, attn_decoder_loss=0.2435, over 29244.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1105, cr_loss=0.3504, attn_decoder_loss=0.2381, over 5777408.42 frames. ], batch size: 100, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 01:59:49,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=790700.0, ans=0.125 2024-09-20 02:00:11,908 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.501e+01 8.989e+01 9.639e+01 2.477e+02, threshold=1.798e+02, percent-clipped=1.0 2024-09-20 02:00:13,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=790780.0, ans=0.125 2024-09-20 02:00:16,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790780.0, ans=0.125 2024-09-20 02:00:31,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=790820.0, ans=0.125 2024-09-20 02:00:33,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=790820.0, ans=0.0 2024-09-20 02:00:41,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=790820.0, ans=10.0 2024-09-20 02:00:46,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=790860.0, ans=0.2 2024-09-20 02:00:47,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=790860.0, ans=0.125 2024-09-20 02:00:49,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=790860.0, ans=0.09899494936611666 2024-09-20 02:00:59,880 INFO [train.py:1198] (0/2) Epoch 44, batch 3150, loss[loss=0.2501, ctc_loss=0.1259, cr_loss=0.3775, attn_decoder_loss=0.2555, over 28857.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.11, cr_loss=0.3493, attn_decoder_loss=0.2379, over 5783611.69 frames. ], batch size: 104, lr: 2.50e-03, grad_scale: 8.0 2024-09-20 02:01:18,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=790940.0, ans=0.125 2024-09-20 02:01:24,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=790940.0, ans=0.0 2024-09-20 02:01:31,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=790980.0, ans=0.125 2024-09-20 02:01:40,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=790980.0, ans=0.125 2024-09-20 02:01:51,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791020.0, ans=0.1 2024-09-20 02:02:17,232 INFO [train.py:1198] (0/2) Epoch 44, batch 3200, loss[loss=0.2358, ctc_loss=0.1133, cr_loss=0.3553, attn_decoder_loss=0.2415, over 29417.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1098, cr_loss=0.3489, attn_decoder_loss=0.2376, over 5794245.09 frames. ], batch size: 79, lr: 2.50e-03, grad_scale: 16.0 2024-09-20 02:02:23,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=791100.0, ans=0.0 2024-09-20 02:02:28,131 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:02:37,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=791140.0, ans=0.125 2024-09-20 02:02:43,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=791140.0, ans=0.0 2024-09-20 02:02:47,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.647e+01 9.072e+01 9.601e+01 1.731e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 02:03:00,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=791180.0, ans=0.0 2024-09-20 02:03:33,119 INFO [train.py:1198] (0/2) Epoch 44, batch 3250, loss[loss=0.2364, ctc_loss=0.1132, cr_loss=0.3388, attn_decoder_loss=0.2426, over 29707.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1101, cr_loss=0.3493, attn_decoder_loss=0.2381, over 5800769.10 frames. ], batch size: 84, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:03:48,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=791340.0, ans=0.125 2024-09-20 02:03:58,487 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2024-09-20 02:04:01,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-20 02:04:02,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2024-09-20 02:04:50,969 INFO [train.py:1198] (0/2) Epoch 44, batch 3300, loss[loss=0.2384, ctc_loss=0.1043, cr_loss=0.3224, attn_decoder_loss=0.2462, over 28464.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1094, cr_loss=0.3474, attn_decoder_loss=0.237, over 5797531.82 frames. ], batch size: 111, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:04:55,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=791500.0, ans=0.125 2024-09-20 02:05:22,614 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.134e+01 8.597e+01 9.177e+01 9.695e+01 2.585e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 02:05:24,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=791580.0, ans=0.025 2024-09-20 02:05:39,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=791620.0, ans=0.0 2024-09-20 02:06:07,995 INFO [train.py:1198] (0/2) Epoch 44, batch 3350, loss[loss=0.2408, ctc_loss=0.1106, cr_loss=0.3406, attn_decoder_loss=0.2477, over 28865.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.11, cr_loss=0.3488, attn_decoder_loss=0.2379, over 5774971.14 frames. ], batch size: 104, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:06:20,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791700.0, ans=0.1 2024-09-20 02:06:40,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.33 vs. limit=15.0 2024-09-20 02:06:47,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=791780.0, ans=0.2 2024-09-20 02:07:06,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2024-09-20 02:07:18,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2024-09-20 02:07:23,739 INFO [train.py:1198] (0/2) Epoch 44, batch 3400, loss[loss=0.2045, ctc_loss=0.08879, cr_loss=0.311, attn_decoder_loss=0.2104, over 29345.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1103, cr_loss=0.3492, attn_decoder_loss=0.2379, over 5768646.34 frames. ], batch size: 67, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:07:45,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.08 vs. limit=6.0 2024-09-20 02:07:51,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=791940.0, ans=0.0 2024-09-20 02:07:55,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.682e+01 9.111e+01 9.724e+01 2.135e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-20 02:08:07,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.86 vs. limit=6.0 2024-09-20 02:08:09,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=792020.0, ans=0.2 2024-09-20 02:08:10,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=792020.0, ans=0.05 2024-09-20 02:08:41,413 INFO [train.py:1198] (0/2) Epoch 44, batch 3450, loss[loss=0.2345, ctc_loss=0.101, cr_loss=0.3214, attn_decoder_loss=0.2422, over 28390.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1103, cr_loss=0.3496, attn_decoder_loss=0.2382, over 5777195.75 frames. ], batch size: 111, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:08:55,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=792140.0, ans=0.95 2024-09-20 02:09:05,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=792140.0, ans=0.125 2024-09-20 02:09:31,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=792220.0, ans=0.0 2024-09-20 02:09:51,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=792260.0, ans=0.125 2024-09-20 02:09:58,573 INFO [train.py:1198] (0/2) Epoch 44, batch 3500, loss[loss=0.2013, ctc_loss=0.08756, cr_loss=0.3059, attn_decoder_loss=0.2071, over 29314.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1098, cr_loss=0.3488, attn_decoder_loss=0.2377, over 5778131.00 frames. ], batch size: 71, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:10:08,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=792300.0, ans=0.0 2024-09-20 02:10:10,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-09-20 02:10:15,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=792340.0, ans=0.0 2024-09-20 02:10:30,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.582e+01 8.980e+01 9.639e+01 1.678e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-20 02:10:32,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=792380.0, ans=0.125 2024-09-20 02:10:49,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=792420.0, ans=0.125 2024-09-20 02:10:56,431 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.20 vs. limit=12.0 2024-09-20 02:11:13,201 INFO [train.py:1198] (0/2) Epoch 44, batch 3550, loss[loss=0.2361, ctc_loss=0.1085, cr_loss=0.3522, attn_decoder_loss=0.2424, over 29716.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1097, cr_loss=0.3484, attn_decoder_loss=0.2377, over 5783843.73 frames. ], batch size: 89, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:11:14,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=792500.0, ans=0.125 2024-09-20 02:11:33,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=792540.0, ans=0.025 2024-09-20 02:11:44,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=792580.0, ans=0.0 2024-09-20 02:12:05,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=792620.0, ans=0.125 2024-09-20 02:12:05,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=792620.0, ans=0.025 2024-09-20 02:12:07,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792620.0, ans=0.1 2024-09-20 02:12:22,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792660.0, ans=0.1 2024-09-20 02:12:26,921 INFO [train.py:1198] (0/2) Epoch 44, batch 3600, loss[loss=0.2247, ctc_loss=0.1041, cr_loss=0.3353, attn_decoder_loss=0.2307, over 29506.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.11, cr_loss=0.3494, attn_decoder_loss=0.2378, over 5792692.21 frames. ], batch size: 77, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:12:58,440 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.550e+01 9.094e+01 9.613e+01 3.759e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-20 02:12:58,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=792780.0, ans=0.125 2024-09-20 02:13:08,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792780.0, ans=0.125 2024-09-20 02:13:30,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=792860.0, ans=0.5 2024-09-20 02:13:41,887 INFO [train.py:1198] (0/2) Epoch 44, batch 3650, loss[loss=0.2483, ctc_loss=0.1207, cr_loss=0.3688, attn_decoder_loss=0.2543, over 29510.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1096, cr_loss=0.3481, attn_decoder_loss=0.2373, over 5794979.23 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:14:06,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=792940.0, ans=0.125 2024-09-20 02:14:58,107 INFO [train.py:1198] (0/2) Epoch 44, batch 3700, loss[loss=0.2417, ctc_loss=0.1152, cr_loss=0.3492, attn_decoder_loss=0.248, over 29719.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1095, cr_loss=0.3479, attn_decoder_loss=0.2374, over 5804335.29 frames. ], batch size: 84, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:15:13,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793140.0, ans=0.1 2024-09-20 02:15:32,681 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.629e+01 9.056e+01 9.534e+01 1.565e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-20 02:16:09,061 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2024-09-20 02:16:09,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=12.0 2024-09-20 02:16:14,384 INFO [train.py:1198] (0/2) Epoch 44, batch 3750, loss[loss=0.2149, ctc_loss=0.0995, cr_loss=0.3329, attn_decoder_loss=0.2203, over 29360.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1095, cr_loss=0.3477, attn_decoder_loss=0.2373, over 5807990.75 frames. ], batch size: 67, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:16:22,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=793300.0, ans=0.0 2024-09-20 02:16:25,545 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.43 vs. limit=10.0 2024-09-20 02:16:36,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-09-20 02:16:44,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=793380.0, ans=0.0 2024-09-20 02:16:50,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=793380.0, ans=0.125 2024-09-20 02:17:24,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=793460.0, ans=0.5 2024-09-20 02:17:28,370 INFO [train.py:1198] (0/2) Epoch 44, batch 3800, loss[loss=0.2376, ctc_loss=0.1097, cr_loss=0.3594, attn_decoder_loss=0.2438, over 29623.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1092, cr_loss=0.3473, attn_decoder_loss=0.237, over 5798920.21 frames. ], batch size: 86, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:17:39,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.23 vs. limit=10.0 2024-09-20 02:17:44,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=793540.0, ans=0.125 2024-09-20 02:17:46,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2024-09-20 02:17:50,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=793540.0, ans=0.1 2024-09-20 02:18:01,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.351e+01 9.103e+01 9.836e+01 3.154e+02, threshold=1.821e+02, percent-clipped=2.0 2024-09-20 02:18:18,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-20 02:18:30,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.50 vs. limit=15.0 2024-09-20 02:18:42,695 INFO [train.py:1198] (0/2) Epoch 44, batch 3850, loss[loss=0.2305, ctc_loss=0.1042, cr_loss=0.345, attn_decoder_loss=0.2369, over 29246.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1089, cr_loss=0.347, attn_decoder_loss=0.2367, over 5814184.03 frames. ], batch size: 100, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:18:53,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=793700.0, ans=0.0 2024-09-20 02:18:56,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=793740.0, ans=10.0 2024-09-20 02:19:00,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=793740.0, ans=0.125 2024-09-20 02:19:06,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=793740.0, ans=0.2 2024-09-20 02:19:16,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=793780.0, ans=0.125 2024-09-20 02:19:31,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=793820.0, ans=0.125 2024-09-20 02:19:46,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=793860.0, ans=0.0 2024-09-20 02:19:53,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2024-09-20 02:19:58,452 INFO [train.py:1198] (0/2) Epoch 44, batch 3900, loss[loss=0.2411, ctc_loss=0.1078, cr_loss=0.3461, attn_decoder_loss=0.2482, over 29628.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1093, cr_loss=0.3476, attn_decoder_loss=0.2372, over 5818259.17 frames. ], batch size: 86, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:20:04,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=793900.0, ans=0.125 2024-09-20 02:20:14,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=793940.0, ans=0.1 2024-09-20 02:20:15,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=793940.0, ans=0.1 2024-09-20 02:20:31,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 8.637e+01 9.253e+01 9.637e+01 1.224e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-20 02:20:31,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=793980.0, ans=0.09899494936611666 2024-09-20 02:20:33,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=793980.0, ans=0.125 2024-09-20 02:21:14,101 INFO [train.py:1198] (0/2) Epoch 44, batch 3950, loss[loss=0.2387, ctc_loss=0.1128, cr_loss=0.3647, attn_decoder_loss=0.2446, over 29470.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1093, cr_loss=0.348, attn_decoder_loss=0.2372, over 5837074.36 frames. ], batch size: 97, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:21:26,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=794100.0, ans=0.025 2024-09-20 02:21:42,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=794180.0, ans=0.07 2024-09-20 02:21:45,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=794180.0, ans=0.125 2024-09-20 02:21:46,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=794180.0, ans=0.04949747468305833 2024-09-20 02:21:58,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=794220.0, ans=0.0 2024-09-20 02:22:16,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=794260.0, ans=0.125 2024-09-20 02:22:20,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=794260.0, ans=0.2 2024-09-20 02:22:27,455 INFO [train.py:1198] (0/2) Epoch 44, batch 4000, loss[loss=0.2216, ctc_loss=0.1034, cr_loss=0.3446, attn_decoder_loss=0.2271, over 29500.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1097, cr_loss=0.3484, attn_decoder_loss=0.2374, over 5812205.81 frames. ], batch size: 74, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:22:32,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=794300.0, ans=0.2 2024-09-20 02:22:33,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=794300.0, ans=0.0 2024-09-20 02:22:39,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=794300.0, ans=0.125 2024-09-20 02:22:54,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=794340.0, ans=0.125 2024-09-20 02:23:01,247 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.690e+01 9.242e+01 9.635e+01 1.653e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-20 02:23:25,290 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=22.5 2024-09-20 02:23:40,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=794500.0, ans=0.1 2024-09-20 02:23:41,488 INFO [train.py:1198] (0/2) Epoch 44, batch 4050, loss[loss=0.2437, ctc_loss=0.1208, cr_loss=0.3474, attn_decoder_loss=0.2496, over 20775.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1096, cr_loss=0.3481, attn_decoder_loss=0.2372, over 5796790.95 frames. ], batch size: 209, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:24:03,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=794540.0, ans=0.125 2024-09-20 02:24:03,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794540.0, ans=0.1 2024-09-20 02:24:16,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=794580.0, ans=0.125 2024-09-20 02:24:45,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=794660.0, ans=0.2 2024-09-20 02:24:51,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=794660.0, ans=0.0 2024-09-20 02:24:53,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=794660.0, ans=0.125 2024-09-20 02:24:56,016 INFO [train.py:1198] (0/2) Epoch 44, batch 4100, loss[loss=0.2453, ctc_loss=0.1235, cr_loss=0.3894, attn_decoder_loss=0.2501, over 29487.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1103, cr_loss=0.3502, attn_decoder_loss=0.2376, over 5792236.86 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:24:59,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=794700.0, ans=0.2 2024-09-20 02:25:04,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=794700.0, ans=0.025 2024-09-20 02:25:06,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=794700.0, ans=0.125 2024-09-20 02:25:29,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=794780.0, ans=0.125 2024-09-20 02:25:30,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.703e+01 9.227e+01 9.918e+01 1.839e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-20 02:25:31,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794780.0, ans=0.1 2024-09-20 02:25:53,425 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=22.5 2024-09-20 02:25:56,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2024-09-20 02:26:09,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2024-09-20 02:26:10,188 INFO [train.py:1198] (0/2) Epoch 44, batch 4150, loss[loss=0.2256, ctc_loss=0.1056, cr_loss=0.3384, attn_decoder_loss=0.2314, over 29500.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1102, cr_loss=0.3502, attn_decoder_loss=0.2373, over 5798148.74 frames. ], batch size: 77, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:26:11,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=794900.0, ans=0.2 2024-09-20 02:26:12,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2024-09-20 02:26:13,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=794900.0, ans=0.125 2024-09-20 02:26:27,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=794940.0, ans=0.125 2024-09-20 02:27:23,536 INFO [train.py:1198] (0/2) Epoch 44, batch 4200, loss[loss=0.2459, ctc_loss=0.1224, cr_loss=0.3779, attn_decoder_loss=0.2512, over 29477.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1103, cr_loss=0.3503, attn_decoder_loss=0.2376, over 5800167.11 frames. ], batch size: 90, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:27:25,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=795100.0, ans=0.5 2024-09-20 02:27:32,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=795100.0, ans=0.125 2024-09-20 02:27:47,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-09-20 02:27:51,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=795180.0, ans=0.0 2024-09-20 02:27:57,349 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 8.655e+01 9.286e+01 9.774e+01 5.497e+02, threshold=1.857e+02, percent-clipped=1.0 2024-09-20 02:28:00,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=795180.0, ans=0.2 2024-09-20 02:28:27,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=795260.0, ans=0.125 2024-09-20 02:28:27,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.71 vs. limit=15.0 2024-09-20 02:28:28,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=795260.0, ans=0.09899494936611666 2024-09-20 02:28:38,191 INFO [train.py:1198] (0/2) Epoch 44, batch 4250, loss[loss=0.212, ctc_loss=0.0886, cr_loss=0.2889, attn_decoder_loss=0.2193, over 29533.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1096, cr_loss=0.3486, attn_decoder_loss=0.2376, over 5805971.82 frames. ], batch size: 74, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:28:44,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=795300.0, ans=0.125 2024-09-20 02:29:01,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=795340.0, ans=0.0 2024-09-20 02:29:08,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=795380.0, ans=0.125 2024-09-20 02:29:10,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=795380.0, ans=15.0 2024-09-20 02:29:20,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=795380.0, ans=0.125 2024-09-20 02:29:21,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795420.0, ans=0.125 2024-09-20 02:29:31,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795420.0, ans=0.1 2024-09-20 02:29:36,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=795460.0, ans=0.125 2024-09-20 02:29:36,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2024-09-20 02:29:39,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=795460.0, ans=0.125 2024-09-20 02:29:46,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=795460.0, ans=0.0 2024-09-20 02:29:49,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795460.0, ans=0.1 2024-09-20 02:29:52,314 INFO [train.py:1198] (0/2) Epoch 44, batch 4300, loss[loss=0.2331, ctc_loss=0.1096, cr_loss=0.3373, attn_decoder_loss=0.2394, over 29516.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1093, cr_loss=0.348, attn_decoder_loss=0.2376, over 5795712.26 frames. ], batch size: 87, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:30:15,280 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-20 02:30:20,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=795580.0, ans=0.125 2024-09-20 02:30:26,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.755e+01 9.251e+01 9.683e+01 2.005e+02, threshold=1.850e+02, percent-clipped=1.0 2024-09-20 02:30:32,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=795580.0, ans=0.0 2024-09-20 02:30:41,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=795620.0, ans=10.0 2024-09-20 02:30:54,898 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:31:06,388 INFO [train.py:1198] (0/2) Epoch 44, batch 4350, loss[loss=0.2477, ctc_loss=0.1167, cr_loss=0.3563, attn_decoder_loss=0.2543, over 29500.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1119, cr_loss=0.3541, attn_decoder_loss=0.2409, over 5798103.03 frames. ], batch size: 97, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:31:08,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=795700.0, ans=0.09899494936611666 2024-09-20 02:31:27,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=795740.0, ans=0.0 2024-09-20 02:31:31,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=795740.0, ans=0.95 2024-09-20 02:31:40,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=795780.0, ans=0.125 2024-09-20 02:31:53,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=795820.0, ans=0.0 2024-09-20 02:31:59,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2024-09-20 02:32:00,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795820.0, ans=0.1 2024-09-20 02:32:06,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=795860.0, ans=0.125 2024-09-20 02:32:11,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-20 02:32:17,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=795860.0, ans=0.1 2024-09-20 02:32:20,752 INFO [train.py:1198] (0/2) Epoch 44, batch 4400, loss[loss=0.2436, ctc_loss=0.1144, cr_loss=0.3569, attn_decoder_loss=0.2501, over 27447.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.1131, cr_loss=0.3568, attn_decoder_loss=0.2428, over 5768510.48 frames. ], batch size: 125, lr: 2.49e-03, grad_scale: 16.0 2024-09-20 02:32:29,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=795900.0, ans=0.2 2024-09-20 02:32:34,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=795940.0, ans=0.0 2024-09-20 02:32:35,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=795940.0, ans=0.125 2024-09-20 02:32:37,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-20 02:32:53,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=795980.0, ans=0.0 2024-09-20 02:32:54,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.213e+01 9.042e+01 9.394e+01 9.819e+01 2.193e+02, threshold=1.879e+02, percent-clipped=1.0 2024-09-20 02:33:09,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=796020.0, ans=0.0 2024-09-20 02:33:19,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=796060.0, ans=0.2 2024-09-20 02:33:29,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2024-09-20 02:33:34,337 INFO [train.py:1198] (0/2) Epoch 44, batch 4450, loss[loss=0.2622, ctc_loss=0.15, cr_loss=0.4197, attn_decoder_loss=0.2654, over 19813.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1167, cr_loss=0.3621, attn_decoder_loss=0.2449, over 5574107.50 frames. ], batch size: 210, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:33:55,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-09-20 02:34:21,948 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:34:27,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=796220.0, ans=0.025 2024-09-20 02:34:33,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=796260.0, ans=0.125 2024-09-20 02:34:41,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=796260.0, ans=0.125 2024-09-20 02:34:44,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-20 02:34:47,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=796260.0, ans=0.025 2024-09-20 02:34:49,812 INFO [train.py:1198] (0/2) Epoch 44, batch 4500, loss[loss=0.2487, ctc_loss=0.1302, cr_loss=0.3951, attn_decoder_loss=0.2531, over 20424.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1194, cr_loss=0.3641, attn_decoder_loss=0.2465, over 5236993.14 frames. ], batch size: 210, lr: 2.49e-03, grad_scale: 8.0 2024-09-20 02:34:53,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=796300.0, ans=0.125 2024-09-20 02:34:53,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=796300.0, ans=15.0 2024-09-20 02:35:05,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=796340.0, ans=0.0 2024-09-20 02:35:11,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=796340.0, ans=0.0 2024-09-20 02:35:17,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=796340.0, ans=0.025 2024-09-20 02:35:26,140 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.915e+01 1.070e+02 1.147e+02 1.258e+02 2.122e+02, threshold=2.294e+02, percent-clipped=1.0 2024-09-20 02:35:27,522 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-44.pt 2024-09-20 02:36:17,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-09-20 02:36:17,461 INFO [train.py:1198] (0/2) Epoch 45, batch 0, loss[loss=0.2139, ctc_loss=0.09865, cr_loss=0.3261, attn_decoder_loss=0.2194, over 29639.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.09865, cr_loss=0.3261, attn_decoder_loss=0.2194, over 29639.00 frames. ], batch size: 73, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:36:17,461 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 02:36:35,782 INFO [train.py:1230] (0/2) Epoch 45, validation: loss=0.2126, ctc_loss=0.03577, cr_loss=6.589e-15, attn_decoder_loss=0.2323, over 944034.00 frames. 2024-09-20 02:36:35,783 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 02:36:42,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=796400.0, ans=0.125 2024-09-20 02:36:42,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-20 02:37:02,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=796440.0, ans=0.0 2024-09-20 02:37:08,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=796480.0, ans=0.1 2024-09-20 02:37:11,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=796480.0, ans=0.0 2024-09-20 02:37:27,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-09-20 02:37:30,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=796520.0, ans=0.125 2024-09-20 02:37:35,863 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.73 vs. limit=15.0 2024-09-20 02:37:38,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=796560.0, ans=0.2 2024-09-20 02:37:46,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=796560.0, ans=0.125 2024-09-20 02:37:51,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-09-20 02:37:53,214 INFO [train.py:1198] (0/2) Epoch 45, batch 50, loss[loss=0.2135, ctc_loss=0.1006, cr_loss=0.3196, attn_decoder_loss=0.2189, over 29426.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1106, cr_loss=0.3508, attn_decoder_loss=0.2384, over 1264621.16 frames. ], batch size: 70, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:38:07,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=796640.0, ans=0.0 2024-09-20 02:38:10,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=796640.0, ans=0.0 2024-09-20 02:38:11,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=796640.0, ans=0.5 2024-09-20 02:38:26,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.00 vs. limit=12.0 2024-09-20 02:38:28,738 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:38:40,694 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:38:42,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=796720.0, ans=0.125 2024-09-20 02:38:52,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=796760.0, ans=0.0 2024-09-20 02:38:54,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=796760.0, ans=0.0 2024-09-20 02:39:00,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=12.0 2024-09-20 02:39:07,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=796800.0, ans=0.125 2024-09-20 02:39:09,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.517e+01 8.971e+01 9.670e+01 3.092e+02, threshold=1.794e+02, percent-clipped=1.0 2024-09-20 02:39:09,133 INFO [train.py:1198] (0/2) Epoch 45, batch 100, loss[loss=0.2234, ctc_loss=0.105, cr_loss=0.3473, attn_decoder_loss=0.2288, over 29534.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1112, cr_loss=0.3513, attn_decoder_loss=0.2397, over 2250092.81 frames. ], batch size: 76, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:39:37,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=796880.0, ans=0.125 2024-09-20 02:39:47,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=796880.0, ans=0.125 2024-09-20 02:39:56,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=796920.0, ans=0.0 2024-09-20 02:40:15,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=796960.0, ans=0.0 2024-09-20 02:40:15,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=796960.0, ans=0.125 2024-09-20 02:40:25,370 INFO [train.py:1198] (0/2) Epoch 45, batch 150, loss[loss=0.2097, ctc_loss=0.09235, cr_loss=0.3179, attn_decoder_loss=0.2156, over 29447.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1087, cr_loss=0.3462, attn_decoder_loss=0.2374, over 3045881.53 frames. ], batch size: 70, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:40:37,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=797000.0, ans=0.125 2024-09-20 02:40:50,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=797040.0, ans=0.125 2024-09-20 02:40:51,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=797040.0, ans=0.125 2024-09-20 02:40:55,418 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.05 vs. limit=12.0 2024-09-20 02:40:59,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=797080.0, ans=0.125 2024-09-20 02:41:05,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=797080.0, ans=0.125 2024-09-20 02:41:26,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=797160.0, ans=0.125 2024-09-20 02:41:38,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=797160.0, ans=0.125 2024-09-20 02:41:42,434 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.414e+01 8.795e+01 9.375e+01 1.270e+02, threshold=1.759e+02, percent-clipped=0.0 2024-09-20 02:41:42,456 INFO [train.py:1198] (0/2) Epoch 45, batch 200, loss[loss=0.2403, ctc_loss=0.1147, cr_loss=0.3657, attn_decoder_loss=0.2461, over 27196.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1082, cr_loss=0.3456, attn_decoder_loss=0.2364, over 3658071.84 frames. ], batch size: 124, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:41:57,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=797240.0, ans=0.125 2024-09-20 02:42:18,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=22.5 2024-09-20 02:42:26,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=797320.0, ans=0.2 2024-09-20 02:42:46,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=797360.0, ans=0.125 2024-09-20 02:42:47,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=797360.0, ans=0.2 2024-09-20 02:42:55,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=797360.0, ans=0.0 2024-09-20 02:42:58,112 INFO [train.py:1198] (0/2) Epoch 45, batch 250, loss[loss=0.2496, ctc_loss=0.1234, cr_loss=0.3807, attn_decoder_loss=0.2552, over 29250.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3471, attn_decoder_loss=0.2371, over 4139122.97 frames. ], batch size: 100, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:42:58,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=797400.0, ans=0.0 2024-09-20 02:43:29,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=797480.0, ans=0.125 2024-09-20 02:43:58,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=797520.0, ans=0.025 2024-09-20 02:44:16,174 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.464e+01 8.917e+01 9.593e+01 1.535e+02, threshold=1.783e+02, percent-clipped=0.0 2024-09-20 02:44:16,202 INFO [train.py:1198] (0/2) Epoch 45, batch 300, loss[loss=0.2458, ctc_loss=0.1184, cr_loss=0.3636, attn_decoder_loss=0.2519, over 29542.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1089, cr_loss=0.3474, attn_decoder_loss=0.2372, over 4506806.23 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:44:46,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=797640.0, ans=0.125 2024-09-20 02:45:33,893 INFO [train.py:1198] (0/2) Epoch 45, batch 350, loss[loss=0.217, ctc_loss=0.09506, cr_loss=0.3059, attn_decoder_loss=0.2238, over 29736.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1096, cr_loss=0.349, attn_decoder_loss=0.2379, over 4793293.33 frames. ], batch size: 72, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:45:46,110 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:45:55,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797840.0, ans=0.1 2024-09-20 02:45:58,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=797840.0, ans=0.0 2024-09-20 02:46:19,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=797920.0, ans=0.0 2024-09-20 02:46:32,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=797960.0, ans=0.125 2024-09-20 02:46:34,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=797960.0, ans=0.125 2024-09-20 02:46:37,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=797960.0, ans=0.2 2024-09-20 02:46:42,216 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2024-09-20 02:46:48,813 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.541e+01 8.980e+01 9.725e+01 1.224e+02, threshold=1.796e+02, percent-clipped=0.0 2024-09-20 02:46:48,834 INFO [train.py:1198] (0/2) Epoch 45, batch 400, loss[loss=0.2373, ctc_loss=0.123, cr_loss=0.3721, attn_decoder_loss=0.2417, over 29713.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1096, cr_loss=0.3491, attn_decoder_loss=0.2378, over 5022831.86 frames. ], batch size: 82, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:47:05,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=798040.0, ans=0.0 2024-09-20 02:47:10,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=798040.0, ans=0.025 2024-09-20 02:47:13,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=798040.0, ans=0.0 2024-09-20 02:47:16,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798040.0, ans=0.125 2024-09-20 02:47:17,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-20 02:47:17,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=798080.0, ans=0.1 2024-09-20 02:47:30,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=798080.0, ans=0.0 2024-09-20 02:47:35,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=798120.0, ans=0.125 2024-09-20 02:48:04,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=798160.0, ans=0.2 2024-09-20 02:48:06,801 INFO [train.py:1198] (0/2) Epoch 45, batch 450, loss[loss=0.2425, ctc_loss=0.116, cr_loss=0.3668, attn_decoder_loss=0.2484, over 29704.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1096, cr_loss=0.3491, attn_decoder_loss=0.2377, over 5185610.35 frames. ], batch size: 83, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:48:23,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=798240.0, ans=0.07 2024-09-20 02:48:34,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=798240.0, ans=0.05 2024-09-20 02:48:45,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=798280.0, ans=0.125 2024-09-20 02:48:51,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=798280.0, ans=0.025 2024-09-20 02:48:52,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.11 vs. limit=22.5 2024-09-20 02:48:53,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=798320.0, ans=0.125 2024-09-20 02:49:08,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=798360.0, ans=0.0 2024-09-20 02:49:11,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=798360.0, ans=0.0 2024-09-20 02:49:15,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=798360.0, ans=0.125 2024-09-20 02:49:24,763 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.412e+01 8.859e+01 9.470e+01 4.425e+02, threshold=1.772e+02, percent-clipped=1.0 2024-09-20 02:49:24,789 INFO [train.py:1198] (0/2) Epoch 45, batch 500, loss[loss=0.2521, ctc_loss=0.131, cr_loss=0.3977, attn_decoder_loss=0.2567, over 29485.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1085, cr_loss=0.3473, attn_decoder_loss=0.2366, over 5329419.65 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 16.0 2024-09-20 02:49:30,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-20 02:49:43,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=798440.0, ans=0.125 2024-09-20 02:49:46,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=12.0 2024-09-20 02:49:53,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=798480.0, ans=0.0 2024-09-20 02:50:10,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=798520.0, ans=0.07 2024-09-20 02:50:33,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798560.0, ans=0.1 2024-09-20 02:50:38,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.61 vs. limit=15.0 2024-09-20 02:50:40,212 INFO [train.py:1198] (0/2) Epoch 45, batch 550, loss[loss=0.2434, ctc_loss=0.1137, cr_loss=0.3614, attn_decoder_loss=0.2498, over 28835.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1093, cr_loss=0.3484, attn_decoder_loss=0.237, over 5423505.42 frames. ], batch size: 104, lr: 2.46e-03, grad_scale: 8.0 2024-09-20 02:50:58,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=798640.0, ans=0.125 2024-09-20 02:51:09,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=798680.0, ans=0.0 2024-09-20 02:51:10,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798680.0, ans=0.125 2024-09-20 02:51:20,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2024-09-20 02:51:32,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=798720.0, ans=0.0 2024-09-20 02:51:43,021 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:51:57,990 INFO [train.py:1198] (0/2) Epoch 45, batch 600, loss[loss=0.2393, ctc_loss=0.1157, cr_loss=0.3576, attn_decoder_loss=0.2451, over 29264.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1092, cr_loss=0.3484, attn_decoder_loss=0.237, over 5511075.43 frames. ], batch size: 100, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:51:59,417 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.595e+01 9.062e+01 9.748e+01 3.862e+02, threshold=1.812e+02, percent-clipped=2.0 2024-09-20 02:52:35,764 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2024-09-20 02:52:53,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=798920.0, ans=0.125 2024-09-20 02:53:01,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798960.0, ans=0.1 2024-09-20 02:53:02,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798960.0, ans=0.125 2024-09-20 02:53:07,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=798960.0, ans=0.0 2024-09-20 02:53:13,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=799000.0, ans=0.125 2024-09-20 02:53:14,710 INFO [train.py:1198] (0/2) Epoch 45, batch 650, loss[loss=0.2317, ctc_loss=0.1018, cr_loss=0.3205, attn_decoder_loss=0.239, over 29739.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1085, cr_loss=0.3464, attn_decoder_loss=0.2364, over 5587558.72 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:53:55,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=799080.0, ans=0.125 2024-09-20 02:54:06,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=799120.0, ans=0.2 2024-09-20 02:54:30,733 INFO [train.py:1198] (0/2) Epoch 45, batch 700, loss[loss=0.2278, ctc_loss=0.1083, cr_loss=0.3462, attn_decoder_loss=0.2334, over 29534.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1089, cr_loss=0.3477, attn_decoder_loss=0.2369, over 5637988.76 frames. ], batch size: 76, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:54:32,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.670e+01 9.106e+01 9.852e+01 1.537e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-20 02:54:46,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=799240.0, ans=0.0 2024-09-20 02:55:07,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=799280.0, ans=0.125 2024-09-20 02:55:16,379 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:55:24,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=799320.0, ans=0.2 2024-09-20 02:55:39,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=799360.0, ans=0.0 2024-09-20 02:55:48,586 INFO [train.py:1198] (0/2) Epoch 45, batch 750, loss[loss=0.2394, ctc_loss=0.112, cr_loss=0.3589, attn_decoder_loss=0.2456, over 29692.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1089, cr_loss=0.3473, attn_decoder_loss=0.2366, over 5677236.95 frames. ], batch size: 82, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:55:50,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=799400.0, ans=0.09899494936611666 2024-09-20 02:55:56,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=799400.0, ans=0.125 2024-09-20 02:55:57,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=799400.0, ans=0.1 2024-09-20 02:56:02,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2024-09-20 02:56:06,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=799440.0, ans=0.125 2024-09-20 02:56:17,525 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-09-20 02:56:28,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-20 02:56:29,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799480.0, ans=0.1 2024-09-20 02:56:46,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=799520.0, ans=0.125 2024-09-20 02:57:06,240 INFO [train.py:1198] (0/2) Epoch 45, batch 800, loss[loss=0.1988, ctc_loss=0.08456, cr_loss=0.2962, attn_decoder_loss=0.2049, over 29607.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.109, cr_loss=0.3472, attn_decoder_loss=0.2367, over 5708316.68 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 02:57:07,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 8.615e+01 9.052e+01 9.760e+01 1.570e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-20 02:57:09,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=799600.0, ans=0.125 2024-09-20 02:57:11,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=799600.0, ans=0.0 2024-09-20 02:57:12,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799600.0, ans=0.125 2024-09-20 02:57:19,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=799640.0, ans=0.035 2024-09-20 02:57:59,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.14 vs. limit=15.0 2024-09-20 02:58:01,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-09-20 02:58:05,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=799760.0, ans=0.125 2024-09-20 02:58:08,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=799760.0, ans=0.125 2024-09-20 02:58:21,370 INFO [train.py:1198] (0/2) Epoch 45, batch 850, loss[loss=0.2353, ctc_loss=0.1062, cr_loss=0.3503, attn_decoder_loss=0.2419, over 29686.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1086, cr_loss=0.3464, attn_decoder_loss=0.2364, over 5736135.39 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:58:29,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=799800.0, ans=0.125 2024-09-20 02:58:39,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=799840.0, ans=0.2 2024-09-20 02:58:46,125 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2024-09-20 02:58:47,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.74 vs. limit=22.5 2024-09-20 02:59:02,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=799880.0, ans=0.125 2024-09-20 02:59:06,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=799920.0, ans=0.125 2024-09-20 02:59:08,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.96 vs. limit=15.0 2024-09-20 02:59:33,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.42 vs. limit=15.0 2024-09-20 02:59:38,140 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-200000.pt 2024-09-20 02:59:46,327 INFO [train.py:1198] (0/2) Epoch 45, batch 900, loss[loss=0.2153, ctc_loss=0.09392, cr_loss=0.3106, attn_decoder_loss=0.2219, over 29615.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1089, cr_loss=0.347, attn_decoder_loss=0.2369, over 5740156.93 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 02:59:49,261 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.442e+01 9.122e+01 9.676e+01 4.269e+02, threshold=1.824e+02, percent-clipped=2.0 2024-09-20 03:00:00,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=800040.0, ans=0.1 2024-09-20 03:00:30,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=800120.0, ans=0.125 2024-09-20 03:00:31,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=800120.0, ans=0.125 2024-09-20 03:00:50,181 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:00:53,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=800160.0, ans=0.0 2024-09-20 03:01:02,854 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-09-20 03:01:03,278 INFO [train.py:1198] (0/2) Epoch 45, batch 950, loss[loss=0.2259, ctc_loss=0.1028, cr_loss=0.3377, attn_decoder_loss=0.2321, over 29537.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1091, cr_loss=0.3474, attn_decoder_loss=0.2372, over 5741568.58 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:01:11,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=800200.0, ans=0.125 2024-09-20 03:01:18,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=800240.0, ans=0.025 2024-09-20 03:01:20,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=800240.0, ans=0.2 2024-09-20 03:01:21,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=800240.0, ans=0.025 2024-09-20 03:01:25,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=800240.0, ans=15.0 2024-09-20 03:01:32,561 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:01:38,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=800280.0, ans=0.125 2024-09-20 03:01:44,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2024-09-20 03:02:18,180 INFO [train.py:1198] (0/2) Epoch 45, batch 1000, loss[loss=0.2215, ctc_loss=0.1048, cr_loss=0.3348, attn_decoder_loss=0.227, over 29474.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1096, cr_loss=0.3485, attn_decoder_loss=0.2379, over 5735344.27 frames. ], batch size: 77, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:02:21,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.681e+01 9.118e+01 9.953e+01 2.174e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-20 03:02:45,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=800440.0, ans=0.125 2024-09-20 03:03:35,521 INFO [train.py:1198] (0/2) Epoch 45, batch 1050, loss[loss=0.2551, ctc_loss=0.1233, cr_loss=0.3763, attn_decoder_loss=0.2613, over 29690.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1099, cr_loss=0.3492, attn_decoder_loss=0.2378, over 5744106.07 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:03:47,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=800600.0, ans=0.2 2024-09-20 03:03:54,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=800640.0, ans=0.125 2024-09-20 03:03:55,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=800640.0, ans=0.0 2024-09-20 03:04:13,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=800680.0, ans=0.0 2024-09-20 03:04:18,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=800680.0, ans=0.0 2024-09-20 03:04:53,634 INFO [train.py:1198] (0/2) Epoch 45, batch 1100, loss[loss=0.2249, ctc_loss=0.1003, cr_loss=0.327, attn_decoder_loss=0.2315, over 29460.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1095, cr_loss=0.3485, attn_decoder_loss=0.2373, over 5756249.81 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:04:56,595 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.469e+01 8.955e+01 9.647e+01 1.370e+02, threshold=1.791e+02, percent-clipped=0.0 2024-09-20 03:05:03,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=800800.0, ans=0.2 2024-09-20 03:05:19,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=800840.0, ans=0.0 2024-09-20 03:05:24,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=800880.0, ans=0.125 2024-09-20 03:05:39,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-20 03:05:57,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=800960.0, ans=0.2 2024-09-20 03:06:00,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=800960.0, ans=0.0 2024-09-20 03:06:09,170 INFO [train.py:1198] (0/2) Epoch 45, batch 1150, loss[loss=0.2224, ctc_loss=0.1046, cr_loss=0.3338, attn_decoder_loss=0.2281, over 29451.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1096, cr_loss=0.3486, attn_decoder_loss=0.2372, over 5753840.74 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:06:26,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=801040.0, ans=0.0 2024-09-20 03:06:33,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.89 vs. limit=10.0 2024-09-20 03:06:34,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=801040.0, ans=0.0 2024-09-20 03:06:34,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2024-09-20 03:06:41,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=801080.0, ans=0.0 2024-09-20 03:06:44,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=801080.0, ans=0.2 2024-09-20 03:06:44,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=801080.0, ans=0.125 2024-09-20 03:07:01,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=801120.0, ans=0.125 2024-09-20 03:07:02,043 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-09-20 03:07:27,000 INFO [train.py:1198] (0/2) Epoch 45, batch 1200, loss[loss=0.2324, ctc_loss=0.09843, cr_loss=0.327, attn_decoder_loss=0.24, over 29671.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1097, cr_loss=0.3489, attn_decoder_loss=0.2376, over 5747592.82 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:07:29,986 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.448e+01 9.125e+01 9.558e+01 3.990e+02, threshold=1.825e+02, percent-clipped=1.0 2024-09-20 03:07:36,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=801200.0, ans=0.2 2024-09-20 03:08:12,538 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:08:13,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=801320.0, ans=0.0 2024-09-20 03:08:15,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=801320.0, ans=0.0 2024-09-20 03:08:41,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=801360.0, ans=0.025 2024-09-20 03:08:44,562 INFO [train.py:1198] (0/2) Epoch 45, batch 1250, loss[loss=0.2467, ctc_loss=0.1226, cr_loss=0.3757, attn_decoder_loss=0.2521, over 29520.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1105, cr_loss=0.3509, attn_decoder_loss=0.2383, over 5775525.57 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:08:49,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=801400.0, ans=0.125 2024-09-20 03:09:19,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=801480.0, ans=0.125 2024-09-20 03:09:24,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=801480.0, ans=0.125 2024-09-20 03:09:34,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=801520.0, ans=0.125 2024-09-20 03:09:38,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-09-20 03:09:45,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-20 03:09:57,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=801560.0, ans=0.125 2024-09-20 03:10:00,506 INFO [train.py:1198] (0/2) Epoch 45, batch 1300, loss[loss=0.2282, ctc_loss=0.101, cr_loss=0.3177, attn_decoder_loss=0.2353, over 28392.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.11, cr_loss=0.3497, attn_decoder_loss=0.2378, over 5778880.80 frames. ], batch size: 111, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:10:03,560 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.720e+01 9.060e+01 9.963e+01 1.314e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-20 03:10:23,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801640.0, ans=0.1 2024-09-20 03:11:09,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=801760.0, ans=0.09899494936611666 2024-09-20 03:11:14,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=801760.0, ans=0.1 2024-09-20 03:11:18,210 INFO [train.py:1198] (0/2) Epoch 45, batch 1350, loss[loss=0.2357, ctc_loss=0.1092, cr_loss=0.3458, attn_decoder_loss=0.2421, over 29760.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1094, cr_loss=0.3485, attn_decoder_loss=0.2373, over 5796812.75 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:11:21,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=801800.0, ans=0.125 2024-09-20 03:11:48,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=801880.0, ans=0.125 2024-09-20 03:12:01,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=801920.0, ans=0.0 2024-09-20 03:12:06,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=801920.0, ans=0.125 2024-09-20 03:12:31,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=801960.0, ans=0.125 2024-09-20 03:12:35,462 INFO [train.py:1198] (0/2) Epoch 45, batch 1400, loss[loss=0.2055, ctc_loss=0.09484, cr_loss=0.3202, attn_decoder_loss=0.2107, over 29552.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1092, cr_loss=0.348, attn_decoder_loss=0.2371, over 5808167.82 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:12:37,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=12.0 2024-09-20 03:12:39,955 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.353e+01 8.806e+01 9.318e+01 1.165e+02, threshold=1.761e+02, percent-clipped=0.0 2024-09-20 03:12:43,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802000.0, ans=0.125 2024-09-20 03:12:52,714 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-09-20 03:12:55,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=802040.0, ans=0.125 2024-09-20 03:13:12,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=802080.0, ans=0.125 2024-09-20 03:13:34,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=802160.0, ans=0.0 2024-09-20 03:13:40,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=802160.0, ans=0.1 2024-09-20 03:13:41,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802160.0, ans=0.1 2024-09-20 03:13:41,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=802160.0, ans=0.0 2024-09-20 03:13:43,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-09-20 03:13:46,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=802160.0, ans=0.2 2024-09-20 03:13:49,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.90 vs. limit=15.0 2024-09-20 03:13:50,619 INFO [train.py:1198] (0/2) Epoch 45, batch 1450, loss[loss=0.2464, ctc_loss=0.1247, cr_loss=0.377, attn_decoder_loss=0.2515, over 29445.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1091, cr_loss=0.3477, attn_decoder_loss=0.2373, over 5804556.60 frames. ], batch size: 94, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:13:50,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=802200.0, ans=0.0 2024-09-20 03:13:55,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=802200.0, ans=0.04949747468305833 2024-09-20 03:14:01,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.07 vs. limit=22.5 2024-09-20 03:14:06,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=802240.0, ans=0.0 2024-09-20 03:14:31,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=802280.0, ans=0.125 2024-09-20 03:14:37,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=802320.0, ans=0.125 2024-09-20 03:14:40,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=802320.0, ans=0.2 2024-09-20 03:14:47,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=802320.0, ans=0.2 2024-09-20 03:14:55,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=802360.0, ans=0.2 2024-09-20 03:15:08,098 INFO [train.py:1198] (0/2) Epoch 45, batch 1500, loss[loss=0.2346, ctc_loss=0.105, cr_loss=0.3379, attn_decoder_loss=0.2415, over 29603.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1089, cr_loss=0.3473, attn_decoder_loss=0.2375, over 5803477.40 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:15:12,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 8.707e+01 9.148e+01 9.626e+01 3.931e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-20 03:15:38,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=802480.0, ans=0.125 2024-09-20 03:16:04,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=802520.0, ans=0.0 2024-09-20 03:16:10,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=802560.0, ans=0.125 2024-09-20 03:16:14,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2024-09-20 03:16:14,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=802560.0, ans=0.0 2024-09-20 03:16:25,826 INFO [train.py:1198] (0/2) Epoch 45, batch 1550, loss[loss=0.2463, ctc_loss=0.1243, cr_loss=0.4003, attn_decoder_loss=0.2509, over 29497.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1095, cr_loss=0.3485, attn_decoder_loss=0.2378, over 5779855.82 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:16:42,746 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:16:47,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802640.0, ans=0.1 2024-09-20 03:16:53,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=802640.0, ans=0.125 2024-09-20 03:17:09,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=802720.0, ans=0.0 2024-09-20 03:17:15,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=802720.0, ans=0.1 2024-09-20 03:17:20,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=802720.0, ans=0.2 2024-09-20 03:17:30,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=802760.0, ans=0.0 2024-09-20 03:17:40,716 INFO [train.py:1198] (0/2) Epoch 45, batch 1600, loss[loss=0.2351, ctc_loss=0.1009, cr_loss=0.3286, attn_decoder_loss=0.2427, over 29680.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1094, cr_loss=0.3476, attn_decoder_loss=0.2375, over 5761621.13 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:17:45,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.517e+01 9.021e+01 9.788e+01 6.298e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-20 03:17:48,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=802800.0, ans=0.125 2024-09-20 03:18:06,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=802840.0, ans=0.07 2024-09-20 03:18:18,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=802880.0, ans=0.2 2024-09-20 03:18:20,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802880.0, ans=0.1 2024-09-20 03:18:57,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.36 vs. limit=12.0 2024-09-20 03:18:58,018 INFO [train.py:1198] (0/2) Epoch 45, batch 1650, loss[loss=0.2283, ctc_loss=0.09796, cr_loss=0.3329, attn_decoder_loss=0.2354, over 29706.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1094, cr_loss=0.3479, attn_decoder_loss=0.2375, over 5757384.65 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:18:58,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=803000.0, ans=0.125 2024-09-20 03:19:10,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=803000.0, ans=0.125 2024-09-20 03:19:13,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803040.0, ans=0.1 2024-09-20 03:19:31,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=803080.0, ans=0.0 2024-09-20 03:19:43,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803120.0, ans=0.1 2024-09-20 03:20:08,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=803160.0, ans=0.05 2024-09-20 03:20:15,553 INFO [train.py:1198] (0/2) Epoch 45, batch 1700, loss[loss=0.1988, ctc_loss=0.08167, cr_loss=0.2862, attn_decoder_loss=0.2054, over 29569.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1091, cr_loss=0.3475, attn_decoder_loss=0.2373, over 5779588.83 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:20:21,495 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.570e+01 9.061e+01 9.508e+01 1.721e+02, threshold=1.812e+02, percent-clipped=0.0 2024-09-20 03:20:43,419 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-20 03:21:26,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=803360.0, ans=0.07 2024-09-20 03:21:30,940 INFO [train.py:1198] (0/2) Epoch 45, batch 1750, loss[loss=0.2085, ctc_loss=0.09586, cr_loss=0.3188, attn_decoder_loss=0.2139, over 29321.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.109, cr_loss=0.3476, attn_decoder_loss=0.2369, over 5787416.11 frames. ], batch size: 67, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:22:08,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=803480.0, ans=0.2 2024-09-20 03:22:19,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=803520.0, ans=0.07 2024-09-20 03:22:47,902 INFO [train.py:1198] (0/2) Epoch 45, batch 1800, loss[loss=0.2469, ctc_loss=0.1194, cr_loss=0.382, attn_decoder_loss=0.2526, over 29683.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1089, cr_loss=0.3474, attn_decoder_loss=0.2372, over 5790952.00 frames. ], batch size: 83, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:22:51,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=803600.0, ans=0.04949747468305833 2024-09-20 03:22:53,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.492e+01 8.891e+01 9.479e+01 1.445e+02, threshold=1.778e+02, percent-clipped=0.0 2024-09-20 03:22:54,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=803600.0, ans=0.125 2024-09-20 03:22:55,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=803600.0, ans=0.125 2024-09-20 03:22:57,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=803600.0, ans=0.2 2024-09-20 03:23:09,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=803640.0, ans=0.07 2024-09-20 03:23:30,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=803680.0, ans=0.2 2024-09-20 03:23:33,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803720.0, ans=0.1 2024-09-20 03:23:53,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=803760.0, ans=0.125 2024-09-20 03:23:53,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=803760.0, ans=0.07 2024-09-20 03:24:02,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803800.0, ans=0.1 2024-09-20 03:24:03,349 INFO [train.py:1198] (0/2) Epoch 45, batch 1850, loss[loss=0.2443, ctc_loss=0.1074, cr_loss=0.3512, attn_decoder_loss=0.2518, over 29620.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3471, attn_decoder_loss=0.237, over 5795680.68 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:24:39,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=803880.0, ans=0.2 2024-09-20 03:24:40,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=803880.0, ans=0.5 2024-09-20 03:24:57,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=803920.0, ans=0.0 2024-09-20 03:24:57,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=803920.0, ans=0.05 2024-09-20 03:25:01,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=803920.0, ans=10.0 2024-09-20 03:25:07,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=12.0 2024-09-20 03:25:20,212 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2024-09-20 03:25:21,002 INFO [train.py:1198] (0/2) Epoch 45, batch 1900, loss[loss=0.2347, ctc_loss=0.1055, cr_loss=0.329, attn_decoder_loss=0.2417, over 29716.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1093, cr_loss=0.3479, attn_decoder_loss=0.2378, over 5804301.59 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:25:27,048 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.514e+01 9.088e+01 9.657e+01 1.546e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-20 03:25:28,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=804000.0, ans=0.125 2024-09-20 03:25:34,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=804040.0, ans=0.125 2024-09-20 03:25:37,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=804040.0, ans=0.125 2024-09-20 03:25:51,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804080.0, ans=0.1 2024-09-20 03:25:59,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=804080.0, ans=0.025 2024-09-20 03:26:16,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=804120.0, ans=0.2 2024-09-20 03:26:31,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=804160.0, ans=0.125 2024-09-20 03:26:34,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-09-20 03:26:38,944 INFO [train.py:1198] (0/2) Epoch 45, batch 1950, loss[loss=0.2209, ctc_loss=0.09681, cr_loss=0.3165, attn_decoder_loss=0.2276, over 29441.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.11, cr_loss=0.3495, attn_decoder_loss=0.2388, over 5819219.77 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:27:25,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=804320.0, ans=0.0 2024-09-20 03:27:35,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=804320.0, ans=0.125 2024-09-20 03:27:42,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804360.0, ans=0.1 2024-09-20 03:27:44,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-09-20 03:27:54,313 INFO [train.py:1198] (0/2) Epoch 45, batch 2000, loss[loss=0.21, ctc_loss=0.09882, cr_loss=0.328, attn_decoder_loss=0.2151, over 29383.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.11, cr_loss=0.3489, attn_decoder_loss=0.2388, over 5798557.57 frames. ], batch size: 67, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:27:59,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-20 03:28:00,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.885e+01 8.761e+01 9.181e+01 9.636e+01 2.089e+02, threshold=1.836e+02, percent-clipped=2.0 2024-09-20 03:28:18,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.04 vs. limit=15.0 2024-09-20 03:28:51,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=804520.0, ans=0.025 2024-09-20 03:29:01,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=804560.0, ans=0.0 2024-09-20 03:29:11,956 INFO [train.py:1198] (0/2) Epoch 45, batch 2050, loss[loss=0.2129, ctc_loss=0.09694, cr_loss=0.3265, attn_decoder_loss=0.2185, over 29430.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1093, cr_loss=0.3477, attn_decoder_loss=0.2379, over 5789925.05 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:29:14,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.95 vs. limit=10.0 2024-09-20 03:29:31,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=804640.0, ans=0.0 2024-09-20 03:30:10,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=804720.0, ans=0.0 2024-09-20 03:30:29,648 INFO [train.py:1198] (0/2) Epoch 45, batch 2100, loss[loss=0.2306, ctc_loss=0.111, cr_loss=0.3513, attn_decoder_loss=0.2361, over 29729.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.109, cr_loss=0.3473, attn_decoder_loss=0.2375, over 5800700.69 frames. ], batch size: 81, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:30:34,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=804800.0, ans=0.125 2024-09-20 03:30:35,559 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.482e+01 9.039e+01 9.529e+01 1.230e+02, threshold=1.808e+02, percent-clipped=0.0 2024-09-20 03:30:46,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=804840.0, ans=0.125 2024-09-20 03:30:49,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=804840.0, ans=0.125 2024-09-20 03:31:04,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=804880.0, ans=0.125 2024-09-20 03:31:04,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.06 vs. limit=15.0 2024-09-20 03:31:10,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=804880.0, ans=0.125 2024-09-20 03:31:44,355 INFO [train.py:1198] (0/2) Epoch 45, batch 2150, loss[loss=0.2297, ctc_loss=0.1159, cr_loss=0.3636, attn_decoder_loss=0.2342, over 29440.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.109, cr_loss=0.3473, attn_decoder_loss=0.2371, over 5813372.65 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 16.0 2024-09-20 03:31:50,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=805000.0, ans=0.025 2024-09-20 03:31:54,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=805000.0, ans=15.0 2024-09-20 03:32:04,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=805040.0, ans=0.125 2024-09-20 03:32:24,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=805080.0, ans=0.2 2024-09-20 03:32:28,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=805080.0, ans=0.125 2024-09-20 03:32:30,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=12.0 2024-09-20 03:32:44,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=805120.0, ans=0.025 2024-09-20 03:33:01,958 INFO [train.py:1198] (0/2) Epoch 45, batch 2200, loss[loss=0.234, ctc_loss=0.1057, cr_loss=0.3542, attn_decoder_loss=0.2404, over 29638.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1093, cr_loss=0.348, attn_decoder_loss=0.2372, over 5809647.24 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 8.0 2024-09-20 03:33:09,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.623e+01 8.976e+01 9.604e+01 3.634e+02, threshold=1.795e+02, percent-clipped=1.0 2024-09-20 03:33:11,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=805200.0, ans=0.2 2024-09-20 03:33:21,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=805240.0, ans=0.0 2024-09-20 03:33:50,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.67 vs. limit=15.0 2024-09-20 03:34:19,497 INFO [train.py:1198] (0/2) Epoch 45, batch 2250, loss[loss=0.2396, ctc_loss=0.1156, cr_loss=0.3636, attn_decoder_loss=0.2452, over 29699.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1089, cr_loss=0.3474, attn_decoder_loss=0.2369, over 5810151.83 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:34:36,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=805440.0, ans=0.1 2024-09-20 03:35:10,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=805520.0, ans=0.05 2024-09-20 03:35:24,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=805560.0, ans=0.125 2024-09-20 03:35:30,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=805560.0, ans=0.125 2024-09-20 03:35:34,915 INFO [train.py:1198] (0/2) Epoch 45, batch 2300, loss[loss=0.209, ctc_loss=0.09004, cr_loss=0.3007, attn_decoder_loss=0.2155, over 29338.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1079, cr_loss=0.345, attn_decoder_loss=0.2358, over 5796926.88 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:35:37,167 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.53 vs. limit=15.0 2024-09-20 03:35:42,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.561e+01 9.011e+01 9.517e+01 1.725e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-20 03:35:42,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=805600.0, ans=0.1 2024-09-20 03:35:45,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=805600.0, ans=0.025 2024-09-20 03:35:51,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=805640.0, ans=0.125 2024-09-20 03:36:11,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=805680.0, ans=0.0 2024-09-20 03:36:34,757 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:36:40,322 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2024-09-20 03:36:52,632 INFO [train.py:1198] (0/2) Epoch 45, batch 2350, loss[loss=0.2544, ctc_loss=0.1277, cr_loss=0.3829, attn_decoder_loss=0.26, over 29716.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1082, cr_loss=0.3453, attn_decoder_loss=0.2362, over 5803671.28 frames. ], batch size: 83, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:36:56,623 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-20 03:37:31,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.53 vs. limit=15.0 2024-09-20 03:37:46,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=805920.0, ans=0.125 2024-09-20 03:37:56,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=805960.0, ans=0.05 2024-09-20 03:38:10,111 INFO [train.py:1198] (0/2) Epoch 45, batch 2400, loss[loss=0.2298, ctc_loss=0.112, cr_loss=0.3674, attn_decoder_loss=0.2347, over 29540.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1086, cr_loss=0.3462, attn_decoder_loss=0.2366, over 5808076.19 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:38:16,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=806000.0, ans=0.95 2024-09-20 03:38:17,562 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.673e+01 8.961e+01 9.495e+01 1.491e+02, threshold=1.792e+02, percent-clipped=0.0 2024-09-20 03:38:33,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=806040.0, ans=0.125 2024-09-20 03:38:45,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=806080.0, ans=0.1 2024-09-20 03:38:52,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=806080.0, ans=0.0 2024-09-20 03:38:55,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=806120.0, ans=0.2 2024-09-20 03:39:15,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806160.0, ans=0.1 2024-09-20 03:39:23,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=806160.0, ans=0.04949747468305833 2024-09-20 03:39:24,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=806200.0, ans=0.125 2024-09-20 03:39:25,820 INFO [train.py:1198] (0/2) Epoch 45, batch 2450, loss[loss=0.2433, ctc_loss=0.1225, cr_loss=0.3638, attn_decoder_loss=0.2487, over 29716.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1092, cr_loss=0.3476, attn_decoder_loss=0.2376, over 5785502.25 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:40:08,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=806280.0, ans=0.2 2024-09-20 03:40:19,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-20 03:40:39,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=806360.0, ans=0.0 2024-09-20 03:40:43,797 INFO [train.py:1198] (0/2) Epoch 45, batch 2500, loss[loss=0.2475, ctc_loss=0.1201, cr_loss=0.382, attn_decoder_loss=0.2531, over 29642.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1096, cr_loss=0.3484, attn_decoder_loss=0.2378, over 5795403.14 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:40:50,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=806400.0, ans=0.125 2024-09-20 03:40:51,304 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.641e+01 9.220e+01 9.804e+01 1.997e+02, threshold=1.844e+02, percent-clipped=2.0 2024-09-20 03:40:59,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=806440.0, ans=0.0 2024-09-20 03:41:01,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-09-20 03:41:01,332 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.94 vs. limit=15.0 2024-09-20 03:41:08,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=806440.0, ans=0.2 2024-09-20 03:41:46,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=806560.0, ans=0.125 2024-09-20 03:42:01,671 INFO [train.py:1198] (0/2) Epoch 45, batch 2550, loss[loss=0.2068, ctc_loss=0.09188, cr_loss=0.3068, attn_decoder_loss=0.2127, over 29353.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1097, cr_loss=0.349, attn_decoder_loss=0.2379, over 5798861.01 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:42:33,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=806680.0, ans=0.125 2024-09-20 03:42:53,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=806720.0, ans=0.0 2024-09-20 03:42:58,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2024-09-20 03:43:10,192 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:43:13,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=806760.0, ans=0.2 2024-09-20 03:43:14,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=806760.0, ans=0.125 2024-09-20 03:43:17,313 INFO [train.py:1198] (0/2) Epoch 45, batch 2600, loss[loss=0.2312, ctc_loss=0.1124, cr_loss=0.361, attn_decoder_loss=0.2364, over 29449.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1101, cr_loss=0.3498, attn_decoder_loss=0.2385, over 5796322.82 frames. ], batch size: 78, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:43:20,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=806800.0, ans=0.0 2024-09-20 03:43:23,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806800.0, ans=0.1 2024-09-20 03:43:26,235 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 8.807e+01 9.340e+01 9.891e+01 1.748e+02, threshold=1.868e+02, percent-clipped=0.0 2024-09-20 03:43:29,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=806800.0, ans=0.09899494936611666 2024-09-20 03:43:32,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806840.0, ans=0.1 2024-09-20 03:43:36,951 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:43:39,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806840.0, ans=0.125 2024-09-20 03:43:47,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=806880.0, ans=0.025 2024-09-20 03:44:00,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=806880.0, ans=0.125 2024-09-20 03:44:03,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=806920.0, ans=0.025 2024-09-20 03:44:03,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=806920.0, ans=0.125 2024-09-20 03:44:03,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=806920.0, ans=0.04949747468305833 2024-09-20 03:44:09,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.12 vs. limit=10.0 2024-09-20 03:44:22,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=806960.0, ans=0.125 2024-09-20 03:44:32,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2024-09-20 03:44:34,361 INFO [train.py:1198] (0/2) Epoch 45, batch 2650, loss[loss=0.2448, ctc_loss=0.1131, cr_loss=0.3674, attn_decoder_loss=0.2512, over 29222.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1101, cr_loss=0.3502, attn_decoder_loss=0.2387, over 5802170.98 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:44:34,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=807000.0, ans=0.025 2024-09-20 03:44:36,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=807000.0, ans=0.0 2024-09-20 03:44:39,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807000.0, ans=0.1 2024-09-20 03:44:48,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=807040.0, ans=0.125 2024-09-20 03:45:29,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=807120.0, ans=0.025 2024-09-20 03:45:31,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-20 03:45:45,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-09-20 03:45:45,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=807160.0, ans=0.0 2024-09-20 03:45:47,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.52 vs. limit=15.0 2024-09-20 03:45:51,986 INFO [train.py:1198] (0/2) Epoch 45, batch 2700, loss[loss=0.2357, ctc_loss=0.1036, cr_loss=0.3395, attn_decoder_loss=0.2428, over 29525.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1101, cr_loss=0.3504, attn_decoder_loss=0.2389, over 5797901.42 frames. ], batch size: 87, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:46:01,055 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.586e+01 9.065e+01 9.630e+01 2.449e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-20 03:46:13,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=807240.0, ans=0.2 2024-09-20 03:46:23,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=807280.0, ans=0.0 2024-09-20 03:46:24,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.46 vs. limit=22.5 2024-09-20 03:46:35,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.60 vs. limit=6.0 2024-09-20 03:46:40,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=807320.0, ans=0.0 2024-09-20 03:46:57,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.00 vs. limit=10.0 2024-09-20 03:47:07,354 INFO [train.py:1198] (0/2) Epoch 45, batch 2750, loss[loss=0.216, ctc_loss=0.09659, cr_loss=0.3186, attn_decoder_loss=0.2222, over 29534.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1095, cr_loss=0.3492, attn_decoder_loss=0.2377, over 5796195.82 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:47:32,114 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-09-20 03:47:33,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=807440.0, ans=0.0 2024-09-20 03:47:38,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-20 03:47:46,459 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:47:57,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=807520.0, ans=0.125 2024-09-20 03:47:59,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-20 03:48:25,194 INFO [train.py:1198] (0/2) Epoch 45, batch 2800, loss[loss=0.2564, ctc_loss=0.1384, cr_loss=0.3842, attn_decoder_loss=0.2609, over 19742.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1098, cr_loss=0.35, attn_decoder_loss=0.2379, over 5777025.72 frames. ], batch size: 209, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:48:31,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=807600.0, ans=0.025 2024-09-20 03:48:34,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.581e+01 9.021e+01 9.905e+01 2.529e+02, threshold=1.804e+02, percent-clipped=2.0 2024-09-20 03:48:42,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.87 vs. limit=22.5 2024-09-20 03:48:43,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=807640.0, ans=0.0 2024-09-20 03:48:48,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807640.0, ans=0.1 2024-09-20 03:48:51,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=807640.0, ans=0.125 2024-09-20 03:48:52,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=807640.0, ans=0.0 2024-09-20 03:49:12,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=807720.0, ans=0.125 2024-09-20 03:49:38,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-20 03:49:42,384 INFO [train.py:1198] (0/2) Epoch 45, batch 2850, loss[loss=0.2191, ctc_loss=0.1001, cr_loss=0.3286, attn_decoder_loss=0.225, over 29500.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.11, cr_loss=0.35, attn_decoder_loss=0.2381, over 5762506.35 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:49:48,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=807800.0, ans=0.0 2024-09-20 03:49:54,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=807800.0, ans=0.125 2024-09-20 03:50:41,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807960.0, ans=0.1 2024-09-20 03:50:57,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.49 vs. limit=22.5 2024-09-20 03:50:58,375 INFO [train.py:1198] (0/2) Epoch 45, batch 2900, loss[loss=0.2288, ctc_loss=0.1037, cr_loss=0.3381, attn_decoder_loss=0.2352, over 29794.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1107, cr_loss=0.3513, attn_decoder_loss=0.2391, over 5788882.80 frames. ], batch size: 80, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:50:59,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.13 vs. limit=10.0 2024-09-20 03:51:06,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.41 vs. limit=10.0 2024-09-20 03:51:07,242 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.821e+01 8.668e+01 9.103e+01 9.766e+01 1.431e+02, threshold=1.821e+02, percent-clipped=0.0 2024-09-20 03:51:19,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=808040.0, ans=0.0 2024-09-20 03:51:51,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=808120.0, ans=0.025 2024-09-20 03:52:15,547 INFO [train.py:1198] (0/2) Epoch 45, batch 2950, loss[loss=0.2198, ctc_loss=0.1048, cr_loss=0.3422, attn_decoder_loss=0.225, over 29513.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1099, cr_loss=0.3488, attn_decoder_loss=0.2379, over 5783407.18 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:52:26,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.80 vs. limit=15.0 2024-09-20 03:52:41,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=808240.0, ans=0.125 2024-09-20 03:52:45,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=808280.0, ans=0.035 2024-09-20 03:52:52,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=808280.0, ans=0.125 2024-09-20 03:52:52,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=808280.0, ans=0.125 2024-09-20 03:52:55,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=808280.0, ans=0.0 2024-09-20 03:53:33,024 INFO [train.py:1198] (0/2) Epoch 45, batch 3000, loss[loss=0.2294, ctc_loss=0.109, cr_loss=0.3489, attn_decoder_loss=0.235, over 29754.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1095, cr_loss=0.3474, attn_decoder_loss=0.2375, over 5783608.43 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:53:33,025 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 03:53:51,274 INFO [train.py:1230] (0/2) Epoch 45, validation: loss=0.213, ctc_loss=0.0366, cr_loss=6.956e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-20 03:53:51,275 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 03:54:00,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.498e+01 9.089e+01 9.590e+01 3.857e+02, threshold=1.818e+02, percent-clipped=2.0 2024-09-20 03:54:05,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=808440.0, ans=0.5 2024-09-20 03:54:29,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=808480.0, ans=0.125 2024-09-20 03:55:05,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=808600.0, ans=0.125 2024-09-20 03:55:06,904 INFO [train.py:1198] (0/2) Epoch 45, batch 3050, loss[loss=0.2248, ctc_loss=0.1063, cr_loss=0.3468, attn_decoder_loss=0.2303, over 29525.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1101, cr_loss=0.3492, attn_decoder_loss=0.2383, over 5778038.17 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:55:32,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2024-09-20 03:55:55,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=808720.0, ans=0.125 2024-09-20 03:56:04,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=808720.0, ans=0.125 2024-09-20 03:56:08,490 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-20 03:56:12,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=808760.0, ans=0.0 2024-09-20 03:56:24,622 INFO [train.py:1198] (0/2) Epoch 45, batch 3100, loss[loss=0.2458, ctc_loss=0.1244, cr_loss=0.3748, attn_decoder_loss=0.2509, over 29251.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1101, cr_loss=0.3495, attn_decoder_loss=0.2379, over 5777083.28 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:56:35,119 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.685e+01 9.291e+01 9.894e+01 1.991e+02, threshold=1.858e+02, percent-clipped=1.0 2024-09-20 03:56:57,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=808880.0, ans=0.125 2024-09-20 03:57:25,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=808960.0, ans=0.035 2024-09-20 03:57:36,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=808960.0, ans=0.025 2024-09-20 03:57:42,020 INFO [train.py:1198] (0/2) Epoch 45, batch 3150, loss[loss=0.2417, ctc_loss=0.1118, cr_loss=0.358, attn_decoder_loss=0.2482, over 28812.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1101, cr_loss=0.3496, attn_decoder_loss=0.2379, over 5783057.78 frames. ], batch size: 104, lr: 2.44e-03, grad_scale: 8.0 2024-09-20 03:57:57,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=809040.0, ans=0.0 2024-09-20 03:58:16,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=809080.0, ans=0.125 2024-09-20 03:58:47,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-20 03:58:56,895 INFO [train.py:1198] (0/2) Epoch 45, batch 3200, loss[loss=0.2353, ctc_loss=0.1071, cr_loss=0.3366, attn_decoder_loss=0.2421, over 29431.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1099, cr_loss=0.3493, attn_decoder_loss=0.2376, over 5793410.18 frames. ], batch size: 79, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 03:59:01,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=809200.0, ans=0.0 2024-09-20 03:59:07,504 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.632e+01 9.218e+01 9.587e+01 1.920e+02, threshold=1.844e+02, percent-clipped=1.0 2024-09-20 03:59:16,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=809240.0, ans=0.0 2024-09-20 03:59:38,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=809280.0, ans=0.0 2024-09-20 03:59:45,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=809320.0, ans=0.125 2024-09-20 03:59:52,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=809320.0, ans=0.125 2024-09-20 04:00:07,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=809360.0, ans=0.125 2024-09-20 04:00:14,541 INFO [train.py:1198] (0/2) Epoch 45, batch 3250, loss[loss=0.2414, ctc_loss=0.1125, cr_loss=0.3642, attn_decoder_loss=0.2476, over 29716.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1101, cr_loss=0.3501, attn_decoder_loss=0.238, over 5800206.74 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:00:14,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=809400.0, ans=0.0 2024-09-20 04:00:16,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=809400.0, ans=0.125 2024-09-20 04:00:32,854 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:00:39,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=809440.0, ans=0.0 2024-09-20 04:00:59,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-09-20 04:01:08,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.83 vs. limit=10.0 2024-09-20 04:01:11,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.54 vs. limit=10.0 2024-09-20 04:01:12,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=809520.0, ans=0.125 2024-09-20 04:01:20,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-20 04:01:32,415 INFO [train.py:1198] (0/2) Epoch 45, batch 3300, loss[loss=0.2406, ctc_loss=0.1133, cr_loss=0.3555, attn_decoder_loss=0.2469, over 28259.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1095, cr_loss=0.349, attn_decoder_loss=0.2369, over 5797535.38 frames. ], batch size: 111, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:01:38,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=809600.0, ans=10.0 2024-09-20 04:01:40,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=809600.0, ans=0.0 2024-09-20 04:01:42,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.585e+01 9.187e+01 9.677e+01 1.727e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-20 04:01:49,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=809640.0, ans=0.125 2024-09-20 04:01:55,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=809640.0, ans=0.125 2024-09-20 04:02:04,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=809680.0, ans=0.1 2024-09-20 04:02:12,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=809680.0, ans=0.125 2024-09-20 04:02:28,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=809720.0, ans=0.125 2024-09-20 04:02:44,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.07 vs. limit=15.0 2024-09-20 04:02:47,607 INFO [train.py:1198] (0/2) Epoch 45, batch 3350, loss[loss=0.2429, ctc_loss=0.1117, cr_loss=0.3456, attn_decoder_loss=0.2498, over 28774.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1103, cr_loss=0.3506, attn_decoder_loss=0.2378, over 5774750.33 frames. ], batch size: 104, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:02:52,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=809800.0, ans=0.1 2024-09-20 04:03:14,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=809840.0, ans=0.1 2024-09-20 04:03:18,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809880.0, ans=0.1 2024-09-20 04:03:26,568 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:04:05,666 INFO [train.py:1198] (0/2) Epoch 45, batch 3400, loss[loss=0.2091, ctc_loss=0.09998, cr_loss=0.3422, attn_decoder_loss=0.2136, over 29354.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1102, cr_loss=0.3501, attn_decoder_loss=0.2376, over 5768057.15 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:04:07,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810000.0, ans=0.1 2024-09-20 04:04:18,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.782e+01 9.254e+01 9.954e+01 2.335e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-20 04:04:33,057 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.22 vs. limit=15.0 2024-09-20 04:05:11,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-09-20 04:05:23,107 INFO [train.py:1198] (0/2) Epoch 45, batch 3450, loss[loss=0.2506, ctc_loss=0.114, cr_loss=0.3571, attn_decoder_loss=0.2578, over 28326.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1103, cr_loss=0.3503, attn_decoder_loss=0.2379, over 5776330.98 frames. ], batch size: 111, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:05:39,263 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-20 04:05:41,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=810240.0, ans=0.125 2024-09-20 04:05:46,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=810240.0, ans=0.09899494936611666 2024-09-20 04:05:55,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=810280.0, ans=0.125 2024-09-20 04:05:58,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810280.0, ans=0.1 2024-09-20 04:06:09,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-20 04:06:32,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=810360.0, ans=0.125 2024-09-20 04:06:38,593 INFO [train.py:1198] (0/2) Epoch 45, batch 3500, loss[loss=0.2118, ctc_loss=0.09721, cr_loss=0.3205, attn_decoder_loss=0.2174, over 29327.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1101, cr_loss=0.3493, attn_decoder_loss=0.2375, over 5776730.58 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:06:49,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.777e+01 9.274e+01 9.867e+01 1.400e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-20 04:06:55,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=810440.0, ans=0.0 2024-09-20 04:06:58,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=810440.0, ans=0.0 2024-09-20 04:07:15,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=810480.0, ans=0.125 2024-09-20 04:07:36,706 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2024-09-20 04:07:55,108 INFO [train.py:1198] (0/2) Epoch 45, batch 3550, loss[loss=0.2477, ctc_loss=0.1201, cr_loss=0.3678, attn_decoder_loss=0.2537, over 29684.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1099, cr_loss=0.3492, attn_decoder_loss=0.2376, over 5783317.18 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:07:56,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=810600.0, ans=0.125 2024-09-20 04:08:01,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=810600.0, ans=0.0 2024-09-20 04:08:10,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=810640.0, ans=0.125 2024-09-20 04:08:18,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=810640.0, ans=0.0 2024-09-20 04:08:22,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=22.5 2024-09-20 04:08:29,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=810680.0, ans=0.0 2024-09-20 04:08:37,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-20 04:08:43,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-20 04:09:00,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=810760.0, ans=0.2 2024-09-20 04:09:02,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=810760.0, ans=0.025 2024-09-20 04:09:02,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810760.0, ans=0.1 2024-09-20 04:09:10,806 INFO [train.py:1198] (0/2) Epoch 45, batch 3600, loss[loss=0.2085, ctc_loss=0.09187, cr_loss=0.2998, attn_decoder_loss=0.2148, over 29507.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1096, cr_loss=0.3485, attn_decoder_loss=0.2373, over 5792992.56 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 32.0 2024-09-20 04:09:15,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=810800.0, ans=0.125 2024-09-20 04:09:22,711 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.599e+01 9.272e+01 9.719e+01 1.680e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-20 04:09:55,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=810920.0, ans=0.1 2024-09-20 04:10:10,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=810960.0, ans=0.125 2024-09-20 04:10:24,865 INFO [train.py:1198] (0/2) Epoch 45, batch 3650, loss[loss=0.2479, ctc_loss=0.1188, cr_loss=0.3688, attn_decoder_loss=0.2541, over 29515.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1092, cr_loss=0.3476, attn_decoder_loss=0.2367, over 5794436.81 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:10:46,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=811040.0, ans=0.04949747468305833 2024-09-20 04:10:48,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811040.0, ans=0.1 2024-09-20 04:10:56,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811080.0, ans=0.1 2024-09-20 04:10:59,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=811080.0, ans=0.2 2024-09-20 04:11:05,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=811080.0, ans=0.0 2024-09-20 04:11:06,251 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.35 vs. limit=22.5 2024-09-20 04:11:20,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=811120.0, ans=0.1 2024-09-20 04:11:27,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=811160.0, ans=0.2 2024-09-20 04:11:39,703 INFO [train.py:1198] (0/2) Epoch 45, batch 3700, loss[loss=0.238, ctc_loss=0.1068, cr_loss=0.3399, attn_decoder_loss=0.245, over 29696.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1093, cr_loss=0.3476, attn_decoder_loss=0.2371, over 5804812.40 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:11:51,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.848e+01 9.366e+01 9.775e+01 1.224e+02, threshold=1.873e+02, percent-clipped=0.0 2024-09-20 04:11:53,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=811240.0, ans=0.0 2024-09-20 04:12:10,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=811280.0, ans=0.125 2024-09-20 04:12:27,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=811320.0, ans=0.125 2024-09-20 04:12:31,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=811320.0, ans=0.0 2024-09-20 04:12:39,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=811360.0, ans=0.125 2024-09-20 04:12:52,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2024-09-20 04:12:54,016 INFO [train.py:1198] (0/2) Epoch 45, batch 3750, loss[loss=0.2073, ctc_loss=0.09693, cr_loss=0.3101, attn_decoder_loss=0.2127, over 29297.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1094, cr_loss=0.3479, attn_decoder_loss=0.237, over 5809097.88 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:13:00,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=811400.0, ans=0.125 2024-09-20 04:13:06,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=811400.0, ans=0.125 2024-09-20 04:13:12,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=811440.0, ans=10.0 2024-09-20 04:13:29,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=811480.0, ans=0.125 2024-09-20 04:13:41,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-09-20 04:13:53,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=811560.0, ans=0.125 2024-09-20 04:14:04,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=811560.0, ans=0.125 2024-09-20 04:14:05,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811560.0, ans=0.1 2024-09-20 04:14:09,800 INFO [train.py:1198] (0/2) Epoch 45, batch 3800, loss[loss=0.246, ctc_loss=0.1116, cr_loss=0.345, attn_decoder_loss=0.2533, over 29624.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1088, cr_loss=0.3468, attn_decoder_loss=0.2366, over 5799778.91 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:14:21,612 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.556e+01 8.957e+01 9.574e+01 2.203e+02, threshold=1.791e+02, percent-clipped=1.0 2024-09-20 04:14:30,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=811640.0, ans=0.07 2024-09-20 04:15:00,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=811720.0, ans=0.125 2024-09-20 04:15:08,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=811720.0, ans=0.125 2024-09-20 04:15:21,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=811760.0, ans=0.0 2024-09-20 04:15:25,792 INFO [train.py:1198] (0/2) Epoch 45, batch 3850, loss[loss=0.2442, ctc_loss=0.1232, cr_loss=0.3752, attn_decoder_loss=0.2493, over 29250.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1089, cr_loss=0.3469, attn_decoder_loss=0.2368, over 5812194.21 frames. ], batch size: 100, lr: 2.44e-03, grad_scale: 16.0 2024-09-20 04:15:36,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=811800.0, ans=0.0 2024-09-20 04:15:40,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=811840.0, ans=0.0 2024-09-20 04:15:51,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=811840.0, ans=0.125 2024-09-20 04:15:55,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=811880.0, ans=0.0 2024-09-20 04:15:57,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-09-20 04:16:00,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2024-09-20 04:16:11,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=811920.0, ans=0.2 2024-09-20 04:16:32,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=811960.0, ans=0.125 2024-09-20 04:16:40,296 INFO [train.py:1198] (0/2) Epoch 45, batch 3900, loss[loss=0.2369, ctc_loss=0.1125, cr_loss=0.3573, attn_decoder_loss=0.2427, over 29627.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1093, cr_loss=0.348, attn_decoder_loss=0.2372, over 5815900.16 frames. ], batch size: 86, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:16:52,122 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.943e+01 8.765e+01 9.119e+01 9.578e+01 1.365e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 04:17:13,080 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:17:23,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=812120.0, ans=0.0 2024-09-20 04:17:43,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=812160.0, ans=0.125 2024-09-20 04:17:46,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=812160.0, ans=0.05 2024-09-20 04:17:54,036 INFO [train.py:1198] (0/2) Epoch 45, batch 3950, loss[loss=0.2413, ctc_loss=0.1133, cr_loss=0.3634, attn_decoder_loss=0.2475, over 29470.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.109, cr_loss=0.3476, attn_decoder_loss=0.2372, over 5835179.26 frames. ], batch size: 97, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:18:17,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-20 04:19:08,738 INFO [train.py:1198] (0/2) Epoch 45, batch 4000, loss[loss=0.2232, ctc_loss=0.1062, cr_loss=0.3362, attn_decoder_loss=0.2287, over 29535.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1092, cr_loss=0.3483, attn_decoder_loss=0.2373, over 5813142.82 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:19:11,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=812400.0, ans=0.125 2024-09-20 04:19:11,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=812400.0, ans=0.125 2024-09-20 04:19:13,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=812400.0, ans=0.2 2024-09-20 04:19:21,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.441e+01 9.012e+01 9.623e+01 3.417e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 04:19:26,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=812440.0, ans=0.125 2024-09-20 04:19:29,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.71 vs. limit=15.0 2024-09-20 04:20:09,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=812560.0, ans=0.0 2024-09-20 04:20:13,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.12 vs. limit=10.0 2024-09-20 04:20:23,544 INFO [train.py:1198] (0/2) Epoch 45, batch 4050, loss[loss=0.2551, ctc_loss=0.1364, cr_loss=0.3795, attn_decoder_loss=0.2599, over 20715.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1094, cr_loss=0.3486, attn_decoder_loss=0.2374, over 5797479.81 frames. ], batch size: 209, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:20:39,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=812640.0, ans=0.125 2024-09-20 04:20:45,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=812640.0, ans=0.125 2024-09-20 04:21:05,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812720.0, ans=0.1 2024-09-20 04:21:11,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=812720.0, ans=0.125 2024-09-20 04:21:33,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=812760.0, ans=0.125 2024-09-20 04:21:35,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=812800.0, ans=0.0 2024-09-20 04:21:36,999 INFO [train.py:1198] (0/2) Epoch 45, batch 4100, loss[loss=0.2432, ctc_loss=0.1201, cr_loss=0.38, attn_decoder_loss=0.2485, over 29508.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1095, cr_loss=0.349, attn_decoder_loss=0.2377, over 5792330.78 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:21:51,600 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.673e+01 9.305e+01 9.853e+01 2.008e+02, threshold=1.861e+02, percent-clipped=1.0 2024-09-20 04:22:16,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=812880.0, ans=0.125 2024-09-20 04:22:31,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=812920.0, ans=0.125 2024-09-20 04:22:31,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=812920.0, ans=0.125 2024-09-20 04:22:41,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=812960.0, ans=0.025 2024-09-20 04:22:50,427 INFO [train.py:1198] (0/2) Epoch 45, batch 4150, loss[loss=0.2273, ctc_loss=0.1143, cr_loss=0.3697, attn_decoder_loss=0.2316, over 29528.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1094, cr_loss=0.3484, attn_decoder_loss=0.2374, over 5797807.04 frames. ], batch size: 77, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:22:55,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=813000.0, ans=0.125 2024-09-20 04:23:07,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.94 vs. limit=15.0 2024-09-20 04:23:08,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=813040.0, ans=0.0 2024-09-20 04:23:09,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=813040.0, ans=0.1 2024-09-20 04:23:41,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=813120.0, ans=0.5 2024-09-20 04:23:54,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=813160.0, ans=0.0 2024-09-20 04:24:06,123 INFO [train.py:1198] (0/2) Epoch 45, batch 4200, loss[loss=0.2505, ctc_loss=0.1294, cr_loss=0.3986, attn_decoder_loss=0.2551, over 29495.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1093, cr_loss=0.3481, attn_decoder_loss=0.2375, over 5800542.81 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:24:20,925 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.639e+01 8.983e+01 9.636e+01 1.465e+02, threshold=1.797e+02, percent-clipped=0.0 2024-09-20 04:24:40,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-20 04:25:00,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813320.0, ans=0.1 2024-09-20 04:25:04,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-20 04:25:12,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=813360.0, ans=0.125 2024-09-20 04:25:13,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=813360.0, ans=0.1 2024-09-20 04:25:13,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=813360.0, ans=0.0 2024-09-20 04:25:14,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.23 vs. limit=10.0 2024-09-20 04:25:19,311 INFO [train.py:1198] (0/2) Epoch 45, batch 4250, loss[loss=0.2129, ctc_loss=0.08413, cr_loss=0.2994, attn_decoder_loss=0.2205, over 29533.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.109, cr_loss=0.3478, attn_decoder_loss=0.2376, over 5806220.48 frames. ], batch size: 74, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:25:22,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=813400.0, ans=0.0 2024-09-20 04:25:31,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=813400.0, ans=0.2 2024-09-20 04:25:32,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=813440.0, ans=0.125 2024-09-20 04:25:33,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=813440.0, ans=0.0 2024-09-20 04:25:36,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=813440.0, ans=0.125 2024-09-20 04:25:45,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=813440.0, ans=0.0 2024-09-20 04:26:03,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-09-20 04:26:06,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.69 vs. limit=10.0 2024-09-20 04:26:09,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=813520.0, ans=10.0 2024-09-20 04:26:14,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=813520.0, ans=0.025 2024-09-20 04:26:22,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=813560.0, ans=0.04949747468305833 2024-09-20 04:26:26,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813560.0, ans=0.1 2024-09-20 04:26:29,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=813560.0, ans=0.0 2024-09-20 04:26:32,857 INFO [train.py:1198] (0/2) Epoch 45, batch 4300, loss[loss=0.231, ctc_loss=0.1076, cr_loss=0.3264, attn_decoder_loss=0.2374, over 29519.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1092, cr_loss=0.3483, attn_decoder_loss=0.238, over 5795193.89 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:26:42,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813600.0, ans=0.1 2024-09-20 04:26:43,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-09-20 04:26:47,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.880e+01 9.464e+01 1.001e+02 2.468e+02, threshold=1.893e+02, percent-clipped=1.0 2024-09-20 04:26:49,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=813640.0, ans=0.125 2024-09-20 04:27:34,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=813760.0, ans=0.1 2024-09-20 04:27:36,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-20 04:27:39,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=813760.0, ans=0.0 2024-09-20 04:27:48,500 INFO [train.py:1198] (0/2) Epoch 45, batch 4350, loss[loss=0.2537, ctc_loss=0.1315, cr_loss=0.3866, attn_decoder_loss=0.2586, over 29436.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1117, cr_loss=0.3539, attn_decoder_loss=0.241, over 5797797.70 frames. ], batch size: 97, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:28:13,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=813840.0, ans=0.0 2024-09-20 04:28:25,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=813880.0, ans=0.125 2024-09-20 04:28:47,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=813960.0, ans=0.125 2024-09-20 04:28:47,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=15.0 2024-09-20 04:29:01,436 INFO [train.py:1198] (0/2) Epoch 45, batch 4400, loss[loss=0.2394, ctc_loss=0.1166, cr_loss=0.369, attn_decoder_loss=0.2448, over 27438.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1129, cr_loss=0.356, attn_decoder_loss=0.2429, over 5769621.48 frames. ], batch size: 125, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:29:04,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-20 04:29:11,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814000.0, ans=0.1 2024-09-20 04:29:15,735 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.358e+01 9.080e+01 9.421e+01 9.945e+01 1.972e+02, threshold=1.884e+02, percent-clipped=1.0 2024-09-20 04:29:26,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=814040.0, ans=0.125 2024-09-20 04:29:27,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=814040.0, ans=0.125 2024-09-20 04:30:15,985 INFO [train.py:1198] (0/2) Epoch 45, batch 4450, loss[loss=0.2485, ctc_loss=0.1401, cr_loss=0.404, attn_decoder_loss=0.2516, over 19866.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1161, cr_loss=0.3613, attn_decoder_loss=0.2449, over 5581693.95 frames. ], batch size: 210, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 04:30:20,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=814200.0, ans=0.125 2024-09-20 04:30:31,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=814240.0, ans=0.025 2024-09-20 04:30:37,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814240.0, ans=0.1 2024-09-20 04:30:40,542 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2024-09-20 04:30:49,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=814280.0, ans=0.125 2024-09-20 04:31:11,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=814320.0, ans=0.2 2024-09-20 04:31:15,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814360.0, ans=0.125 2024-09-20 04:31:21,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=814360.0, ans=0.025 2024-09-20 04:31:24,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=814360.0, ans=0.05 2024-09-20 04:31:31,833 INFO [train.py:1198] (0/2) Epoch 45, batch 4500, loss[loss=0.2495, ctc_loss=0.1297, cr_loss=0.3818, attn_decoder_loss=0.2544, over 19585.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1189, cr_loss=0.3637, attn_decoder_loss=0.2466, over 5238162.55 frames. ], batch size: 209, lr: 2.43e-03, grad_scale: 8.0 2024-09-20 04:31:39,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=814400.0, ans=0.125 2024-09-20 04:31:46,992 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:31:48,022 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.758e+01 1.032e+02 1.137e+02 1.254e+02 4.078e+02, threshold=2.275e+02, percent-clipped=1.0 2024-09-20 04:32:08,830 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-45.pt 2024-09-20 04:32:47,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-09-20 04:32:47,514 INFO [train.py:1198] (0/2) Epoch 46, batch 0, loss[loss=0.2135, ctc_loss=0.0987, cr_loss=0.3329, attn_decoder_loss=0.2188, over 29619.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.0987, cr_loss=0.3329, attn_decoder_loss=0.2188, over 29619.00 frames. ], batch size: 73, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:32:47,515 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 04:33:04,846 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.5135, 5.3068, 5.1238, 4.7433], device='cuda:0') 2024-09-20 04:33:07,329 INFO [train.py:1230] (0/2) Epoch 46, validation: loss=0.2132, ctc_loss=0.03625, cr_loss=6.411e-15, attn_decoder_loss=0.2328, over 944034.00 frames. 2024-09-20 04:33:07,329 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 04:33:09,149 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=814500.0, ans=0.0 2024-09-20 04:33:12,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=814500.0, ans=0.125 2024-09-20 04:33:16,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=814500.0, ans=0.125 2024-09-20 04:33:18,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=814500.0, ans=0.2 2024-09-20 04:33:36,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=814580.0, ans=0.125 2024-09-20 04:33:45,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=814580.0, ans=0.2 2024-09-20 04:33:50,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.01 vs. limit=15.0 2024-09-20 04:33:51,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=12.0 2024-09-20 04:34:24,686 INFO [train.py:1198] (0/2) Epoch 46, batch 50, loss[loss=0.2025, ctc_loss=0.09474, cr_loss=0.3192, attn_decoder_loss=0.2074, over 29409.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1114, cr_loss=0.3504, attn_decoder_loss=0.2382, over 1268269.42 frames. ], batch size: 70, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:35:19,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 8.768e+01 9.324e+01 1.041e+02 2.439e+02, threshold=1.865e+02, percent-clipped=1.0 2024-09-20 04:35:26,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=814860.0, ans=0.0 2024-09-20 04:35:41,095 INFO [train.py:1198] (0/2) Epoch 46, batch 100, loss[loss=0.221, ctc_loss=0.1046, cr_loss=0.3381, attn_decoder_loss=0.2264, over 29548.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1126, cr_loss=0.3537, attn_decoder_loss=0.24, over 2253032.30 frames. ], batch size: 76, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:35:42,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=814900.0, ans=0.125 2024-09-20 04:35:52,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2024-09-20 04:35:59,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=814940.0, ans=0.125 2024-09-20 04:36:00,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=814940.0, ans=0.0 2024-09-20 04:36:21,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=814980.0, ans=0.125 2024-09-20 04:36:35,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.48 vs. limit=15.0 2024-09-20 04:36:36,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=815020.0, ans=0.2 2024-09-20 04:36:55,412 INFO [train.py:1198] (0/2) Epoch 46, batch 150, loss[loss=0.2052, ctc_loss=0.09145, cr_loss=0.3075, attn_decoder_loss=0.211, over 29418.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1099, cr_loss=0.3488, attn_decoder_loss=0.2376, over 3047719.32 frames. ], batch size: 70, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:37:09,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=815140.0, ans=0.0 2024-09-20 04:37:51,970 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.420e+01 9.019e+01 9.584e+01 1.300e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 04:37:55,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=815220.0, ans=0.125 2024-09-20 04:38:01,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=815260.0, ans=0.125 2024-09-20 04:38:12,905 INFO [train.py:1198] (0/2) Epoch 46, batch 200, loss[loss=0.2384, ctc_loss=0.116, cr_loss=0.3758, attn_decoder_loss=0.2436, over 27307.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1094, cr_loss=0.3481, attn_decoder_loss=0.2372, over 3660184.39 frames. ], batch size: 124, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:38:16,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=12.0 2024-09-20 04:38:20,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=815300.0, ans=0.125 2024-09-20 04:38:26,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=815340.0, ans=0.125 2024-09-20 04:38:32,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=815340.0, ans=0.0 2024-09-20 04:38:34,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-20 04:38:49,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=815380.0, ans=0.125 2024-09-20 04:38:52,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=815380.0, ans=0.125 2024-09-20 04:38:53,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=815380.0, ans=0.125 2024-09-20 04:39:16,316 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=15.0 2024-09-20 04:39:21,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=815460.0, ans=0.04949747468305833 2024-09-20 04:39:30,377 INFO [train.py:1198] (0/2) Epoch 46, batch 250, loss[loss=0.2511, ctc_loss=0.126, cr_loss=0.3759, attn_decoder_loss=0.2566, over 29262.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1086, cr_loss=0.3467, attn_decoder_loss=0.2368, over 4141981.70 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:39:33,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=815500.0, ans=0.025 2024-09-20 04:39:37,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-09-20 04:39:47,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=815540.0, ans=0.05 2024-09-20 04:40:02,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=815580.0, ans=15.0 2024-09-20 04:40:06,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=815580.0, ans=0.1 2024-09-20 04:40:11,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=815580.0, ans=0.125 2024-09-20 04:40:17,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=815620.0, ans=0.0 2024-09-20 04:40:24,347 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.615e+01 9.020e+01 9.569e+01 1.385e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 04:40:45,339 INFO [train.py:1198] (0/2) Epoch 46, batch 300, loss[loss=0.2549, ctc_loss=0.133, cr_loss=0.3967, attn_decoder_loss=0.2596, over 29520.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1088, cr_loss=0.3471, attn_decoder_loss=0.2369, over 4512202.25 frames. ], batch size: 92, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:40:45,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=815700.0, ans=0.125 2024-09-20 04:40:59,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815740.0, ans=0.1 2024-09-20 04:41:08,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.07 vs. limit=15.0 2024-09-20 04:41:09,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=815740.0, ans=0.1 2024-09-20 04:42:02,790 INFO [train.py:1198] (0/2) Epoch 46, batch 350, loss[loss=0.2118, ctc_loss=0.08988, cr_loss=0.3069, attn_decoder_loss=0.2186, over 29328.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1092, cr_loss=0.348, attn_decoder_loss=0.2378, over 4797694.01 frames. ], batch size: 71, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:42:08,551 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=22.5 2024-09-20 04:42:09,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=815900.0, ans=0.05 2024-09-20 04:42:09,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-20 04:42:25,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.54 vs. limit=15.0 2024-09-20 04:42:37,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815980.0, ans=0.1 2024-09-20 04:42:39,132 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-204000.pt 2024-09-20 04:42:52,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=815980.0, ans=0.0 2024-09-20 04:42:56,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-20 04:43:04,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.751e+01 9.027e+01 9.740e+01 2.091e+02, threshold=1.805e+02, percent-clipped=1.0 2024-09-20 04:43:13,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=816060.0, ans=0.125 2024-09-20 04:43:17,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=816060.0, ans=0.125 2024-09-20 04:43:27,883 INFO [train.py:1198] (0/2) Epoch 46, batch 400, loss[loss=0.2409, ctc_loss=0.1145, cr_loss=0.3577, attn_decoder_loss=0.247, over 29702.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1085, cr_loss=0.3462, attn_decoder_loss=0.2372, over 5027050.34 frames. ], batch size: 82, lr: 2.40e-03, grad_scale: 32.0 2024-09-20 04:43:32,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=816100.0, ans=0.125 2024-09-20 04:43:47,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=816140.0, ans=0.5 2024-09-20 04:43:49,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=816140.0, ans=0.2 2024-09-20 04:44:12,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816220.0, ans=0.125 2024-09-20 04:44:15,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=816220.0, ans=15.0 2024-09-20 04:44:32,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=816260.0, ans=0.0 2024-09-20 04:44:33,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=816260.0, ans=0.025 2024-09-20 04:44:36,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=816260.0, ans=0.125 2024-09-20 04:44:43,873 INFO [train.py:1198] (0/2) Epoch 46, batch 450, loss[loss=0.2432, ctc_loss=0.121, cr_loss=0.3785, attn_decoder_loss=0.2484, over 29707.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1088, cr_loss=0.3469, attn_decoder_loss=0.2376, over 5189556.49 frames. ], batch size: 83, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:44:59,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=816340.0, ans=0.0 2024-09-20 04:45:11,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=816340.0, ans=0.0 2024-09-20 04:45:19,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.08 vs. limit=22.5 2024-09-20 04:45:40,120 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.630e+01 9.037e+01 9.631e+01 6.120e+02, threshold=1.807e+02, percent-clipped=1.0 2024-09-20 04:45:43,688 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:45:58,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-20 04:46:02,236 INFO [train.py:1198] (0/2) Epoch 46, batch 500, loss[loss=0.2487, ctc_loss=0.1271, cr_loss=0.4, attn_decoder_loss=0.2533, over 29425.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1079, cr_loss=0.3452, attn_decoder_loss=0.2364, over 5331717.19 frames. ], batch size: 94, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:46:06,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=816500.0, ans=0.015 2024-09-20 04:46:32,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=816580.0, ans=0.2 2024-09-20 04:46:35,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=816580.0, ans=0.0 2024-09-20 04:46:37,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=816580.0, ans=0.0 2024-09-20 04:46:53,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-20 04:47:01,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816660.0, ans=0.1 2024-09-20 04:47:20,144 INFO [train.py:1198] (0/2) Epoch 46, batch 550, loss[loss=0.2445, ctc_loss=0.1162, cr_loss=0.3583, attn_decoder_loss=0.2507, over 28832.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.108, cr_loss=0.3454, attn_decoder_loss=0.2363, over 5423114.66 frames. ], batch size: 104, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:47:28,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=816700.0, ans=0.1 2024-09-20 04:47:31,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=816700.0, ans=0.07 2024-09-20 04:47:32,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=816700.0, ans=0.0 2024-09-20 04:47:37,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=816740.0, ans=0.2 2024-09-20 04:47:38,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=816740.0, ans=0.125 2024-09-20 04:47:40,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=816740.0, ans=0.0 2024-09-20 04:47:47,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=816740.0, ans=0.2 2024-09-20 04:47:57,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-09-20 04:48:16,368 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.519e+01 9.115e+01 9.608e+01 2.263e+02, threshold=1.823e+02, percent-clipped=2.0 2024-09-20 04:48:33,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816860.0, ans=0.1 2024-09-20 04:48:36,083 INFO [train.py:1198] (0/2) Epoch 46, batch 600, loss[loss=0.2451, ctc_loss=0.1222, cr_loss=0.3752, attn_decoder_loss=0.2504, over 29303.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.108, cr_loss=0.3455, attn_decoder_loss=0.2364, over 5509249.85 frames. ], batch size: 100, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:48:45,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=816900.0, ans=0.125 2024-09-20 04:48:47,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-09-20 04:49:09,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=816980.0, ans=0.025 2024-09-20 04:49:10,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=816980.0, ans=0.025 2024-09-20 04:49:13,130 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=22.5 2024-09-20 04:49:49,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-09-20 04:49:53,434 INFO [train.py:1198] (0/2) Epoch 46, batch 650, loss[loss=0.232, ctc_loss=0.1035, cr_loss=0.3241, attn_decoder_loss=0.2391, over 29737.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1072, cr_loss=0.3444, attn_decoder_loss=0.236, over 5586977.24 frames. ], batch size: 81, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:50:11,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=817140.0, ans=0.125 2024-09-20 04:50:15,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-09-20 04:50:16,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=817140.0, ans=0.2 2024-09-20 04:50:16,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=817140.0, ans=0.07 2024-09-20 04:50:23,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=817180.0, ans=0.125 2024-09-20 04:50:24,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=817180.0, ans=0.0 2024-09-20 04:50:36,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=817180.0, ans=0.125 2024-09-20 04:50:45,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.90 vs. limit=15.0 2024-09-20 04:50:49,216 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.319e+01 8.831e+01 9.492e+01 1.301e+02, threshold=1.766e+02, percent-clipped=0.0 2024-09-20 04:51:08,832 INFO [train.py:1198] (0/2) Epoch 46, batch 700, loss[loss=0.2263, ctc_loss=0.1081, cr_loss=0.349, attn_decoder_loss=0.2316, over 29540.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3455, attn_decoder_loss=0.2366, over 5638544.29 frames. ], batch size: 76, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:51:13,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=817300.0, ans=0.0 2024-09-20 04:51:30,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-09-20 04:51:48,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2024-09-20 04:51:49,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=817380.0, ans=0.125 2024-09-20 04:52:04,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=817420.0, ans=0.0 2024-09-20 04:52:10,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=817460.0, ans=0.125 2024-09-20 04:52:27,358 INFO [train.py:1198] (0/2) Epoch 46, batch 750, loss[loss=0.2313, ctc_loss=0.1005, cr_loss=0.3292, attn_decoder_loss=0.2385, over 29722.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1075, cr_loss=0.3448, attn_decoder_loss=0.2361, over 5676713.41 frames. ], batch size: 82, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:52:33,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=817500.0, ans=0.125 2024-09-20 04:52:34,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-09-20 04:52:44,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=817540.0, ans=0.125 2024-09-20 04:53:10,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=817580.0, ans=0.0 2024-09-20 04:53:23,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.548e+01 9.089e+01 9.698e+01 1.282e+02, threshold=1.818e+02, percent-clipped=0.0 2024-09-20 04:53:24,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.33 vs. limit=22.5 2024-09-20 04:53:40,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=817660.0, ans=0.125 2024-09-20 04:53:43,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=817700.0, ans=0.125 2024-09-20 04:53:44,978 INFO [train.py:1198] (0/2) Epoch 46, batch 800, loss[loss=0.2123, ctc_loss=0.0932, cr_loss=0.3111, attn_decoder_loss=0.2186, over 29613.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1077, cr_loss=0.3454, attn_decoder_loss=0.2361, over 5706339.64 frames. ], batch size: 73, lr: 2.40e-03, grad_scale: 32.0 2024-09-20 04:53:52,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2024-09-20 04:53:58,036 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2024-09-20 04:53:58,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=817740.0, ans=0.5 2024-09-20 04:54:03,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.97 vs. limit=15.0 2024-09-20 04:54:19,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=817780.0, ans=0.125 2024-09-20 04:54:23,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-20 04:54:27,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-09-20 04:54:38,161 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:54:40,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=817820.0, ans=0.125 2024-09-20 04:55:00,147 INFO [train.py:1198] (0/2) Epoch 46, batch 850, loss[loss=0.2422, ctc_loss=0.1159, cr_loss=0.3637, attn_decoder_loss=0.2482, over 29723.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1074, cr_loss=0.3449, attn_decoder_loss=0.2358, over 5734901.00 frames. ], batch size: 89, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:55:11,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-09-20 04:55:13,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=817940.0, ans=0.125 2024-09-20 04:55:19,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=817940.0, ans=0.0 2024-09-20 04:55:26,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817940.0, ans=0.1 2024-09-20 04:55:29,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=15.0 2024-09-20 04:55:36,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=817980.0, ans=0.125 2024-09-20 04:55:45,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=817980.0, ans=0.0 2024-09-20 04:55:52,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=818020.0, ans=0.0 2024-09-20 04:55:57,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=818020.0, ans=0.0 2024-09-20 04:55:59,849 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.524e+01 9.066e+01 9.505e+01 2.667e+02, threshold=1.813e+02, percent-clipped=1.0 2024-09-20 04:56:18,041 INFO [train.py:1198] (0/2) Epoch 46, batch 900, loss[loss=0.2131, ctc_loss=0.08737, cr_loss=0.2951, attn_decoder_loss=0.2205, over 29600.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1078, cr_loss=0.3455, attn_decoder_loss=0.2365, over 5739987.75 frames. ], batch size: 73, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:56:33,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=818140.0, ans=0.125 2024-09-20 04:56:57,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=818180.0, ans=0.035 2024-09-20 04:57:18,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=818260.0, ans=0.125 2024-09-20 04:57:19,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=818260.0, ans=0.0 2024-09-20 04:57:30,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=818260.0, ans=0.125 2024-09-20 04:57:34,894 INFO [train.py:1198] (0/2) Epoch 46, batch 950, loss[loss=0.2148, ctc_loss=0.08928, cr_loss=0.2967, attn_decoder_loss=0.2222, over 29524.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.108, cr_loss=0.3455, attn_decoder_loss=0.2368, over 5742962.42 frames. ], batch size: 74, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:57:42,599 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:57:44,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=818300.0, ans=0.0 2024-09-20 04:58:09,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=818380.0, ans=0.0 2024-09-20 04:58:32,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.692e+01 9.271e+01 9.926e+01 1.686e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-20 04:58:49,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=818500.0, ans=0.5 2024-09-20 04:58:50,240 INFO [train.py:1198] (0/2) Epoch 46, batch 1000, loss[loss=0.2208, ctc_loss=0.09965, cr_loss=0.3393, attn_decoder_loss=0.2267, over 29508.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1086, cr_loss=0.3464, attn_decoder_loss=0.2375, over 5736594.08 frames. ], batch size: 77, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 04:59:54,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=818660.0, ans=0.05 2024-09-20 05:00:07,746 INFO [train.py:1198] (0/2) Epoch 46, batch 1050, loss[loss=0.2407, ctc_loss=0.1176, cr_loss=0.3665, attn_decoder_loss=0.2462, over 29658.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1084, cr_loss=0.3459, attn_decoder_loss=0.237, over 5743532.19 frames. ], batch size: 85, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:00:15,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818700.0, ans=0.1 2024-09-20 05:00:17,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2024-09-20 05:00:37,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.53 vs. limit=15.0 2024-09-20 05:01:05,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.722e+01 9.094e+01 9.715e+01 1.593e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 05:01:11,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=818860.0, ans=0.2 2024-09-20 05:01:25,796 INFO [train.py:1198] (0/2) Epoch 46, batch 1100, loss[loss=0.232, ctc_loss=0.1095, cr_loss=0.3682, attn_decoder_loss=0.2375, over 29469.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1085, cr_loss=0.3466, attn_decoder_loss=0.2371, over 5755735.45 frames. ], batch size: 78, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:01:36,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=818900.0, ans=0.125 2024-09-20 05:01:47,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=818940.0, ans=0.2 2024-09-20 05:01:51,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=818940.0, ans=0.025 2024-09-20 05:01:56,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=818980.0, ans=0.125 2024-09-20 05:02:02,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=818980.0, ans=0.0 2024-09-20 05:02:05,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=818980.0, ans=0.125 2024-09-20 05:02:19,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=819020.0, ans=0.0 2024-09-20 05:02:41,364 INFO [train.py:1198] (0/2) Epoch 46, batch 1150, loss[loss=0.2247, ctc_loss=0.1119, cr_loss=0.3543, attn_decoder_loss=0.2294, over 29433.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1086, cr_loss=0.3462, attn_decoder_loss=0.2368, over 5754864.75 frames. ], batch size: 78, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:02:51,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=819100.0, ans=0.0 2024-09-20 05:03:17,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=819180.0, ans=0.125 2024-09-20 05:03:17,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=819180.0, ans=0.125 2024-09-20 05:03:35,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=12.0 2024-09-20 05:03:41,726 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.675e+01 9.165e+01 9.732e+01 5.471e+02, threshold=1.833e+02, percent-clipped=2.0 2024-09-20 05:03:59,540 INFO [train.py:1198] (0/2) Epoch 46, batch 1200, loss[loss=0.2251, ctc_loss=0.09975, cr_loss=0.3244, attn_decoder_loss=0.2318, over 29703.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1087, cr_loss=0.3467, attn_decoder_loss=0.2373, over 5748693.47 frames. ], batch size: 85, lr: 2.40e-03, grad_scale: 32.0 2024-09-20 05:04:17,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=819340.0, ans=0.0 2024-09-20 05:04:37,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819380.0, ans=0.1 2024-09-20 05:05:02,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=819460.0, ans=0.0 2024-09-20 05:05:08,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=819460.0, ans=0.1 2024-09-20 05:05:17,174 INFO [train.py:1198] (0/2) Epoch 46, batch 1250, loss[loss=0.2583, ctc_loss=0.1331, cr_loss=0.4009, attn_decoder_loss=0.2633, over 29487.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1096, cr_loss=0.3486, attn_decoder_loss=0.2382, over 5775356.27 frames. ], batch size: 92, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:05:53,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=819580.0, ans=0.125 2024-09-20 05:06:16,052 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.479e+01 9.052e+01 9.530e+01 1.493e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-20 05:06:32,583 INFO [train.py:1198] (0/2) Epoch 46, batch 1300, loss[loss=0.2508, ctc_loss=0.1151, cr_loss=0.3614, attn_decoder_loss=0.2578, over 28240.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1092, cr_loss=0.3477, attn_decoder_loss=0.2376, over 5778317.20 frames. ], batch size: 111, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:06:40,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=819700.0, ans=0.0 2024-09-20 05:06:44,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=819700.0, ans=0.0 2024-09-20 05:07:01,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=819780.0, ans=10.0 2024-09-20 05:07:14,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=819780.0, ans=0.1 2024-09-20 05:07:15,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=819780.0, ans=0.125 2024-09-20 05:07:18,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-20 05:07:20,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=819820.0, ans=0.0 2024-09-20 05:07:29,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=819820.0, ans=0.0 2024-09-20 05:07:40,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=819860.0, ans=0.125 2024-09-20 05:07:41,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-20 05:07:49,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-20 05:07:50,558 INFO [train.py:1198] (0/2) Epoch 46, batch 1350, loss[loss=0.2356, ctc_loss=0.1143, cr_loss=0.3571, attn_decoder_loss=0.2412, over 29752.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1086, cr_loss=0.3472, attn_decoder_loss=0.2372, over 5795096.60 frames. ], batch size: 81, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:07:58,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819900.0, ans=0.1 2024-09-20 05:08:19,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=819980.0, ans=0.125 2024-09-20 05:08:32,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=819980.0, ans=22.5 2024-09-20 05:08:49,306 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.548e+01 9.009e+01 9.440e+01 1.283e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-20 05:08:51,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-09-20 05:08:54,268 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:09:05,332 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:09:06,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=820100.0, ans=0.05 2024-09-20 05:09:08,086 INFO [train.py:1198] (0/2) Epoch 46, batch 1400, loss[loss=0.1997, ctc_loss=0.08563, cr_loss=0.2873, attn_decoder_loss=0.2059, over 29591.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1082, cr_loss=0.346, attn_decoder_loss=0.2369, over 5806080.97 frames. ], batch size: 69, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:09:17,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=820100.0, ans=0.0 2024-09-20 05:09:18,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=820100.0, ans=0.125 2024-09-20 05:09:21,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-09-20 05:09:35,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820140.0, ans=0.1 2024-09-20 05:09:36,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=820180.0, ans=0.0 2024-09-20 05:09:37,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=820180.0, ans=0.125 2024-09-20 05:09:38,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=820180.0, ans=0.025 2024-09-20 05:09:47,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=820180.0, ans=0.125 2024-09-20 05:09:51,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-20 05:09:51,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-09-20 05:09:55,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=820220.0, ans=0.0 2024-09-20 05:10:20,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=820260.0, ans=0.2 2024-09-20 05:10:23,185 INFO [train.py:1198] (0/2) Epoch 46, batch 1450, loss[loss=0.2482, ctc_loss=0.1225, cr_loss=0.3836, attn_decoder_loss=0.2537, over 29431.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1082, cr_loss=0.3463, attn_decoder_loss=0.2371, over 5802956.93 frames. ], batch size: 94, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:10:27,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=820300.0, ans=0.0 2024-09-20 05:10:35,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=820300.0, ans=0.0 2024-09-20 05:10:41,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=820340.0, ans=0.1 2024-09-20 05:10:46,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=820340.0, ans=0.0 2024-09-20 05:11:03,244 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-09-20 05:11:10,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=820420.0, ans=0.125 2024-09-20 05:11:20,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-09-20 05:11:23,832 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.745e+01 8.716e+01 9.154e+01 9.658e+01 1.732e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-20 05:11:24,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=820460.0, ans=0.07 2024-09-20 05:11:26,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.12 vs. limit=15.0 2024-09-20 05:11:40,539 INFO [train.py:1198] (0/2) Epoch 46, batch 1500, loss[loss=0.2429, ctc_loss=0.1172, cr_loss=0.3587, attn_decoder_loss=0.2489, over 29619.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.109, cr_loss=0.3482, attn_decoder_loss=0.2378, over 5804043.11 frames. ], batch size: 86, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:11:56,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=820540.0, ans=0.5 2024-09-20 05:11:58,144 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=22.5 2024-09-20 05:11:59,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=820540.0, ans=0.0 2024-09-20 05:12:09,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=820580.0, ans=0.125 2024-09-20 05:12:11,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=820580.0, ans=0.125 2024-09-20 05:12:40,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=820660.0, ans=15.0 2024-09-20 05:12:41,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=820660.0, ans=0.0 2024-09-20 05:12:44,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=820660.0, ans=0.2 2024-09-20 05:12:57,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=820700.0, ans=0.0 2024-09-20 05:12:58,449 INFO [train.py:1198] (0/2) Epoch 46, batch 1550, loss[loss=0.2467, ctc_loss=0.1291, cr_loss=0.3944, attn_decoder_loss=0.251, over 29499.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1095, cr_loss=0.3492, attn_decoder_loss=0.2379, over 5780323.36 frames. ], batch size: 90, lr: 2.40e-03, grad_scale: 16.0 2024-09-20 05:13:17,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=820740.0, ans=22.5 2024-09-20 05:13:18,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2024-09-20 05:13:20,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.03 vs. limit=12.0 2024-09-20 05:13:27,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=820780.0, ans=0.025 2024-09-20 05:13:31,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=820780.0, ans=0.125 2024-09-20 05:13:56,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=820820.0, ans=0.0 2024-09-20 05:13:57,690 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.666e+01 9.165e+01 9.955e+01 1.733e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-20 05:14:05,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=820860.0, ans=0.125 2024-09-20 05:14:14,135 INFO [train.py:1198] (0/2) Epoch 46, batch 1600, loss[loss=0.2353, ctc_loss=0.1091, cr_loss=0.3541, attn_decoder_loss=0.2415, over 29670.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1093, cr_loss=0.3481, attn_decoder_loss=0.2374, over 5763542.32 frames. ], batch size: 85, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 05:14:38,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=820940.0, ans=0.125 2024-09-20 05:15:10,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=821020.0, ans=0.1 2024-09-20 05:15:22,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=821060.0, ans=0.95 2024-09-20 05:15:31,302 INFO [train.py:1198] (0/2) Epoch 46, batch 1650, loss[loss=0.2287, ctc_loss=0.09893, cr_loss=0.338, attn_decoder_loss=0.2356, over 29693.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1092, cr_loss=0.348, attn_decoder_loss=0.2375, over 5756542.63 frames. ], batch size: 89, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:15:33,900 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2024-09-20 05:15:39,307 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:15:49,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=821140.0, ans=0.07 2024-09-20 05:15:49,935 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.93 vs. limit=10.0 2024-09-20 05:16:31,206 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.592e+01 9.131e+01 9.784e+01 1.419e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 05:16:48,221 INFO [train.py:1198] (0/2) Epoch 46, batch 1700, loss[loss=0.2074, ctc_loss=0.0965, cr_loss=0.3345, attn_decoder_loss=0.2123, over 29581.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1088, cr_loss=0.3475, attn_decoder_loss=0.237, over 5778876.77 frames. ], batch size: 69, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:16:54,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=821300.0, ans=0.0 2024-09-20 05:17:00,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=821300.0, ans=0.2 2024-09-20 05:17:08,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821340.0, ans=0.1 2024-09-20 05:17:11,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=821340.0, ans=0.0 2024-09-20 05:17:18,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.91 vs. limit=15.0 2024-09-20 05:17:19,171 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=15.0 2024-09-20 05:17:30,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=821380.0, ans=0.0 2024-09-20 05:17:39,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821420.0, ans=0.0 2024-09-20 05:17:40,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=821420.0, ans=0.125 2024-09-20 05:17:50,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.31 vs. limit=15.0 2024-09-20 05:18:00,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=821460.0, ans=0.04949747468305833 2024-09-20 05:18:03,233 INFO [train.py:1198] (0/2) Epoch 46, batch 1750, loss[loss=0.2053, ctc_loss=0.08617, cr_loss=0.2837, attn_decoder_loss=0.2123, over 29351.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1083, cr_loss=0.346, attn_decoder_loss=0.2367, over 5786803.38 frames. ], batch size: 67, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:18:11,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.02 vs. limit=15.0 2024-09-20 05:19:06,023 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.682e+01 9.175e+01 9.617e+01 1.208e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-20 05:19:13,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=821660.0, ans=0.0 2024-09-20 05:19:16,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=821660.0, ans=0.125 2024-09-20 05:19:20,743 INFO [train.py:1198] (0/2) Epoch 46, batch 1800, loss[loss=0.2334, ctc_loss=0.1119, cr_loss=0.3409, attn_decoder_loss=0.2393, over 29671.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1086, cr_loss=0.3467, attn_decoder_loss=0.2369, over 5790404.03 frames. ], batch size: 83, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:19:43,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=15.0 2024-09-20 05:20:06,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821820.0, ans=0.1 2024-09-20 05:20:18,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=821820.0, ans=0.0 2024-09-20 05:20:38,155 INFO [train.py:1198] (0/2) Epoch 46, batch 1850, loss[loss=0.2324, ctc_loss=0.1036, cr_loss=0.3256, attn_decoder_loss=0.2394, over 29650.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1088, cr_loss=0.3471, attn_decoder_loss=0.2372, over 5796699.03 frames. ], batch size: 86, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:20:39,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=821900.0, ans=0.125 2024-09-20 05:20:45,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-20 05:20:48,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=821900.0, ans=0.2 2024-09-20 05:21:14,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821980.0, ans=0.0 2024-09-20 05:21:38,309 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.515e+01 9.175e+01 9.634e+01 2.306e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 05:21:49,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.09 vs. limit=22.5 2024-09-20 05:21:53,030 INFO [train.py:1198] (0/2) Epoch 46, batch 1900, loss[loss=0.2456, ctc_loss=0.1143, cr_loss=0.3666, attn_decoder_loss=0.252, over 29710.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.109, cr_loss=0.3475, attn_decoder_loss=0.2376, over 5803691.66 frames. ], batch size: 89, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:21:59,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=822100.0, ans=0.125 2024-09-20 05:22:10,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=822140.0, ans=0.025 2024-09-20 05:22:19,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-20 05:22:22,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.73 vs. limit=15.0 2024-09-20 05:22:57,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=822260.0, ans=0.125 2024-09-20 05:23:05,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2024-09-20 05:23:10,796 INFO [train.py:1198] (0/2) Epoch 46, batch 1950, loss[loss=0.2269, ctc_loss=0.1056, cr_loss=0.3451, attn_decoder_loss=0.2327, over 29445.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1096, cr_loss=0.3491, attn_decoder_loss=0.2387, over 5818357.50 frames. ], batch size: 78, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:23:16,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-09-20 05:23:23,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=822300.0, ans=15.0 2024-09-20 05:23:30,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822340.0, ans=0.1 2024-09-20 05:23:51,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=822380.0, ans=0.125 2024-09-20 05:23:54,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=822420.0, ans=0.125 2024-09-20 05:24:08,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=822420.0, ans=0.0 2024-09-20 05:24:08,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=822420.0, ans=0.0 2024-09-20 05:24:10,975 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.689e+01 9.269e+01 9.722e+01 1.487e+02, threshold=1.854e+02, percent-clipped=0.0 2024-09-20 05:24:18,022 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=822460.0, ans=0.0 2024-09-20 05:24:23,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=822460.0, ans=0.1 2024-09-20 05:24:25,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=822460.0, ans=0.125 2024-09-20 05:24:28,127 INFO [train.py:1198] (0/2) Epoch 46, batch 2000, loss[loss=0.2075, ctc_loss=0.09306, cr_loss=0.3121, attn_decoder_loss=0.2132, over 29380.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1098, cr_loss=0.3495, attn_decoder_loss=0.239, over 5796736.44 frames. ], batch size: 67, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 05:24:45,153 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:24:48,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-20 05:24:49,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=822540.0, ans=0.125 2024-09-20 05:24:52,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=822540.0, ans=0.125 2024-09-20 05:25:03,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2024-09-20 05:25:10,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=822580.0, ans=0.125 2024-09-20 05:25:17,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.82 vs. limit=15.0 2024-09-20 05:25:19,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=822620.0, ans=0.0 2024-09-20 05:25:19,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=822620.0, ans=0.0 2024-09-20 05:25:31,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=822660.0, ans=0.2 2024-09-20 05:25:43,655 INFO [train.py:1198] (0/2) Epoch 46, batch 2050, loss[loss=0.2039, ctc_loss=0.09426, cr_loss=0.3426, attn_decoder_loss=0.2085, over 29401.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1092, cr_loss=0.3482, attn_decoder_loss=0.2381, over 5788761.45 frames. ], batch size: 70, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:25:49,224 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-09-20 05:26:23,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=822780.0, ans=0.0 2024-09-20 05:26:32,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-20 05:26:34,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=822820.0, ans=0.125 2024-09-20 05:26:35,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=822820.0, ans=0.0 2024-09-20 05:26:41,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=822820.0, ans=0.025 2024-09-20 05:26:47,715 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.548e+01 9.000e+01 9.590e+01 1.636e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-20 05:26:55,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=822860.0, ans=0.025 2024-09-20 05:27:01,504 INFO [train.py:1198] (0/2) Epoch 46, batch 2100, loss[loss=0.2325, ctc_loss=0.1144, cr_loss=0.3636, attn_decoder_loss=0.2375, over 29762.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1088, cr_loss=0.3473, attn_decoder_loss=0.2376, over 5801100.88 frames. ], batch size: 81, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:27:16,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=822940.0, ans=0.025 2024-09-20 05:27:16,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=822940.0, ans=0.0 2024-09-20 05:27:25,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=822940.0, ans=0.5 2024-09-20 05:27:29,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.10 vs. limit=15.0 2024-09-20 05:27:49,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=823020.0, ans=0.125 2024-09-20 05:27:52,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=823020.0, ans=0.125 2024-09-20 05:27:53,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=823020.0, ans=0.125 2024-09-20 05:28:04,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=823060.0, ans=0.0 2024-09-20 05:28:15,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=823060.0, ans=0.0 2024-09-20 05:28:18,248 INFO [train.py:1198] (0/2) Epoch 46, batch 2150, loss[loss=0.2278, ctc_loss=0.1088, cr_loss=0.3489, attn_decoder_loss=0.2333, over 29451.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1081, cr_loss=0.3463, attn_decoder_loss=0.2369, over 5816135.80 frames. ], batch size: 78, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:28:47,217 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:28:56,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=823180.0, ans=0.0 2024-09-20 05:29:01,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=823180.0, ans=0.125 2024-09-20 05:29:06,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-20 05:29:10,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=823220.0, ans=0.125 2024-09-20 05:29:12,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-20 05:29:13,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=823220.0, ans=0.0 2024-09-20 05:29:20,447 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.498e+01 9.079e+01 9.733e+01 1.239e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-20 05:29:25,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=823260.0, ans=0.025 2024-09-20 05:29:34,168 INFO [train.py:1198] (0/2) Epoch 46, batch 2200, loss[loss=0.253, ctc_loss=0.1249, cr_loss=0.3699, attn_decoder_loss=0.2591, over 29603.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1081, cr_loss=0.3463, attn_decoder_loss=0.237, over 5813419.41 frames. ], batch size: 86, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:29:40,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=823300.0, ans=0.2 2024-09-20 05:29:45,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.40 vs. limit=6.0 2024-09-20 05:29:47,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=4.77 vs. limit=15.0 2024-09-20 05:29:55,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=823340.0, ans=0.0 2024-09-20 05:29:55,541 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:29:57,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=823340.0, ans=0.025 2024-09-20 05:30:01,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=823340.0, ans=0.025 2024-09-20 05:30:03,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=823380.0, ans=0.0 2024-09-20 05:30:15,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=823380.0, ans=0.1 2024-09-20 05:30:21,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=823420.0, ans=0.125 2024-09-20 05:30:24,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=823420.0, ans=0.125 2024-09-20 05:30:30,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=823420.0, ans=0.025 2024-09-20 05:30:52,018 INFO [train.py:1198] (0/2) Epoch 46, batch 2250, loss[loss=0.2395, ctc_loss=0.1162, cr_loss=0.3738, attn_decoder_loss=0.2448, over 29725.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1082, cr_loss=0.3466, attn_decoder_loss=0.2371, over 5812323.06 frames. ], batch size: 82, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:31:11,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823540.0, ans=0.1 2024-09-20 05:31:34,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.57 vs. limit=22.5 2024-09-20 05:31:35,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=823620.0, ans=0.0 2024-09-20 05:31:44,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=823620.0, ans=0.09899494936611666 2024-09-20 05:31:53,371 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.446e+01 9.116e+01 9.634e+01 2.292e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-20 05:31:55,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=823660.0, ans=0.035 2024-09-20 05:32:08,792 INFO [train.py:1198] (0/2) Epoch 46, batch 2300, loss[loss=0.2137, ctc_loss=0.09189, cr_loss=0.3222, attn_decoder_loss=0.2201, over 29313.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1076, cr_loss=0.3452, attn_decoder_loss=0.2361, over 5799003.90 frames. ], batch size: 71, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:32:14,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.94 vs. limit=10.0 2024-09-20 05:32:16,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=823700.0, ans=0.125 2024-09-20 05:32:21,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=823700.0, ans=0.0 2024-09-20 05:32:22,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=823740.0, ans=0.125 2024-09-20 05:32:30,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=823740.0, ans=0.125 2024-09-20 05:32:36,656 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.41 vs. limit=15.0 2024-09-20 05:33:06,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=823820.0, ans=0.025 2024-09-20 05:33:23,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=823900.0, ans=0.125 2024-09-20 05:33:24,354 INFO [train.py:1198] (0/2) Epoch 46, batch 2350, loss[loss=0.252, ctc_loss=0.1233, cr_loss=0.3879, attn_decoder_loss=0.2577, over 29692.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1082, cr_loss=0.3468, attn_decoder_loss=0.2366, over 5803995.84 frames. ], batch size: 83, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:33:28,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=823900.0, ans=0.2 2024-09-20 05:33:48,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=823940.0, ans=0.2 2024-09-20 05:34:26,077 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.622e+01 9.111e+01 9.786e+01 2.523e+02, threshold=1.822e+02, percent-clipped=1.0 2024-09-20 05:34:39,832 INFO [train.py:1198] (0/2) Epoch 46, batch 2400, loss[loss=0.2077, ctc_loss=0.08687, cr_loss=0.2951, attn_decoder_loss=0.2146, over 29547.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1087, cr_loss=0.3479, attn_decoder_loss=0.2371, over 5808766.06 frames. ], batch size: 76, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 05:35:06,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=824140.0, ans=0.125 2024-09-20 05:35:14,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.68 vs. limit=10.0 2024-09-20 05:35:24,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=824180.0, ans=0.125 2024-09-20 05:35:32,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=824220.0, ans=0.125 2024-09-20 05:35:42,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=824260.0, ans=0.0 2024-09-20 05:35:45,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=824260.0, ans=0.125 2024-09-20 05:35:52,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.75 vs. limit=15.0 2024-09-20 05:35:59,548 INFO [train.py:1198] (0/2) Epoch 46, batch 2450, loss[loss=0.2407, ctc_loss=0.117, cr_loss=0.3709, attn_decoder_loss=0.2462, over 29703.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1099, cr_loss=0.3501, attn_decoder_loss=0.2384, over 5785591.41 frames. ], batch size: 82, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:36:06,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-09-20 05:36:16,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=824340.0, ans=0.025 2024-09-20 05:36:24,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=824340.0, ans=22.5 2024-09-20 05:36:40,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-09-20 05:36:47,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=824420.0, ans=0.0 2024-09-20 05:36:52,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=824420.0, ans=0.2 2024-09-20 05:36:55,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=824420.0, ans=0.0 2024-09-20 05:37:02,670 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.776e+01 9.478e+01 1.012e+02 4.785e+02, threshold=1.896e+02, percent-clipped=1.0 2024-09-20 05:37:07,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=824460.0, ans=0.125 2024-09-20 05:37:14,644 INFO [train.py:1198] (0/2) Epoch 46, batch 2500, loss[loss=0.2433, ctc_loss=0.1214, cr_loss=0.3839, attn_decoder_loss=0.2483, over 29648.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1099, cr_loss=0.35, attn_decoder_loss=0.2383, over 5795592.93 frames. ], batch size: 86, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:37:42,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-20 05:37:48,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=824580.0, ans=0.0 2024-09-20 05:38:30,353 INFO [train.py:1198] (0/2) Epoch 46, batch 2550, loss[loss=0.2131, ctc_loss=0.1083, cr_loss=0.3375, attn_decoder_loss=0.2173, over 29344.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1096, cr_loss=0.3495, attn_decoder_loss=0.2381, over 5798486.00 frames. ], batch size: 67, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:38:32,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=824700.0, ans=0.125 2024-09-20 05:38:43,774 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-09-20 05:38:58,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=824740.0, ans=0.1 2024-09-20 05:39:02,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=824780.0, ans=0.125 2024-09-20 05:39:02,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=824780.0, ans=0.2 2024-09-20 05:39:19,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=824820.0, ans=0.0 2024-09-20 05:39:27,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=824820.0, ans=0.0 2024-09-20 05:39:36,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.555e+01 9.140e+01 9.726e+01 1.841e+02, threshold=1.828e+02, percent-clipped=0.0 2024-09-20 05:39:44,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=824860.0, ans=0.125 2024-09-20 05:39:50,515 INFO [train.py:1198] (0/2) Epoch 46, batch 2600, loss[loss=0.2221, ctc_loss=0.1019, cr_loss=0.3426, attn_decoder_loss=0.2278, over 29462.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1094, cr_loss=0.3492, attn_decoder_loss=0.2382, over 5794100.49 frames. ], batch size: 78, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:39:52,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=824900.0, ans=0.015 2024-09-20 05:39:52,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824900.0, ans=0.1 2024-09-20 05:40:19,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.42 vs. limit=10.0 2024-09-20 05:40:28,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.63 vs. limit=22.5 2024-09-20 05:40:43,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-20 05:40:46,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=825020.0, ans=0.2 2024-09-20 05:41:05,513 INFO [train.py:1198] (0/2) Epoch 46, batch 2650, loss[loss=0.2372, ctc_loss=0.1094, cr_loss=0.3417, attn_decoder_loss=0.2438, over 29266.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1094, cr_loss=0.3493, attn_decoder_loss=0.2383, over 5800865.16 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:41:26,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2024-09-20 05:41:27,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825140.0, ans=0.1 2024-09-20 05:41:31,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.59 vs. limit=15.0 2024-09-20 05:41:35,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825180.0, ans=0.1 2024-09-20 05:41:46,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=825180.0, ans=0.125 2024-09-20 05:42:09,969 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.633e+01 9.169e+01 9.571e+01 1.241e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-20 05:42:20,700 INFO [train.py:1198] (0/2) Epoch 46, batch 2700, loss[loss=0.2392, ctc_loss=0.1101, cr_loss=0.3381, attn_decoder_loss=0.2461, over 29512.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1096, cr_loss=0.3498, attn_decoder_loss=0.2386, over 5795882.21 frames. ], batch size: 87, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:42:50,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=825340.0, ans=0.125 2024-09-20 05:43:05,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825380.0, ans=0.1 2024-09-20 05:43:40,492 INFO [train.py:1198] (0/2) Epoch 46, batch 2750, loss[loss=0.2246, ctc_loss=0.1022, cr_loss=0.3486, attn_decoder_loss=0.2305, over 29516.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1089, cr_loss=0.3481, attn_decoder_loss=0.2376, over 5792944.42 frames. ], batch size: 75, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:43:44,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=825500.0, ans=15.0 2024-09-20 05:44:15,452 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:44:32,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=825620.0, ans=0.125 2024-09-20 05:44:38,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=825620.0, ans=0.2 2024-09-20 05:44:46,008 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.644e+01 9.121e+01 9.722e+01 2.212e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-20 05:44:55,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=825700.0, ans=0.0 2024-09-20 05:44:56,658 INFO [train.py:1198] (0/2) Epoch 46, batch 2800, loss[loss=0.2393, ctc_loss=0.116, cr_loss=0.3294, attn_decoder_loss=0.2456, over 20031.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1094, cr_loss=0.3488, attn_decoder_loss=0.2378, over 5775894.28 frames. ], batch size: 209, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:44:56,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=825700.0, ans=0.2 2024-09-20 05:44:58,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=825700.0, ans=0.125 2024-09-20 05:44:58,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=825700.0, ans=0.0 2024-09-20 05:45:20,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2024-09-20 05:45:35,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=825780.0, ans=0.0 2024-09-20 05:45:39,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.32 vs. limit=22.5 2024-09-20 05:45:57,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2024-09-20 05:46:11,555 INFO [train.py:1198] (0/2) Epoch 46, batch 2850, loss[loss=0.2213, ctc_loss=0.1018, cr_loss=0.3195, attn_decoder_loss=0.2275, over 29518.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1098, cr_loss=0.3494, attn_decoder_loss=0.2382, over 5762250.44 frames. ], batch size: 77, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:46:24,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=825900.0, ans=0.2 2024-09-20 05:46:36,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=825940.0, ans=0.125 2024-09-20 05:47:16,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826060.0, ans=0.1 2024-09-20 05:47:22,089 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.726e+01 9.166e+01 9.745e+01 2.049e+02, threshold=1.833e+02, percent-clipped=1.0 2024-09-20 05:47:31,176 INFO [train.py:1198] (0/2) Epoch 46, batch 2900, loss[loss=0.2208, ctc_loss=0.09948, cr_loss=0.3225, attn_decoder_loss=0.2271, over 29417.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1104, cr_loss=0.3506, attn_decoder_loss=0.239, over 5788154.42 frames. ], batch size: 79, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:47:32,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2024-09-20 05:47:33,476 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.03 vs. limit=15.0 2024-09-20 05:47:43,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=826100.0, ans=0.0 2024-09-20 05:47:52,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=826140.0, ans=0.025 2024-09-20 05:48:01,387 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=826180.0, ans=0.125 2024-09-20 05:48:05,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826180.0, ans=0.1 2024-09-20 05:48:10,665 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:48:28,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=826220.0, ans=0.0 2024-09-20 05:48:36,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=826260.0, ans=0.2 2024-09-20 05:48:39,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826260.0, ans=0.1 2024-09-20 05:48:46,536 INFO [train.py:1198] (0/2) Epoch 46, batch 2950, loss[loss=0.2216, ctc_loss=0.1033, cr_loss=0.3437, attn_decoder_loss=0.2271, over 29516.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.109, cr_loss=0.3474, attn_decoder_loss=0.2373, over 5783431.27 frames. ], batch size: 75, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:48:47,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.92 vs. limit=15.0 2024-09-20 05:48:51,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=826300.0, ans=0.0 2024-09-20 05:48:57,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=826300.0, ans=10.0 2024-09-20 05:49:04,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=826340.0, ans=0.0 2024-09-20 05:49:27,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=826380.0, ans=0.1 2024-09-20 05:49:30,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=826420.0, ans=0.0 2024-09-20 05:49:50,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826460.0, ans=0.1 2024-09-20 05:49:53,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.789e+01 9.210e+01 9.770e+01 1.527e+02, threshold=1.842e+02, percent-clipped=0.0 2024-09-20 05:50:02,327 INFO [train.py:1198] (0/2) Epoch 46, batch 3000, loss[loss=0.2302, ctc_loss=0.1076, cr_loss=0.3604, attn_decoder_loss=0.2358, over 29756.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1086, cr_loss=0.3472, attn_decoder_loss=0.2372, over 5784250.87 frames. ], batch size: 81, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:50:02,328 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 05:50:11,349 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5652, 5.3135, 5.1572, 4.7790], device='cuda:0') 2024-09-20 05:50:20,767 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.8010, 4.8593, 4.5217, 2.5232], device='cuda:0') 2024-09-20 05:50:21,365 INFO [train.py:1230] (0/2) Epoch 46, validation: loss=0.2122, ctc_loss=0.03683, cr_loss=6.872e-15, attn_decoder_loss=0.2317, over 944034.00 frames. 2024-09-20 05:50:21,365 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 05:50:36,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=826540.0, ans=0.0 2024-09-20 05:50:39,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-09-20 05:50:44,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=826540.0, ans=0.0 2024-09-20 05:51:22,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=826660.0, ans=0.125 2024-09-20 05:51:39,051 INFO [train.py:1198] (0/2) Epoch 46, batch 3050, loss[loss=0.234, ctc_loss=0.1173, cr_loss=0.3882, attn_decoder_loss=0.2383, over 29528.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1092, cr_loss=0.3482, attn_decoder_loss=0.238, over 5777141.54 frames. ], batch size: 76, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:51:43,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-20 05:52:02,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=826740.0, ans=0.0 2024-09-20 05:52:03,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=826740.0, ans=0.2 2024-09-20 05:52:22,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-20 05:52:42,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=826860.0, ans=0.2 2024-09-20 05:52:44,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=826860.0, ans=0.125 2024-09-20 05:52:45,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.604e+01 9.128e+01 9.681e+01 2.059e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 05:52:51,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=826860.0, ans=0.0 2024-09-20 05:52:51,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=826860.0, ans=0.125 2024-09-20 05:52:54,449 INFO [train.py:1198] (0/2) Epoch 46, batch 3100, loss[loss=0.2412, ctc_loss=0.1206, cr_loss=0.3838, attn_decoder_loss=0.2461, over 29249.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.109, cr_loss=0.3475, attn_decoder_loss=0.2374, over 5775946.51 frames. ], batch size: 100, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:52:56,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=826900.0, ans=0.0 2024-09-20 05:53:02,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826900.0, ans=0.125 2024-09-20 05:53:12,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=826940.0, ans=0.025 2024-09-20 05:53:29,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=826980.0, ans=0.1 2024-09-20 05:53:41,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827020.0, ans=0.1 2024-09-20 05:54:09,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=827060.0, ans=0.025 2024-09-20 05:54:12,267 INFO [train.py:1198] (0/2) Epoch 46, batch 3150, loss[loss=0.2449, ctc_loss=0.1108, cr_loss=0.3498, attn_decoder_loss=0.252, over 28864.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1089, cr_loss=0.3473, attn_decoder_loss=0.2371, over 5783291.68 frames. ], batch size: 104, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:54:12,737 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:54:14,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=827100.0, ans=0.2 2024-09-20 05:54:15,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=827100.0, ans=0.0 2024-09-20 05:54:24,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=827100.0, ans=0.125 2024-09-20 05:54:36,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=827140.0, ans=0.1 2024-09-20 05:54:44,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=827180.0, ans=0.1 2024-09-20 05:54:44,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=827180.0, ans=0.1 2024-09-20 05:54:52,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=827180.0, ans=0.025 2024-09-20 05:55:16,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=827260.0, ans=0.125 2024-09-20 05:55:18,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=827260.0, ans=0.125 2024-09-20 05:55:20,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.661e+01 9.228e+01 9.834e+01 1.754e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-20 05:55:27,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=827260.0, ans=0.125 2024-09-20 05:55:30,097 INFO [train.py:1198] (0/2) Epoch 46, batch 3200, loss[loss=0.2281, ctc_loss=0.1077, cr_loss=0.3519, attn_decoder_loss=0.2337, over 29422.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1088, cr_loss=0.3471, attn_decoder_loss=0.2368, over 5793912.58 frames. ], batch size: 79, lr: 2.39e-03, grad_scale: 16.0 2024-09-20 05:55:50,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=827340.0, ans=0.0 2024-09-20 05:56:03,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=827380.0, ans=0.2 2024-09-20 05:56:06,359 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=22.5 2024-09-20 05:56:06,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=827380.0, ans=0.125 2024-09-20 05:56:14,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=827420.0, ans=0.125 2024-09-20 05:56:24,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=827420.0, ans=0.5 2024-09-20 05:56:26,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-09-20 05:56:45,902 INFO [train.py:1198] (0/2) Epoch 46, batch 3250, loss[loss=0.2461, ctc_loss=0.1219, cr_loss=0.3791, attn_decoder_loss=0.2514, over 29703.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1087, cr_loss=0.3471, attn_decoder_loss=0.2375, over 5802215.58 frames. ], batch size: 84, lr: 2.39e-03, grad_scale: 8.0 2024-09-20 05:56:52,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827500.0, ans=0.1 2024-09-20 05:57:01,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=827540.0, ans=0.025 2024-09-20 05:57:01,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=827540.0, ans=0.0 2024-09-20 05:57:05,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=827540.0, ans=0.125 2024-09-20 05:57:11,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=827540.0, ans=0.025 2024-09-20 05:57:36,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.36 vs. limit=22.5 2024-09-20 05:57:44,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=827660.0, ans=0.125 2024-09-20 05:57:52,296 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827660.0, ans=0.125 2024-09-20 05:57:53,416 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.520e+01 8.954e+01 9.495e+01 2.408e+02, threshold=1.791e+02, percent-clipped=1.0 2024-09-20 05:57:56,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=827660.0, ans=0.125 2024-09-20 05:58:01,023 INFO [train.py:1198] (0/2) Epoch 46, batch 3300, loss[loss=0.2457, ctc_loss=0.1204, cr_loss=0.3673, attn_decoder_loss=0.2514, over 28262.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1081, cr_loss=0.3453, attn_decoder_loss=0.2364, over 5799046.48 frames. ], batch size: 111, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 05:58:01,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=827700.0, ans=0.0 2024-09-20 05:58:18,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827740.0, ans=0.1 2024-09-20 05:58:37,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=827780.0, ans=0.025 2024-09-20 05:58:44,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=827780.0, ans=0.125 2024-09-20 05:58:58,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2024-09-20 05:59:05,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=827860.0, ans=0.09899494936611666 2024-09-20 05:59:07,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=827860.0, ans=0.1 2024-09-20 05:59:20,247 INFO [train.py:1198] (0/2) Epoch 46, batch 3350, loss[loss=0.2351, ctc_loss=0.107, cr_loss=0.3237, attn_decoder_loss=0.2421, over 28907.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1089, cr_loss=0.347, attn_decoder_loss=0.2373, over 5776005.25 frames. ], batch size: 104, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 05:59:24,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-20 05:59:41,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=827940.0, ans=0.2 2024-09-20 05:59:43,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827940.0, ans=0.125 2024-09-20 06:00:03,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=827980.0, ans=0.125 2024-09-20 06:00:06,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=828020.0, ans=0.125 2024-09-20 06:00:17,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2024-09-20 06:00:27,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=828060.0, ans=0.0 2024-09-20 06:00:28,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.801e+01 9.345e+01 9.836e+01 1.654e+02, threshold=1.869e+02, percent-clipped=0.0 2024-09-20 06:00:36,123 INFO [train.py:1198] (0/2) Epoch 46, batch 3400, loss[loss=0.2076, ctc_loss=0.09076, cr_loss=0.3016, attn_decoder_loss=0.2138, over 29350.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.109, cr_loss=0.3469, attn_decoder_loss=0.2371, over 5767412.53 frames. ], batch size: 67, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:01:00,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=828140.0, ans=0.125 2024-09-20 06:01:06,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=828180.0, ans=0.2 2024-09-20 06:01:12,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=828180.0, ans=0.2 2024-09-20 06:01:15,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=828180.0, ans=0.125 2024-09-20 06:01:20,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=828220.0, ans=0.025 2024-09-20 06:01:24,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=828220.0, ans=0.0 2024-09-20 06:01:30,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=828220.0, ans=0.07 2024-09-20 06:01:32,819 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=15.0 2024-09-20 06:01:35,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=828260.0, ans=10.0 2024-09-20 06:01:44,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=828260.0, ans=0.125 2024-09-20 06:01:47,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=828260.0, ans=0.125 2024-09-20 06:01:51,529 INFO [train.py:1198] (0/2) Epoch 46, batch 3450, loss[loss=0.2396, ctc_loss=0.1092, cr_loss=0.3364, attn_decoder_loss=0.2466, over 28164.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1089, cr_loss=0.3473, attn_decoder_loss=0.2373, over 5776135.20 frames. ], batch size: 111, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:01:58,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=828300.0, ans=0.125 2024-09-20 06:02:04,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=828300.0, ans=0.0 2024-09-20 06:02:53,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-09-20 06:03:00,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=828460.0, ans=0.0 2024-09-20 06:03:03,198 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 8.466e+01 9.080e+01 9.638e+01 4.809e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-20 06:03:05,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=828460.0, ans=0.2 2024-09-20 06:03:10,700 INFO [train.py:1198] (0/2) Epoch 46, batch 3500, loss[loss=0.2061, ctc_loss=0.09622, cr_loss=0.3232, attn_decoder_loss=0.2111, over 29315.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1086, cr_loss=0.3463, attn_decoder_loss=0.2367, over 5777228.74 frames. ], batch size: 71, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:03:17,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=828500.0, ans=0.0 2024-09-20 06:03:26,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=828540.0, ans=0.125 2024-09-20 06:03:44,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.18 vs. limit=15.0 2024-09-20 06:03:44,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2024-09-20 06:03:46,946 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:03:49,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=828580.0, ans=0.0 2024-09-20 06:04:06,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=828620.0, ans=0.0 2024-09-20 06:04:12,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=828660.0, ans=0.0 2024-09-20 06:04:20,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.02 vs. limit=15.0 2024-09-20 06:04:25,142 INFO [train.py:1198] (0/2) Epoch 46, batch 3550, loss[loss=0.2433, ctc_loss=0.12, cr_loss=0.3784, attn_decoder_loss=0.2486, over 29698.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1084, cr_loss=0.3461, attn_decoder_loss=0.2368, over 5783426.27 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 8.0 2024-09-20 06:04:25,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=828700.0, ans=0.0 2024-09-20 06:04:28,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=828700.0, ans=0.0 2024-09-20 06:04:30,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-09-20 06:04:32,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828700.0, ans=0.1 2024-09-20 06:04:35,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=828700.0, ans=0.0 2024-09-20 06:05:02,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-20 06:05:20,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=828820.0, ans=0.0 2024-09-20 06:05:21,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=828820.0, ans=15.0 2024-09-20 06:05:32,134 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.518e+01 9.140e+01 9.697e+01 1.857e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-20 06:05:39,453 INFO [train.py:1198] (0/2) Epoch 46, batch 3600, loss[loss=0.2252, ctc_loss=0.1023, cr_loss=0.313, attn_decoder_loss=0.2319, over 29493.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1088, cr_loss=0.3473, attn_decoder_loss=0.2372, over 5792252.79 frames. ], batch size: 77, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:05:42,761 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:05:43,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.85 vs. limit=22.5 2024-09-20 06:05:46,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.14 vs. limit=15.0 2024-09-20 06:06:09,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=828980.0, ans=0.125 2024-09-20 06:06:28,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=829020.0, ans=0.125 2024-09-20 06:06:50,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=829060.0, ans=0.0 2024-09-20 06:06:53,555 INFO [train.py:1198] (0/2) Epoch 46, batch 3650, loss[loss=0.2541, ctc_loss=0.1241, cr_loss=0.3707, attn_decoder_loss=0.2603, over 29485.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1085, cr_loss=0.3463, attn_decoder_loss=0.2366, over 5793970.15 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:07:05,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=829100.0, ans=0.125 2024-09-20 06:07:22,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829180.0, ans=0.125 2024-09-20 06:07:25,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=829180.0, ans=0.2 2024-09-20 06:08:03,943 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.537e+01 9.087e+01 9.420e+01 1.458e+02, threshold=1.817e+02, percent-clipped=0.0 2024-09-20 06:08:04,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=829260.0, ans=0.5 2024-09-20 06:08:08,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=829260.0, ans=0.0 2024-09-20 06:08:11,548 INFO [train.py:1198] (0/2) Epoch 46, batch 3700, loss[loss=0.2473, ctc_loss=0.1218, cr_loss=0.368, attn_decoder_loss=0.2531, over 29710.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1087, cr_loss=0.3468, attn_decoder_loss=0.2369, over 5804500.53 frames. ], batch size: 84, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:08:38,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=829340.0, ans=0.0 2024-09-20 06:09:25,410 INFO [train.py:1198] (0/2) Epoch 46, batch 3750, loss[loss=0.2027, ctc_loss=0.09294, cr_loss=0.2982, attn_decoder_loss=0.2082, over 29371.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1086, cr_loss=0.3468, attn_decoder_loss=0.2367, over 5807811.31 frames. ], batch size: 67, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:09:27,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=829500.0, ans=0.125 2024-09-20 06:09:47,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=829540.0, ans=0.125 2024-09-20 06:10:01,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=829580.0, ans=0.2 2024-09-20 06:10:13,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829620.0, ans=0.1 2024-09-20 06:10:30,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=829660.0, ans=0.0 2024-09-20 06:10:30,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=829660.0, ans=0.125 2024-09-20 06:10:32,723 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.638e+01 9.232e+01 9.625e+01 1.772e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-20 06:10:40,159 INFO [train.py:1198] (0/2) Epoch 46, batch 3800, loss[loss=0.2473, ctc_loss=0.11, cr_loss=0.3418, attn_decoder_loss=0.255, over 29649.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1083, cr_loss=0.3459, attn_decoder_loss=0.2362, over 5798819.64 frames. ], batch size: 86, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:10:40,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=829700.0, ans=0.2 2024-09-20 06:10:43,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=829700.0, ans=0.0 2024-09-20 06:10:49,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=829700.0, ans=0.125 2024-09-20 06:10:59,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=829740.0, ans=0.125 2024-09-20 06:11:20,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=829780.0, ans=0.0 2024-09-20 06:11:54,494 INFO [train.py:1198] (0/2) Epoch 46, batch 3850, loss[loss=0.2476, ctc_loss=0.1188, cr_loss=0.3728, attn_decoder_loss=0.2537, over 29244.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1083, cr_loss=0.3464, attn_decoder_loss=0.2365, over 5811355.29 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:12:15,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=829940.0, ans=0.125 2024-09-20 06:12:40,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=830020.0, ans=0.2 2024-09-20 06:12:48,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=830020.0, ans=0.025 2024-09-20 06:12:52,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=830060.0, ans=0.0 2024-09-20 06:13:01,125 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.600e+01 9.111e+01 9.601e+01 1.529e+02, threshold=1.822e+02, percent-clipped=0.0 2024-09-20 06:13:07,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=830100.0, ans=0.05 2024-09-20 06:13:08,427 INFO [train.py:1198] (0/2) Epoch 46, batch 3900, loss[loss=0.2318, ctc_loss=0.09471, cr_loss=0.3145, attn_decoder_loss=0.24, over 29609.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1087, cr_loss=0.3473, attn_decoder_loss=0.2372, over 5816113.08 frames. ], batch size: 86, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:13:11,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=830100.0, ans=0.0 2024-09-20 06:13:14,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=830100.0, ans=0.125 2024-09-20 06:13:33,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830140.0, ans=0.125 2024-09-20 06:13:38,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2024-09-20 06:14:08,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-09-20 06:14:09,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=830260.0, ans=0.07 2024-09-20 06:14:13,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=830260.0, ans=0.125 2024-09-20 06:14:25,257 INFO [train.py:1198] (0/2) Epoch 46, batch 3950, loss[loss=0.2421, ctc_loss=0.1137, cr_loss=0.3509, attn_decoder_loss=0.2486, over 29499.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1085, cr_loss=0.3469, attn_decoder_loss=0.2373, over 5835460.62 frames. ], batch size: 97, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:14:31,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=830300.0, ans=0.0 2024-09-20 06:14:35,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830300.0, ans=0.125 2024-09-20 06:14:40,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=830340.0, ans=0.0 2024-09-20 06:14:43,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.64 vs. limit=10.0 2024-09-20 06:15:11,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=830420.0, ans=0.125 2024-09-20 06:15:26,857 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2024-09-20 06:15:27,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=830460.0, ans=0.125 2024-09-20 06:15:31,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.756e+01 9.097e+01 9.656e+01 1.303e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 06:15:33,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.65 vs. limit=10.0 2024-09-20 06:15:36,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830460.0, ans=0.1 2024-09-20 06:15:38,973 INFO [train.py:1198] (0/2) Epoch 46, batch 4000, loss[loss=0.2206, ctc_loss=0.09773, cr_loss=0.3227, attn_decoder_loss=0.2271, over 29499.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1085, cr_loss=0.3465, attn_decoder_loss=0.237, over 5812246.59 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 06:15:45,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830500.0, ans=0.125 2024-09-20 06:15:53,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=830540.0, ans=0.1 2024-09-20 06:15:55,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2024-09-20 06:15:58,297 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:16:52,994 INFO [train.py:1198] (0/2) Epoch 46, batch 4050, loss[loss=0.2559, ctc_loss=0.1323, cr_loss=0.3898, attn_decoder_loss=0.2609, over 20613.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1089, cr_loss=0.3475, attn_decoder_loss=0.237, over 5796386.24 frames. ], batch size: 210, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:17:03,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=830700.0, ans=0.2 2024-09-20 06:17:03,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=830700.0, ans=0.1 2024-09-20 06:17:07,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=830740.0, ans=0.05 2024-09-20 06:17:12,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=830740.0, ans=0.035 2024-09-20 06:17:16,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=830740.0, ans=0.125 2024-09-20 06:17:25,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=830780.0, ans=0.0 2024-09-20 06:17:25,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830780.0, ans=0.1 2024-09-20 06:17:34,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=830780.0, ans=0.0 2024-09-20 06:17:51,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=830860.0, ans=0.125 2024-09-20 06:17:58,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=830860.0, ans=0.125 2024-09-20 06:18:01,510 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.777e+01 9.283e+01 1.003e+02 3.559e+02, threshold=1.857e+02, percent-clipped=2.0 2024-09-20 06:18:08,723 INFO [train.py:1198] (0/2) Epoch 46, batch 4100, loss[loss=0.2486, ctc_loss=0.1245, cr_loss=0.3929, attn_decoder_loss=0.2536, over 29510.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1091, cr_loss=0.348, attn_decoder_loss=0.2371, over 5791806.65 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:18:33,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830940.0, ans=0.125 2024-09-20 06:18:35,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=830940.0, ans=0.125 2024-09-20 06:18:45,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=830980.0, ans=0.125 2024-09-20 06:18:51,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=12.0 2024-09-20 06:18:54,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=831020.0, ans=0.125 2024-09-20 06:19:21,756 INFO [train.py:1198] (0/2) Epoch 46, batch 4150, loss[loss=0.2256, ctc_loss=0.1046, cr_loss=0.341, attn_decoder_loss=0.2314, over 29508.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1089, cr_loss=0.3476, attn_decoder_loss=0.2367, over 5797404.67 frames. ], batch size: 77, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:20:10,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 06:20:15,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=831220.0, ans=0.125 2024-09-20 06:20:29,883 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.815e+01 9.247e+01 9.794e+01 1.755e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-20 06:20:35,815 INFO [train.py:1198] (0/2) Epoch 46, batch 4200, loss[loss=0.2538, ctc_loss=0.1249, cr_loss=0.3963, attn_decoder_loss=0.2593, over 29500.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1092, cr_loss=0.3481, attn_decoder_loss=0.2371, over 5799555.81 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:20:53,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=831340.0, ans=0.2 2024-09-20 06:20:58,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=831340.0, ans=0.1 2024-09-20 06:21:01,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=831340.0, ans=0.125 2024-09-20 06:21:11,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=831380.0, ans=0.125 2024-09-20 06:21:34,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=831460.0, ans=0.0 2024-09-20 06:21:49,886 INFO [train.py:1198] (0/2) Epoch 46, batch 4250, loss[loss=0.2151, ctc_loss=0.09344, cr_loss=0.3136, attn_decoder_loss=0.2216, over 29492.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1089, cr_loss=0.347, attn_decoder_loss=0.2372, over 5805522.79 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:21:50,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=831500.0, ans=0.125 2024-09-20 06:21:51,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=831500.0, ans=0.0 2024-09-20 06:21:53,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=831500.0, ans=0.0 2024-09-20 06:22:14,486 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:22:20,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=831580.0, ans=0.125 2024-09-20 06:22:22,558 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.65 vs. limit=15.0 2024-09-20 06:22:25,523 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.22 vs. limit=15.0 2024-09-20 06:22:26,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=831580.0, ans=0.125 2024-09-20 06:22:36,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=831620.0, ans=0.07 2024-09-20 06:22:58,491 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.665e+01 9.148e+01 1.004e+02 2.126e+02, threshold=1.830e+02, percent-clipped=1.0 2024-09-20 06:23:04,400 INFO [train.py:1198] (0/2) Epoch 46, batch 4300, loss[loss=0.2435, ctc_loss=0.1161, cr_loss=0.3618, attn_decoder_loss=0.2496, over 29503.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1086, cr_loss=0.3461, attn_decoder_loss=0.2373, over 5794092.29 frames. ], batch size: 87, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:23:28,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-20 06:23:31,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-20 06:23:34,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.69 vs. limit=15.0 2024-09-20 06:24:18,202 INFO [train.py:1198] (0/2) Epoch 46, batch 4350, loss[loss=0.242, ctc_loss=0.1131, cr_loss=0.3533, attn_decoder_loss=0.2484, over 29505.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1105, cr_loss=0.3507, attn_decoder_loss=0.2402, over 5796729.91 frames. ], batch size: 97, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:24:37,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=831940.0, ans=0.125 2024-09-20 06:24:54,534 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-208000.pt 2024-09-20 06:25:07,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=831980.0, ans=0.0 2024-09-20 06:25:20,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=832020.0, ans=0.025 2024-09-20 06:25:20,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=832020.0, ans=0.0 2024-09-20 06:25:36,408 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.165e+01 9.019e+01 9.377e+01 9.847e+01 2.022e+02, threshold=1.875e+02, percent-clipped=1.0 2024-09-20 06:25:42,287 INFO [train.py:1198] (0/2) Epoch 46, batch 4400, loss[loss=0.2382, ctc_loss=0.1176, cr_loss=0.3685, attn_decoder_loss=0.2434, over 27242.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1117, cr_loss=0.3533, attn_decoder_loss=0.2421, over 5766651.95 frames. ], batch size: 124, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 06:25:44,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832100.0, ans=0.1 2024-09-20 06:25:45,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=832100.0, ans=0.1 2024-09-20 06:25:48,226 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:26:17,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=832180.0, ans=0.125 2024-09-20 06:26:18,254 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2024-09-20 06:26:18,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=832180.0, ans=0.125 2024-09-20 06:26:32,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=7.26 vs. limit=12.0 2024-09-20 06:26:40,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=832260.0, ans=0.125 2024-09-20 06:26:55,246 INFO [train.py:1198] (0/2) Epoch 46, batch 4450, loss[loss=0.2382, ctc_loss=0.1203, cr_loss=0.3354, attn_decoder_loss=0.2439, over 20283.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1152, cr_loss=0.3597, attn_decoder_loss=0.2443, over 5576992.35 frames. ], batch size: 209, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:27:08,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=832300.0, ans=0.035 2024-09-20 06:27:16,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=22.5 2024-09-20 06:27:25,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=11.99 vs. limit=15.0 2024-09-20 06:27:54,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832460.0, ans=0.1 2024-09-20 06:27:54,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=832460.0, ans=0.025 2024-09-20 06:28:06,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.388e+01 1.015e+02 1.122e+02 1.210e+02 5.487e+02, threshold=2.243e+02, percent-clipped=3.0 2024-09-20 06:28:10,697 INFO [train.py:1198] (0/2) Epoch 46, batch 4500, loss[loss=0.2462, ctc_loss=0.1248, cr_loss=0.36, attn_decoder_loss=0.2517, over 20721.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1176, cr_loss=0.3612, attn_decoder_loss=0.2458, over 5235304.38 frames. ], batch size: 210, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 06:28:17,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2024-09-20 06:28:48,068 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-46.pt 2024-09-20 06:29:38,357 INFO [train.py:1198] (0/2) Epoch 47, batch 0, loss[loss=0.2164, ctc_loss=0.0921, cr_loss=0.3201, attn_decoder_loss=0.223, over 29633.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.0921, cr_loss=0.3201, attn_decoder_loss=0.223, over 29633.00 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 32.0 2024-09-20 06:29:38,358 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 06:29:56,719 INFO [train.py:1230] (0/2) Epoch 47, validation: loss=0.2131, ctc_loss=0.03582, cr_loss=6.765e-15, attn_decoder_loss=0.2328, over 944034.00 frames. 2024-09-20 06:29:56,720 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 06:29:58,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=832600.0, ans=0.2 2024-09-20 06:30:28,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=832680.0, ans=0.125 2024-09-20 06:30:28,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=832680.0, ans=0.125 2024-09-20 06:30:45,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=832720.0, ans=0.02 2024-09-20 06:30:57,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=832760.0, ans=0.09899494936611666 2024-09-20 06:31:02,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=832760.0, ans=0.125 2024-09-20 06:31:08,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=832760.0, ans=0.125 2024-09-20 06:31:13,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=832800.0, ans=22.5 2024-09-20 06:31:14,204 INFO [train.py:1198] (0/2) Epoch 47, batch 50, loss[loss=0.203, ctc_loss=0.09429, cr_loss=0.3112, attn_decoder_loss=0.2081, over 29423.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1106, cr_loss=0.352, attn_decoder_loss=0.2381, over 1266693.12 frames. ], batch size: 70, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:31:15,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=832800.0, ans=0.125 2024-09-20 06:31:48,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.892e+01 9.712e+01 1.150e+02 2.007e+02, threshold=1.942e+02, percent-clipped=0.0 2024-09-20 06:32:11,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=832920.0, ans=0.035 2024-09-20 06:32:14,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=832960.0, ans=0.0 2024-09-20 06:32:16,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=832960.0, ans=0.1 2024-09-20 06:32:24,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=832960.0, ans=0.07 2024-09-20 06:32:29,705 INFO [train.py:1198] (0/2) Epoch 47, batch 100, loss[loss=0.2237, ctc_loss=0.1055, cr_loss=0.3409, attn_decoder_loss=0.2293, over 29536.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1118, cr_loss=0.3542, attn_decoder_loss=0.2405, over 2252171.74 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:32:54,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.96 vs. limit=12.0 2024-09-20 06:32:57,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=833040.0, ans=0.125 2024-09-20 06:33:18,032 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:33:25,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=833120.0, ans=0.125 2024-09-20 06:33:45,862 INFO [train.py:1198] (0/2) Epoch 47, batch 150, loss[loss=0.2093, ctc_loss=0.09806, cr_loss=0.3249, attn_decoder_loss=0.2145, over 29429.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1094, cr_loss=0.3483, attn_decoder_loss=0.2379, over 3046893.96 frames. ], batch size: 70, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:33:55,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2024-09-20 06:34:01,587 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:34:13,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=833240.0, ans=0.125 2024-09-20 06:34:22,974 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.579e+01 9.254e+01 9.598e+01 1.367e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-20 06:34:26,718 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2024-09-20 06:34:29,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=833280.0, ans=0.2 2024-09-20 06:34:35,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=833320.0, ans=0.125 2024-09-20 06:34:44,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=833320.0, ans=0.1 2024-09-20 06:34:57,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=833360.0, ans=0.0 2024-09-20 06:35:02,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=833400.0, ans=0.125 2024-09-20 06:35:03,354 INFO [train.py:1198] (0/2) Epoch 47, batch 200, loss[loss=0.2518, ctc_loss=0.1319, cr_loss=0.398, attn_decoder_loss=0.2562, over 27352.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1083, cr_loss=0.3461, attn_decoder_loss=0.2366, over 3659180.33 frames. ], batch size: 124, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:35:03,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=833400.0, ans=0.0 2024-09-20 06:35:12,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=833400.0, ans=0.125 2024-09-20 06:35:31,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.05 vs. limit=10.0 2024-09-20 06:35:44,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=833480.0, ans=0.125 2024-09-20 06:35:50,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=833520.0, ans=0.125 2024-09-20 06:36:00,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=833520.0, ans=0.0 2024-09-20 06:36:12,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=833560.0, ans=0.0 2024-09-20 06:36:19,052 INFO [train.py:1198] (0/2) Epoch 47, batch 250, loss[loss=0.244, ctc_loss=0.1107, cr_loss=0.3519, attn_decoder_loss=0.251, over 29213.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1086, cr_loss=0.347, attn_decoder_loss=0.237, over 4140402.48 frames. ], batch size: 100, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:36:29,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=833600.0, ans=0.0 2024-09-20 06:36:38,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=833640.0, ans=0.125 2024-09-20 06:36:38,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=833640.0, ans=0.025 2024-09-20 06:36:47,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=833640.0, ans=0.1 2024-09-20 06:36:55,948 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.644e+01 9.308e+01 9.912e+01 1.990e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-20 06:37:06,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=833720.0, ans=0.0 2024-09-20 06:37:21,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=833760.0, ans=0.125 2024-09-20 06:37:26,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=833760.0, ans=0.125 2024-09-20 06:37:35,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=833800.0, ans=0.0 2024-09-20 06:37:36,484 INFO [train.py:1198] (0/2) Epoch 47, batch 300, loss[loss=0.2447, ctc_loss=0.113, cr_loss=0.3606, attn_decoder_loss=0.2513, over 29523.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1083, cr_loss=0.3467, attn_decoder_loss=0.2369, over 4507978.83 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:37:37,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-20 06:37:44,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=833800.0, ans=0.125 2024-09-20 06:38:02,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=833840.0, ans=0.2 2024-09-20 06:38:06,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=833880.0, ans=0.125 2024-09-20 06:38:26,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=12.0 2024-09-20 06:38:28,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=833920.0, ans=0.125 2024-09-20 06:38:37,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833960.0, ans=0.1 2024-09-20 06:38:46,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=833960.0, ans=0.125 2024-09-20 06:38:54,087 INFO [train.py:1198] (0/2) Epoch 47, batch 350, loss[loss=0.2095, ctc_loss=0.08746, cr_loss=0.2833, attn_decoder_loss=0.2167, over 29319.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1082, cr_loss=0.3465, attn_decoder_loss=0.2371, over 4794049.83 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:39:28,605 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.657e+01 9.081e+01 9.524e+01 1.810e+02, threshold=1.816e+02, percent-clipped=0.0 2024-09-20 06:39:39,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834120.0, ans=0.1 2024-09-20 06:39:58,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=834160.0, ans=0.125 2024-09-20 06:40:01,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=834160.0, ans=0.025 2024-09-20 06:40:04,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=834160.0, ans=0.025 2024-09-20 06:40:08,860 INFO [train.py:1198] (0/2) Epoch 47, batch 400, loss[loss=0.2416, ctc_loss=0.1124, cr_loss=0.3535, attn_decoder_loss=0.2481, over 29713.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1078, cr_loss=0.3453, attn_decoder_loss=0.2367, over 5024044.74 frames. ], batch size: 82, lr: 2.35e-03, grad_scale: 32.0 2024-09-20 06:40:12,322 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:40:24,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=834240.0, ans=0.0 2024-09-20 06:40:31,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=834240.0, ans=0.0 2024-09-20 06:40:52,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=834280.0, ans=0.0 2024-09-20 06:41:10,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=834360.0, ans=0.95 2024-09-20 06:41:14,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=10.12 vs. limit=12.0 2024-09-20 06:41:27,383 INFO [train.py:1198] (0/2) Epoch 47, batch 450, loss[loss=0.2497, ctc_loss=0.1223, cr_loss=0.3837, attn_decoder_loss=0.2554, over 29684.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3462, attn_decoder_loss=0.2369, over 5187896.99 frames. ], batch size: 83, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:41:39,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=834400.0, ans=0.0 2024-09-20 06:41:45,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=834440.0, ans=0.0 2024-09-20 06:41:50,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=834440.0, ans=0.0 2024-09-20 06:41:54,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=834440.0, ans=0.0 2024-09-20 06:41:58,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=834480.0, ans=0.0 2024-09-20 06:42:03,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.609e+01 9.172e+01 9.678e+01 2.074e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-20 06:42:11,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=834520.0, ans=0.0 2024-09-20 06:42:16,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=834520.0, ans=0.1 2024-09-20 06:42:27,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=834520.0, ans=0.0 2024-09-20 06:42:32,672 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-20 06:42:38,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2024-09-20 06:42:45,367 INFO [train.py:1198] (0/2) Epoch 47, batch 500, loss[loss=0.2451, ctc_loss=0.1172, cr_loss=0.3812, attn_decoder_loss=0.2508, over 29464.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1072, cr_loss=0.3444, attn_decoder_loss=0.2361, over 5330749.54 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:42:48,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=834600.0, ans=0.125 2024-09-20 06:42:53,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=834600.0, ans=0.125 2024-09-20 06:42:53,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.61 vs. limit=10.0 2024-09-20 06:42:54,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=834600.0, ans=0.125 2024-09-20 06:43:06,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=834640.0, ans=0.0 2024-09-20 06:43:14,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=834680.0, ans=15.0 2024-09-20 06:43:22,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-20 06:43:42,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2024-09-20 06:43:51,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=834760.0, ans=0.125 2024-09-20 06:44:00,803 INFO [train.py:1198] (0/2) Epoch 47, batch 550, loss[loss=0.2427, ctc_loss=0.1171, cr_loss=0.3613, attn_decoder_loss=0.2486, over 28828.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1077, cr_loss=0.3454, attn_decoder_loss=0.2364, over 5422970.41 frames. ], batch size: 104, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:44:03,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-20 06:44:32,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-09-20 06:44:36,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=834880.0, ans=0.125 2024-09-20 06:44:39,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 8.688e+01 9.011e+01 9.708e+01 1.487e+02, threshold=1.802e+02, percent-clipped=0.0 2024-09-20 06:45:04,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=834960.0, ans=0.0 2024-09-20 06:45:08,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=834960.0, ans=0.125 2024-09-20 06:45:18,980 INFO [train.py:1198] (0/2) Epoch 47, batch 600, loss[loss=0.2437, ctc_loss=0.116, cr_loss=0.3442, attn_decoder_loss=0.2502, over 29272.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1079, cr_loss=0.3458, attn_decoder_loss=0.2368, over 5508956.35 frames. ], batch size: 100, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:45:32,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=835040.0, ans=0.2 2024-09-20 06:45:36,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-20 06:45:39,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-20 06:45:53,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=835080.0, ans=0.2 2024-09-20 06:45:59,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=835080.0, ans=0.125 2024-09-20 06:46:36,764 INFO [train.py:1198] (0/2) Epoch 47, batch 650, loss[loss=0.2272, ctc_loss=0.09445, cr_loss=0.3155, attn_decoder_loss=0.2349, over 29778.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1073, cr_loss=0.3445, attn_decoder_loss=0.236, over 5585604.67 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:46:43,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=835200.0, ans=0.0 2024-09-20 06:47:07,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=835280.0, ans=0.125 2024-09-20 06:47:13,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.697e+01 9.175e+01 9.724e+01 1.599e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-20 06:47:16,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=835280.0, ans=0.125 2024-09-20 06:47:16,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=835280.0, ans=0.0 2024-09-20 06:47:16,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=835280.0, ans=0.125 2024-09-20 06:47:25,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=835320.0, ans=0.0 2024-09-20 06:47:27,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-20 06:47:45,488 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.35 vs. limit=22.5 2024-09-20 06:47:47,211 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2024-09-20 06:47:52,265 INFO [train.py:1198] (0/2) Epoch 47, batch 700, loss[loss=0.2245, ctc_loss=0.1087, cr_loss=0.3452, attn_decoder_loss=0.2297, over 29541.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1081, cr_loss=0.3462, attn_decoder_loss=0.2369, over 5636639.74 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:48:25,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-20 06:48:26,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=835480.0, ans=0.125 2024-09-20 06:48:33,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=22.5 2024-09-20 06:48:50,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-20 06:48:51,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=835520.0, ans=0.125 2024-09-20 06:49:03,085 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-20 06:49:06,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=835560.0, ans=0.125 2024-09-20 06:49:09,606 INFO [train.py:1198] (0/2) Epoch 47, batch 750, loss[loss=0.2341, ctc_loss=0.1099, cr_loss=0.3366, attn_decoder_loss=0.2404, over 29714.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1078, cr_loss=0.3454, attn_decoder_loss=0.2363, over 5674964.02 frames. ], batch size: 82, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:49:21,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835600.0, ans=0.1 2024-09-20 06:49:45,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.705e+01 9.081e+01 9.642e+01 1.954e+02, threshold=1.816e+02, percent-clipped=1.0 2024-09-20 06:50:07,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835720.0, ans=0.1 2024-09-20 06:50:10,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=835760.0, ans=0.125 2024-09-20 06:50:10,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=12.0 2024-09-20 06:50:24,780 INFO [train.py:1198] (0/2) Epoch 47, batch 800, loss[loss=0.209, ctc_loss=0.09147, cr_loss=0.3117, attn_decoder_loss=0.2151, over 29636.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.108, cr_loss=0.3464, attn_decoder_loss=0.2365, over 5706406.34 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 32.0 2024-09-20 06:50:28,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=835800.0, ans=0.0 2024-09-20 06:50:35,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-09-20 06:50:41,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.74 vs. limit=15.0 2024-09-20 06:50:51,994 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.74 vs. limit=10.0 2024-09-20 06:51:18,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=835920.0, ans=0.025 2024-09-20 06:51:34,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=835960.0, ans=0.5 2024-09-20 06:51:41,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=836000.0, ans=0.125 2024-09-20 06:51:42,576 INFO [train.py:1198] (0/2) Epoch 47, batch 850, loss[loss=0.257, ctc_loss=0.1254, cr_loss=0.3694, attn_decoder_loss=0.2634, over 29676.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1077, cr_loss=0.3458, attn_decoder_loss=0.236, over 5736362.55 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 06:51:47,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=836000.0, ans=0.0 2024-09-20 06:52:06,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=836040.0, ans=0.125 2024-09-20 06:52:08,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=836040.0, ans=0.125 2024-09-20 06:52:21,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=836080.0, ans=0.0 2024-09-20 06:52:22,245 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.624e+01 9.106e+01 9.735e+01 2.135e+02, threshold=1.821e+02, percent-clipped=1.0 2024-09-20 06:52:49,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.60 vs. limit=22.5 2024-09-20 06:53:00,277 INFO [train.py:1198] (0/2) Epoch 47, batch 900, loss[loss=0.2056, ctc_loss=0.08506, cr_loss=0.2924, attn_decoder_loss=0.2125, over 29593.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1077, cr_loss=0.3457, attn_decoder_loss=0.2361, over 5741336.32 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:53:16,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=836240.0, ans=6.0 2024-09-20 06:53:25,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=836240.0, ans=0.125 2024-09-20 06:53:51,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=836320.0, ans=0.125 2024-09-20 06:54:01,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=836360.0, ans=0.2 2024-09-20 06:54:05,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=836360.0, ans=0.125 2024-09-20 06:54:15,111 INFO [train.py:1198] (0/2) Epoch 47, batch 950, loss[loss=0.2099, ctc_loss=0.08923, cr_loss=0.3013, attn_decoder_loss=0.2166, over 29517.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1077, cr_loss=0.3456, attn_decoder_loss=0.2362, over 5744613.86 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:54:33,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.68 vs. limit=12.0 2024-09-20 06:54:40,265 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:54:56,610 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.702e+01 9.182e+01 9.933e+01 3.090e+02, threshold=1.836e+02, percent-clipped=1.0 2024-09-20 06:55:12,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836520.0, ans=0.1 2024-09-20 06:55:13,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=836520.0, ans=0.0 2024-09-20 06:55:32,629 INFO [train.py:1198] (0/2) Epoch 47, batch 1000, loss[loss=0.23, ctc_loss=0.1211, cr_loss=0.376, attn_decoder_loss=0.2338, over 29488.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1087, cr_loss=0.3474, attn_decoder_loss=0.237, over 5739433.85 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:55:44,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=836600.0, ans=0.0 2024-09-20 06:55:45,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=836600.0, ans=0.0 2024-09-20 06:55:47,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-20 06:55:52,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=836640.0, ans=0.0 2024-09-20 06:56:18,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=836720.0, ans=0.125 2024-09-20 06:56:26,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836720.0, ans=0.1 2024-09-20 06:56:46,455 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.89 vs. limit=10.0 2024-09-20 06:56:50,186 INFO [train.py:1198] (0/2) Epoch 47, batch 1050, loss[loss=0.2365, ctc_loss=0.1028, cr_loss=0.3264, attn_decoder_loss=0.2441, over 29687.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1084, cr_loss=0.347, attn_decoder_loss=0.2366, over 5746373.95 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:57:04,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=836840.0, ans=0.125 2024-09-20 06:57:11,056 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2024-09-20 06:57:14,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=836840.0, ans=0.125 2024-09-20 06:57:27,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=836880.0, ans=0.0 2024-09-20 06:57:29,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 8.620e+01 9.050e+01 9.577e+01 1.323e+02, threshold=1.810e+02, percent-clipped=0.0 2024-09-20 06:57:34,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=836920.0, ans=0.125 2024-09-20 06:57:37,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=836920.0, ans=0.125 2024-09-20 06:57:39,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836920.0, ans=0.0 2024-09-20 06:57:43,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=836920.0, ans=0.125 2024-09-20 06:58:05,795 INFO [train.py:1198] (0/2) Epoch 47, batch 1100, loss[loss=0.2202, ctc_loss=0.09893, cr_loss=0.3257, attn_decoder_loss=0.2264, over 29464.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1083, cr_loss=0.3462, attn_decoder_loss=0.2365, over 5758359.19 frames. ], batch size: 78, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:58:07,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=837000.0, ans=0.125 2024-09-20 06:58:18,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=837000.0, ans=0.1 2024-09-20 06:58:26,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=837040.0, ans=0.04949747468305833 2024-09-20 06:58:31,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=837040.0, ans=0.1 2024-09-20 06:58:35,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=837040.0, ans=0.1 2024-09-20 06:58:58,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=837120.0, ans=0.1 2024-09-20 06:58:59,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837120.0, ans=0.125 2024-09-20 06:59:01,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=837120.0, ans=0.125 2024-09-20 06:59:23,430 INFO [train.py:1198] (0/2) Epoch 47, batch 1150, loss[loss=0.2206, ctc_loss=0.1006, cr_loss=0.335, attn_decoder_loss=0.2264, over 29434.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1081, cr_loss=0.3457, attn_decoder_loss=0.2361, over 5756846.31 frames. ], batch size: 78, lr: 2.35e-03, grad_scale: 8.0 2024-09-20 06:59:46,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837240.0, ans=0.125 2024-09-20 07:00:02,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=837280.0, ans=10.0 2024-09-20 07:00:04,978 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.402e+01 9.011e+01 9.514e+01 2.556e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 07:00:24,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=837360.0, ans=0.125 2024-09-20 07:00:31,959 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:00:40,655 INFO [train.py:1198] (0/2) Epoch 47, batch 1200, loss[loss=0.2433, ctc_loss=0.1158, cr_loss=0.349, attn_decoder_loss=0.2497, over 29679.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1084, cr_loss=0.346, attn_decoder_loss=0.237, over 5748968.84 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 07:00:44,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=837400.0, ans=0.125 2024-09-20 07:00:48,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=837400.0, ans=0.125 2024-09-20 07:01:09,774 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:01:14,736 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.78 vs. limit=15.0 2024-09-20 07:01:21,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=837480.0, ans=0.125 2024-09-20 07:01:56,185 INFO [train.py:1198] (0/2) Epoch 47, batch 1250, loss[loss=0.2433, ctc_loss=0.1209, cr_loss=0.3763, attn_decoder_loss=0.2485, over 29513.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1087, cr_loss=0.3468, attn_decoder_loss=0.2374, over 5776280.12 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 16.0 2024-09-20 07:02:06,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=837600.0, ans=0.09899494936611666 2024-09-20 07:02:24,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=837640.0, ans=0.0 2024-09-20 07:02:26,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-20 07:02:35,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-20 07:02:37,735 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.615e+01 9.145e+01 9.650e+01 1.333e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-20 07:03:13,888 INFO [train.py:1198] (0/2) Epoch 47, batch 1300, loss[loss=0.2437, ctc_loss=0.1125, cr_loss=0.3518, attn_decoder_loss=0.2505, over 28380.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1085, cr_loss=0.3467, attn_decoder_loss=0.2368, over 5781296.89 frames. ], batch size: 112, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:03:17,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.17 vs. limit=22.5 2024-09-20 07:03:35,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=837840.0, ans=0.0 2024-09-20 07:04:07,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=837920.0, ans=0.125 2024-09-20 07:04:10,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=837920.0, ans=0.0 2024-09-20 07:04:25,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=12.0 2024-09-20 07:04:25,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=22.5 2024-09-20 07:04:29,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=837960.0, ans=10.0 2024-09-20 07:04:31,984 INFO [train.py:1198] (0/2) Epoch 47, batch 1350, loss[loss=0.2309, ctc_loss=0.1089, cr_loss=0.3386, attn_decoder_loss=0.237, over 29769.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1083, cr_loss=0.3463, attn_decoder_loss=0.2368, over 5797616.44 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:04:41,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=838000.0, ans=0.125 2024-09-20 07:04:45,688 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:05:03,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=838080.0, ans=0.125 2024-09-20 07:05:03,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=838080.0, ans=0.0 2024-09-20 07:05:10,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.374e+01 8.876e+01 9.629e+01 1.227e+02, threshold=1.775e+02, percent-clipped=0.0 2024-09-20 07:05:34,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=838160.0, ans=0.125 2024-09-20 07:05:39,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=838160.0, ans=10.0 2024-09-20 07:05:46,452 INFO [train.py:1198] (0/2) Epoch 47, batch 1400, loss[loss=0.1946, ctc_loss=0.0817, cr_loss=0.2936, attn_decoder_loss=0.2006, over 29583.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1082, cr_loss=0.3467, attn_decoder_loss=0.2365, over 5808545.29 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:05:57,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.14 vs. limit=15.0 2024-09-20 07:06:09,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838240.0, ans=0.1 2024-09-20 07:06:15,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=838240.0, ans=0.1 2024-09-20 07:06:22,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-20 07:06:47,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=838360.0, ans=0.0 2024-09-20 07:07:03,992 INFO [train.py:1198] (0/2) Epoch 47, batch 1450, loss[loss=0.2473, ctc_loss=0.1259, cr_loss=0.3832, attn_decoder_loss=0.2522, over 29434.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1085, cr_loss=0.3478, attn_decoder_loss=0.2371, over 5805254.63 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:07:35,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=838480.0, ans=0.125 2024-09-20 07:07:45,072 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.627e+01 9.137e+01 9.746e+01 6.249e+02, threshold=1.827e+02, percent-clipped=1.0 2024-09-20 07:07:49,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=838520.0, ans=0.125 2024-09-20 07:07:49,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=838520.0, ans=0.125 2024-09-20 07:07:56,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.81 vs. limit=10.0 2024-09-20 07:07:57,905 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.92 vs. limit=22.5 2024-09-20 07:08:00,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=838520.0, ans=0.02 2024-09-20 07:08:06,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=838560.0, ans=0.0 2024-09-20 07:08:09,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=838560.0, ans=0.125 2024-09-20 07:08:09,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.36 vs. limit=12.0 2024-09-20 07:08:11,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=838560.0, ans=0.0 2024-09-20 07:08:16,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=838560.0, ans=0.125 2024-09-20 07:08:20,890 INFO [train.py:1198] (0/2) Epoch 47, batch 1500, loss[loss=0.2487, ctc_loss=0.1132, cr_loss=0.3669, attn_decoder_loss=0.2556, over 29636.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1086, cr_loss=0.3476, attn_decoder_loss=0.2373, over 5805791.52 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:08:31,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=838600.0, ans=0.125 2024-09-20 07:08:50,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.86 vs. limit=10.0 2024-09-20 07:08:56,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=838680.0, ans=0.0 2024-09-20 07:09:02,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=838680.0, ans=0.2 2024-09-20 07:09:02,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-09-20 07:09:23,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=838760.0, ans=0.025 2024-09-20 07:09:27,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2024-09-20 07:09:36,604 INFO [train.py:1198] (0/2) Epoch 47, batch 1550, loss[loss=0.2413, ctc_loss=0.1192, cr_loss=0.3704, attn_decoder_loss=0.2467, over 29513.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1089, cr_loss=0.3481, attn_decoder_loss=0.2375, over 5781840.48 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:09:38,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=838800.0, ans=0.125 2024-09-20 07:09:49,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.34 vs. limit=10.0 2024-09-20 07:09:59,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=838840.0, ans=0.0 2024-09-20 07:10:09,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=838880.0, ans=0.125 2024-09-20 07:10:17,554 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.567e+01 9.122e+01 9.785e+01 2.024e+02, threshold=1.824e+02, percent-clipped=1.0 2024-09-20 07:10:37,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838960.0, ans=0.1 2024-09-20 07:10:53,714 INFO [train.py:1198] (0/2) Epoch 47, batch 1600, loss[loss=0.2385, ctc_loss=0.1111, cr_loss=0.3582, attn_decoder_loss=0.2447, over 29685.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.109, cr_loss=0.3479, attn_decoder_loss=0.2372, over 5763786.92 frames. ], batch size: 85, lr: 2.34e-03, grad_scale: 32.0 2024-09-20 07:10:54,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=839000.0, ans=0.125 2024-09-20 07:11:20,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=839040.0, ans=0.025 2024-09-20 07:11:39,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=839120.0, ans=0.0 2024-09-20 07:11:50,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.25 vs. limit=15.0 2024-09-20 07:11:54,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=839160.0, ans=0.125 2024-09-20 07:11:54,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=839160.0, ans=0.125 2024-09-20 07:11:56,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=839160.0, ans=0.2 2024-09-20 07:11:57,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=839160.0, ans=0.125 2024-09-20 07:12:11,193 INFO [train.py:1198] (0/2) Epoch 47, batch 1650, loss[loss=0.2428, ctc_loss=0.1126, cr_loss=0.3533, attn_decoder_loss=0.2495, over 29709.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1087, cr_loss=0.3472, attn_decoder_loss=0.2368, over 5758169.05 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:12:17,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=839200.0, ans=0.125 2024-09-20 07:12:20,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839200.0, ans=0.1 2024-09-20 07:12:25,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.07 vs. limit=15.0 2024-09-20 07:12:47,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=839280.0, ans=0.0 2024-09-20 07:12:52,013 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.634e+01 9.046e+01 9.641e+01 2.969e+02, threshold=1.809e+02, percent-clipped=2.0 2024-09-20 07:12:57,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=839320.0, ans=0.125 2024-09-20 07:13:19,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=839360.0, ans=0.125 2024-09-20 07:13:23,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.95 vs. limit=15.0 2024-09-20 07:13:26,655 INFO [train.py:1198] (0/2) Epoch 47, batch 1700, loss[loss=0.1974, ctc_loss=0.08603, cr_loss=0.3055, attn_decoder_loss=0.203, over 29602.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1081, cr_loss=0.346, attn_decoder_loss=0.2366, over 5779581.65 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:13:26,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=839400.0, ans=0.5 2024-09-20 07:13:36,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-09-20 07:14:02,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.28 vs. limit=10.0 2024-09-20 07:14:06,557 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:14:39,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-09-20 07:14:43,708 INFO [train.py:1198] (0/2) Epoch 47, batch 1750, loss[loss=0.2076, ctc_loss=0.09166, cr_loss=0.3124, attn_decoder_loss=0.2136, over 29309.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1078, cr_loss=0.3459, attn_decoder_loss=0.2364, over 5786334.70 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:14:45,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=839600.0, ans=0.07 2024-09-20 07:15:04,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-20 07:15:07,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-20 07:15:26,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.597e+01 9.141e+01 9.828e+01 1.386e+02, threshold=1.828e+02, percent-clipped=0.0 2024-09-20 07:15:34,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=839720.0, ans=0.125 2024-09-20 07:15:37,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=839720.0, ans=0.125 2024-09-20 07:16:00,712 INFO [train.py:1198] (0/2) Epoch 47, batch 1800, loss[loss=0.2422, ctc_loss=0.118, cr_loss=0.3749, attn_decoder_loss=0.2477, over 29679.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1083, cr_loss=0.3471, attn_decoder_loss=0.2367, over 5790396.59 frames. ], batch size: 83, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:16:03,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-20 07:16:13,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=839800.0, ans=0.125 2024-09-20 07:16:22,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=839840.0, ans=0.0 2024-09-20 07:16:25,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=839840.0, ans=0.025 2024-09-20 07:16:55,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=839920.0, ans=0.125 2024-09-20 07:17:16,946 INFO [train.py:1198] (0/2) Epoch 47, batch 1850, loss[loss=0.2395, ctc_loss=0.1163, cr_loss=0.3517, attn_decoder_loss=0.2454, over 29628.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1082, cr_loss=0.3471, attn_decoder_loss=0.2367, over 5795429.22 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:17:24,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=840000.0, ans=0.0 2024-09-20 07:17:29,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=840000.0, ans=0.0 2024-09-20 07:17:39,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=22.5 2024-09-20 07:18:01,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.631e+01 9.188e+01 9.651e+01 1.430e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-20 07:18:34,074 INFO [train.py:1198] (0/2) Epoch 47, batch 1900, loss[loss=0.2367, ctc_loss=0.1061, cr_loss=0.3406, attn_decoder_loss=0.2437, over 29700.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1085, cr_loss=0.348, attn_decoder_loss=0.2375, over 5803245.97 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:18:36,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-20 07:18:39,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2024-09-20 07:18:41,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=840200.0, ans=0.125 2024-09-20 07:18:42,139 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=22.5 2024-09-20 07:18:52,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=840240.0, ans=0.125 2024-09-20 07:19:08,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=840280.0, ans=0.02 2024-09-20 07:19:12,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=840280.0, ans=0.125 2024-09-20 07:19:21,886 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:19:32,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=840320.0, ans=0.0 2024-09-20 07:19:35,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=840360.0, ans=0.0 2024-09-20 07:19:51,436 INFO [train.py:1198] (0/2) Epoch 47, batch 1950, loss[loss=0.2259, ctc_loss=0.09598, cr_loss=0.3169, attn_decoder_loss=0.2333, over 29474.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1091, cr_loss=0.3495, attn_decoder_loss=0.2384, over 5818546.31 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:19:54,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=840400.0, ans=0.04949747468305833 2024-09-20 07:20:05,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=840440.0, ans=0.125 2024-09-20 07:20:06,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=840440.0, ans=0.0 2024-09-20 07:20:07,396 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-09-20 07:20:33,255 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.893e+01 9.385e+01 9.948e+01 2.061e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-20 07:20:48,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=840520.0, ans=0.1 2024-09-20 07:20:54,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=840560.0, ans=0.0 2024-09-20 07:20:59,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=840560.0, ans=0.125 2024-09-20 07:21:06,476 INFO [train.py:1198] (0/2) Epoch 47, batch 2000, loss[loss=0.201, ctc_loss=0.0855, cr_loss=0.3055, attn_decoder_loss=0.2071, over 29322.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1092, cr_loss=0.3496, attn_decoder_loss=0.2384, over 5797390.64 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:21:16,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=840600.0, ans=0.05 2024-09-20 07:21:25,707 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=15.0 2024-09-20 07:21:28,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=840640.0, ans=0.025 2024-09-20 07:22:02,230 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-20 07:22:10,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=840760.0, ans=0.1 2024-09-20 07:22:24,173 INFO [train.py:1198] (0/2) Epoch 47, batch 2050, loss[loss=0.2023, ctc_loss=0.08672, cr_loss=0.3017, attn_decoder_loss=0.2085, over 29452.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1086, cr_loss=0.3479, attn_decoder_loss=0.2375, over 5789306.08 frames. ], batch size: 70, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:22:24,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=840800.0, ans=0.04949747468305833 2024-09-20 07:22:41,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=12.0 2024-09-20 07:22:55,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=840880.0, ans=0.0 2024-09-20 07:22:55,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=840880.0, ans=0.0 2024-09-20 07:22:59,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=840880.0, ans=0.0 2024-09-20 07:23:09,568 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.643e+01 9.120e+01 9.477e+01 1.642e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 07:23:19,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-09-20 07:23:23,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=840920.0, ans=0.2 2024-09-20 07:23:26,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=840960.0, ans=0.125 2024-09-20 07:23:41,213 INFO [train.py:1198] (0/2) Epoch 47, batch 2100, loss[loss=0.2324, ctc_loss=0.1097, cr_loss=0.3547, attn_decoder_loss=0.2382, over 29733.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1083, cr_loss=0.347, attn_decoder_loss=0.237, over 5800064.23 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:23:43,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-20 07:23:44,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=841000.0, ans=0.1 2024-09-20 07:24:03,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.80 vs. limit=12.0 2024-09-20 07:24:04,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=841040.0, ans=0.07 2024-09-20 07:24:08,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=841040.0, ans=0.0 2024-09-20 07:24:10,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=841080.0, ans=0.1 2024-09-20 07:24:14,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=841080.0, ans=0.2 2024-09-20 07:24:27,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-20 07:24:29,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=841120.0, ans=0.125 2024-09-20 07:24:29,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-20 07:24:40,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.16 vs. limit=15.0 2024-09-20 07:24:49,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=841160.0, ans=0.0 2024-09-20 07:24:53,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=841160.0, ans=0.125 2024-09-20 07:24:56,334 INFO [train.py:1198] (0/2) Epoch 47, batch 2150, loss[loss=0.2276, ctc_loss=0.1109, cr_loss=0.3467, attn_decoder_loss=0.2328, over 29448.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1078, cr_loss=0.3463, attn_decoder_loss=0.2363, over 5814989.13 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:25:11,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=841240.0, ans=0.2 2024-09-20 07:25:21,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=15.0 2024-09-20 07:25:39,992 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.572e+01 9.031e+01 9.738e+01 1.571e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-20 07:25:41,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=841320.0, ans=0.2 2024-09-20 07:25:44,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=841320.0, ans=0.025 2024-09-20 07:25:45,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2024-09-20 07:25:47,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=841320.0, ans=0.0 2024-09-20 07:25:48,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=12.0 2024-09-20 07:25:51,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=841320.0, ans=0.0 2024-09-20 07:25:52,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=841320.0, ans=0.2 2024-09-20 07:26:08,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=841360.0, ans=0.2 2024-09-20 07:26:12,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=841400.0, ans=0.125 2024-09-20 07:26:13,833 INFO [train.py:1198] (0/2) Epoch 47, batch 2200, loss[loss=0.2406, ctc_loss=0.1162, cr_loss=0.3563, attn_decoder_loss=0.2466, over 29640.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1079, cr_loss=0.3463, attn_decoder_loss=0.2364, over 5810924.36 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:26:15,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=841400.0, ans=0.2 2024-09-20 07:26:18,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=841400.0, ans=0.125 2024-09-20 07:26:23,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=841400.0, ans=0.125 2024-09-20 07:26:35,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-09-20 07:26:41,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=841440.0, ans=0.0 2024-09-20 07:26:57,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=841480.0, ans=0.125 2024-09-20 07:27:06,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.83 vs. limit=15.0 2024-09-20 07:27:13,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=841520.0, ans=0.0 2024-09-20 07:27:31,966 INFO [train.py:1198] (0/2) Epoch 47, batch 2250, loss[loss=0.2368, ctc_loss=0.1093, cr_loss=0.3317, attn_decoder_loss=0.2436, over 29732.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1077, cr_loss=0.3458, attn_decoder_loss=0.2363, over 5811443.11 frames. ], batch size: 82, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:27:47,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=841640.0, ans=0.125 2024-09-20 07:27:49,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-20 07:28:02,670 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.66 vs. limit=22.5 2024-09-20 07:28:03,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=841680.0, ans=0.0 2024-09-20 07:28:09,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=841680.0, ans=0.125 2024-09-20 07:28:14,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=841680.0, ans=0.2 2024-09-20 07:28:15,524 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.630e+01 9.023e+01 9.514e+01 2.412e+02, threshold=1.805e+02, percent-clipped=2.0 2024-09-20 07:28:28,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-09-20 07:28:34,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2024-09-20 07:28:38,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=841760.0, ans=0.2 2024-09-20 07:28:42,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=841760.0, ans=0.125 2024-09-20 07:28:47,112 INFO [train.py:1198] (0/2) Epoch 47, batch 2300, loss[loss=0.2141, ctc_loss=0.1049, cr_loss=0.3409, attn_decoder_loss=0.2187, over 29303.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1073, cr_loss=0.3445, attn_decoder_loss=0.2356, over 5797323.47 frames. ], batch size: 71, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:29:00,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841840.0, ans=0.1 2024-09-20 07:29:00,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=841840.0, ans=10.0 2024-09-20 07:29:28,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=841880.0, ans=0.07 2024-09-20 07:29:29,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=841880.0, ans=0.0 2024-09-20 07:29:34,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-09-20 07:29:58,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=841960.0, ans=0.125 2024-09-20 07:29:59,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=841960.0, ans=0.0 2024-09-20 07:30:04,682 INFO [train.py:1198] (0/2) Epoch 47, batch 2350, loss[loss=0.2444, ctc_loss=0.1239, cr_loss=0.3836, attn_decoder_loss=0.2493, over 29698.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1076, cr_loss=0.3454, attn_decoder_loss=0.2357, over 5802668.85 frames. ], batch size: 83, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:30:10,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=842000.0, ans=0.1 2024-09-20 07:30:12,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842000.0, ans=0.1 2024-09-20 07:30:50,388 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 8.800e+01 9.287e+01 9.916e+01 3.475e+02, threshold=1.857e+02, percent-clipped=2.0 2024-09-20 07:30:51,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2024-09-20 07:31:18,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=842160.0, ans=0.125 2024-09-20 07:31:22,263 INFO [train.py:1198] (0/2) Epoch 47, batch 2400, loss[loss=0.2253, ctc_loss=0.1059, cr_loss=0.3552, attn_decoder_loss=0.2307, over 29526.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1081, cr_loss=0.3465, attn_decoder_loss=0.2364, over 5807643.08 frames. ], batch size: 76, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:31:26,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.36 vs. limit=15.0 2024-09-20 07:31:31,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=842200.0, ans=0.95 2024-09-20 07:32:00,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=842280.0, ans=0.0 2024-09-20 07:32:07,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-09-20 07:32:15,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=842320.0, ans=0.0 2024-09-20 07:32:17,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=842320.0, ans=0.125 2024-09-20 07:32:20,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=842320.0, ans=0.04949747468305833 2024-09-20 07:32:27,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=842360.0, ans=0.0 2024-09-20 07:32:33,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=842360.0, ans=0.125 2024-09-20 07:32:38,367 INFO [train.py:1198] (0/2) Epoch 47, batch 2450, loss[loss=0.2331, ctc_loss=0.1081, cr_loss=0.3441, attn_decoder_loss=0.2394, over 29718.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1086, cr_loss=0.3469, attn_decoder_loss=0.2373, over 5784330.63 frames. ], batch size: 82, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:33:03,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2024-09-20 07:33:05,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=842440.0, ans=0.125 2024-09-20 07:33:05,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=842440.0, ans=0.04949747468305833 2024-09-20 07:33:07,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842480.0, ans=0.1 2024-09-20 07:33:21,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.574e+01 9.096e+01 9.798e+01 1.804e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 07:33:29,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=842520.0, ans=0.0 2024-09-20 07:33:48,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=842560.0, ans=0.125 2024-09-20 07:33:49,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=842560.0, ans=0.125 2024-09-20 07:33:55,453 INFO [train.py:1198] (0/2) Epoch 47, batch 2500, loss[loss=0.251, ctc_loss=0.1232, cr_loss=0.3979, attn_decoder_loss=0.2563, over 29649.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1089, cr_loss=0.3474, attn_decoder_loss=0.2376, over 5794327.35 frames. ], batch size: 86, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:33:59,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-09-20 07:34:03,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=842600.0, ans=0.125 2024-09-20 07:34:09,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=842640.0, ans=0.0 2024-09-20 07:34:16,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842640.0, ans=0.1 2024-09-20 07:34:21,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=842640.0, ans=0.0 2024-09-20 07:34:25,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-20 07:34:35,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=842680.0, ans=0.125 2024-09-20 07:35:13,292 INFO [train.py:1198] (0/2) Epoch 47, batch 2550, loss[loss=0.2014, ctc_loss=0.08292, cr_loss=0.2991, attn_decoder_loss=0.208, over 29356.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1083, cr_loss=0.3464, attn_decoder_loss=0.2372, over 5797609.64 frames. ], batch size: 67, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:35:44,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-20 07:35:49,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=842880.0, ans=12.0 2024-09-20 07:35:51,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=842880.0, ans=0.2 2024-09-20 07:35:54,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=842880.0, ans=0.2 2024-09-20 07:35:56,714 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.685e+01 9.047e+01 9.681e+01 1.454e+02, threshold=1.809e+02, percent-clipped=0.0 2024-09-20 07:36:28,672 INFO [train.py:1198] (0/2) Epoch 47, batch 2600, loss[loss=0.2231, ctc_loss=0.09966, cr_loss=0.3286, attn_decoder_loss=0.2295, over 29439.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1084, cr_loss=0.3469, attn_decoder_loss=0.2375, over 5794394.49 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:36:54,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=843040.0, ans=0.125 2024-09-20 07:36:57,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=843080.0, ans=0.1 2024-09-20 07:36:59,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2024-09-20 07:37:34,071 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.91 vs. limit=12.0 2024-09-20 07:37:46,284 INFO [train.py:1198] (0/2) Epoch 47, batch 2650, loss[loss=0.2397, ctc_loss=0.114, cr_loss=0.3714, attn_decoder_loss=0.2454, over 29252.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1082, cr_loss=0.3468, attn_decoder_loss=0.2376, over 5801649.14 frames. ], batch size: 100, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:38:00,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.96 vs. limit=22.5 2024-09-20 07:38:04,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=843240.0, ans=0.0 2024-09-20 07:38:07,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=843240.0, ans=0.0 2024-09-20 07:38:31,922 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.688e+01 9.037e+01 9.488e+01 1.743e+02, threshold=1.807e+02, percent-clipped=0.0 2024-09-20 07:38:35,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2024-09-20 07:39:03,847 INFO [train.py:1198] (0/2) Epoch 47, batch 2700, loss[loss=0.2387, ctc_loss=0.1125, cr_loss=0.3602, attn_decoder_loss=0.2447, over 29513.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1084, cr_loss=0.3472, attn_decoder_loss=0.2376, over 5797523.32 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:39:13,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=843400.0, ans=0.125 2024-09-20 07:39:19,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=843440.0, ans=0.0 2024-09-20 07:39:47,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=843520.0, ans=0.2 2024-09-20 07:40:04,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=843560.0, ans=0.125 2024-09-20 07:40:16,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=843560.0, ans=0.2 2024-09-20 07:40:19,196 INFO [train.py:1198] (0/2) Epoch 47, batch 2750, loss[loss=0.216, ctc_loss=0.1061, cr_loss=0.3476, attn_decoder_loss=0.2205, over 29523.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1078, cr_loss=0.3459, attn_decoder_loss=0.2366, over 5794815.33 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:40:33,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=12.0 2024-09-20 07:40:36,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=843640.0, ans=0.0 2024-09-20 07:41:02,935 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.503e+01 8.699e+01 9.178e+01 9.870e+01 7.766e+02, threshold=1.836e+02, percent-clipped=3.0 2024-09-20 07:41:10,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2024-09-20 07:41:12,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=843720.0, ans=0.125 2024-09-20 07:41:21,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=843760.0, ans=0.0 2024-09-20 07:41:37,199 INFO [train.py:1198] (0/2) Epoch 47, batch 2800, loss[loss=0.2641, ctc_loss=0.1492, cr_loss=0.4008, attn_decoder_loss=0.2679, over 19412.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1081, cr_loss=0.3462, attn_decoder_loss=0.2369, over 5775943.44 frames. ], batch size: 210, lr: 2.34e-03, grad_scale: 32.0 2024-09-20 07:41:54,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=843840.0, ans=0.125 2024-09-20 07:42:06,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=843840.0, ans=0.0 2024-09-20 07:42:14,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=843880.0, ans=0.125 2024-09-20 07:42:17,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2024-09-20 07:42:23,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=843920.0, ans=0.125 2024-09-20 07:42:53,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=844000.0, ans=0.0 2024-09-20 07:42:54,619 INFO [train.py:1198] (0/2) Epoch 47, batch 2850, loss[loss=0.2241, ctc_loss=0.1029, cr_loss=0.3313, attn_decoder_loss=0.2302, over 29494.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1084, cr_loss=0.3469, attn_decoder_loss=0.2376, over 5762714.52 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:43:04,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=844000.0, ans=0.0 2024-09-20 07:43:25,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=844080.0, ans=0.1 2024-09-20 07:43:32,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=844080.0, ans=0.025 2024-09-20 07:43:32,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=844080.0, ans=0.0 2024-09-20 07:43:37,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=844080.0, ans=0.125 2024-09-20 07:43:39,707 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.812e+01 9.340e+01 9.979e+01 3.635e+02, threshold=1.868e+02, percent-clipped=1.0 2024-09-20 07:43:49,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.66 vs. limit=22.5 2024-09-20 07:44:01,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=844160.0, ans=0.2 2024-09-20 07:44:04,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=844160.0, ans=0.0 2024-09-20 07:44:09,787 INFO [train.py:1198] (0/2) Epoch 47, batch 2900, loss[loss=0.233, ctc_loss=0.1153, cr_loss=0.368, attn_decoder_loss=0.2379, over 29417.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.109, cr_loss=0.3486, attn_decoder_loss=0.2388, over 5787896.81 frames. ], batch size: 79, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:45:27,382 INFO [train.py:1198] (0/2) Epoch 47, batch 2950, loss[loss=0.226, ctc_loss=0.1101, cr_loss=0.3562, attn_decoder_loss=0.2309, over 29518.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1081, cr_loss=0.3467, attn_decoder_loss=0.2375, over 5781945.55 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 16.0 2024-09-20 07:45:32,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.59 vs. limit=15.0 2024-09-20 07:45:35,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=844400.0, ans=0.0 2024-09-20 07:45:59,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=844480.0, ans=0.1 2024-09-20 07:46:05,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.73 vs. limit=6.0 2024-09-20 07:46:14,754 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.670e+01 9.173e+01 9.775e+01 4.031e+02, threshold=1.835e+02, percent-clipped=2.0 2024-09-20 07:46:36,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2024-09-20 07:46:45,068 INFO [train.py:1198] (0/2) Epoch 47, batch 3000, loss[loss=0.2306, ctc_loss=0.102, cr_loss=0.3381, attn_decoder_loss=0.2374, over 29772.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1084, cr_loss=0.3472, attn_decoder_loss=0.2373, over 5782197.46 frames. ], batch size: 81, lr: 2.34e-03, grad_scale: 8.0 2024-09-20 07:46:45,069 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 07:47:03,444 INFO [train.py:1230] (0/2) Epoch 47, validation: loss=0.2127, ctc_loss=0.03692, cr_loss=6.538e-15, attn_decoder_loss=0.2323, over 944034.00 frames. 2024-09-20 07:47:03,445 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 07:47:15,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=844600.0, ans=0.2 2024-09-20 07:47:29,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=844640.0, ans=0.0 2024-09-20 07:47:35,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=844680.0, ans=0.0 2024-09-20 07:47:36,093 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.39 vs. limit=22.5 2024-09-20 07:47:38,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=844680.0, ans=0.125 2024-09-20 07:47:46,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=844680.0, ans=0.1 2024-09-20 07:48:16,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=844760.0, ans=0.0 2024-09-20 07:48:16,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=844760.0, ans=0.125 2024-09-20 07:48:19,546 INFO [train.py:1198] (0/2) Epoch 47, batch 3050, loss[loss=0.2301, ctc_loss=0.1077, cr_loss=0.3399, attn_decoder_loss=0.2361, over 29530.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1089, cr_loss=0.3479, attn_decoder_loss=0.2382, over 5776807.71 frames. ], batch size: 76, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 07:48:22,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=844800.0, ans=0.0 2024-09-20 07:48:36,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=844840.0, ans=0.0 2024-09-20 07:48:53,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=844880.0, ans=0.0 2024-09-20 07:49:06,338 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.765e+01 9.371e+01 9.890e+01 2.296e+02, threshold=1.874e+02, percent-clipped=1.0 2024-09-20 07:49:38,882 INFO [train.py:1198] (0/2) Epoch 47, batch 3100, loss[loss=0.246, ctc_loss=0.117, cr_loss=0.3721, attn_decoder_loss=0.2521, over 29280.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1091, cr_loss=0.3484, attn_decoder_loss=0.238, over 5776110.11 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 07:50:04,498 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:50:07,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=845080.0, ans=0.0 2024-09-20 07:50:16,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=845080.0, ans=0.2 2024-09-20 07:50:16,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-20 07:50:22,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=845120.0, ans=0.0 2024-09-20 07:50:27,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=845120.0, ans=0.1 2024-09-20 07:50:33,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=845120.0, ans=0.025 2024-09-20 07:50:33,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.92 vs. limit=10.0 2024-09-20 07:50:54,270 INFO [train.py:1198] (0/2) Epoch 47, batch 3150, loss[loss=0.2542, ctc_loss=0.1229, cr_loss=0.3719, attn_decoder_loss=0.2605, over 28861.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1088, cr_loss=0.3477, attn_decoder_loss=0.2379, over 5783552.70 frames. ], batch size: 104, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 07:50:57,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=845200.0, ans=0.07 2024-09-20 07:51:11,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=845240.0, ans=0.1 2024-09-20 07:51:23,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=845280.0, ans=0.0 2024-09-20 07:51:25,127 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:51:33,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845280.0, ans=0.1 2024-09-20 07:51:33,943 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:51:36,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=845280.0, ans=0.1 2024-09-20 07:51:40,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.583e+01 9.160e+01 9.723e+01 3.463e+02, threshold=1.832e+02, percent-clipped=1.0 2024-09-20 07:51:55,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=845360.0, ans=0.0 2024-09-20 07:52:09,911 INFO [train.py:1198] (0/2) Epoch 47, batch 3200, loss[loss=0.2173, ctc_loss=0.09778, cr_loss=0.3356, attn_decoder_loss=0.2231, over 29781.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1085, cr_loss=0.3471, attn_decoder_loss=0.2372, over 5795303.42 frames. ], batch size: 80, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:52:17,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=845400.0, ans=0.07 2024-09-20 07:52:19,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=845400.0, ans=0.125 2024-09-20 07:52:27,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.69 vs. limit=15.0 2024-09-20 07:52:37,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=845440.0, ans=0.125 2024-09-20 07:52:52,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=845480.0, ans=0.125 2024-09-20 07:52:55,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=845520.0, ans=0.125 2024-09-20 07:52:58,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=845520.0, ans=0.125 2024-09-20 07:53:16,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-09-20 07:53:22,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=845560.0, ans=0.2 2024-09-20 07:53:27,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=845600.0, ans=0.125 2024-09-20 07:53:28,507 INFO [train.py:1198] (0/2) Epoch 47, batch 3250, loss[loss=0.2401, ctc_loss=0.1099, cr_loss=0.3596, attn_decoder_loss=0.2466, over 29701.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1086, cr_loss=0.3474, attn_decoder_loss=0.2377, over 5801559.56 frames. ], batch size: 84, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:53:59,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.79 vs. limit=15.0 2024-09-20 07:54:13,280 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:54:17,342 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.840e+01 9.453e+01 1.004e+02 3.254e+02, threshold=1.891e+02, percent-clipped=2.0 2024-09-20 07:54:30,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.71 vs. limit=22.5 2024-09-20 07:54:32,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.34 vs. limit=22.5 2024-09-20 07:54:37,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=845760.0, ans=0.125 2024-09-20 07:54:38,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=845760.0, ans=0.025 2024-09-20 07:54:45,561 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=12.0 2024-09-20 07:54:45,884 INFO [train.py:1198] (0/2) Epoch 47, batch 3300, loss[loss=0.2336, ctc_loss=0.101, cr_loss=0.323, attn_decoder_loss=0.2412, over 28386.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1078, cr_loss=0.3455, attn_decoder_loss=0.2364, over 5798708.90 frames. ], batch size: 111, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:55:07,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=845840.0, ans=0.125 2024-09-20 07:55:30,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=845920.0, ans=0.125 2024-09-20 07:55:43,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=845920.0, ans=0.125 2024-09-20 07:55:46,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=845960.0, ans=0.05 2024-09-20 07:55:50,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=845960.0, ans=0.0 2024-09-20 07:55:55,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=845960.0, ans=0.0 2024-09-20 07:56:00,877 INFO [train.py:1198] (0/2) Epoch 47, batch 3350, loss[loss=0.2463, ctc_loss=0.1145, cr_loss=0.3628, attn_decoder_loss=0.2529, over 28787.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1087, cr_loss=0.3474, attn_decoder_loss=0.2373, over 5776303.51 frames. ], batch size: 104, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:56:02,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=846000.0, ans=0.2 2024-09-20 07:56:05,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=846000.0, ans=0.125 2024-09-20 07:56:26,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=846040.0, ans=0.125 2024-09-20 07:56:46,664 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.13 vs. limit=22.5 2024-09-20 07:56:47,170 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.631e+01 9.228e+01 9.801e+01 1.993e+02, threshold=1.846e+02, percent-clipped=1.0 2024-09-20 07:56:50,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=846120.0, ans=0.04949747468305833 2024-09-20 07:57:20,296 INFO [train.py:1198] (0/2) Epoch 47, batch 3400, loss[loss=0.201, ctc_loss=0.08512, cr_loss=0.2924, attn_decoder_loss=0.2073, over 29351.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1088, cr_loss=0.3479, attn_decoder_loss=0.2372, over 5768704.61 frames. ], batch size: 67, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:57:26,754 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:57:30,761 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-20 07:57:35,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=846240.0, ans=10.0 2024-09-20 07:58:05,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-09-20 07:58:15,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=846320.0, ans=0.125 2024-09-20 07:58:16,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846320.0, ans=0.1 2024-09-20 07:58:26,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.28 vs. limit=12.0 2024-09-20 07:58:36,095 INFO [train.py:1198] (0/2) Epoch 47, batch 3450, loss[loss=0.2365, ctc_loss=0.1066, cr_loss=0.3315, attn_decoder_loss=0.2436, over 28414.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1089, cr_loss=0.3478, attn_decoder_loss=0.2375, over 5776812.97 frames. ], batch size: 111, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 07:58:47,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.02 vs. limit=12.0 2024-09-20 07:58:54,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=846440.0, ans=0.025 2024-09-20 07:59:12,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=846480.0, ans=10.0 2024-09-20 07:59:15,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=846480.0, ans=0.125 2024-09-20 07:59:16,875 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:59:22,444 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.718e+01 9.224e+01 9.719e+01 1.765e+02, threshold=1.845e+02, percent-clipped=0.0 2024-09-20 07:59:51,158 INFO [train.py:1198] (0/2) Epoch 47, batch 3500, loss[loss=0.2091, ctc_loss=0.08889, cr_loss=0.3034, attn_decoder_loss=0.2157, over 29301.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1085, cr_loss=0.3469, attn_decoder_loss=0.2369, over 5776512.41 frames. ], batch size: 71, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:00:02,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=846600.0, ans=0.125 2024-09-20 08:00:16,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-09-20 08:00:40,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=846720.0, ans=0.125 2024-09-20 08:01:05,576 INFO [train.py:1198] (0/2) Epoch 47, batch 3550, loss[loss=0.2457, ctc_loss=0.1146, cr_loss=0.3807, attn_decoder_loss=0.2518, over 29704.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1081, cr_loss=0.3464, attn_decoder_loss=0.2368, over 5782290.61 frames. ], batch size: 89, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:01:19,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=846840.0, ans=0.125 2024-09-20 08:01:48,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=846880.0, ans=0.0 2024-09-20 08:01:55,423 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.386e+01 8.882e+01 9.420e+01 1.531e+02, threshold=1.776e+02, percent-clipped=0.0 2024-09-20 08:02:01,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=846920.0, ans=0.0 2024-09-20 08:02:06,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2024-09-20 08:02:08,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=846960.0, ans=0.125 2024-09-20 08:02:08,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=846960.0, ans=0.125 2024-09-20 08:02:10,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.16 vs. limit=15.0 2024-09-20 08:02:23,268 INFO [train.py:1198] (0/2) Epoch 47, batch 3600, loss[loss=0.2229, ctc_loss=0.1046, cr_loss=0.3393, attn_decoder_loss=0.2285, over 29480.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1081, cr_loss=0.3466, attn_decoder_loss=0.237, over 5791866.72 frames. ], batch size: 77, lr: 2.33e-03, grad_scale: 32.0 2024-09-20 08:02:36,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=847040.0, ans=0.125 2024-09-20 08:02:47,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=847040.0, ans=0.0 2024-09-20 08:03:08,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-09-20 08:03:19,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847120.0, ans=0.1 2024-09-20 08:03:21,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.18 vs. limit=22.5 2024-09-20 08:03:28,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=847160.0, ans=10.0 2024-09-20 08:03:34,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=847160.0, ans=0.2 2024-09-20 08:03:36,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=847200.0, ans=0.0 2024-09-20 08:03:37,741 INFO [train.py:1198] (0/2) Epoch 47, batch 3650, loss[loss=0.2513, ctc_loss=0.1294, cr_loss=0.3927, attn_decoder_loss=0.2561, over 29511.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1077, cr_loss=0.3456, attn_decoder_loss=0.2365, over 5794009.18 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:03:38,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=847200.0, ans=0.125 2024-09-20 08:03:44,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=847200.0, ans=6.0 2024-09-20 08:03:57,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=847240.0, ans=0.125 2024-09-20 08:03:57,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=847240.0, ans=0.125 2024-09-20 08:04:22,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-20 08:04:26,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.685e+01 9.171e+01 9.762e+01 1.576e+02, threshold=1.834e+02, percent-clipped=0.0 2024-09-20 08:04:37,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=847360.0, ans=0.125 2024-09-20 08:04:38,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=847360.0, ans=0.125 2024-09-20 08:04:51,910 INFO [train.py:1198] (0/2) Epoch 47, batch 3700, loss[loss=0.2518, ctc_loss=0.1288, cr_loss=0.3975, attn_decoder_loss=0.2567, over 29712.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.108, cr_loss=0.3457, attn_decoder_loss=0.2369, over 5803527.79 frames. ], batch size: 84, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:05:05,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=847440.0, ans=0.04949747468305833 2024-09-20 08:05:26,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=847480.0, ans=0.125 2024-09-20 08:05:31,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=847480.0, ans=0.0 2024-09-20 08:05:38,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=847520.0, ans=0.025 2024-09-20 08:05:42,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=847520.0, ans=0.0 2024-09-20 08:05:47,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=847520.0, ans=0.2 2024-09-20 08:05:50,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=847560.0, ans=0.125 2024-09-20 08:05:54,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=847560.0, ans=0.0 2024-09-20 08:06:06,315 INFO [train.py:1198] (0/2) Epoch 47, batch 3750, loss[loss=0.2099, ctc_loss=0.1056, cr_loss=0.3263, attn_decoder_loss=0.2143, over 29356.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3452, attn_decoder_loss=0.2365, over 5807458.81 frames. ], batch size: 67, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:06:11,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=847600.0, ans=0.2 2024-09-20 08:06:39,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=847680.0, ans=0.0 2024-09-20 08:06:55,606 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.560e+01 9.134e+01 9.769e+01 1.535e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-20 08:07:15,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=22.5 2024-09-20 08:07:20,709 INFO [train.py:1198] (0/2) Epoch 47, batch 3800, loss[loss=0.24, ctc_loss=0.1073, cr_loss=0.344, attn_decoder_loss=0.2471, over 29629.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1076, cr_loss=0.3452, attn_decoder_loss=0.2364, over 5798810.27 frames. ], batch size: 86, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:07:34,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=847840.0, ans=0.0 2024-09-20 08:07:40,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=847840.0, ans=0.125 2024-09-20 08:07:46,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=847840.0, ans=0.0 2024-09-20 08:07:46,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847840.0, ans=0.1 2024-09-20 08:07:55,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=847880.0, ans=0.1 2024-09-20 08:08:04,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten.whitening_limit, batch_count=847880.0, ans=22.5 2024-09-20 08:08:19,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=847920.0, ans=0.95 2024-09-20 08:08:32,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=847960.0, ans=0.125 2024-09-20 08:08:34,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=847960.0, ans=0.125 2024-09-20 08:08:35,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=847960.0, ans=0.125 2024-09-20 08:08:37,323 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-212000.pt 2024-09-20 08:08:45,574 INFO [train.py:1198] (0/2) Epoch 47, batch 3850, loss[loss=0.2396, ctc_loss=0.1165, cr_loss=0.3619, attn_decoder_loss=0.2452, over 29233.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1073, cr_loss=0.3444, attn_decoder_loss=0.2362, over 5811828.25 frames. ], batch size: 100, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:08:45,865 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:08:56,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.86 vs. limit=10.0 2024-09-20 08:09:16,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=848080.0, ans=0.025 2024-09-20 08:09:18,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=848080.0, ans=0.07 2024-09-20 08:09:22,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848080.0, ans=0.1 2024-09-20 08:09:28,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=848120.0, ans=0.125 2024-09-20 08:09:34,450 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 8.837e+01 9.282e+01 9.780e+01 1.653e+02, threshold=1.856e+02, percent-clipped=0.0 2024-09-20 08:09:36,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=848120.0, ans=0.125 2024-09-20 08:09:48,233 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:09:59,714 INFO [train.py:1198] (0/2) Epoch 47, batch 3900, loss[loss=0.2527, ctc_loss=0.1144, cr_loss=0.3605, attn_decoder_loss=0.2601, over 29636.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1077, cr_loss=0.3455, attn_decoder_loss=0.2366, over 5816388.25 frames. ], batch size: 86, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:10:10,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=848200.0, ans=0.125 2024-09-20 08:10:26,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=848240.0, ans=0.0 2024-09-20 08:10:30,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=848280.0, ans=0.0 2024-09-20 08:10:32,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.00 vs. limit=10.0 2024-09-20 08:10:33,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=848280.0, ans=0.025 2024-09-20 08:10:41,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=848280.0, ans=0.125 2024-09-20 08:10:51,515 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:11:03,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=848360.0, ans=10.0 2024-09-20 08:11:13,513 INFO [train.py:1198] (0/2) Epoch 47, batch 3950, loss[loss=0.2482, ctc_loss=0.1291, cr_loss=0.3987, attn_decoder_loss=0.2525, over 29479.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1074, cr_loss=0.3449, attn_decoder_loss=0.2365, over 5835978.83 frames. ], batch size: 97, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:11:35,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-20 08:11:58,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=848520.0, ans=0.125 2024-09-20 08:12:02,179 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.442e+01 9.066e+01 9.725e+01 6.124e+02, threshold=1.813e+02, percent-clipped=2.0 2024-09-20 08:12:02,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=848520.0, ans=0.0 2024-09-20 08:12:27,102 INFO [train.py:1198] (0/2) Epoch 47, batch 4000, loss[loss=0.2161, ctc_loss=0.09348, cr_loss=0.3105, attn_decoder_loss=0.2229, over 29503.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1075, cr_loss=0.3453, attn_decoder_loss=0.2367, over 5812990.51 frames. ], batch size: 74, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:12:55,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2024-09-20 08:12:57,416 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-20 08:13:43,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=22.5 2024-09-20 08:13:43,952 INFO [train.py:1198] (0/2) Epoch 47, batch 4050, loss[loss=0.2516, ctc_loss=0.1303, cr_loss=0.3478, attn_decoder_loss=0.2573, over 20475.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1074, cr_loss=0.3445, attn_decoder_loss=0.2364, over 5797036.11 frames. ], batch size: 209, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:14:11,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=848880.0, ans=0.0 2024-09-20 08:14:17,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=848880.0, ans=0.0 2024-09-20 08:14:24,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=848880.0, ans=0.2 2024-09-20 08:14:30,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=848920.0, ans=0.125 2024-09-20 08:14:33,187 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.525e+01 9.075e+01 9.953e+01 1.624e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-20 08:14:38,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-20 08:14:56,870 INFO [train.py:1198] (0/2) Epoch 47, batch 4100, loss[loss=0.2475, ctc_loss=0.1262, cr_loss=0.3763, attn_decoder_loss=0.2526, over 29497.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.108, cr_loss=0.3455, attn_decoder_loss=0.2368, over 5792527.85 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:15:46,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=849120.0, ans=0.125 2024-09-20 08:15:49,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849120.0, ans=0.1 2024-09-20 08:15:53,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=849160.0, ans=0.0 2024-09-20 08:16:10,033 INFO [train.py:1198] (0/2) Epoch 47, batch 4150, loss[loss=0.2293, ctc_loss=0.108, cr_loss=0.3422, attn_decoder_loss=0.2352, over 29481.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1079, cr_loss=0.3449, attn_decoder_loss=0.2366, over 5797588.50 frames. ], batch size: 77, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:16:28,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=849240.0, ans=0.5 2024-09-20 08:16:51,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=849280.0, ans=0.2 2024-09-20 08:17:00,946 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.758e+01 9.113e+01 9.755e+01 1.948e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-20 08:17:05,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=849320.0, ans=0.125 2024-09-20 08:17:25,429 INFO [train.py:1198] (0/2) Epoch 47, batch 4200, loss[loss=0.2499, ctc_loss=0.1255, cr_loss=0.3934, attn_decoder_loss=0.255, over 29517.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1085, cr_loss=0.3467, attn_decoder_loss=0.237, over 5799526.02 frames. ], batch size: 90, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:17:31,767 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:17:46,349 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=849440.0, ans=0.125 2024-09-20 08:17:46,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=849440.0, ans=0.04949747468305833 2024-09-20 08:17:50,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849440.0, ans=0.1 2024-09-20 08:17:50,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=849440.0, ans=0.0 2024-09-20 08:17:53,486 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=849480.0, ans=0.015 2024-09-20 08:18:32,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=849560.0, ans=0.0 2024-09-20 08:18:39,530 INFO [train.py:1198] (0/2) Epoch 47, batch 4250, loss[loss=0.2095, ctc_loss=0.09108, cr_loss=0.315, attn_decoder_loss=0.2157, over 29520.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1082, cr_loss=0.3467, attn_decoder_loss=0.237, over 5805410.36 frames. ], batch size: 74, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:18:41,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849600.0, ans=0.1 2024-09-20 08:18:42,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=849600.0, ans=0.05 2024-09-20 08:19:00,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=849640.0, ans=0.0 2024-09-20 08:19:09,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=849680.0, ans=0.0 2024-09-20 08:19:29,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.716e+01 9.310e+01 9.869e+01 2.948e+02, threshold=1.862e+02, percent-clipped=1.0 2024-09-20 08:19:53,044 INFO [train.py:1198] (0/2) Epoch 47, batch 4300, loss[loss=0.2439, ctc_loss=0.1177, cr_loss=0.3753, attn_decoder_loss=0.2496, over 29525.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.108, cr_loss=0.3461, attn_decoder_loss=0.2372, over 5794156.62 frames. ], batch size: 87, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:20:05,321 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:20:05,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=849800.0, ans=0.125 2024-09-20 08:20:07,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.50 vs. limit=22.5 2024-09-20 08:20:12,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=849840.0, ans=0.125 2024-09-20 08:20:20,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=849840.0, ans=0.2 2024-09-20 08:20:38,150 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2024-09-20 08:21:02,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2024-09-20 08:21:02,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=849960.0, ans=0.125 2024-09-20 08:21:05,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2024-09-20 08:21:08,682 INFO [train.py:1198] (0/2) Epoch 47, batch 4350, loss[loss=0.2496, ctc_loss=0.1243, cr_loss=0.3883, attn_decoder_loss=0.2549, over 29497.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1107, cr_loss=0.3526, attn_decoder_loss=0.2404, over 5796099.97 frames. ], batch size: 97, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:21:16,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850000.0, ans=0.1 2024-09-20 08:21:23,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=850040.0, ans=0.125 2024-09-20 08:21:29,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=850040.0, ans=0.2 2024-09-20 08:21:42,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=850080.0, ans=0.125 2024-09-20 08:21:58,355 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 9.076e+01 9.411e+01 9.982e+01 1.475e+02, threshold=1.882e+02, percent-clipped=0.0 2024-09-20 08:22:11,854 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:22:21,577 INFO [train.py:1198] (0/2) Epoch 47, batch 4400, loss[loss=0.2348, ctc_loss=0.106, cr_loss=0.3411, attn_decoder_loss=0.2415, over 27561.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1117, cr_loss=0.3547, attn_decoder_loss=0.2421, over 5767884.64 frames. ], batch size: 125, lr: 2.33e-03, grad_scale: 16.0 2024-09-20 08:22:21,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=850200.0, ans=0.04949747468305833 2024-09-20 08:22:42,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=850240.0, ans=0.125 2024-09-20 08:22:49,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=850280.0, ans=0.2 2024-09-20 08:23:03,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=850280.0, ans=0.125 2024-09-20 08:23:10,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850320.0, ans=0.1 2024-09-20 08:23:20,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=850360.0, ans=0.0 2024-09-20 08:23:36,660 INFO [train.py:1198] (0/2) Epoch 47, batch 4450, loss[loss=0.2507, ctc_loss=0.126, cr_loss=0.3677, attn_decoder_loss=0.2564, over 20389.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1149, cr_loss=0.3599, attn_decoder_loss=0.2441, over 5584155.34 frames. ], batch size: 209, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:24:17,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=850480.0, ans=0.125 2024-09-20 08:24:18,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=850480.0, ans=0.2 2024-09-20 08:24:22,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=850520.0, ans=0.0 2024-09-20 08:24:29,099 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.530e+01 9.504e+01 1.076e+02 1.200e+02 1.579e+02, threshold=2.152e+02, percent-clipped=0.0 2024-09-20 08:24:35,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.76 vs. limit=22.5 2024-09-20 08:24:50,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.48 vs. limit=22.5 2024-09-20 08:24:51,481 INFO [train.py:1198] (0/2) Epoch 47, batch 4500, loss[loss=0.2369, ctc_loss=0.1124, cr_loss=0.3235, attn_decoder_loss=0.2435, over 19704.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1176, cr_loss=0.3623, attn_decoder_loss=0.2458, over 5242171.24 frames. ], batch size: 209, lr: 2.33e-03, grad_scale: 8.0 2024-09-20 08:24:55,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-09-20 08:25:03,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=850600.0, ans=0.125 2024-09-20 08:25:09,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=850640.0, ans=0.125 2024-09-20 08:25:13,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.77 vs. limit=10.0 2024-09-20 08:25:23,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2024-09-20 08:25:28,451 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-47.pt 2024-09-20 08:26:14,114 INFO [train.py:1198] (0/2) Epoch 48, batch 0, loss[loss=0.2138, ctc_loss=0.09467, cr_loss=0.3301, attn_decoder_loss=0.2197, over 29591.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.09467, cr_loss=0.3301, attn_decoder_loss=0.2197, over 29591.00 frames. ], batch size: 73, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:26:14,115 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 08:26:32,431 INFO [train.py:1230] (0/2) Epoch 48, validation: loss=0.2131, ctc_loss=0.03621, cr_loss=7.075e-15, attn_decoder_loss=0.2327, over 944034.00 frames. 2024-09-20 08:26:32,431 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 08:26:40,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=850700.0, ans=0.05 2024-09-20 08:27:13,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-09-20 08:27:39,690 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:27:41,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=850860.0, ans=0.125 2024-09-20 08:27:49,851 INFO [train.py:1198] (0/2) Epoch 48, batch 50, loss[loss=0.201, ctc_loss=0.08715, cr_loss=0.3126, attn_decoder_loss=0.2067, over 29407.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1101, cr_loss=0.3518, attn_decoder_loss=0.2372, over 1267569.61 frames. ], batch size: 70, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:27:55,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.27 vs. limit=12.0 2024-09-20 08:27:56,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=850900.0, ans=0.125 2024-09-20 08:28:02,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=850900.0, ans=0.125 2024-09-20 08:28:04,961 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 9.048e+01 9.836e+01 1.173e+02 2.253e+02, threshold=1.967e+02, percent-clipped=1.0 2024-09-20 08:28:08,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.55 vs. limit=10.0 2024-09-20 08:28:20,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=850980.0, ans=0.0 2024-09-20 08:28:35,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=851020.0, ans=0.125 2024-09-20 08:28:40,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=851020.0, ans=0.0 2024-09-20 08:29:05,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=851060.0, ans=0.125 2024-09-20 08:29:07,767 INFO [train.py:1198] (0/2) Epoch 48, batch 100, loss[loss=0.2142, ctc_loss=0.08895, cr_loss=0.2942, attn_decoder_loss=0.2215, over 29525.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1112, cr_loss=0.3547, attn_decoder_loss=0.24, over 2250747.43 frames. ], batch size: 76, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:29:08,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=851100.0, ans=0.1 2024-09-20 08:29:22,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=851140.0, ans=0.2 2024-09-20 08:29:31,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=851140.0, ans=0.0 2024-09-20 08:29:33,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-20 08:29:44,751 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2024-09-20 08:30:03,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=851220.0, ans=0.0 2024-09-20 08:30:22,101 INFO [train.py:1198] (0/2) Epoch 48, batch 150, loss[loss=0.2032, ctc_loss=0.09185, cr_loss=0.3061, attn_decoder_loss=0.2087, over 29466.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1093, cr_loss=0.3502, attn_decoder_loss=0.238, over 3046285.51 frames. ], batch size: 70, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:30:29,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2024-09-20 08:30:38,632 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.661e+01 9.113e+01 9.779e+01 1.487e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 08:30:46,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=851340.0, ans=0.125 2024-09-20 08:30:46,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=851340.0, ans=0.125 2024-09-20 08:30:54,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-20 08:30:55,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=851380.0, ans=0.0 2024-09-20 08:31:26,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=851460.0, ans=0.0 2024-09-20 08:31:26,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.52 vs. limit=15.0 2024-09-20 08:31:39,270 INFO [train.py:1198] (0/2) Epoch 48, batch 200, loss[loss=0.2461, ctc_loss=0.1194, cr_loss=0.3746, attn_decoder_loss=0.2519, over 27581.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.109, cr_loss=0.3494, attn_decoder_loss=0.2374, over 3658492.92 frames. ], batch size: 125, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:31:50,617 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.69 vs. limit=15.0 2024-09-20 08:32:35,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=851620.0, ans=0.125 2024-09-20 08:32:44,620 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-09-20 08:32:54,544 INFO [train.py:1198] (0/2) Epoch 48, batch 250, loss[loss=0.2464, ctc_loss=0.1106, cr_loss=0.3586, attn_decoder_loss=0.2535, over 29264.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1082, cr_loss=0.348, attn_decoder_loss=0.2373, over 4141286.62 frames. ], batch size: 100, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:33:10,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=851740.0, ans=0.1 2024-09-20 08:33:13,492 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.550e+01 9.278e+01 9.687e+01 3.776e+02, threshold=1.856e+02, percent-clipped=1.0 2024-09-20 08:33:14,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.24 vs. limit=15.0 2024-09-20 08:33:30,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=851780.0, ans=0.025 2024-09-20 08:33:40,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=851820.0, ans=0.0 2024-09-20 08:33:40,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851820.0, ans=0.1 2024-09-20 08:33:40,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=851820.0, ans=0.0 2024-09-20 08:33:45,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=851820.0, ans=0.025 2024-09-20 08:33:54,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=851820.0, ans=0.0 2024-09-20 08:34:06,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.68 vs. limit=22.5 2024-09-20 08:34:09,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=851860.0, ans=0.07 2024-09-20 08:34:12,358 INFO [train.py:1198] (0/2) Epoch 48, batch 300, loss[loss=0.241, ctc_loss=0.1118, cr_loss=0.3694, attn_decoder_loss=0.2471, over 29541.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3471, attn_decoder_loss=0.2369, over 4508742.30 frames. ], batch size: 92, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:34:15,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=851900.0, ans=0.0 2024-09-20 08:34:20,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=851900.0, ans=0.0 2024-09-20 08:34:22,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-09-20 08:34:30,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=851940.0, ans=0.125 2024-09-20 08:35:00,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.28 vs. limit=10.0 2024-09-20 08:35:01,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=852020.0, ans=0.125 2024-09-20 08:35:08,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-20 08:35:15,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=852060.0, ans=0.125 2024-09-20 08:35:15,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852060.0, ans=0.1 2024-09-20 08:35:21,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852060.0, ans=0.1 2024-09-20 08:35:29,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.48 vs. limit=22.5 2024-09-20 08:35:29,937 INFO [train.py:1198] (0/2) Epoch 48, batch 350, loss[loss=0.2061, ctc_loss=0.08786, cr_loss=0.2724, attn_decoder_loss=0.2132, over 29331.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1077, cr_loss=0.3466, attn_decoder_loss=0.2368, over 4795090.57 frames. ], batch size: 71, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:35:37,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=852100.0, ans=0.025 2024-09-20 08:35:39,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=852100.0, ans=0.1 2024-09-20 08:35:45,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=852140.0, ans=0.125 2024-09-20 08:35:46,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.632e+01 9.132e+01 9.604e+01 3.712e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 08:35:52,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=852140.0, ans=0.1 2024-09-20 08:36:36,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=852260.0, ans=0.125 2024-09-20 08:36:45,274 INFO [train.py:1198] (0/2) Epoch 48, batch 400, loss[loss=0.2323, ctc_loss=0.1106, cr_loss=0.3738, attn_decoder_loss=0.2375, over 29711.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1079, cr_loss=0.3466, attn_decoder_loss=0.2367, over 5025215.85 frames. ], batch size: 82, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:36:52,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.78 vs. limit=15.0 2024-09-20 08:36:54,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=852300.0, ans=0.125 2024-09-20 08:37:12,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.74 vs. limit=12.0 2024-09-20 08:37:14,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2024-09-20 08:37:15,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=852340.0, ans=0.125 2024-09-20 08:37:26,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=22.5 2024-09-20 08:37:59,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=852460.0, ans=0.0 2024-09-20 08:38:03,186 INFO [train.py:1198] (0/2) Epoch 48, batch 450, loss[loss=0.2464, ctc_loss=0.1202, cr_loss=0.3695, attn_decoder_loss=0.2523, over 29699.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1083, cr_loss=0.3473, attn_decoder_loss=0.2368, over 5188972.73 frames. ], batch size: 83, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:38:19,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.734e+01 9.234e+01 9.898e+01 1.385e+02, threshold=1.847e+02, percent-clipped=0.0 2024-09-20 08:38:22,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-09-20 08:38:24,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=852540.0, ans=0.0 2024-09-20 08:38:42,639 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=852580.0, ans=0.125 2024-09-20 08:38:55,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=852620.0, ans=0.2 2024-09-20 08:39:07,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=852660.0, ans=0.125 2024-09-20 08:39:10,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=852660.0, ans=0.1 2024-09-20 08:39:13,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=852660.0, ans=0.2 2024-09-20 08:39:21,087 INFO [train.py:1198] (0/2) Epoch 48, batch 500, loss[loss=0.2514, ctc_loss=0.1239, cr_loss=0.3655, attn_decoder_loss=0.2574, over 29429.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1074, cr_loss=0.3456, attn_decoder_loss=0.2361, over 5331199.11 frames. ], batch size: 94, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:39:26,509 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-09-20 08:39:35,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-20 08:40:02,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=852780.0, ans=0.2 2024-09-20 08:40:09,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2024-09-20 08:40:23,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=852860.0, ans=0.125 2024-09-20 08:40:36,940 INFO [train.py:1198] (0/2) Epoch 48, batch 550, loss[loss=0.2343, ctc_loss=0.107, cr_loss=0.3598, attn_decoder_loss=0.2404, over 28775.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1076, cr_loss=0.3466, attn_decoder_loss=0.2363, over 5425018.09 frames. ], batch size: 104, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:40:38,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=852900.0, ans=0.125 2024-09-20 08:40:53,454 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.626e+01 8.943e+01 9.744e+01 1.321e+02, threshold=1.789e+02, percent-clipped=0.0 2024-09-20 08:40:56,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=852940.0, ans=0.09899494936611666 2024-09-20 08:41:30,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=853020.0, ans=0.1 2024-09-20 08:41:36,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=853020.0, ans=0.0 2024-09-20 08:41:44,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=853060.0, ans=0.125 2024-09-20 08:41:44,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853060.0, ans=0.1 2024-09-20 08:41:48,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2024-09-20 08:41:50,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=853060.0, ans=0.0 2024-09-20 08:41:54,600 INFO [train.py:1198] (0/2) Epoch 48, batch 600, loss[loss=0.2459, ctc_loss=0.1153, cr_loss=0.3682, attn_decoder_loss=0.2522, over 29227.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1074, cr_loss=0.3465, attn_decoder_loss=0.2365, over 5510892.85 frames. ], batch size: 100, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:41:57,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=853100.0, ans=0.0 2024-09-20 08:42:02,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=853100.0, ans=0.125 2024-09-20 08:42:33,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.70 vs. limit=15.0 2024-09-20 08:42:34,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-20 08:42:35,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=853180.0, ans=0.025 2024-09-20 08:42:52,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-09-20 08:43:04,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=853260.0, ans=0.2 2024-09-20 08:43:12,008 INFO [train.py:1198] (0/2) Epoch 48, batch 650, loss[loss=0.2295, ctc_loss=0.1056, cr_loss=0.3489, attn_decoder_loss=0.2355, over 29766.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1067, cr_loss=0.3443, attn_decoder_loss=0.2358, over 5587963.11 frames. ], batch size: 81, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:43:19,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-20 08:43:21,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=853300.0, ans=0.05 2024-09-20 08:43:21,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=853300.0, ans=0.125 2024-09-20 08:43:28,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.608e+01 8.952e+01 9.536e+01 4.634e+02, threshold=1.790e+02, percent-clipped=1.0 2024-09-20 08:43:31,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-20 08:43:40,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=853380.0, ans=0.125 2024-09-20 08:43:53,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853380.0, ans=0.1 2024-09-20 08:44:05,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=853420.0, ans=0.2 2024-09-20 08:44:23,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-09-20 08:44:27,289 INFO [train.py:1198] (0/2) Epoch 48, batch 700, loss[loss=0.2323, ctc_loss=0.1101, cr_loss=0.3508, attn_decoder_loss=0.238, over 29537.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1072, cr_loss=0.3459, attn_decoder_loss=0.2364, over 5637334.28 frames. ], batch size: 76, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:44:50,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=853540.0, ans=0.125 2024-09-20 08:45:23,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=853620.0, ans=0.1 2024-09-20 08:45:29,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=853660.0, ans=0.125 2024-09-20 08:45:41,452 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:45:45,558 INFO [train.py:1198] (0/2) Epoch 48, batch 750, loss[loss=0.2388, ctc_loss=0.1235, cr_loss=0.3988, attn_decoder_loss=0.2427, over 29700.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1075, cr_loss=0.3462, attn_decoder_loss=0.2363, over 5677371.04 frames. ], batch size: 82, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:45:54,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=853700.0, ans=0.125 2024-09-20 08:46:01,951 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.792e+01 9.309e+01 9.827e+01 1.298e+02, threshold=1.862e+02, percent-clipped=0.0 2024-09-20 08:46:08,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.19 vs. limit=15.0 2024-09-20 08:46:18,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=853780.0, ans=0.0 2024-09-20 08:46:39,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2024-09-20 08:47:03,065 INFO [train.py:1198] (0/2) Epoch 48, batch 800, loss[loss=0.2089, ctc_loss=0.0955, cr_loss=0.3158, attn_decoder_loss=0.2145, over 29639.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1078, cr_loss=0.3473, attn_decoder_loss=0.2364, over 5708284.17 frames. ], batch size: 73, lr: 2.30e-03, grad_scale: 32.0 2024-09-20 08:47:12,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=853900.0, ans=0.125 2024-09-20 08:47:36,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=853980.0, ans=0.2 2024-09-20 08:47:47,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=854020.0, ans=0.0 2024-09-20 08:47:49,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=854020.0, ans=0.0 2024-09-20 08:47:51,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=854020.0, ans=0.125 2024-09-20 08:47:54,920 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2024-09-20 08:48:11,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=854060.0, ans=0.1 2024-09-20 08:48:18,090 INFO [train.py:1198] (0/2) Epoch 48, batch 850, loss[loss=0.2338, ctc_loss=0.1078, cr_loss=0.3356, attn_decoder_loss=0.2404, over 29704.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.107, cr_loss=0.345, attn_decoder_loss=0.2361, over 5736657.06 frames. ], batch size: 89, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:48:21,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=854100.0, ans=0.0 2024-09-20 08:48:28,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.07 vs. limit=15.0 2024-09-20 08:48:35,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.678e+01 9.128e+01 9.659e+01 1.410e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 08:48:42,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=854140.0, ans=0.0 2024-09-20 08:49:12,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854220.0, ans=0.1 2024-09-20 08:49:12,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-09-20 08:49:36,098 INFO [train.py:1198] (0/2) Epoch 48, batch 900, loss[loss=0.212, ctc_loss=0.08857, cr_loss=0.2962, attn_decoder_loss=0.2191, over 29616.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1071, cr_loss=0.3451, attn_decoder_loss=0.2363, over 5740605.75 frames. ], batch size: 73, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:49:38,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-09-20 08:49:52,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=854340.0, ans=0.0 2024-09-20 08:50:23,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=854420.0, ans=0.0 2024-09-20 08:50:49,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=854460.0, ans=0.125 2024-09-20 08:50:53,779 INFO [train.py:1198] (0/2) Epoch 48, batch 950, loss[loss=0.217, ctc_loss=0.1019, cr_loss=0.3369, attn_decoder_loss=0.2223, over 29506.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1069, cr_loss=0.3443, attn_decoder_loss=0.2362, over 5742532.03 frames. ], batch size: 74, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:51:01,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=854500.0, ans=0.125 2024-09-20 08:51:04,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=854500.0, ans=0.125 2024-09-20 08:51:11,718 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.747e+01 9.386e+01 9.871e+01 2.198e+02, threshold=1.877e+02, percent-clipped=1.0 2024-09-20 08:51:12,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.74 vs. limit=10.0 2024-09-20 08:51:20,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=8.0 2024-09-20 08:51:22,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=854580.0, ans=0.2 2024-09-20 08:51:25,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=854580.0, ans=0.0 2024-09-20 08:51:27,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=854580.0, ans=0.125 2024-09-20 08:51:37,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854620.0, ans=0.125 2024-09-20 08:51:57,894 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=12.0 2024-09-20 08:52:00,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=854660.0, ans=0.0 2024-09-20 08:52:08,494 INFO [train.py:1198] (0/2) Epoch 48, batch 1000, loss[loss=0.2245, ctc_loss=0.1093, cr_loss=0.343, attn_decoder_loss=0.2297, over 29479.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1076, cr_loss=0.346, attn_decoder_loss=0.2368, over 5736651.27 frames. ], batch size: 77, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:52:25,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=854740.0, ans=0.0 2024-09-20 08:52:32,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=854740.0, ans=0.2 2024-09-20 08:52:46,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=854780.0, ans=0.125 2024-09-20 08:53:06,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=854820.0, ans=0.1 2024-09-20 08:53:25,983 INFO [train.py:1198] (0/2) Epoch 48, batch 1050, loss[loss=0.2349, ctc_loss=0.1035, cr_loss=0.3305, attn_decoder_loss=0.2421, over 29681.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1071, cr_loss=0.3447, attn_decoder_loss=0.2361, over 5746171.94 frames. ], batch size: 85, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:53:30,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=854900.0, ans=0.1 2024-09-20 08:53:44,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.686e+01 9.233e+01 9.898e+01 2.337e+02, threshold=1.847e+02, percent-clipped=2.0 2024-09-20 08:53:53,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=854940.0, ans=0.125 2024-09-20 08:53:54,012 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-09-20 08:54:02,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854980.0, ans=0.1 2024-09-20 08:54:05,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=854980.0, ans=0.125 2024-09-20 08:54:11,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855020.0, ans=0.1 2024-09-20 08:54:39,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=855060.0, ans=0.125 2024-09-20 08:54:39,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=855060.0, ans=0.125 2024-09-20 08:54:43,757 INFO [train.py:1198] (0/2) Epoch 48, batch 1100, loss[loss=0.2192, ctc_loss=0.0946, cr_loss=0.3168, attn_decoder_loss=0.226, over 29424.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1066, cr_loss=0.3437, attn_decoder_loss=0.2358, over 5757607.28 frames. ], batch size: 78, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:54:53,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=12.0 2024-09-20 08:55:42,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.29 vs. limit=15.0 2024-09-20 08:55:48,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-20 08:55:59,696 INFO [train.py:1198] (0/2) Epoch 48, batch 1150, loss[loss=0.2184, ctc_loss=0.1026, cr_loss=0.3302, attn_decoder_loss=0.2239, over 29447.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1071, cr_loss=0.3447, attn_decoder_loss=0.236, over 5755202.63 frames. ], batch size: 78, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:55:59,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=855300.0, ans=0.0 2024-09-20 08:56:03,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.94 vs. limit=10.0 2024-09-20 08:56:19,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.603e+01 9.086e+01 9.808e+01 3.950e+02, threshold=1.817e+02, percent-clipped=2.0 2024-09-20 08:56:31,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=855380.0, ans=0.125 2024-09-20 08:56:41,460 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2024-09-20 08:57:17,312 INFO [train.py:1198] (0/2) Epoch 48, batch 1200, loss[loss=0.2476, ctc_loss=0.1186, cr_loss=0.3756, attn_decoder_loss=0.2536, over 29667.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1078, cr_loss=0.3461, attn_decoder_loss=0.2369, over 5747638.05 frames. ], batch size: 85, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:57:19,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=855500.0, ans=0.125 2024-09-20 08:57:34,995 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2024-09-20 08:57:52,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=855580.0, ans=0.09899494936611666 2024-09-20 08:57:54,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-09-20 08:57:58,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.69 vs. limit=12.0 2024-09-20 08:58:23,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=855660.0, ans=0.125 2024-09-20 08:58:30,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=855660.0, ans=0.125 2024-09-20 08:58:34,855 INFO [train.py:1198] (0/2) Epoch 48, batch 1250, loss[loss=0.2581, ctc_loss=0.1329, cr_loss=0.4148, attn_decoder_loss=0.2627, over 29545.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1083, cr_loss=0.347, attn_decoder_loss=0.2373, over 5775405.62 frames. ], batch size: 92, lr: 2.30e-03, grad_scale: 16.0 2024-09-20 08:58:36,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855700.0, ans=0.1 2024-09-20 08:58:54,648 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.788e+01 9.389e+01 9.946e+01 2.084e+02, threshold=1.878e+02, percent-clipped=1.0 2024-09-20 08:58:59,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=855740.0, ans=0.2 2024-09-20 08:59:21,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.81 vs. limit=22.5 2024-09-20 08:59:50,619 INFO [train.py:1198] (0/2) Epoch 48, batch 1300, loss[loss=0.2401, ctc_loss=0.1107, cr_loss=0.3368, attn_decoder_loss=0.247, over 28242.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1078, cr_loss=0.3456, attn_decoder_loss=0.2367, over 5779682.82 frames. ], batch size: 111, lr: 2.30e-03, grad_scale: 8.0 2024-09-20 08:59:51,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=855900.0, ans=0.0 2024-09-20 08:59:54,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.96 vs. limit=15.0 2024-09-20 09:01:01,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=856060.0, ans=0.125 2024-09-20 09:01:09,003 INFO [train.py:1198] (0/2) Epoch 48, batch 1350, loss[loss=0.2243, ctc_loss=0.09546, cr_loss=0.3213, attn_decoder_loss=0.2315, over 29753.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1074, cr_loss=0.3449, attn_decoder_loss=0.2365, over 5797127.22 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:01:20,070 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=22.5 2024-09-20 09:01:24,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=856140.0, ans=0.0 2024-09-20 09:01:29,672 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.601e+01 8.992e+01 9.491e+01 1.134e+02, threshold=1.798e+02, percent-clipped=0.0 2024-09-20 09:01:55,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=856220.0, ans=0.0 2024-09-20 09:01:59,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=856220.0, ans=0.1 2024-09-20 09:02:23,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2024-09-20 09:02:25,938 INFO [train.py:1198] (0/2) Epoch 48, batch 1400, loss[loss=0.2054, ctc_loss=0.09145, cr_loss=0.3032, attn_decoder_loss=0.2113, over 29560.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1076, cr_loss=0.3454, attn_decoder_loss=0.2364, over 5808074.38 frames. ], batch size: 69, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:02:26,344 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=856300.0, ans=0.125 2024-09-20 09:02:30,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=856300.0, ans=0.125 2024-09-20 09:02:33,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=856300.0, ans=0.95 2024-09-20 09:02:51,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=856340.0, ans=0.125 2024-09-20 09:03:08,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.37 vs. limit=22.5 2024-09-20 09:03:11,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=856420.0, ans=0.04949747468305833 2024-09-20 09:03:12,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=856420.0, ans=0.0 2024-09-20 09:03:38,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=856460.0, ans=0.0 2024-09-20 09:03:41,084 INFO [train.py:1198] (0/2) Epoch 48, batch 1450, loss[loss=0.2413, ctc_loss=0.1115, cr_loss=0.3268, attn_decoder_loss=0.2484, over 29440.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.108, cr_loss=0.3463, attn_decoder_loss=0.2369, over 5804200.22 frames. ], batch size: 94, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:03:41,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=856500.0, ans=0.0 2024-09-20 09:03:47,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=856500.0, ans=0.2 2024-09-20 09:03:48,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=856500.0, ans=0.07 2024-09-20 09:04:01,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=856540.0, ans=0.0 2024-09-20 09:04:02,263 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.708e+01 9.120e+01 9.678e+01 1.766e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 09:04:06,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856540.0, ans=0.1 2024-09-20 09:04:08,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=856540.0, ans=0.2 2024-09-20 09:04:11,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=856580.0, ans=0.0 2024-09-20 09:04:24,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-20 09:04:29,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=856620.0, ans=0.125 2024-09-20 09:04:35,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856620.0, ans=0.1 2024-09-20 09:04:48,205 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=12.0 2024-09-20 09:04:58,580 INFO [train.py:1198] (0/2) Epoch 48, batch 1500, loss[loss=0.2349, ctc_loss=0.1041, cr_loss=0.3355, attn_decoder_loss=0.242, over 29635.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1082, cr_loss=0.3472, attn_decoder_loss=0.2374, over 5806445.63 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:05:18,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=856740.0, ans=0.125 2024-09-20 09:05:18,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=856740.0, ans=15.0 2024-09-20 09:05:24,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=856740.0, ans=0.1 2024-09-20 09:05:28,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2024-09-20 09:05:29,472 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=22.5 2024-09-20 09:05:40,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-09-20 09:05:41,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=856780.0, ans=0.0 2024-09-20 09:05:45,068 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:05:49,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=856820.0, ans=0.0 2024-09-20 09:06:17,038 INFO [train.py:1198] (0/2) Epoch 48, batch 1550, loss[loss=0.2504, ctc_loss=0.1299, cr_loss=0.4068, attn_decoder_loss=0.2547, over 29490.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1087, cr_loss=0.3481, attn_decoder_loss=0.2376, over 5781985.88 frames. ], batch size: 90, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:06:18,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=856900.0, ans=0.0 2024-09-20 09:06:26,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=856900.0, ans=0.125 2024-09-20 09:06:38,059 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.748e+01 9.189e+01 9.595e+01 2.151e+02, threshold=1.838e+02, percent-clipped=1.0 2024-09-20 09:06:43,299 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=22.5 2024-09-20 09:06:49,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.95 vs. limit=12.0 2024-09-20 09:06:57,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=856980.0, ans=0.95 2024-09-20 09:07:08,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=857020.0, ans=0.125 2024-09-20 09:07:17,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.02 vs. limit=15.0 2024-09-20 09:07:22,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.79 vs. limit=15.0 2024-09-20 09:07:31,851 INFO [train.py:1198] (0/2) Epoch 48, batch 1600, loss[loss=0.2373, ctc_loss=0.1087, cr_loss=0.3493, attn_decoder_loss=0.2439, over 29669.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1085, cr_loss=0.347, attn_decoder_loss=0.2374, over 5765229.82 frames. ], batch size: 85, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:07:36,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=857100.0, ans=0.025 2024-09-20 09:08:49,427 INFO [train.py:1198] (0/2) Epoch 48, batch 1650, loss[loss=0.2423, ctc_loss=0.1021, cr_loss=0.3436, attn_decoder_loss=0.2502, over 29722.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1079, cr_loss=0.3458, attn_decoder_loss=0.2372, over 5758085.80 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:09:10,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.669e+01 9.204e+01 9.828e+01 1.752e+02, threshold=1.841e+02, percent-clipped=0.0 2024-09-20 09:09:22,887 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:09:35,256 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:09:39,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=857420.0, ans=0.125 2024-09-20 09:09:47,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857420.0, ans=0.1 2024-09-20 09:09:52,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857460.0, ans=0.1 2024-09-20 09:10:07,189 INFO [train.py:1198] (0/2) Epoch 48, batch 1700, loss[loss=0.2024, ctc_loss=0.08735, cr_loss=0.2999, attn_decoder_loss=0.2085, over 29593.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1076, cr_loss=0.3451, attn_decoder_loss=0.2369, over 5778543.55 frames. ], batch size: 69, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:10:12,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=857500.0, ans=0.0 2024-09-20 09:10:21,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=857540.0, ans=0.125 2024-09-20 09:10:48,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857580.0, ans=0.1 2024-09-20 09:11:16,479 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.71 vs. limit=15.0 2024-09-20 09:11:23,281 INFO [train.py:1198] (0/2) Epoch 48, batch 1750, loss[loss=0.218, ctc_loss=0.1032, cr_loss=0.3353, attn_decoder_loss=0.2233, over 29342.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.345, attn_decoder_loss=0.2366, over 5787129.57 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:11:40,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=857740.0, ans=0.1 2024-09-20 09:11:44,443 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.682e+01 9.026e+01 9.554e+01 1.464e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-20 09:11:52,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=857780.0, ans=0.125 2024-09-20 09:12:17,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=857820.0, ans=0.125 2024-09-20 09:12:28,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=857860.0, ans=0.2 2024-09-20 09:12:40,544 INFO [train.py:1198] (0/2) Epoch 48, batch 1800, loss[loss=0.2389, ctc_loss=0.1113, cr_loss=0.3637, attn_decoder_loss=0.245, over 29682.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1077, cr_loss=0.3457, attn_decoder_loss=0.2368, over 5790832.54 frames. ], batch size: 83, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:13:37,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=858020.0, ans=0.125 2024-09-20 09:13:57,387 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-09-20 09:13:58,142 INFO [train.py:1198] (0/2) Epoch 48, batch 1850, loss[loss=0.2382, ctc_loss=0.1141, cr_loss=0.3617, attn_decoder_loss=0.2439, over 29620.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1077, cr_loss=0.3459, attn_decoder_loss=0.2367, over 5795916.10 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:14:00,001 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:14:02,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=858100.0, ans=0.125 2024-09-20 09:14:12,940 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2024-09-20 09:14:19,244 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.577e+01 9.244e+01 9.733e+01 2.629e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-20 09:14:31,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=858180.0, ans=0.0 2024-09-20 09:15:00,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=858260.0, ans=0.125 2024-09-20 09:15:13,440 INFO [train.py:1198] (0/2) Epoch 48, batch 1900, loss[loss=0.242, ctc_loss=0.1191, cr_loss=0.3746, attn_decoder_loss=0.2473, over 29700.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1079, cr_loss=0.3466, attn_decoder_loss=0.2371, over 5803905.48 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:15:39,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-09-20 09:16:13,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=858460.0, ans=0.125 2024-09-20 09:16:16,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=858460.0, ans=0.125 2024-09-20 09:16:30,242 INFO [train.py:1198] (0/2) Epoch 48, batch 1950, loss[loss=0.2259, ctc_loss=0.1044, cr_loss=0.336, attn_decoder_loss=0.2319, over 29446.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1084, cr_loss=0.3476, attn_decoder_loss=0.238, over 5818537.52 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:16:31,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=22.5 2024-09-20 09:16:50,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=858540.0, ans=0.1 2024-09-20 09:16:53,370 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.839e+01 9.358e+01 9.818e+01 1.771e+02, threshold=1.872e+02, percent-clipped=0.0 2024-09-20 09:17:01,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858580.0, ans=0.1 2024-09-20 09:17:15,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.27 vs. limit=15.0 2024-09-20 09:17:26,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=858620.0, ans=15.0 2024-09-20 09:17:31,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=858620.0, ans=0.04949747468305833 2024-09-20 09:17:36,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=858660.0, ans=0.125 2024-09-20 09:17:49,886 INFO [train.py:1198] (0/2) Epoch 48, batch 2000, loss[loss=0.2035, ctc_loss=0.0852, cr_loss=0.2953, attn_decoder_loss=0.2101, over 29297.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1086, cr_loss=0.3479, attn_decoder_loss=0.2384, over 5796742.12 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 32.0 2024-09-20 09:17:57,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=858700.0, ans=0.125 2024-09-20 09:18:19,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-09-20 09:18:59,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=858860.0, ans=0.0 2024-09-20 09:19:05,485 INFO [train.py:1198] (0/2) Epoch 48, batch 2050, loss[loss=0.2039, ctc_loss=0.08967, cr_loss=0.3109, attn_decoder_loss=0.2097, over 29421.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1075, cr_loss=0.3454, attn_decoder_loss=0.2369, over 5787963.79 frames. ], batch size: 70, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:19:07,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=858900.0, ans=0.125 2024-09-20 09:19:21,633 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2024-09-20 09:19:22,310 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:19:28,038 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.559e+01 9.116e+01 9.582e+01 1.621e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 09:19:32,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=858940.0, ans=0.025 2024-09-20 09:19:55,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-09-20 09:20:02,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=859020.0, ans=0.0 2024-09-20 09:20:20,537 INFO [train.py:1198] (0/2) Epoch 48, batch 2100, loss[loss=0.2291, ctc_loss=0.107, cr_loss=0.3557, attn_decoder_loss=0.2347, over 29767.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.3447, attn_decoder_loss=0.2364, over 5800889.71 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:20:37,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=859140.0, ans=0.0 2024-09-20 09:20:48,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=16.91 vs. limit=15.0 2024-09-20 09:20:52,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=859180.0, ans=0.125 2024-09-20 09:21:22,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=859220.0, ans=0.125 2024-09-20 09:21:28,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=859260.0, ans=0.125 2024-09-20 09:21:32,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=859260.0, ans=0.2 2024-09-20 09:21:40,099 INFO [train.py:1198] (0/2) Epoch 48, batch 2150, loss[loss=0.2304, ctc_loss=0.1006, cr_loss=0.3418, attn_decoder_loss=0.2372, over 29429.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1067, cr_loss=0.344, attn_decoder_loss=0.2359, over 5816616.80 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:21:40,899 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2024-09-20 09:21:46,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=859300.0, ans=0.125 2024-09-20 09:22:02,730 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.576e+01 8.993e+01 9.601e+01 1.335e+02, threshold=1.799e+02, percent-clipped=0.0 2024-09-20 09:22:03,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=859340.0, ans=0.1 2024-09-20 09:22:05,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-20 09:22:25,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859420.0, ans=0.1 2024-09-20 09:22:25,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859420.0, ans=0.1 2024-09-20 09:22:39,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.30 vs. limit=10.0 2024-09-20 09:22:52,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=859460.0, ans=0.025 2024-09-20 09:22:55,666 INFO [train.py:1198] (0/2) Epoch 48, batch 2200, loss[loss=0.2306, ctc_loss=0.1059, cr_loss=0.3408, attn_decoder_loss=0.2369, over 29640.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.107, cr_loss=0.3443, attn_decoder_loss=0.236, over 5813781.14 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:23:06,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=859500.0, ans=0.125 2024-09-20 09:23:15,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=859540.0, ans=0.0 2024-09-20 09:23:31,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=859580.0, ans=0.125 2024-09-20 09:23:35,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=859580.0, ans=0.125 2024-09-20 09:23:38,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=859580.0, ans=0.125 2024-09-20 09:23:55,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-09-20 09:23:56,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=859660.0, ans=0.125 2024-09-20 09:24:10,685 INFO [train.py:1198] (0/2) Epoch 48, batch 2250, loss[loss=0.2261, ctc_loss=0.103, cr_loss=0.3285, attn_decoder_loss=0.2325, over 29709.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1072, cr_loss=0.3442, attn_decoder_loss=0.2361, over 5814520.24 frames. ], batch size: 82, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:24:11,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=859700.0, ans=0.0 2024-09-20 09:24:27,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=859740.0, ans=0.125 2024-09-20 09:24:34,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.41 vs. limit=15.0 2024-09-20 09:24:35,429 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.683e+01 9.115e+01 9.671e+01 7.163e+02, threshold=1.823e+02, percent-clipped=1.0 2024-09-20 09:24:44,823 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=859780.0, ans=0.09899494936611666 2024-09-20 09:25:30,683 INFO [train.py:1198] (0/2) Epoch 48, batch 2300, loss[loss=0.2158, ctc_loss=0.1003, cr_loss=0.3348, attn_decoder_loss=0.2212, over 29734.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1066, cr_loss=0.343, attn_decoder_loss=0.2352, over 5800515.93 frames. ], batch size: 72, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:25:47,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=859940.0, ans=0.2 2024-09-20 09:25:53,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=859940.0, ans=0.125 2024-09-20 09:25:57,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=859940.0, ans=0.125 2024-09-20 09:26:17,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.48 vs. limit=10.0 2024-09-20 09:26:18,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-09-20 09:26:46,313 INFO [train.py:1198] (0/2) Epoch 48, batch 2350, loss[loss=0.2357, ctc_loss=0.1146, cr_loss=0.3735, attn_decoder_loss=0.2409, over 29689.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1066, cr_loss=0.343, attn_decoder_loss=0.2353, over 5804675.82 frames. ], batch size: 83, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:26:48,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=860100.0, ans=0.2 2024-09-20 09:27:08,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=860140.0, ans=0.5 2024-09-20 09:27:10,173 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.540e+01 9.100e+01 9.543e+01 1.555e+02, threshold=1.820e+02, percent-clipped=0.0 2024-09-20 09:27:24,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=860180.0, ans=0.2 2024-09-20 09:27:49,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=860260.0, ans=0.125 2024-09-20 09:28:01,993 INFO [train.py:1198] (0/2) Epoch 48, batch 2400, loss[loss=0.2202, ctc_loss=0.1067, cr_loss=0.3427, attn_decoder_loss=0.2251, over 29545.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1071, cr_loss=0.3446, attn_decoder_loss=0.236, over 5809073.40 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:28:02,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=860300.0, ans=0.0 2024-09-20 09:28:07,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-20 09:28:09,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=860300.0, ans=0.125 2024-09-20 09:28:49,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=860420.0, ans=0.5 2024-09-20 09:29:21,833 INFO [train.py:1198] (0/2) Epoch 48, batch 2450, loss[loss=0.2475, ctc_loss=0.121, cr_loss=0.388, attn_decoder_loss=0.2529, over 29728.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1079, cr_loss=0.3463, attn_decoder_loss=0.237, over 5785118.11 frames. ], batch size: 82, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:29:22,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=860500.0, ans=0.1 2024-09-20 09:29:25,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=860500.0, ans=0.125 2024-09-20 09:29:27,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-09-20 09:29:32,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2024-09-20 09:29:44,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=860540.0, ans=0.2 2024-09-20 09:29:45,558 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.875e+01 9.472e+01 1.005e+02 1.888e+02, threshold=1.894e+02, percent-clipped=1.0 2024-09-20 09:29:45,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=860540.0, ans=0.0 2024-09-20 09:29:55,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=860580.0, ans=0.0 2024-09-20 09:30:36,888 INFO [train.py:1198] (0/2) Epoch 48, batch 2500, loss[loss=0.2361, ctc_loss=0.104, cr_loss=0.3284, attn_decoder_loss=0.2435, over 29647.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.108, cr_loss=0.3467, attn_decoder_loss=0.2372, over 5795019.78 frames. ], batch size: 86, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:30:37,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=860700.0, ans=0.025 2024-09-20 09:30:46,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=860700.0, ans=0.125 2024-09-20 09:30:55,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=860740.0, ans=0.09899494936611666 2024-09-20 09:31:03,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2024-09-20 09:31:09,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=860780.0, ans=0.1 2024-09-20 09:31:26,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=860820.0, ans=0.2 2024-09-20 09:31:30,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=860820.0, ans=0.125 2024-09-20 09:31:30,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860820.0, ans=0.1 2024-09-20 09:31:53,216 INFO [train.py:1198] (0/2) Epoch 48, batch 2550, loss[loss=0.2103, ctc_loss=0.09497, cr_loss=0.3296, attn_decoder_loss=0.2158, over 29376.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.108, cr_loss=0.3465, attn_decoder_loss=0.237, over 5798019.69 frames. ], batch size: 67, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:31:58,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=860900.0, ans=0.0 2024-09-20 09:32:00,982 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:32:02,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=860900.0, ans=0.0 2024-09-20 09:32:09,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=860940.0, ans=0.125 2024-09-20 09:32:11,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=12.0 2024-09-20 09:32:18,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.740e+01 9.125e+01 9.570e+01 1.327e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-20 09:32:25,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=860980.0, ans=0.125 2024-09-20 09:32:39,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2024-09-20 09:32:59,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=861060.0, ans=0.125 2024-09-20 09:33:12,682 INFO [train.py:1198] (0/2) Epoch 48, batch 2600, loss[loss=0.2292, ctc_loss=0.1072, cr_loss=0.3502, attn_decoder_loss=0.2349, over 29428.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1079, cr_loss=0.3467, attn_decoder_loss=0.2372, over 5794512.94 frames. ], batch size: 78, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:33:45,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=861180.0, ans=0.125 2024-09-20 09:33:48,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=861180.0, ans=0.125 2024-09-20 09:33:58,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=861220.0, ans=0.125 2024-09-20 09:34:27,409 INFO [train.py:1198] (0/2) Epoch 48, batch 2650, loss[loss=0.2448, ctc_loss=0.1168, cr_loss=0.3723, attn_decoder_loss=0.2507, over 29287.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1081, cr_loss=0.347, attn_decoder_loss=0.2376, over 5800744.33 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:34:35,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=861300.0, ans=0.125 2024-09-20 09:34:53,030 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.627e+01 9.156e+01 9.635e+01 1.174e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-20 09:35:14,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.07 vs. limit=15.0 2024-09-20 09:35:22,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-20 09:35:30,849 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:35:36,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-20 09:35:42,636 INFO [train.py:1198] (0/2) Epoch 48, batch 2700, loss[loss=0.2425, ctc_loss=0.1085, cr_loss=0.3392, attn_decoder_loss=0.2498, over 29529.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1086, cr_loss=0.3479, attn_decoder_loss=0.238, over 5795654.92 frames. ], batch size: 87, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:35:45,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=861500.0, ans=0.125 2024-09-20 09:35:55,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861500.0, ans=0.1 2024-09-20 09:36:07,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=861540.0, ans=0.5 2024-09-20 09:36:45,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=861620.0, ans=0.1 2024-09-20 09:36:57,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861660.0, ans=0.1 2024-09-20 09:37:03,341 INFO [train.py:1198] (0/2) Epoch 48, batch 2750, loss[loss=0.2165, ctc_loss=0.09687, cr_loss=0.3074, attn_decoder_loss=0.223, over 29512.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1079, cr_loss=0.3459, attn_decoder_loss=0.2369, over 5793238.57 frames. ], batch size: 75, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:37:03,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=861700.0, ans=0.0 2024-09-20 09:37:08,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=861700.0, ans=0.95 2024-09-20 09:37:20,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861740.0, ans=0.1 2024-09-20 09:37:23,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=861740.0, ans=0.125 2024-09-20 09:37:23,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.14 vs. limit=22.5 2024-09-20 09:37:28,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.868e+01 9.360e+01 1.005e+02 2.892e+02, threshold=1.872e+02, percent-clipped=3.0 2024-09-20 09:37:35,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861780.0, ans=0.1 2024-09-20 09:37:36,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=861780.0, ans=0.125 2024-09-20 09:37:57,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=861820.0, ans=0.2 2024-09-20 09:37:58,300 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-20 09:37:59,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=861820.0, ans=0.0 2024-09-20 09:38:08,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=861860.0, ans=0.125 2024-09-20 09:38:13,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=861860.0, ans=0.125 2024-09-20 09:38:18,880 INFO [train.py:1198] (0/2) Epoch 48, batch 2800, loss[loss=0.2498, ctc_loss=0.1281, cr_loss=0.3772, attn_decoder_loss=0.255, over 20563.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1084, cr_loss=0.3464, attn_decoder_loss=0.2368, over 5774896.62 frames. ], batch size: 211, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:38:22,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861900.0, ans=0.1 2024-09-20 09:38:23,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=861900.0, ans=0.2 2024-09-20 09:38:41,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=861940.0, ans=0.09899494936611666 2024-09-20 09:38:41,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=861940.0, ans=0.125 2024-09-20 09:38:47,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861980.0, ans=0.1 2024-09-20 09:38:48,267 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2024-09-20 09:39:00,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-20 09:39:05,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=862020.0, ans=0.125 2024-09-20 09:39:23,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=862060.0, ans=0.125 2024-09-20 09:39:34,247 INFO [train.py:1198] (0/2) Epoch 48, batch 2850, loss[loss=0.2288, ctc_loss=0.1042, cr_loss=0.3487, attn_decoder_loss=0.2349, over 29491.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1083, cr_loss=0.3464, attn_decoder_loss=0.2371, over 5761102.55 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:39:52,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=862140.0, ans=0.04949747468305833 2024-09-20 09:39:59,997 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.779e+01 9.246e+01 9.697e+01 4.650e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-20 09:40:22,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=862220.0, ans=0.125 2024-09-20 09:40:33,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=862220.0, ans=0.125 2024-09-20 09:40:53,971 INFO [train.py:1198] (0/2) Epoch 48, batch 2900, loss[loss=0.222, ctc_loss=0.1002, cr_loss=0.3302, attn_decoder_loss=0.2282, over 29425.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1089, cr_loss=0.3483, attn_decoder_loss=0.2382, over 5787123.47 frames. ], batch size: 79, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:41:15,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=862340.0, ans=0.0 2024-09-20 09:41:21,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=862340.0, ans=0.2 2024-09-20 09:41:38,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=862420.0, ans=0.025 2024-09-20 09:41:39,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=862420.0, ans=0.2 2024-09-20 09:41:42,124 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-09-20 09:42:10,005 INFO [train.py:1198] (0/2) Epoch 48, batch 2950, loss[loss=0.2223, ctc_loss=0.1095, cr_loss=0.3632, attn_decoder_loss=0.2268, over 29525.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.108, cr_loss=0.3464, attn_decoder_loss=0.237, over 5782312.38 frames. ], batch size: 75, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:42:37,508 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.743e+01 9.257e+01 9.610e+01 1.643e+02, threshold=1.851e+02, percent-clipped=0.0 2024-09-20 09:42:52,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=862580.0, ans=0.2 2024-09-20 09:42:57,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=862620.0, ans=0.0 2024-09-20 09:43:00,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2024-09-20 09:43:23,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862660.0, ans=0.1 2024-09-20 09:43:25,736 INFO [train.py:1198] (0/2) Epoch 48, batch 3000, loss[loss=0.2289, ctc_loss=0.1098, cr_loss=0.3496, attn_decoder_loss=0.2343, over 29755.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1081, cr_loss=0.3465, attn_decoder_loss=0.2369, over 5783999.53 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:43:25,737 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 09:43:31,636 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2051, 3.8085, 3.9051, 4.2579, 4.2995, 4.3676, 3.8244, 4.3401], device='cuda:0') 2024-09-20 09:43:44,037 INFO [train.py:1230] (0/2) Epoch 48, validation: loss=0.2127, ctc_loss=0.03675, cr_loss=6.55e-15, attn_decoder_loss=0.2323, over 944034.00 frames. 2024-09-20 09:43:44,038 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 09:43:47,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=862700.0, ans=0.125 2024-09-20 09:43:57,792 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2024-09-20 09:44:19,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862780.0, ans=0.1 2024-09-20 09:44:25,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=862780.0, ans=0.125 2024-09-20 09:44:44,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=862820.0, ans=0.125 2024-09-20 09:45:04,018 INFO [train.py:1198] (0/2) Epoch 48, batch 3050, loss[loss=0.2194, ctc_loss=0.1035, cr_loss=0.3446, attn_decoder_loss=0.2247, over 29506.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1086, cr_loss=0.3478, attn_decoder_loss=0.2376, over 5777530.52 frames. ], batch size: 76, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:45:09,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=862900.0, ans=0.125 2024-09-20 09:45:10,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=862900.0, ans=0.125 2024-09-20 09:45:23,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=862940.0, ans=0.125 2024-09-20 09:45:27,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-09-20 09:45:31,015 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.880e+01 9.329e+01 1.001e+02 1.444e+02, threshold=1.866e+02, percent-clipped=0.0 2024-09-20 09:45:37,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=862980.0, ans=0.125 2024-09-20 09:45:44,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=862980.0, ans=0.0 2024-09-20 09:45:46,350 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=862980.0, ans=0.0 2024-09-20 09:45:59,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2024-09-20 09:46:19,412 INFO [train.py:1198] (0/2) Epoch 48, batch 3100, loss[loss=0.2401, ctc_loss=0.1158, cr_loss=0.355, attn_decoder_loss=0.246, over 29187.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1083, cr_loss=0.3468, attn_decoder_loss=0.2373, over 5777309.88 frames. ], batch size: 100, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:46:22,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=863100.0, ans=0.125 2024-09-20 09:46:24,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=863100.0, ans=0.95 2024-09-20 09:46:31,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=863100.0, ans=0.0 2024-09-20 09:46:33,252 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=863140.0, ans=0.09899494936611666 2024-09-20 09:46:37,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=863140.0, ans=0.09899494936611666 2024-09-20 09:46:51,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=863180.0, ans=0.0 2024-09-20 09:47:06,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=863220.0, ans=0.125 2024-09-20 09:47:11,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2024-09-20 09:47:18,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=863260.0, ans=10.0 2024-09-20 09:47:35,296 INFO [train.py:1198] (0/2) Epoch 48, batch 3150, loss[loss=0.2484, ctc_loss=0.121, cr_loss=0.377, attn_decoder_loss=0.2541, over 28789.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1079, cr_loss=0.3463, attn_decoder_loss=0.2372, over 5784338.57 frames. ], batch size: 104, lr: 2.29e-03, grad_scale: 8.0 2024-09-20 09:47:40,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=863300.0, ans=0.125 2024-09-20 09:48:06,679 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.651e+01 9.014e+01 9.549e+01 1.887e+02, threshold=1.803e+02, percent-clipped=1.0 2024-09-20 09:48:17,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=863380.0, ans=0.125 2024-09-20 09:48:25,041 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:48:32,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=863420.0, ans=0.125 2024-09-20 09:48:34,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=863420.0, ans=0.0 2024-09-20 09:48:38,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=863460.0, ans=0.0 2024-09-20 09:48:54,877 INFO [train.py:1198] (0/2) Epoch 48, batch 3200, loss[loss=0.2268, ctc_loss=0.1072, cr_loss=0.343, attn_decoder_loss=0.2325, over 29398.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1074, cr_loss=0.3454, attn_decoder_loss=0.2367, over 5795077.27 frames. ], batch size: 79, lr: 2.29e-03, grad_scale: 16.0 2024-09-20 09:49:01,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=863500.0, ans=0.125 2024-09-20 09:49:08,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=22.5 2024-09-20 09:49:20,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2024-09-20 09:49:30,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=863580.0, ans=0.125 2024-09-20 09:49:45,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=863620.0, ans=0.125 2024-09-20 09:49:51,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-09-20 09:49:58,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=863660.0, ans=0.125 2024-09-20 09:50:10,573 INFO [train.py:1198] (0/2) Epoch 48, batch 3250, loss[loss=0.2391, ctc_loss=0.1126, cr_loss=0.3643, attn_decoder_loss=0.245, over 29703.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1076, cr_loss=0.3465, attn_decoder_loss=0.2371, over 5801742.08 frames. ], batch size: 84, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:50:10,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=863700.0, ans=0.125 2024-09-20 09:50:23,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863700.0, ans=0.1 2024-09-20 09:50:36,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=863740.0, ans=0.2 2024-09-20 09:50:37,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.777e+01 9.225e+01 9.680e+01 2.463e+02, threshold=1.845e+02, percent-clipped=1.0 2024-09-20 09:50:38,777 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-09-20 09:50:47,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=863780.0, ans=0.0 2024-09-20 09:50:56,639 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-20 09:51:09,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=863860.0, ans=0.07 2024-09-20 09:51:13,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.20 vs. limit=10.0 2024-09-20 09:51:26,310 INFO [train.py:1198] (0/2) Epoch 48, batch 3300, loss[loss=0.2432, ctc_loss=0.1083, cr_loss=0.3266, attn_decoder_loss=0.2509, over 28287.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.107, cr_loss=0.3452, attn_decoder_loss=0.2361, over 5798668.32 frames. ], batch size: 111, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:52:05,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=863980.0, ans=0.0 2024-09-20 09:52:07,144 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-216000.pt 2024-09-20 09:52:26,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=864020.0, ans=0.125 2024-09-20 09:52:30,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=864020.0, ans=0.2 2024-09-20 09:52:38,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=864060.0, ans=0.125 2024-09-20 09:52:48,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=864060.0, ans=0.0 2024-09-20 09:52:52,953 INFO [train.py:1198] (0/2) Epoch 48, batch 3350, loss[loss=0.2495, ctc_loss=0.1267, cr_loss=0.3735, attn_decoder_loss=0.2548, over 28866.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1077, cr_loss=0.3469, attn_decoder_loss=0.2369, over 5775578.11 frames. ], batch size: 104, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:53:02,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=864100.0, ans=0.025 2024-09-20 09:53:11,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=864140.0, ans=0.125 2024-09-20 09:53:20,154 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.868e+01 9.379e+01 9.923e+01 1.602e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-20 09:53:20,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=864140.0, ans=0.09899494936611666 2024-09-20 09:53:23,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864180.0, ans=0.1 2024-09-20 09:53:29,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=864180.0, ans=0.125 2024-09-20 09:53:29,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=864180.0, ans=0.125 2024-09-20 09:53:29,648 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:53:38,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=864220.0, ans=0.125 2024-09-20 09:53:44,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=864220.0, ans=0.125 2024-09-20 09:53:58,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=864260.0, ans=0.0 2024-09-20 09:54:04,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=864260.0, ans=0.0 2024-09-20 09:54:05,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=864260.0, ans=0.025 2024-09-20 09:54:08,444 INFO [train.py:1198] (0/2) Epoch 48, batch 3400, loss[loss=0.2076, ctc_loss=0.09971, cr_loss=0.3432, attn_decoder_loss=0.2119, over 29350.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.108, cr_loss=0.3475, attn_decoder_loss=0.2369, over 5767638.10 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:54:43,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=864380.0, ans=0.0 2024-09-20 09:54:46,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=864380.0, ans=0.0 2024-09-20 09:54:49,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=864380.0, ans=0.0 2024-09-20 09:54:52,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=864420.0, ans=0.0 2024-09-20 09:54:55,652 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:54:58,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-20 09:55:19,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=864460.0, ans=0.125 2024-09-20 09:55:23,994 INFO [train.py:1198] (0/2) Epoch 48, batch 3450, loss[loss=0.2375, ctc_loss=0.1086, cr_loss=0.3485, attn_decoder_loss=0.244, over 28289.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.108, cr_loss=0.3477, attn_decoder_loss=0.2372, over 5775760.72 frames. ], batch size: 111, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:55:30,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=864500.0, ans=0.125 2024-09-20 09:55:55,191 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.532e+01 9.118e+01 9.502e+01 1.543e+02, threshold=1.824e+02, percent-clipped=0.0 2024-09-20 09:56:32,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=864660.0, ans=0.125 2024-09-20 09:56:35,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=864660.0, ans=0.125 2024-09-20 09:56:43,109 INFO [train.py:1198] (0/2) Epoch 48, batch 3500, loss[loss=0.2164, ctc_loss=0.0936, cr_loss=0.3251, attn_decoder_loss=0.2228, over 29327.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1078, cr_loss=0.3472, attn_decoder_loss=0.2368, over 5777417.73 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:57:16,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-09-20 09:57:52,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=864860.0, ans=0.125 2024-09-20 09:57:58,097 INFO [train.py:1198] (0/2) Epoch 48, batch 3550, loss[loss=0.24, ctc_loss=0.1053, cr_loss=0.3438, attn_decoder_loss=0.2473, over 29723.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1075, cr_loss=0.3463, attn_decoder_loss=0.2366, over 5782701.66 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 09:58:24,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.565e+01 9.018e+01 9.505e+01 1.694e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 09:58:41,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=865020.0, ans=0.0 2024-09-20 09:58:54,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=865020.0, ans=0.125 2024-09-20 09:58:59,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=865060.0, ans=0.125 2024-09-20 09:59:01,439 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-09-20 09:59:11,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=865100.0, ans=0.125 2024-09-20 09:59:12,459 INFO [train.py:1198] (0/2) Epoch 48, batch 3600, loss[loss=0.2295, ctc_loss=0.1118, cr_loss=0.3546, attn_decoder_loss=0.2347, over 29504.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1074, cr_loss=0.3462, attn_decoder_loss=0.2366, over 5790946.56 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 32.0 2024-09-20 09:59:25,269 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-20 09:59:56,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-20 10:00:10,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865260.0, ans=0.125 2024-09-20 10:00:26,296 INFO [train.py:1198] (0/2) Epoch 48, batch 3650, loss[loss=0.2479, ctc_loss=0.1182, cr_loss=0.3676, attn_decoder_loss=0.2542, over 29492.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1068, cr_loss=0.3451, attn_decoder_loss=0.236, over 5792926.02 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:00:54,244 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.368e+01 8.575e+01 9.071e+01 9.730e+01 1.168e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 10:01:22,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=865420.0, ans=0.125 2024-09-20 10:01:25,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=8.0 2024-09-20 10:01:44,201 INFO [train.py:1198] (0/2) Epoch 48, batch 3700, loss[loss=0.2411, ctc_loss=0.1073, cr_loss=0.3438, attn_decoder_loss=0.2483, over 29698.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.346, attn_decoder_loss=0.2364, over 5803291.84 frames. ], batch size: 84, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:01:57,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=865540.0, ans=0.2 2024-09-20 10:02:20,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=865580.0, ans=0.0 2024-09-20 10:02:21,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=865580.0, ans=0.125 2024-09-20 10:02:42,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=865660.0, ans=0.025 2024-09-20 10:02:45,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.91 vs. limit=12.0 2024-09-20 10:02:54,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=865660.0, ans=0.125 2024-09-20 10:02:58,586 INFO [train.py:1198] (0/2) Epoch 48, batch 3750, loss[loss=0.2074, ctc_loss=0.09622, cr_loss=0.3412, attn_decoder_loss=0.2122, over 29344.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1074, cr_loss=0.3463, attn_decoder_loss=0.2364, over 5807035.26 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:03:01,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865700.0, ans=0.125 2024-09-20 10:03:09,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=865700.0, ans=0.125 2024-09-20 10:03:14,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-20 10:03:19,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=865740.0, ans=0.125 2024-09-20 10:03:28,344 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.668e+01 9.150e+01 9.729e+01 2.139e+02, threshold=1.830e+02, percent-clipped=2.0 2024-09-20 10:03:31,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=865780.0, ans=0.125 2024-09-20 10:03:39,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=865780.0, ans=0.125 2024-09-20 10:04:00,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-20 10:04:12,899 INFO [train.py:1198] (0/2) Epoch 48, batch 3800, loss[loss=0.238, ctc_loss=0.1159, cr_loss=0.3646, attn_decoder_loss=0.2435, over 29631.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1074, cr_loss=0.3462, attn_decoder_loss=0.2364, over 5797711.73 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:04:26,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=865940.0, ans=0.0 2024-09-20 10:04:30,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-09-20 10:04:47,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=865980.0, ans=0.0 2024-09-20 10:04:51,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=865980.0, ans=0.2 2024-09-20 10:05:23,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.86 vs. limit=10.0 2024-09-20 10:05:27,016 INFO [train.py:1198] (0/2) Epoch 48, batch 3850, loss[loss=0.2351, ctc_loss=0.1075, cr_loss=0.3566, attn_decoder_loss=0.2414, over 29208.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1073, cr_loss=0.3462, attn_decoder_loss=0.2362, over 5811952.56 frames. ], batch size: 100, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:05:38,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=866100.0, ans=0.1 2024-09-20 10:05:41,974 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:05:46,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2024-09-20 10:05:56,512 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.668e+01 9.090e+01 9.614e+01 1.900e+02, threshold=1.818e+02, percent-clipped=1.0 2024-09-20 10:06:01,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=866180.0, ans=0.0 2024-09-20 10:06:07,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=866180.0, ans=0.0 2024-09-20 10:06:16,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=866220.0, ans=0.125 2024-09-20 10:06:17,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=866220.0, ans=0.125 2024-09-20 10:06:23,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-20 10:06:40,951 INFO [train.py:1198] (0/2) Epoch 48, batch 3900, loss[loss=0.2385, ctc_loss=0.1102, cr_loss=0.3461, attn_decoder_loss=0.2451, over 29628.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1077, cr_loss=0.3467, attn_decoder_loss=0.2365, over 5816274.87 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:06:41,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=866300.0, ans=0.125 2024-09-20 10:06:45,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=866300.0, ans=0.0 2024-09-20 10:07:22,118 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-20 10:07:26,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=866380.0, ans=0.2 2024-09-20 10:07:49,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866460.0, ans=0.1 2024-09-20 10:07:50,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=866460.0, ans=0.0 2024-09-20 10:07:58,099 INFO [train.py:1198] (0/2) Epoch 48, batch 3950, loss[loss=0.2507, ctc_loss=0.123, cr_loss=0.3739, attn_decoder_loss=0.2566, over 29487.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1073, cr_loss=0.3459, attn_decoder_loss=0.2365, over 5835555.21 frames. ], batch size: 97, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:08:04,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=866500.0, ans=0.125 2024-09-20 10:08:17,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=866540.0, ans=0.125 2024-09-20 10:08:17,506 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:08:27,414 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.623e+01 9.056e+01 9.623e+01 1.586e+02, threshold=1.811e+02, percent-clipped=0.0 2024-09-20 10:08:37,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2024-09-20 10:08:41,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2024-09-20 10:08:49,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=866620.0, ans=0.0 2024-09-20 10:08:59,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=866660.0, ans=0.125 2024-09-20 10:09:11,165 INFO [train.py:1198] (0/2) Epoch 48, batch 4000, loss[loss=0.2073, ctc_loss=0.08399, cr_loss=0.2907, attn_decoder_loss=0.2146, over 29509.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1075, cr_loss=0.3461, attn_decoder_loss=0.2366, over 5814001.30 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:09:23,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866700.0, ans=0.1 2024-09-20 10:09:27,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=866740.0, ans=0.125 2024-09-20 10:09:36,342 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=866740.0, ans=0.0 2024-09-20 10:09:45,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=866780.0, ans=0.125 2024-09-20 10:09:50,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=866780.0, ans=0.1 2024-09-20 10:09:53,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 2024-09-20 10:09:56,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=866820.0, ans=0.0 2024-09-20 10:10:02,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-09-20 10:10:20,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=866860.0, ans=0.125 2024-09-20 10:10:24,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2024-09-20 10:10:24,655 INFO [train.py:1198] (0/2) Epoch 48, batch 4050, loss[loss=0.2433, ctc_loss=0.1206, cr_loss=0.3394, attn_decoder_loss=0.2494, over 19806.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1075, cr_loss=0.346, attn_decoder_loss=0.2363, over 5796888.82 frames. ], batch size: 209, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:10:44,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.89 vs. limit=15.0 2024-09-20 10:10:46,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866940.0, ans=0.1 2024-09-20 10:10:49,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=866940.0, ans=0.125 2024-09-20 10:10:51,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=866940.0, ans=0.0 2024-09-20 10:10:53,680 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 8.805e+01 9.236e+01 9.679e+01 1.942e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-20 10:11:10,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867020.0, ans=0.125 2024-09-20 10:11:11,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.64 vs. limit=10.0 2024-09-20 10:11:27,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=867060.0, ans=0.025 2024-09-20 10:11:39,148 INFO [train.py:1198] (0/2) Epoch 48, batch 4100, loss[loss=0.2511, ctc_loss=0.1293, cr_loss=0.3928, attn_decoder_loss=0.2559, over 29519.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1079, cr_loss=0.3469, attn_decoder_loss=0.2366, over 5792418.00 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:12:02,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=867140.0, ans=0.1 2024-09-20 10:12:05,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=867140.0, ans=0.04949747468305833 2024-09-20 10:12:10,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867180.0, ans=0.125 2024-09-20 10:12:26,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=867220.0, ans=0.125 2024-09-20 10:12:50,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=867260.0, ans=0.125 2024-09-20 10:12:54,717 INFO [train.py:1198] (0/2) Epoch 48, batch 4150, loss[loss=0.2215, ctc_loss=0.1054, cr_loss=0.352, attn_decoder_loss=0.2266, over 29500.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.108, cr_loss=0.3473, attn_decoder_loss=0.2362, over 5797162.02 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:13:15,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=867340.0, ans=0.125 2024-09-20 10:13:23,981 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.808e+01 9.166e+01 9.915e+01 1.612e+02, threshold=1.833e+02, percent-clipped=0.0 2024-09-20 10:13:37,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=867420.0, ans=0.0 2024-09-20 10:13:49,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867420.0, ans=0.1 2024-09-20 10:13:54,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=867460.0, ans=0.0 2024-09-20 10:14:04,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.72 vs. limit=22.5 2024-09-20 10:14:08,054 INFO [train.py:1198] (0/2) Epoch 48, batch 4200, loss[loss=0.2338, ctc_loss=0.1161, cr_loss=0.3581, attn_decoder_loss=0.239, over 29511.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1082, cr_loss=0.348, attn_decoder_loss=0.2367, over 5799486.42 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:14:31,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2024-09-20 10:15:17,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.50 vs. limit=22.5 2024-09-20 10:15:20,379 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=12.0 2024-09-20 10:15:21,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867700.0, ans=0.0 2024-09-20 10:15:22,555 INFO [train.py:1198] (0/2) Epoch 48, batch 4250, loss[loss=0.2219, ctc_loss=0.09539, cr_loss=0.3149, attn_decoder_loss=0.229, over 29527.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.108, cr_loss=0.3472, attn_decoder_loss=0.2368, over 5804572.06 frames. ], batch size: 74, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:15:22,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=867700.0, ans=0.125 2024-09-20 10:15:24,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=867700.0, ans=0.1 2024-09-20 10:15:31,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=867700.0, ans=0.125 2024-09-20 10:15:54,014 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.733e+01 9.174e+01 9.868e+01 2.354e+02, threshold=1.835e+02, percent-clipped=1.0 2024-09-20 10:15:56,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.88 vs. limit=10.0 2024-09-20 10:15:57,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=867780.0, ans=0.125 2024-09-20 10:16:03,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=867780.0, ans=0.125 2024-09-20 10:16:03,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=867780.0, ans=0.125 2024-09-20 10:16:05,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2024-09-20 10:16:13,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=867820.0, ans=0.5 2024-09-20 10:16:16,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=867820.0, ans=0.0 2024-09-20 10:16:22,825 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.14 vs. limit=12.0 2024-09-20 10:16:36,677 INFO [train.py:1198] (0/2) Epoch 48, batch 4300, loss[loss=0.241, ctc_loss=0.1145, cr_loss=0.3516, attn_decoder_loss=0.2473, over 29546.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1079, cr_loss=0.3464, attn_decoder_loss=0.2372, over 5793588.06 frames. ], batch size: 87, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:16:38,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=867900.0, ans=0.125 2024-09-20 10:17:04,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=867980.0, ans=0.125 2024-09-20 10:17:24,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=868020.0, ans=0.125 2024-09-20 10:17:27,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-09-20 10:17:31,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=868020.0, ans=0.025 2024-09-20 10:17:45,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-20 10:17:50,562 INFO [train.py:1198] (0/2) Epoch 48, batch 4350, loss[loss=0.2489, ctc_loss=0.1276, cr_loss=0.379, attn_decoder_loss=0.2539, over 29529.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1104, cr_loss=0.3519, attn_decoder_loss=0.2403, over 5796940.15 frames. ], batch size: 97, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:18:00,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=868100.0, ans=0.125 2024-09-20 10:18:04,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-20 10:18:14,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=868140.0, ans=0.95 2024-09-20 10:18:16,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=868140.0, ans=0.125 2024-09-20 10:18:21,800 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.099e+01 9.551e+01 1.026e+02 1.775e+02, threshold=1.910e+02, percent-clipped=0.0 2024-09-20 10:18:29,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868180.0, ans=0.1 2024-09-20 10:18:29,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=868180.0, ans=0.125 2024-09-20 10:18:35,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=868220.0, ans=0.2 2024-09-20 10:18:42,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=22.5 2024-09-20 10:18:42,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.78 vs. limit=22.5 2024-09-20 10:18:59,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=868260.0, ans=0.125 2024-09-20 10:19:01,155 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2024-09-20 10:19:04,477 INFO [train.py:1198] (0/2) Epoch 48, batch 4400, loss[loss=0.2428, ctc_loss=0.1199, cr_loss=0.3713, attn_decoder_loss=0.2482, over 27268.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1114, cr_loss=0.354, attn_decoder_loss=0.242, over 5767692.62 frames. ], batch size: 124, lr: 2.28e-03, grad_scale: 16.0 2024-09-20 10:19:44,380 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:19:45,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=868380.0, ans=0.09899494936611666 2024-09-20 10:19:48,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=868420.0, ans=0.0 2024-09-20 10:19:53,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868420.0, ans=0.1 2024-09-20 10:19:53,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=868420.0, ans=0.125 2024-09-20 10:20:18,889 INFO [train.py:1198] (0/2) Epoch 48, batch 4450, loss[loss=0.255, ctc_loss=0.1378, cr_loss=0.3876, attn_decoder_loss=0.2594, over 20291.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1148, cr_loss=0.3596, attn_decoder_loss=0.2442, over 5574642.69 frames. ], batch size: 210, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:20:19,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=868500.0, ans=0.125 2024-09-20 10:20:30,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2024-09-20 10:20:49,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=868580.0, ans=0.0 2024-09-20 10:20:52,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.165e+01 9.214e+01 1.004e+02 1.130e+02 1.604e+02, threshold=2.007e+02, percent-clipped=0.0 2024-09-20 10:20:55,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868580.0, ans=0.1 2024-09-20 10:21:03,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.71 vs. limit=15.0 2024-09-20 10:21:22,308 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-09-20 10:21:33,695 INFO [train.py:1198] (0/2) Epoch 48, batch 4500, loss[loss=0.2521, ctc_loss=0.1374, cr_loss=0.3722, attn_decoder_loss=0.2566, over 20367.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1174, cr_loss=0.3617, attn_decoder_loss=0.2457, over 5236472.66 frames. ], batch size: 210, lr: 2.28e-03, grad_scale: 8.0 2024-09-20 10:21:39,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=868700.0, ans=0.125 2024-09-20 10:21:44,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=868700.0, ans=0.95 2024-09-20 10:21:51,153 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=12.0 2024-09-20 10:22:11,204 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-48.pt 2024-09-20 10:23:01,527 INFO [train.py:1198] (0/2) Epoch 49, batch 0, loss[loss=0.207, ctc_loss=0.08852, cr_loss=0.3078, attn_decoder_loss=0.2133, over 29604.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.08852, cr_loss=0.3078, attn_decoder_loss=0.2133, over 29604.00 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:23:01,527 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 10:23:19,981 INFO [train.py:1230] (0/2) Epoch 49, validation: loss=0.2124, ctc_loss=0.03569, cr_loss=6.554e-15, attn_decoder_loss=0.2321, over 944034.00 frames. 2024-09-20 10:23:19,981 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 10:23:26,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=868800.0, ans=0.025 2024-09-20 10:23:27,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=868800.0, ans=0.125 2024-09-20 10:24:24,926 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:24:32,177 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 9.535e+01 1.078e+02 1.164e+02 4.744e+02, threshold=2.156e+02, percent-clipped=1.0 2024-09-20 10:24:36,547 INFO [train.py:1198] (0/2) Epoch 49, batch 50, loss[loss=0.2056, ctc_loss=0.09332, cr_loss=0.3106, attn_decoder_loss=0.2112, over 29457.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1086, cr_loss=0.3466, attn_decoder_loss=0.2375, over 1269174.46 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:24:57,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=869040.0, ans=0.0 2024-09-20 10:24:57,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=869040.0, ans=0.125 2024-09-20 10:25:02,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=869040.0, ans=0.0 2024-09-20 10:25:10,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=869080.0, ans=0.125 2024-09-20 10:25:31,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=869120.0, ans=0.125 2024-09-20 10:25:53,938 INFO [train.py:1198] (0/2) Epoch 49, batch 100, loss[loss=0.2257, ctc_loss=0.1118, cr_loss=0.3366, attn_decoder_loss=0.2309, over 29545.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1107, cr_loss=0.352, attn_decoder_loss=0.2392, over 2252368.66 frames. ], batch size: 76, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:25:57,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=869200.0, ans=0.125 2024-09-20 10:26:08,121 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2024-09-20 10:26:34,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=869280.0, ans=0.125 2024-09-20 10:26:56,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=869360.0, ans=0.07 2024-09-20 10:26:58,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=869360.0, ans=0.125 2024-09-20 10:27:05,481 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.667e+01 9.247e+01 9.821e+01 1.649e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-20 10:27:08,519 INFO [train.py:1198] (0/2) Epoch 49, batch 150, loss[loss=0.2089, ctc_loss=0.09265, cr_loss=0.3253, attn_decoder_loss=0.2146, over 29440.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1082, cr_loss=0.3472, attn_decoder_loss=0.2371, over 3047414.00 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:27:17,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=869400.0, ans=0.0 2024-09-20 10:27:33,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=869440.0, ans=0.125 2024-09-20 10:27:48,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=869480.0, ans=0.2 2024-09-20 10:27:53,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=869480.0, ans=0.0 2024-09-20 10:28:00,613 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:28:00,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=869520.0, ans=0.0 2024-09-20 10:28:06,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=869520.0, ans=0.125 2024-09-20 10:28:16,471 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-20 10:28:26,279 INFO [train.py:1198] (0/2) Epoch 49, batch 200, loss[loss=0.2413, ctc_loss=0.1131, cr_loss=0.3641, attn_decoder_loss=0.2474, over 27089.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1075, cr_loss=0.3466, attn_decoder_loss=0.2362, over 3658689.03 frames. ], batch size: 124, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:28:58,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869680.0, ans=0.1 2024-09-20 10:29:13,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=869720.0, ans=0.125 2024-09-20 10:29:14,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=869720.0, ans=0.125 2024-09-20 10:29:18,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-09-20 10:29:38,438 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.622e+01 9.248e+01 9.651e+01 1.394e+02, threshold=1.850e+02, percent-clipped=0.0 2024-09-20 10:29:43,771 INFO [train.py:1198] (0/2) Epoch 49, batch 250, loss[loss=0.2535, ctc_loss=0.1221, cr_loss=0.3761, attn_decoder_loss=0.2598, over 29222.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1076, cr_loss=0.3469, attn_decoder_loss=0.2365, over 4141638.23 frames. ], batch size: 100, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:29:46,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2024-09-20 10:29:48,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.81 vs. limit=5.0 2024-09-20 10:30:15,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869880.0, ans=0.1 2024-09-20 10:30:59,203 INFO [train.py:1198] (0/2) Epoch 49, batch 300, loss[loss=0.2428, ctc_loss=0.115, cr_loss=0.3555, attn_decoder_loss=0.2491, over 29525.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1072, cr_loss=0.3459, attn_decoder_loss=0.2362, over 4508627.99 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:31:07,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=870000.0, ans=0.0 2024-09-20 10:31:21,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870040.0, ans=0.1 2024-09-20 10:31:52,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=870120.0, ans=0.125 2024-09-20 10:31:54,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=870120.0, ans=0.0 2024-09-20 10:32:13,610 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.601e+01 9.011e+01 9.321e+01 1.888e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 10:32:16,512 INFO [train.py:1198] (0/2) Epoch 49, batch 350, loss[loss=0.2031, ctc_loss=0.07674, cr_loss=0.2854, attn_decoder_loss=0.2108, over 29315.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1074, cr_loss=0.3465, attn_decoder_loss=0.2367, over 4794983.95 frames. ], batch size: 71, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:32:16,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=870200.0, ans=0.2 2024-09-20 10:32:31,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-09-20 10:32:44,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=870280.0, ans=0.125 2024-09-20 10:32:55,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=870280.0, ans=0.125 2024-09-20 10:33:22,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=870360.0, ans=0.2 2024-09-20 10:33:25,902 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2024-09-20 10:33:31,555 INFO [train.py:1198] (0/2) Epoch 49, batch 400, loss[loss=0.2409, ctc_loss=0.1129, cr_loss=0.3412, attn_decoder_loss=0.2476, over 29727.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.107, cr_loss=0.3458, attn_decoder_loss=0.2363, over 5024658.34 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:33:49,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=870440.0, ans=0.125 2024-09-20 10:33:52,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.41 vs. limit=22.5 2024-09-20 10:34:31,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=870520.0, ans=0.125 2024-09-20 10:34:46,411 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.552e+01 9.241e+01 9.788e+01 2.728e+02, threshold=1.848e+02, percent-clipped=1.0 2024-09-20 10:34:49,363 INFO [train.py:1198] (0/2) Epoch 49, batch 450, loss[loss=0.2314, ctc_loss=0.1084, cr_loss=0.3601, attn_decoder_loss=0.237, over 29707.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1073, cr_loss=0.3466, attn_decoder_loss=0.2364, over 5188025.74 frames. ], batch size: 83, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:34:55,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=870600.0, ans=0.2 2024-09-20 10:35:09,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=870640.0, ans=0.2 2024-09-20 10:35:13,824 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:35:56,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=870760.0, ans=0.0 2024-09-20 10:36:02,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-20 10:36:07,290 INFO [train.py:1198] (0/2) Epoch 49, batch 500, loss[loss=0.2468, ctc_loss=0.1193, cr_loss=0.3703, attn_decoder_loss=0.2527, over 29460.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1068, cr_loss=0.3456, attn_decoder_loss=0.2357, over 5330455.38 frames. ], batch size: 94, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:36:10,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=870800.0, ans=0.1 2024-09-20 10:36:14,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-20 10:36:16,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=870800.0, ans=0.0 2024-09-20 10:36:19,796 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:36:22,795 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:36:25,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=870840.0, ans=0.5 2024-09-20 10:37:18,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=870960.0, ans=0.015 2024-09-20 10:37:19,724 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.745e+01 9.074e+01 9.621e+01 1.472e+02, threshold=1.815e+02, percent-clipped=0.0 2024-09-20 10:37:22,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-20 10:37:25,058 INFO [train.py:1198] (0/2) Epoch 49, batch 550, loss[loss=0.2343, ctc_loss=0.09976, cr_loss=0.3043, attn_decoder_loss=0.2425, over 28842.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1068, cr_loss=0.3453, attn_decoder_loss=0.2356, over 5423933.06 frames. ], batch size: 104, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:37:30,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=12.0 2024-09-20 10:37:55,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=871080.0, ans=0.125 2024-09-20 10:38:04,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=871080.0, ans=0.125 2024-09-20 10:38:10,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=871120.0, ans=0.0 2024-09-20 10:38:40,830 INFO [train.py:1198] (0/2) Epoch 49, batch 600, loss[loss=0.2396, ctc_loss=0.1058, cr_loss=0.3211, attn_decoder_loss=0.2473, over 29210.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1073, cr_loss=0.3459, attn_decoder_loss=0.236, over 5511122.91 frames. ], batch size: 100, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:39:54,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=871360.0, ans=0.125 2024-09-20 10:39:55,326 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 8.578e+01 9.036e+01 9.635e+01 5.589e+02, threshold=1.807e+02, percent-clipped=2.0 2024-09-20 10:39:58,317 INFO [train.py:1198] (0/2) Epoch 49, batch 650, loss[loss=0.2277, ctc_loss=0.1024, cr_loss=0.3301, attn_decoder_loss=0.2343, over 29748.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1063, cr_loss=0.3435, attn_decoder_loss=0.2353, over 5588481.38 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:40:07,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=871400.0, ans=0.025 2024-09-20 10:40:32,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-09-20 10:40:47,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=871520.0, ans=0.0 2024-09-20 10:41:12,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=871600.0, ans=0.125 2024-09-20 10:41:13,721 INFO [train.py:1198] (0/2) Epoch 49, batch 700, loss[loss=0.2208, ctc_loss=0.09761, cr_loss=0.3295, attn_decoder_loss=0.2271, over 29511.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1069, cr_loss=0.3448, attn_decoder_loss=0.2358, over 5639554.74 frames. ], batch size: 76, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:41:20,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=871600.0, ans=0.2 2024-09-20 10:41:26,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=871600.0, ans=0.125 2024-09-20 10:41:26,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=871600.0, ans=0.125 2024-09-20 10:41:29,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=871640.0, ans=0.125 2024-09-20 10:41:32,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=871640.0, ans=0.1 2024-09-20 10:41:40,240 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=871640.0, ans=0.0 2024-09-20 10:42:04,352 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=871720.0, ans=0.125 2024-09-20 10:42:05,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=871720.0, ans=0.0 2024-09-20 10:42:26,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=871760.0, ans=0.125 2024-09-20 10:42:29,495 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 8.695e+01 9.527e+01 1.020e+02 1.538e+02, threshold=1.905e+02, percent-clipped=0.0 2024-09-20 10:42:31,056 INFO [train.py:1198] (0/2) Epoch 49, batch 750, loss[loss=0.2386, ctc_loss=0.1102, cr_loss=0.3574, attn_decoder_loss=0.245, over 29712.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1069, cr_loss=0.345, attn_decoder_loss=0.2358, over 5677316.70 frames. ], batch size: 82, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:43:06,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-20 10:43:34,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=871960.0, ans=0.125 2024-09-20 10:43:37,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=871960.0, ans=0.125 2024-09-20 10:43:40,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=871960.0, ans=0.125 2024-09-20 10:43:48,839 INFO [train.py:1198] (0/2) Epoch 49, batch 800, loss[loss=0.2128, ctc_loss=0.09813, cr_loss=0.3239, attn_decoder_loss=0.2184, over 29584.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1066, cr_loss=0.3443, attn_decoder_loss=0.2357, over 5707598.61 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:44:02,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=872040.0, ans=0.025 2024-09-20 10:44:05,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=872040.0, ans=0.125 2024-09-20 10:44:05,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=872040.0, ans=0.07 2024-09-20 10:44:11,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-09-20 10:44:15,378 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.27 vs. limit=12.0 2024-09-20 10:44:17,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872080.0, ans=0.1 2024-09-20 10:44:23,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=872080.0, ans=0.05 2024-09-20 10:44:35,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=872120.0, ans=0.0 2024-09-20 10:44:37,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=872120.0, ans=0.0 2024-09-20 10:45:03,403 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.964e+01 8.646e+01 9.267e+01 9.884e+01 3.056e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-20 10:45:03,424 INFO [train.py:1198] (0/2) Epoch 49, batch 850, loss[loss=0.2347, ctc_loss=0.1099, cr_loss=0.3597, attn_decoder_loss=0.2406, over 29706.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1063, cr_loss=0.3435, attn_decoder_loss=0.2353, over 5734397.38 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:45:10,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2024-09-20 10:45:17,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=872200.0, ans=0.125 2024-09-20 10:45:18,456 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.06 vs. limit=10.0 2024-09-20 10:45:19,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=872240.0, ans=0.125 2024-09-20 10:45:22,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=872240.0, ans=0.125 2024-09-20 10:45:32,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872240.0, ans=0.125 2024-09-20 10:45:34,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=872280.0, ans=0.0 2024-09-20 10:45:54,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=872320.0, ans=0.0 2024-09-20 10:46:09,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-20 10:46:12,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=872360.0, ans=0.0 2024-09-20 10:46:21,062 INFO [train.py:1198] (0/2) Epoch 49, batch 900, loss[loss=0.204, ctc_loss=0.0902, cr_loss=0.3127, attn_decoder_loss=0.2097, over 29628.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1064, cr_loss=0.3435, attn_decoder_loss=0.2355, over 5738161.68 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:46:37,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=872440.0, ans=0.125 2024-09-20 10:46:37,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=872440.0, ans=0.2 2024-09-20 10:46:40,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=872440.0, ans=0.125 2024-09-20 10:46:45,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-09-20 10:47:08,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=872520.0, ans=0.2 2024-09-20 10:47:29,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=872560.0, ans=0.2 2024-09-20 10:47:33,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=872560.0, ans=0.125 2024-09-20 10:47:38,400 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.666e+01 9.208e+01 1.007e+02 2.481e+02, threshold=1.842e+02, percent-clipped=1.0 2024-09-20 10:47:38,427 INFO [train.py:1198] (0/2) Epoch 49, batch 950, loss[loss=0.2154, ctc_loss=0.09565, cr_loss=0.3102, attn_decoder_loss=0.2218, over 29522.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1065, cr_loss=0.3436, attn_decoder_loss=0.2358, over 5738756.57 frames. ], batch size: 74, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:47:41,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=872600.0, ans=0.125 2024-09-20 10:48:00,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.69 vs. limit=15.0 2024-09-20 10:48:20,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=872680.0, ans=0.0 2024-09-20 10:48:43,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=872760.0, ans=0.125 2024-09-20 10:48:44,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=872760.0, ans=0.125 2024-09-20 10:48:46,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.74 vs. limit=15.0 2024-09-20 10:48:53,489 INFO [train.py:1198] (0/2) Epoch 49, batch 1000, loss[loss=0.2139, ctc_loss=0.09212, cr_loss=0.2995, attn_decoder_loss=0.2208, over 29511.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1074, cr_loss=0.3455, attn_decoder_loss=0.2367, over 5733366.93 frames. ], batch size: 77, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:48:53,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=872800.0, ans=0.0 2024-09-20 10:49:35,657 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=12.0 2024-09-20 10:49:54,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=872960.0, ans=0.125 2024-09-20 10:50:02,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=872960.0, ans=0.0 2024-09-20 10:50:10,721 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.566e+01 9.140e+01 9.673e+01 2.370e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-20 10:50:10,746 INFO [train.py:1198] (0/2) Epoch 49, batch 1050, loss[loss=0.232, ctc_loss=0.1007, cr_loss=0.3298, attn_decoder_loss=0.2393, over 29695.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1074, cr_loss=0.3451, attn_decoder_loss=0.2364, over 5741299.63 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:50:43,539 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.97 vs. limit=15.0 2024-09-20 10:50:49,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=873080.0, ans=0.1 2024-09-20 10:50:56,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=873120.0, ans=0.2 2024-09-20 10:51:01,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=873120.0, ans=0.025 2024-09-20 10:51:03,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=873120.0, ans=0.0 2024-09-20 10:51:14,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873160.0, ans=0.1 2024-09-20 10:51:14,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=873160.0, ans=0.125 2024-09-20 10:51:26,491 INFO [train.py:1198] (0/2) Epoch 49, batch 1100, loss[loss=0.2237, ctc_loss=0.1059, cr_loss=0.3473, attn_decoder_loss=0.2291, over 29458.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1071, cr_loss=0.3448, attn_decoder_loss=0.236, over 5754248.20 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:51:47,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=873240.0, ans=0.025 2024-09-20 10:52:06,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=873280.0, ans=0.09899494936611666 2024-09-20 10:52:20,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=873320.0, ans=0.125 2024-09-20 10:52:22,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873320.0, ans=0.125 2024-09-20 10:52:28,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=873360.0, ans=0.125 2024-09-20 10:52:33,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-20 10:52:34,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=873360.0, ans=0.0 2024-09-20 10:52:40,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=873360.0, ans=0.0 2024-09-20 10:52:44,485 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 8.549e+01 9.114e+01 9.620e+01 1.410e+02, threshold=1.823e+02, percent-clipped=0.0 2024-09-20 10:52:44,507 INFO [train.py:1198] (0/2) Epoch 49, batch 1150, loss[loss=0.2314, ctc_loss=0.1065, cr_loss=0.3556, attn_decoder_loss=0.2374, over 29418.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1073, cr_loss=0.3448, attn_decoder_loss=0.236, over 5753016.70 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 10:52:44,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=873400.0, ans=0.0 2024-09-20 10:53:01,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.92 vs. limit=15.0 2024-09-20 10:54:02,454 INFO [train.py:1198] (0/2) Epoch 49, batch 1200, loss[loss=0.2303, ctc_loss=0.107, cr_loss=0.3471, attn_decoder_loss=0.2363, over 29686.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1078, cr_loss=0.346, attn_decoder_loss=0.2369, over 5746268.57 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:54:23,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=873640.0, ans=0.125 2024-09-20 10:54:27,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.97 vs. limit=15.0 2024-09-20 10:54:30,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.28 vs. limit=10.0 2024-09-20 10:54:42,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=873680.0, ans=0.125 2024-09-20 10:55:07,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=873760.0, ans=0.125 2024-09-20 10:55:18,095 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.765e+01 9.274e+01 9.697e+01 1.334e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-20 10:55:18,117 INFO [train.py:1198] (0/2) Epoch 49, batch 1250, loss[loss=0.2442, ctc_loss=0.116, cr_loss=0.382, attn_decoder_loss=0.25, over 29519.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1079, cr_loss=0.3466, attn_decoder_loss=0.2372, over 5775531.99 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:55:21,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=873800.0, ans=0.2 2024-09-20 10:55:22,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=4.78 vs. limit=15.0 2024-09-20 10:55:58,407 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:56:01,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=873880.0, ans=0.125 2024-09-20 10:56:15,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=873920.0, ans=0.1 2024-09-20 10:56:17,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.40 vs. limit=22.5 2024-09-20 10:56:35,758 INFO [train.py:1198] (0/2) Epoch 49, batch 1300, loss[loss=0.227, ctc_loss=0.1033, cr_loss=0.34, attn_decoder_loss=0.2332, over 28399.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1077, cr_loss=0.3466, attn_decoder_loss=0.2367, over 5779635.32 frames. ], batch size: 111, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:56:47,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=874000.0, ans=0.5 2024-09-20 10:56:47,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=874000.0, ans=0.125 2024-09-20 10:57:05,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874040.0, ans=0.1 2024-09-20 10:57:07,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=874080.0, ans=0.2 2024-09-20 10:57:23,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=874120.0, ans=0.125 2024-09-20 10:57:53,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.411e+01 9.030e+01 9.662e+01 1.974e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-20 10:57:53,522 INFO [train.py:1198] (0/2) Epoch 49, batch 1350, loss[loss=0.2298, ctc_loss=0.1091, cr_loss=0.3488, attn_decoder_loss=0.2354, over 29769.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1073, cr_loss=0.3458, attn_decoder_loss=0.2363, over 5795461.62 frames. ], batch size: 81, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:57:58,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=874200.0, ans=0.0 2024-09-20 10:58:17,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=874240.0, ans=0.125 2024-09-20 10:58:46,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=874320.0, ans=0.0 2024-09-20 10:58:59,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874360.0, ans=0.1 2024-09-20 10:59:09,069 INFO [train.py:1198] (0/2) Epoch 49, batch 1400, loss[loss=0.2039, ctc_loss=0.09532, cr_loss=0.3197, attn_decoder_loss=0.2089, over 29597.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1073, cr_loss=0.3455, attn_decoder_loss=0.2363, over 5806949.99 frames. ], batch size: 69, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 10:59:19,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=874400.0, ans=0.2 2024-09-20 10:59:27,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=874440.0, ans=0.0 2024-09-20 10:59:47,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=874480.0, ans=0.0 2024-09-20 10:59:48,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.12 vs. limit=22.5 2024-09-20 10:59:53,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=874480.0, ans=0.09899494936611666 2024-09-20 11:00:08,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=874520.0, ans=0.125 2024-09-20 11:00:25,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=874600.0, ans=0.0 2024-09-20 11:00:26,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.532e+01 9.313e+01 9.693e+01 1.325e+02, threshold=1.863e+02, percent-clipped=0.0 2024-09-20 11:00:26,332 INFO [train.py:1198] (0/2) Epoch 49, batch 1450, loss[loss=0.2508, ctc_loss=0.1283, cr_loss=0.3969, attn_decoder_loss=0.2556, over 29459.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1078, cr_loss=0.3465, attn_decoder_loss=0.237, over 5805246.86 frames. ], batch size: 94, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:00:26,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=874600.0, ans=0.125 2024-09-20 11:00:31,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=874600.0, ans=0.125 2024-09-20 11:01:09,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=874680.0, ans=0.125 2024-09-20 11:01:18,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=874720.0, ans=0.025 2024-09-20 11:01:25,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=874720.0, ans=0.0 2024-09-20 11:01:27,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=874760.0, ans=0.125 2024-09-20 11:01:31,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874760.0, ans=0.1 2024-09-20 11:01:43,697 INFO [train.py:1198] (0/2) Epoch 49, batch 1500, loss[loss=0.2399, ctc_loss=0.1129, cr_loss=0.3548, attn_decoder_loss=0.2461, over 29640.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1076, cr_loss=0.346, attn_decoder_loss=0.2372, over 5805899.35 frames. ], batch size: 86, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:02:00,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=874840.0, ans=0.1 2024-09-20 11:02:46,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=874960.0, ans=0.025 2024-09-20 11:02:59,723 INFO [train.py:1198] (0/2) Epoch 49, batch 1550, loss[loss=0.2427, ctc_loss=0.1143, cr_loss=0.3654, attn_decoder_loss=0.2488, over 29523.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.108, cr_loss=0.3466, attn_decoder_loss=0.2372, over 5781309.48 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:03:01,251 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.780e+01 9.221e+01 9.714e+01 1.731e+02, threshold=1.844e+02, percent-clipped=0.0 2024-09-20 11:03:15,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=875040.0, ans=0.025 2024-09-20 11:03:32,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=875080.0, ans=0.0 2024-09-20 11:03:44,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=875080.0, ans=0.0 2024-09-20 11:04:17,871 INFO [train.py:1198] (0/2) Epoch 49, batch 1600, loss[loss=0.2441, ctc_loss=0.1084, cr_loss=0.3596, attn_decoder_loss=0.2512, over 29658.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1079, cr_loss=0.3465, attn_decoder_loss=0.237, over 5766687.71 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:04:29,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=875200.0, ans=0.09899494936611666 2024-09-20 11:04:55,839 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-09-20 11:05:02,551 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:05:28,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=875360.0, ans=0.05 2024-09-20 11:05:35,272 INFO [train.py:1198] (0/2) Epoch 49, batch 1650, loss[loss=0.2518, ctc_loss=0.1257, cr_loss=0.3894, attn_decoder_loss=0.2571, over 29715.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1075, cr_loss=0.3462, attn_decoder_loss=0.2368, over 5759737.54 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:05:36,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.731e+01 9.375e+01 1.033e+02 4.600e+02, threshold=1.875e+02, percent-clipped=3.0 2024-09-20 11:05:44,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=875400.0, ans=0.125 2024-09-20 11:05:59,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=875440.0, ans=0.125 2024-09-20 11:06:25,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=875520.0, ans=0.125 2024-09-20 11:06:31,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=875520.0, ans=0.05 2024-09-20 11:06:37,692 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2024-09-20 11:06:50,358 INFO [train.py:1198] (0/2) Epoch 49, batch 1700, loss[loss=0.1987, ctc_loss=0.08375, cr_loss=0.2947, attn_decoder_loss=0.2049, over 29596.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1069, cr_loss=0.345, attn_decoder_loss=0.2366, over 5782370.64 frames. ], batch size: 69, lr: 2.25e-03, grad_scale: 16.0 2024-09-20 11:07:03,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.15 vs. limit=12.0 2024-09-20 11:07:15,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=875640.0, ans=0.125 2024-09-20 11:07:31,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2024-09-20 11:07:54,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=875760.0, ans=0.125 2024-09-20 11:08:05,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=875760.0, ans=0.125 2024-09-20 11:08:07,659 INFO [train.py:1198] (0/2) Epoch 49, batch 1750, loss[loss=0.2073, ctc_loss=0.09342, cr_loss=0.3271, attn_decoder_loss=0.2127, over 29346.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1067, cr_loss=0.3448, attn_decoder_loss=0.2361, over 5791214.79 frames. ], batch size: 67, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:08:10,639 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.685e+01 8.566e+01 9.020e+01 9.576e+01 1.474e+02, threshold=1.804e+02, percent-clipped=0.0 2024-09-20 11:08:10,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=875800.0, ans=0.025 2024-09-20 11:08:10,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=875800.0, ans=0.0 2024-09-20 11:08:26,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=875840.0, ans=0.2 2024-09-20 11:08:44,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=875880.0, ans=10.0 2024-09-20 11:08:53,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=875920.0, ans=0.2 2024-09-20 11:09:24,845 INFO [train.py:1198] (0/2) Epoch 49, batch 1800, loss[loss=0.2419, ctc_loss=0.113, cr_loss=0.3637, attn_decoder_loss=0.2482, over 29703.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1068, cr_loss=0.3448, attn_decoder_loss=0.2361, over 5792708.40 frames. ], batch size: 83, lr: 2.25e-03, grad_scale: 8.0 2024-09-20 11:09:34,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=876000.0, ans=0.125 2024-09-20 11:10:01,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=876080.0, ans=0.0 2024-09-20 11:10:03,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=876080.0, ans=15.0 2024-09-20 11:10:32,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876160.0, ans=0.0 2024-09-20 11:10:40,175 INFO [train.py:1198] (0/2) Epoch 49, batch 1850, loss[loss=0.2417, ctc_loss=0.1085, cr_loss=0.3486, attn_decoder_loss=0.2487, over 29630.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.107, cr_loss=0.3455, attn_decoder_loss=0.2362, over 5797393.61 frames. ], batch size: 86, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:10:43,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.600e+01 9.055e+01 9.654e+01 2.900e+02, threshold=1.811e+02, percent-clipped=2.0 2024-09-20 11:10:58,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.76 vs. limit=10.0 2024-09-20 11:10:58,971 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-09-20 11:11:04,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=876240.0, ans=0.2 2024-09-20 11:11:08,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=876280.0, ans=0.125 2024-09-20 11:11:27,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=876320.0, ans=0.025 2024-09-20 11:11:36,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=876320.0, ans=0.1 2024-09-20 11:11:56,967 INFO [train.py:1198] (0/2) Epoch 49, batch 1900, loss[loss=0.251, ctc_loss=0.1227, cr_loss=0.3772, attn_decoder_loss=0.2568, over 29722.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1074, cr_loss=0.3466, attn_decoder_loss=0.2369, over 5805151.49 frames. ], batch size: 89, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:12:06,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-20 11:12:14,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=876440.0, ans=0.2 2024-09-20 11:12:49,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=876520.0, ans=0.04949747468305833 2024-09-20 11:12:54,652 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=12.0 2024-09-20 11:13:01,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=876560.0, ans=0.125 2024-09-20 11:13:14,825 INFO [train.py:1198] (0/2) Epoch 49, batch 1950, loss[loss=0.2283, ctc_loss=0.108, cr_loss=0.3493, attn_decoder_loss=0.2339, over 29458.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1081, cr_loss=0.3488, attn_decoder_loss=0.2381, over 5819336.21 frames. ], batch size: 78, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:13:15,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2024-09-20 11:13:17,858 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.737e+01 9.338e+01 9.931e+01 1.218e+02, threshold=1.868e+02, percent-clipped=0.0 2024-09-20 11:13:49,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=876680.0, ans=0.125 2024-09-20 11:14:23,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten.whitening_limit, batch_count=876760.0, ans=15.0 2024-09-20 11:14:27,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=876760.0, ans=0.0 2024-09-20 11:14:30,284 INFO [train.py:1198] (0/2) Epoch 49, batch 2000, loss[loss=0.1971, ctc_loss=0.08514, cr_loss=0.297, attn_decoder_loss=0.2029, over 29357.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1082, cr_loss=0.3486, attn_decoder_loss=0.2382, over 5797484.48 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:14:33,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=876800.0, ans=0.125 2024-09-20 11:14:33,749 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:14:39,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=876800.0, ans=0.125 2024-09-20 11:14:59,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=876880.0, ans=0.0 2024-09-20 11:15:23,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876920.0, ans=0.1 2024-09-20 11:15:26,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=876920.0, ans=15.0 2024-09-20 11:15:47,893 INFO [train.py:1198] (0/2) Epoch 49, batch 2050, loss[loss=0.2014, ctc_loss=0.08559, cr_loss=0.3001, attn_decoder_loss=0.2076, over 29459.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1076, cr_loss=0.3469, attn_decoder_loss=0.2368, over 5789206.55 frames. ], batch size: 70, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:15:49,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=877000.0, ans=0.0 2024-09-20 11:15:50,918 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.585e+01 9.358e+01 1.005e+02 5.300e+02, threshold=1.872e+02, percent-clipped=1.0 2024-09-20 11:15:57,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=877000.0, ans=0.0 2024-09-20 11:16:06,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=877040.0, ans=0.2 2024-09-20 11:17:05,470 INFO [train.py:1198] (0/2) Epoch 49, batch 2100, loss[loss=0.2326, ctc_loss=0.11, cr_loss=0.3685, attn_decoder_loss=0.238, over 29755.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.3458, attn_decoder_loss=0.2364, over 5800751.88 frames. ], batch size: 81, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:17:12,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=12.0 2024-09-20 11:17:28,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=877240.0, ans=0.125 2024-09-20 11:17:44,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=877280.0, ans=0.125 2024-09-20 11:17:52,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=877320.0, ans=0.0 2024-09-20 11:18:02,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=877320.0, ans=0.0 2024-09-20 11:18:20,702 INFO [train.py:1198] (0/2) Epoch 49, batch 2150, loss[loss=0.2258, ctc_loss=0.1101, cr_loss=0.3423, attn_decoder_loss=0.231, over 29444.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1066, cr_loss=0.3445, attn_decoder_loss=0.2359, over 5815431.29 frames. ], batch size: 78, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:18:23,750 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.478e+01 8.920e+01 9.429e+01 1.261e+02, threshold=1.784e+02, percent-clipped=0.0 2024-09-20 11:18:49,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877480.0, ans=0.125 2024-09-20 11:18:51,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=22.5 2024-09-20 11:18:57,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=877480.0, ans=0.0 2024-09-20 11:19:15,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=877520.0, ans=0.07 2024-09-20 11:19:18,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=877520.0, ans=22.5 2024-09-20 11:19:21,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-20 11:19:38,774 INFO [train.py:1198] (0/2) Epoch 49, batch 2200, loss[loss=0.2311, ctc_loss=0.09464, cr_loss=0.306, attn_decoder_loss=0.2394, over 29611.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1065, cr_loss=0.3443, attn_decoder_loss=0.236, over 5812292.71 frames. ], batch size: 86, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:19:43,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877600.0, ans=0.125 2024-09-20 11:20:05,859 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.96 vs. limit=22.5 2024-09-20 11:20:44,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877760.0, ans=0.125 2024-09-20 11:20:56,320 INFO [train.py:1198] (0/2) Epoch 49, batch 2250, loss[loss=0.2346, ctc_loss=0.1046, cr_loss=0.3325, attn_decoder_loss=0.2417, over 29713.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1064, cr_loss=0.3439, attn_decoder_loss=0.2359, over 5812525.61 frames. ], batch size: 82, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:20:59,107 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.798e+01 9.192e+01 9.899e+01 1.510e+02, threshold=1.838e+02, percent-clipped=0.0 2024-09-20 11:21:02,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=877800.0, ans=0.1 2024-09-20 11:21:17,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=877840.0, ans=0.2 2024-09-20 11:21:29,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=877880.0, ans=0.125 2024-09-20 11:21:47,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=877920.0, ans=0.1 2024-09-20 11:21:52,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=877920.0, ans=0.0 2024-09-20 11:22:11,323 INFO [train.py:1198] (0/2) Epoch 49, batch 2300, loss[loss=0.2131, ctc_loss=0.09251, cr_loss=0.3124, attn_decoder_loss=0.2196, over 29321.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1055, cr_loss=0.3413, attn_decoder_loss=0.2348, over 5800015.27 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:22:27,268 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-20 11:23:06,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=878120.0, ans=0.0 2024-09-20 11:23:21,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=878160.0, ans=0.2 2024-09-20 11:23:29,168 INFO [train.py:1198] (0/2) Epoch 49, batch 2350, loss[loss=0.2421, ctc_loss=0.1153, cr_loss=0.3608, attn_decoder_loss=0.2481, over 29692.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1063, cr_loss=0.3433, attn_decoder_loss=0.2353, over 5805167.92 frames. ], batch size: 83, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:23:32,113 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.491e+01 9.028e+01 9.631e+01 3.047e+02, threshold=1.806e+02, percent-clipped=1.0 2024-09-20 11:23:33,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=878200.0, ans=0.125 2024-09-20 11:23:52,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=878240.0, ans=0.5 2024-09-20 11:24:17,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-20 11:24:19,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=878320.0, ans=0.125 2024-09-20 11:24:21,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=878320.0, ans=0.125 2024-09-20 11:24:35,654 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=12.0 2024-09-20 11:24:47,391 INFO [train.py:1198] (0/2) Epoch 49, batch 2400, loss[loss=0.2228, ctc_loss=0.1058, cr_loss=0.3549, attn_decoder_loss=0.2279, over 29526.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1063, cr_loss=0.3437, attn_decoder_loss=0.2358, over 5807592.66 frames. ], batch size: 76, lr: 2.24e-03, grad_scale: 32.0 2024-09-20 11:25:01,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=878440.0, ans=0.0 2024-09-20 11:25:04,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=878440.0, ans=0.1 2024-09-20 11:25:19,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=878480.0, ans=0.2 2024-09-20 11:25:19,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=878480.0, ans=0.2 2024-09-20 11:25:56,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2024-09-20 11:26:02,944 INFO [train.py:1198] (0/2) Epoch 49, batch 2450, loss[loss=0.247, ctc_loss=0.1244, cr_loss=0.3844, attn_decoder_loss=0.2521, over 29708.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1068, cr_loss=0.3453, attn_decoder_loss=0.2367, over 5784855.24 frames. ], batch size: 82, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:26:07,323 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.775e+01 9.341e+01 9.851e+01 1.765e+02, threshold=1.868e+02, percent-clipped=0.0 2024-09-20 11:26:16,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=878640.0, ans=0.125 2024-09-20 11:26:40,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=878680.0, ans=0.125 2024-09-20 11:26:43,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=878680.0, ans=0.0 2024-09-20 11:26:51,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=878720.0, ans=0.02 2024-09-20 11:26:55,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=878720.0, ans=0.2 2024-09-20 11:27:00,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=878720.0, ans=0.125 2024-09-20 11:27:19,937 INFO [train.py:1198] (0/2) Epoch 49, batch 2500, loss[loss=0.2371, ctc_loss=0.108, cr_loss=0.3397, attn_decoder_loss=0.2439, over 29616.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1068, cr_loss=0.3451, attn_decoder_loss=0.2367, over 5795340.54 frames. ], batch size: 86, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:27:32,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=878800.0, ans=0.1 2024-09-20 11:27:38,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=878840.0, ans=0.2 2024-09-20 11:27:46,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=878840.0, ans=0.125 2024-09-20 11:27:47,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-20 11:28:01,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=878880.0, ans=0.125 2024-09-20 11:28:15,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=878920.0, ans=0.05 2024-09-20 11:28:37,631 INFO [train.py:1198] (0/2) Epoch 49, batch 2550, loss[loss=0.2091, ctc_loss=0.09666, cr_loss=0.3156, attn_decoder_loss=0.2146, over 29304.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1075, cr_loss=0.3463, attn_decoder_loss=0.2369, over 5799431.54 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:28:42,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.758e+01 9.202e+01 9.559e+01 1.179e+02, threshold=1.840e+02, percent-clipped=0.0 2024-09-20 11:28:47,225 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.75 vs. limit=15.0 2024-09-20 11:28:49,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=879000.0, ans=0.025 2024-09-20 11:29:22,330 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-09-20 11:29:43,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=879160.0, ans=0.125 2024-09-20 11:29:50,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=879160.0, ans=0.125 2024-09-20 11:29:50,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=879160.0, ans=0.0 2024-09-20 11:29:53,788 INFO [train.py:1198] (0/2) Epoch 49, batch 2600, loss[loss=0.2215, ctc_loss=0.1011, cr_loss=0.3403, attn_decoder_loss=0.2273, over 29440.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1073, cr_loss=0.346, attn_decoder_loss=0.2371, over 5794316.93 frames. ], batch size: 78, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:30:08,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879240.0, ans=0.1 2024-09-20 11:30:11,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=879240.0, ans=0.125 2024-09-20 11:30:33,086 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=22.5 2024-09-20 11:31:03,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879360.0, ans=0.1 2024-09-20 11:31:10,893 INFO [train.py:1198] (0/2) Epoch 49, batch 2650, loss[loss=0.2432, ctc_loss=0.12, cr_loss=0.3718, attn_decoder_loss=0.2486, over 29248.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1073, cr_loss=0.346, attn_decoder_loss=0.2373, over 5800680.22 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:31:15,433 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.630e+01 9.011e+01 9.615e+01 2.139e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 11:31:15,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=879400.0, ans=0.0 2024-09-20 11:31:34,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=879440.0, ans=0.0 2024-09-20 11:31:36,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=12.0 2024-09-20 11:31:37,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=879440.0, ans=0.125 2024-09-20 11:31:43,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.40 vs. limit=15.0 2024-09-20 11:31:46,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=879480.0, ans=0.2 2024-09-20 11:31:57,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=879520.0, ans=0.0 2024-09-20 11:32:03,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=879520.0, ans=0.125 2024-09-20 11:32:06,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879520.0, ans=0.1 2024-09-20 11:32:24,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=879560.0, ans=0.125 2024-09-20 11:32:27,641 INFO [train.py:1198] (0/2) Epoch 49, batch 2700, loss[loss=0.2402, ctc_loss=0.1133, cr_loss=0.3585, attn_decoder_loss=0.2464, over 29530.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1077, cr_loss=0.3469, attn_decoder_loss=0.2377, over 5796569.28 frames. ], batch size: 87, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:32:46,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=22.5 2024-09-20 11:32:47,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=879640.0, ans=0.125 2024-09-20 11:32:56,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=879680.0, ans=0.1 2024-09-20 11:32:57,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2024-09-20 11:33:43,443 INFO [train.py:1198] (0/2) Epoch 49, batch 2750, loss[loss=0.2224, ctc_loss=0.1054, cr_loss=0.3294, attn_decoder_loss=0.2281, over 29534.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.107, cr_loss=0.3453, attn_decoder_loss=0.2365, over 5795588.88 frames. ], batch size: 75, lr: 2.24e-03, grad_scale: 8.0 2024-09-20 11:33:49,654 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.773e+01 9.217e+01 9.860e+01 5.240e+02, threshold=1.843e+02, percent-clipped=1.0 2024-09-20 11:33:51,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=879800.0, ans=10.0 2024-09-20 11:33:56,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=879800.0, ans=15.0 2024-09-20 11:34:01,306 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2024-09-20 11:34:12,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=879880.0, ans=0.2 2024-09-20 11:34:20,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=879880.0, ans=0.125 2024-09-20 11:34:36,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=879920.0, ans=0.125 2024-09-20 11:34:52,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=879960.0, ans=0.1 2024-09-20 11:34:57,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2024-09-20 11:35:00,239 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-220000.pt 2024-09-20 11:35:09,261 INFO [train.py:1198] (0/2) Epoch 49, batch 2800, loss[loss=0.2405, ctc_loss=0.1197, cr_loss=0.344, attn_decoder_loss=0.2463, over 20099.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1073, cr_loss=0.3457, attn_decoder_loss=0.2366, over 5775052.55 frames. ], batch size: 209, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:35:11,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-09-20 11:35:19,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=880000.0, ans=0.125 2024-09-20 11:35:19,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=880000.0, ans=0.1 2024-09-20 11:35:43,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=880080.0, ans=0.0 2024-09-20 11:35:44,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=880080.0, ans=0.035 2024-09-20 11:35:54,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-20 11:35:58,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=880120.0, ans=0.125 2024-09-20 11:36:26,662 INFO [train.py:1198] (0/2) Epoch 49, batch 2850, loss[loss=0.2212, ctc_loss=0.1009, cr_loss=0.3224, attn_decoder_loss=0.2273, over 29515.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1077, cr_loss=0.3463, attn_decoder_loss=0.2369, over 5761996.16 frames. ], batch size: 77, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:36:32,610 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.815e+01 9.180e+01 9.751e+01 2.075e+02, threshold=1.836e+02, percent-clipped=1.0 2024-09-20 11:36:57,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=880280.0, ans=0.125 2024-09-20 11:37:01,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=880280.0, ans=0.05 2024-09-20 11:37:21,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=880320.0, ans=0.0 2024-09-20 11:37:42,429 INFO [train.py:1198] (0/2) Epoch 49, batch 2900, loss[loss=0.2264, ctc_loss=0.1041, cr_loss=0.3442, attn_decoder_loss=0.2324, over 29411.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1082, cr_loss=0.3477, attn_decoder_loss=0.2378, over 5787764.78 frames. ], batch size: 79, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:37:44,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=880400.0, ans=0.2 2024-09-20 11:37:51,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880400.0, ans=0.125 2024-09-20 11:38:01,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=22.5 2024-09-20 11:38:31,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=880520.0, ans=0.5 2024-09-20 11:38:35,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=880520.0, ans=0.125 2024-09-20 11:38:35,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=880520.0, ans=0.125 2024-09-20 11:38:51,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=880560.0, ans=0.025 2024-09-20 11:39:02,491 INFO [train.py:1198] (0/2) Epoch 49, batch 2950, loss[loss=0.2189, ctc_loss=0.09776, cr_loss=0.3214, attn_decoder_loss=0.2253, over 29513.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1075, cr_loss=0.3463, attn_decoder_loss=0.2368, over 5782502.80 frames. ], batch size: 75, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:39:02,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880600.0, ans=0.1 2024-09-20 11:39:07,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=880600.0, ans=0.125 2024-09-20 11:39:07,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-09-20 11:39:08,383 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.608e+01 9.182e+01 9.827e+01 1.689e+02, threshold=1.836e+02, percent-clipped=0.0 2024-09-20 11:39:19,915 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.95 vs. limit=10.0 2024-09-20 11:39:27,080 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:39:31,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=880680.0, ans=0.0 2024-09-20 11:39:35,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=880680.0, ans=0.125 2024-09-20 11:39:48,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.57 vs. limit=15.0 2024-09-20 11:39:56,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2024-09-20 11:40:18,501 INFO [train.py:1198] (0/2) Epoch 49, batch 3000, loss[loss=0.2262, ctc_loss=0.1013, cr_loss=0.3401, attn_decoder_loss=0.2325, over 29743.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1073, cr_loss=0.346, attn_decoder_loss=0.2367, over 5782749.31 frames. ], batch size: 81, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:40:18,502 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 11:40:29,309 INFO [zipformer.py:1858] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8820, 3.2156, 3.1212, 3.2973, 3.3037, 3.3074, 2.6312, 3.4788], device='cuda:0') 2024-09-20 11:40:36,852 INFO [train.py:1230] (0/2) Epoch 49, validation: loss=0.2126, ctc_loss=0.03669, cr_loss=6.618e-15, attn_decoder_loss=0.2322, over 944034.00 frames. 2024-09-20 11:40:36,853 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 11:41:01,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=880840.0, ans=0.025 2024-09-20 11:41:01,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.52 vs. limit=15.0 2024-09-20 11:41:28,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=880920.0, ans=0.125 2024-09-20 11:41:41,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=880960.0, ans=0.125 2024-09-20 11:41:52,485 INFO [train.py:1198] (0/2) Epoch 49, batch 3050, loss[loss=0.218, ctc_loss=0.09656, cr_loss=0.3226, attn_decoder_loss=0.2244, over 29536.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1074, cr_loss=0.3461, attn_decoder_loss=0.2373, over 5777046.59 frames. ], batch size: 76, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:41:58,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.535e+01 9.070e+01 9.568e+01 1.381e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 11:42:03,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=881000.0, ans=0.125 2024-09-20 11:42:05,219 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:42:06,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=881040.0, ans=0.125 2024-09-20 11:42:13,447 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2024-09-20 11:42:19,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881040.0, ans=0.125 2024-09-20 11:42:22,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=881080.0, ans=0.125 2024-09-20 11:42:29,100 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:42:55,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=881160.0, ans=0.2 2024-09-20 11:43:11,822 INFO [train.py:1198] (0/2) Epoch 49, batch 3100, loss[loss=0.235, ctc_loss=0.109, cr_loss=0.3334, attn_decoder_loss=0.2416, over 29235.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1072, cr_loss=0.3458, attn_decoder_loss=0.2368, over 5777249.57 frames. ], batch size: 100, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:43:30,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-09-20 11:43:32,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-09-20 11:43:42,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=881280.0, ans=0.125 2024-09-20 11:43:45,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=881280.0, ans=0.125 2024-09-20 11:44:12,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=881360.0, ans=0.0 2024-09-20 11:44:16,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-20 11:44:20,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=881360.0, ans=0.125 2024-09-20 11:44:21,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=881360.0, ans=0.125 2024-09-20 11:44:22,590 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2024-09-20 11:44:23,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=881360.0, ans=0.2 2024-09-20 11:44:27,457 INFO [train.py:1198] (0/2) Epoch 49, batch 3150, loss[loss=0.2402, ctc_loss=0.1037, cr_loss=0.3306, attn_decoder_loss=0.248, over 28808.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1072, cr_loss=0.3455, attn_decoder_loss=0.2368, over 5782422.22 frames. ], batch size: 104, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:44:33,449 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.538e+01 9.268e+01 9.767e+01 2.524e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-20 11:44:57,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=881480.0, ans=0.125 2024-09-20 11:45:06,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=881480.0, ans=0.0 2024-09-20 11:45:11,291 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=881520.0, ans=0.0 2024-09-20 11:45:15,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=881520.0, ans=0.125 2024-09-20 11:45:35,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881560.0, ans=0.1 2024-09-20 11:45:42,745 INFO [train.py:1198] (0/2) Epoch 49, batch 3200, loss[loss=0.2322, ctc_loss=0.1077, cr_loss=0.352, attn_decoder_loss=0.2382, over 29409.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.3452, attn_decoder_loss=0.2364, over 5792307.86 frames. ], batch size: 79, lr: 2.24e-03, grad_scale: 32.0 2024-09-20 11:45:51,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-20 11:45:58,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=881640.0, ans=0.125 2024-09-20 11:46:04,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=881640.0, ans=0.1 2024-09-20 11:46:31,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=881720.0, ans=0.0 2024-09-20 11:46:41,743 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=881720.0, ans=0.0 2024-09-20 11:46:55,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=881760.0, ans=0.0 2024-09-20 11:47:02,660 INFO [train.py:1198] (0/2) Epoch 49, batch 3250, loss[loss=0.2434, ctc_loss=0.1222, cr_loss=0.3899, attn_decoder_loss=0.2483, over 29698.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1074, cr_loss=0.346, attn_decoder_loss=0.2371, over 5799130.34 frames. ], batch size: 84, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:47:10,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.742e+01 9.266e+01 9.794e+01 1.259e+02, threshold=1.853e+02, percent-clipped=0.0 2024-09-20 11:47:34,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=881880.0, ans=0.2 2024-09-20 11:47:54,325 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.22 vs. limit=15.0 2024-09-20 11:48:17,515 INFO [train.py:1198] (0/2) Epoch 49, batch 3300, loss[loss=0.232, ctc_loss=0.09773, cr_loss=0.3106, attn_decoder_loss=0.24, over 28143.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1066, cr_loss=0.3439, attn_decoder_loss=0.2359, over 5798176.13 frames. ], batch size: 111, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:48:32,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-09-20 11:48:57,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=882080.0, ans=0.0 2024-09-20 11:49:06,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=882120.0, ans=0.1 2024-09-20 11:49:32,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=3.94 vs. limit=12.0 2024-09-20 11:49:32,907 INFO [train.py:1198] (0/2) Epoch 49, batch 3350, loss[loss=0.2473, ctc_loss=0.1158, cr_loss=0.3741, attn_decoder_loss=0.2536, over 28881.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.3448, attn_decoder_loss=0.2365, over 5775708.57 frames. ], batch size: 104, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:49:40,361 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.704e+01 9.230e+01 9.837e+01 1.570e+02, threshold=1.846e+02, percent-clipped=0.0 2024-09-20 11:50:05,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.00 vs. limit=10.0 2024-09-20 11:50:11,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2024-09-20 11:50:25,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=882320.0, ans=0.125 2024-09-20 11:50:40,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882360.0, ans=0.1 2024-09-20 11:50:42,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=882360.0, ans=0.125 2024-09-20 11:50:52,872 INFO [train.py:1198] (0/2) Epoch 49, batch 3400, loss[loss=0.1955, ctc_loss=0.08902, cr_loss=0.3084, attn_decoder_loss=0.2005, over 29352.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1074, cr_loss=0.3453, attn_decoder_loss=0.2364, over 5767325.53 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:51:05,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-09-20 11:51:30,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=882480.0, ans=0.125 2024-09-20 11:51:36,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.59 vs. limit=22.5 2024-09-20 11:52:05,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=882560.0, ans=0.0 2024-09-20 11:52:08,161 INFO [train.py:1198] (0/2) Epoch 49, batch 3450, loss[loss=0.236, ctc_loss=0.1003, cr_loss=0.3321, attn_decoder_loss=0.2437, over 28316.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1074, cr_loss=0.3456, attn_decoder_loss=0.2369, over 5774800.56 frames. ], batch size: 111, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:52:15,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.48 vs. limit=15.0 2024-09-20 11:52:15,709 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.667e+01 9.196e+01 9.628e+01 1.869e+02, threshold=1.839e+02, percent-clipped=1.0 2024-09-20 11:52:23,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=882640.0, ans=0.125 2024-09-20 11:52:23,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=882640.0, ans=0.125 2024-09-20 11:52:31,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.93 vs. limit=10.0 2024-09-20 11:52:54,852 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=882720.0, ans=0.125 2024-09-20 11:53:01,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=882720.0, ans=0.125 2024-09-20 11:53:11,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=882760.0, ans=0.025 2024-09-20 11:53:22,836 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-20 11:53:23,233 INFO [train.py:1198] (0/2) Epoch 49, batch 3500, loss[loss=0.2092, ctc_loss=0.0916, cr_loss=0.3212, attn_decoder_loss=0.2151, over 29348.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1075, cr_loss=0.3459, attn_decoder_loss=0.2364, over 5776004.65 frames. ], batch size: 71, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:53:43,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=882840.0, ans=10.0 2024-09-20 11:53:57,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=882880.0, ans=0.125 2024-09-20 11:54:05,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=882880.0, ans=0.09899494936611666 2024-09-20 11:54:39,746 INFO [train.py:1198] (0/2) Epoch 49, batch 3550, loss[loss=0.2486, ctc_loss=0.1219, cr_loss=0.3952, attn_decoder_loss=0.2539, over 29702.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1072, cr_loss=0.3455, attn_decoder_loss=0.2364, over 5781917.16 frames. ], batch size: 89, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:54:41,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=883000.0, ans=0.0 2024-09-20 11:54:42,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=883000.0, ans=0.125 2024-09-20 11:54:47,103 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.606e+01 9.040e+01 9.689e+01 1.934e+02, threshold=1.808e+02, percent-clipped=1.0 2024-09-20 11:55:16,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=883080.0, ans=0.125 2024-09-20 11:55:18,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=883080.0, ans=0.025 2024-09-20 11:55:21,270 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-09-20 11:55:50,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=883160.0, ans=0.125 2024-09-20 11:55:51,500 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:55:53,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=883160.0, ans=0.125 2024-09-20 11:55:56,049 INFO [train.py:1198] (0/2) Epoch 49, batch 3600, loss[loss=0.2239, ctc_loss=0.09737, cr_loss=0.3262, attn_decoder_loss=0.2307, over 29494.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1072, cr_loss=0.3455, attn_decoder_loss=0.2366, over 5792115.95 frames. ], batch size: 77, lr: 2.24e-03, grad_scale: 32.0 2024-09-20 11:55:59,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=883200.0, ans=0.0 2024-09-20 11:55:59,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=883200.0, ans=0.125 2024-09-20 11:56:26,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883280.0, ans=0.1 2024-09-20 11:56:32,541 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.81 vs. limit=22.5 2024-09-20 11:56:52,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=883320.0, ans=0.1 2024-09-20 11:56:52,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=883320.0, ans=0.125 2024-09-20 11:57:03,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=883360.0, ans=0.0 2024-09-20 11:57:03,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.24 vs. limit=22.5 2024-09-20 11:57:07,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=883360.0, ans=0.0 2024-09-20 11:57:10,213 INFO [train.py:1198] (0/2) Epoch 49, batch 3650, loss[loss=0.2399, ctc_loss=0.1196, cr_loss=0.3831, attn_decoder_loss=0.2448, over 29492.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1065, cr_loss=0.3443, attn_decoder_loss=0.2359, over 5794264.05 frames. ], batch size: 90, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:57:13,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883400.0, ans=0.1 2024-09-20 11:57:17,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=883400.0, ans=0.025 2024-09-20 11:57:19,160 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.791e+01 8.757e+01 9.183e+01 9.714e+01 2.760e+02, threshold=1.837e+02, percent-clipped=2.0 2024-09-20 11:57:23,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=883440.0, ans=0.125 2024-09-20 11:57:38,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=883480.0, ans=0.125 2024-09-20 11:57:49,168 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:57:56,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=883520.0, ans=0.0 2024-09-20 11:58:00,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.64 vs. limit=10.0 2024-09-20 11:58:02,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883520.0, ans=0.1 2024-09-20 11:58:24,147 INFO [train.py:1198] (0/2) Epoch 49, batch 3700, loss[loss=0.2412, ctc_loss=0.1088, cr_loss=0.358, attn_decoder_loss=0.2479, over 29691.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1067, cr_loss=0.3446, attn_decoder_loss=0.2361, over 5804307.68 frames. ], batch size: 84, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:58:31,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=883600.0, ans=0.2 2024-09-20 11:58:44,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-20 11:58:55,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=883680.0, ans=0.125 2024-09-20 11:59:35,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=883760.0, ans=0.125 2024-09-20 11:59:38,359 INFO [train.py:1198] (0/2) Epoch 49, batch 3750, loss[loss=0.206, ctc_loss=0.09756, cr_loss=0.3345, attn_decoder_loss=0.2106, over 29311.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1065, cr_loss=0.3443, attn_decoder_loss=0.2359, over 5808094.56 frames. ], batch size: 67, lr: 2.24e-03, grad_scale: 16.0 2024-09-20 11:59:47,374 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.555e+01 9.159e+01 9.903e+01 1.587e+02, threshold=1.832e+02, percent-clipped=0.0 2024-09-20 12:00:21,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=883920.0, ans=0.125 2024-09-20 12:00:40,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.40 vs. limit=22.5 2024-09-20 12:00:42,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=883960.0, ans=0.125 2024-09-20 12:00:50,896 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2024-09-20 12:00:54,870 INFO [train.py:1198] (0/2) Epoch 49, batch 3800, loss[loss=0.2405, ctc_loss=0.1141, cr_loss=0.3476, attn_decoder_loss=0.2469, over 29637.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1064, cr_loss=0.3436, attn_decoder_loss=0.2356, over 5797707.13 frames. ], batch size: 86, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:00:55,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=884000.0, ans=0.2 2024-09-20 12:01:09,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-20 12:01:36,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=884080.0, ans=0.125 2024-09-20 12:01:48,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=884120.0, ans=0.0 2024-09-20 12:01:53,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=884120.0, ans=0.0 2024-09-20 12:01:58,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.37 vs. limit=22.5 2024-09-20 12:02:06,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=884160.0, ans=0.125 2024-09-20 12:02:10,425 INFO [train.py:1198] (0/2) Epoch 49, batch 3850, loss[loss=0.2426, ctc_loss=0.1167, cr_loss=0.366, attn_decoder_loss=0.2485, over 29260.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1062, cr_loss=0.3432, attn_decoder_loss=0.2356, over 5811759.11 frames. ], batch size: 100, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:02:18,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=884200.0, ans=0.125 2024-09-20 12:02:18,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=884200.0, ans=0.0 2024-09-20 12:02:19,394 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.680e+01 9.125e+01 9.699e+01 1.289e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-20 12:02:24,518 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.96 vs. limit=22.5 2024-09-20 12:02:26,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.37 vs. limit=15.0 2024-09-20 12:02:38,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=884280.0, ans=0.125 2024-09-20 12:02:47,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=884280.0, ans=0.0 2024-09-20 12:02:49,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.25 vs. limit=12.0 2024-09-20 12:03:24,867 INFO [train.py:1198] (0/2) Epoch 49, batch 3900, loss[loss=0.2422, ctc_loss=0.1111, cr_loss=0.3591, attn_decoder_loss=0.2488, over 29634.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1067, cr_loss=0.3443, attn_decoder_loss=0.2361, over 5815944.97 frames. ], batch size: 86, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:03:26,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=884400.0, ans=0.2 2024-09-20 12:03:32,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=884400.0, ans=0.125 2024-09-20 12:03:32,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=884400.0, ans=0.07 2024-09-20 12:04:14,100 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:04:23,399 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.30 vs. limit=22.5 2024-09-20 12:04:30,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=884560.0, ans=10.0 2024-09-20 12:04:38,693 INFO [train.py:1198] (0/2) Epoch 49, batch 3950, loss[loss=0.2481, ctc_loss=0.1234, cr_loss=0.3867, attn_decoder_loss=0.2534, over 29456.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1071, cr_loss=0.3449, attn_decoder_loss=0.2363, over 5835747.44 frames. ], batch size: 97, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:04:47,484 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.739e+01 9.137e+01 9.584e+01 1.763e+02, threshold=1.827e+02, percent-clipped=0.0 2024-09-20 12:04:47,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=884600.0, ans=0.1 2024-09-20 12:04:53,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=884640.0, ans=0.0 2024-09-20 12:04:53,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=884640.0, ans=0.125 2024-09-20 12:05:25,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=884720.0, ans=0.09899494936611666 2024-09-20 12:05:31,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=884720.0, ans=0.125 2024-09-20 12:05:33,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=884720.0, ans=0.0 2024-09-20 12:05:33,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2024-09-20 12:05:36,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=884760.0, ans=0.125 2024-09-20 12:05:44,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=884760.0, ans=0.125 2024-09-20 12:05:53,639 INFO [train.py:1198] (0/2) Epoch 49, batch 4000, loss[loss=0.2117, ctc_loss=0.09292, cr_loss=0.3133, attn_decoder_loss=0.2179, over 29508.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.3447, attn_decoder_loss=0.2365, over 5813399.78 frames. ], batch size: 74, lr: 2.23e-03, grad_scale: 32.0 2024-09-20 12:05:58,788 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.86 vs. limit=15.0 2024-09-20 12:06:01,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=884800.0, ans=0.02 2024-09-20 12:06:06,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-20 12:06:17,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=884840.0, ans=0.2 2024-09-20 12:06:23,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=884880.0, ans=0.125 2024-09-20 12:06:48,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=884920.0, ans=0.2 2024-09-20 12:07:08,775 INFO [train.py:1198] (0/2) Epoch 49, batch 4050, loss[loss=0.2552, ctc_loss=0.1355, cr_loss=0.358, attn_decoder_loss=0.2605, over 20012.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1069, cr_loss=0.3437, attn_decoder_loss=0.236, over 5796404.90 frames. ], batch size: 210, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:07:18,926 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.931e+01 9.287e+01 9.798e+01 1.744e+02, threshold=1.857e+02, percent-clipped=0.0 2024-09-20 12:07:20,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=885000.0, ans=0.125 2024-09-20 12:07:20,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-09-20 12:07:36,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=885080.0, ans=0.1 2024-09-20 12:07:52,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=885120.0, ans=0.2 2024-09-20 12:08:04,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885120.0, ans=0.1 2024-09-20 12:08:07,837 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:08:22,202 INFO [train.py:1198] (0/2) Epoch 49, batch 4100, loss[loss=0.2505, ctc_loss=0.1302, cr_loss=0.4, attn_decoder_loss=0.255, over 29497.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1071, cr_loss=0.3444, attn_decoder_loss=0.2364, over 5791806.99 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:08:50,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=885280.0, ans=0.0 2024-09-20 12:09:12,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=885320.0, ans=0.2 2024-09-20 12:09:35,737 INFO [train.py:1198] (0/2) Epoch 49, batch 4150, loss[loss=0.2295, ctc_loss=0.1096, cr_loss=0.3377, attn_decoder_loss=0.2354, over 29495.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.107, cr_loss=0.3448, attn_decoder_loss=0.2361, over 5797329.50 frames. ], batch size: 77, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:09:46,203 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.769e+01 9.382e+01 9.981e+01 1.562e+02, threshold=1.876e+02, percent-clipped=0.0 2024-09-20 12:09:56,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=885440.0, ans=0.125 2024-09-20 12:10:49,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=885560.0, ans=0.0 2024-09-20 12:10:52,077 INFO [train.py:1198] (0/2) Epoch 49, batch 4200, loss[loss=0.2471, ctc_loss=0.1191, cr_loss=0.3607, attn_decoder_loss=0.2533, over 29501.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.107, cr_loss=0.3449, attn_decoder_loss=0.2363, over 5799002.89 frames. ], batch size: 90, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:11:16,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=885640.0, ans=0.125 2024-09-20 12:11:37,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-09-20 12:12:01,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-09-20 12:12:05,523 INFO [train.py:1198] (0/2) Epoch 49, batch 4250, loss[loss=0.213, ctc_loss=0.09217, cr_loss=0.3179, attn_decoder_loss=0.2194, over 29517.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.107, cr_loss=0.3447, attn_decoder_loss=0.2364, over 5804671.14 frames. ], batch size: 74, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:12:15,831 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.756e+01 9.233e+01 9.751e+01 2.001e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-20 12:12:26,911 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.43 vs. limit=22.5 2024-09-20 12:12:40,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885880.0, ans=0.1 2024-09-20 12:12:56,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=15.0 2024-09-20 12:13:11,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=885960.0, ans=0.125 2024-09-20 12:13:19,048 INFO [train.py:1198] (0/2) Epoch 49, batch 4300, loss[loss=0.2377, ctc_loss=0.1083, cr_loss=0.3455, attn_decoder_loss=0.2444, over 29525.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1068, cr_loss=0.3441, attn_decoder_loss=0.2365, over 5794049.94 frames. ], batch size: 87, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:13:36,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=886040.0, ans=0.125 2024-09-20 12:13:37,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.10 vs. limit=15.0 2024-09-20 12:13:47,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=886040.0, ans=0.125 2024-09-20 12:13:47,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.19 vs. limit=22.5 2024-09-20 12:14:31,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=22.5 2024-09-20 12:14:34,869 INFO [train.py:1198] (0/2) Epoch 49, batch 4350, loss[loss=0.2496, ctc_loss=0.1269, cr_loss=0.3883, attn_decoder_loss=0.2546, over 29538.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1091, cr_loss=0.3494, attn_decoder_loss=0.2395, over 5797077.62 frames. ], batch size: 97, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:14:38,236 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:14:45,117 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.096e+01 9.498e+01 1.000e+02 1.959e+02, threshold=1.900e+02, percent-clipped=0.0 2024-09-20 12:14:45,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=886200.0, ans=0.125 2024-09-20 12:14:51,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=886240.0, ans=0.125 2024-09-20 12:15:05,324 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=22.5 2024-09-20 12:15:08,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=886280.0, ans=0.015 2024-09-20 12:15:11,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=886280.0, ans=0.0 2024-09-20 12:15:24,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=886320.0, ans=0.0 2024-09-20 12:15:24,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=886320.0, ans=0.125 2024-09-20 12:15:42,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=886360.0, ans=0.125 2024-09-20 12:15:48,110 INFO [train.py:1198] (0/2) Epoch 49, batch 4400, loss[loss=0.2426, ctc_loss=0.1117, cr_loss=0.3585, attn_decoder_loss=0.2492, over 27385.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1104, cr_loss=0.3521, attn_decoder_loss=0.2414, over 5768956.01 frames. ], batch size: 124, lr: 2.23e-03, grad_scale: 32.0 2024-09-20 12:16:12,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=886440.0, ans=0.0 2024-09-20 12:16:18,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886480.0, ans=0.1 2024-09-20 12:16:25,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=886480.0, ans=0.125 2024-09-20 12:16:39,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=886520.0, ans=0.035 2024-09-20 12:16:44,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-09-20 12:16:51,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=886560.0, ans=0.125 2024-09-20 12:17:00,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=886560.0, ans=0.0 2024-09-20 12:17:03,039 INFO [train.py:1198] (0/2) Epoch 49, batch 4450, loss[loss=0.2504, ctc_loss=0.13, cr_loss=0.3695, attn_decoder_loss=0.2556, over 19549.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1134, cr_loss=0.3566, attn_decoder_loss=0.2433, over 5581764.67 frames. ], batch size: 209, lr: 2.23e-03, grad_scale: 16.0 2024-09-20 12:17:04,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886600.0, ans=0.1 2024-09-20 12:17:16,287 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.166e+01 9.203e+01 9.654e+01 1.067e+02 3.742e+02, threshold=1.931e+02, percent-clipped=2.0 2024-09-20 12:17:26,278 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.37 vs. limit=22.5 2024-09-20 12:17:29,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-20 12:17:48,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=886720.0, ans=0.125 2024-09-20 12:18:08,343 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=17.02 vs. limit=15.0 2024-09-20 12:18:10,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=886760.0, ans=0.125 2024-09-20 12:18:18,101 INFO [train.py:1198] (0/2) Epoch 49, batch 4500, loss[loss=0.2459, ctc_loss=0.1381, cr_loss=0.3762, attn_decoder_loss=0.2496, over 20530.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1162, cr_loss=0.3597, attn_decoder_loss=0.245, over 5243866.58 frames. ], batch size: 209, lr: 2.23e-03, grad_scale: 8.0 2024-09-20 12:18:36,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=886840.0, ans=0.125 2024-09-20 12:18:54,879 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-49.pt 2024-09-20 12:19:45,526 INFO [train.py:1198] (0/2) Epoch 50, batch 0, loss[loss=0.2136, ctc_loss=0.09389, cr_loss=0.3176, attn_decoder_loss=0.2198, over 29592.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.09389, cr_loss=0.3176, attn_decoder_loss=0.2198, over 29592.00 frames. ], batch size: 73, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:19:45,526 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 12:20:03,817 INFO [train.py:1230] (0/2) Epoch 50, validation: loss=0.2133, ctc_loss=0.03558, cr_loss=6.519e-15, attn_decoder_loss=0.2331, over 944034.00 frames. 2024-09-20 12:20:03,818 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 12:20:15,075 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=12.0 2024-09-20 12:20:40,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=886980.0, ans=0.125 2024-09-20 12:20:50,785 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-20 12:20:51,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=887020.0, ans=0.125 2024-09-20 12:20:57,069 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.187e+01 1.009e+02 1.098e+02 1.200e+02 1.487e+02, threshold=2.197e+02, percent-clipped=0.0 2024-09-20 12:21:21,355 INFO [train.py:1198] (0/2) Epoch 50, batch 50, loss[loss=0.1985, ctc_loss=0.08559, cr_loss=0.291, attn_decoder_loss=0.2045, over 29413.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1092, cr_loss=0.3487, attn_decoder_loss=0.2376, over 1267418.53 frames. ], batch size: 70, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:21:29,943 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-20 12:21:51,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=887180.0, ans=0.05 2024-09-20 12:21:57,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2024-09-20 12:22:02,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=887180.0, ans=0.125 2024-09-20 12:22:02,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=887180.0, ans=0.125 2024-09-20 12:22:04,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=887180.0, ans=0.0 2024-09-20 12:22:13,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=887220.0, ans=0.125 2024-09-20 12:22:18,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=12.0 2024-09-20 12:22:20,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=887220.0, ans=15.0 2024-09-20 12:22:22,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=887260.0, ans=0.125 2024-09-20 12:22:23,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=887260.0, ans=0.125 2024-09-20 12:22:33,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=887260.0, ans=0.125 2024-09-20 12:22:37,261 INFO [train.py:1198] (0/2) Epoch 50, batch 100, loss[loss=0.2149, ctc_loss=0.0939, cr_loss=0.3158, attn_decoder_loss=0.2213, over 29543.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1099, cr_loss=0.3513, attn_decoder_loss=0.2392, over 2253526.90 frames. ], batch size: 76, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:22:37,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=887300.0, ans=0.125 2024-09-20 12:23:29,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.808e+01 9.273e+01 9.833e+01 1.804e+02, threshold=1.855e+02, percent-clipped=0.0 2024-09-20 12:23:33,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887420.0, ans=0.1 2024-09-20 12:23:36,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=887420.0, ans=0.0 2024-09-20 12:23:53,825 INFO [train.py:1198] (0/2) Epoch 50, batch 150, loss[loss=0.2066, ctc_loss=0.08658, cr_loss=0.2946, attn_decoder_loss=0.2134, over 29439.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1074, cr_loss=0.3465, attn_decoder_loss=0.2368, over 3048310.59 frames. ], batch size: 70, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:24:07,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=887540.0, ans=0.125 2024-09-20 12:24:29,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=887580.0, ans=0.025 2024-09-20 12:24:41,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=887620.0, ans=0.0 2024-09-20 12:24:42,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887620.0, ans=0.1 2024-09-20 12:24:44,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=887620.0, ans=0.125 2024-09-20 12:24:50,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=887620.0, ans=0.0 2024-09-20 12:25:02,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=887660.0, ans=0.125 2024-09-20 12:25:11,629 INFO [train.py:1198] (0/2) Epoch 50, batch 200, loss[loss=0.2393, ctc_loss=0.1096, cr_loss=0.3616, attn_decoder_loss=0.2457, over 27358.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1069, cr_loss=0.3454, attn_decoder_loss=0.2359, over 3660121.18 frames. ], batch size: 124, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:25:19,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=887700.0, ans=0.125 2024-09-20 12:25:19,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=887700.0, ans=10.0 2024-09-20 12:25:25,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=887740.0, ans=0.125 2024-09-20 12:25:27,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887740.0, ans=0.1 2024-09-20 12:25:27,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887740.0, ans=0.125 2024-09-20 12:25:35,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.08 vs. limit=22.5 2024-09-20 12:26:04,372 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.480e+01 9.009e+01 9.638e+01 2.120e+02, threshold=1.802e+02, percent-clipped=1.0 2024-09-20 12:26:26,848 INFO [train.py:1198] (0/2) Epoch 50, batch 250, loss[loss=0.2331, ctc_loss=0.1026, cr_loss=0.3333, attn_decoder_loss=0.2402, over 29195.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1066, cr_loss=0.3445, attn_decoder_loss=0.2357, over 4141757.61 frames. ], batch size: 100, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:26:34,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=887900.0, ans=0.125 2024-09-20 12:26:50,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=887940.0, ans=0.0 2024-09-20 12:26:50,676 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:27:32,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2024-09-20 12:27:43,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=888100.0, ans=0.0 2024-09-20 12:27:45,106 INFO [train.py:1198] (0/2) Epoch 50, batch 300, loss[loss=0.2431, ctc_loss=0.1135, cr_loss=0.3725, attn_decoder_loss=0.2492, over 29549.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1063, cr_loss=0.3441, attn_decoder_loss=0.2357, over 4509386.68 frames. ], batch size: 92, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:28:25,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=888180.0, ans=0.125 2024-09-20 12:28:40,139 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.884e+01 9.251e+01 9.818e+01 2.212e+02, threshold=1.850e+02, percent-clipped=1.0 2024-09-20 12:28:40,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=888220.0, ans=0.125 2024-09-20 12:28:40,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=888220.0, ans=0.07 2024-09-20 12:29:02,961 INFO [train.py:1198] (0/2) Epoch 50, batch 350, loss[loss=0.2049, ctc_loss=0.08557, cr_loss=0.2965, attn_decoder_loss=0.2116, over 29328.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1067, cr_loss=0.3449, attn_decoder_loss=0.2362, over 4794483.32 frames. ], batch size: 71, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:29:58,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=888420.0, ans=0.125 2024-09-20 12:30:17,910 INFO [train.py:1198] (0/2) Epoch 50, batch 400, loss[loss=0.2314, ctc_loss=0.1067, cr_loss=0.3577, attn_decoder_loss=0.2373, over 29712.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1061, cr_loss=0.3437, attn_decoder_loss=0.2358, over 5025051.33 frames. ], batch size: 82, lr: 2.21e-03, grad_scale: 16.0 2024-09-20 12:30:21,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=888500.0, ans=0.1 2024-09-20 12:30:35,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=888540.0, ans=0.125 2024-09-20 12:30:36,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=888540.0, ans=0.0 2024-09-20 12:30:43,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=888540.0, ans=0.125 2024-09-20 12:31:04,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=888620.0, ans=0.1 2024-09-20 12:31:11,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=888620.0, ans=0.0 2024-09-20 12:31:14,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.622e+01 9.023e+01 9.604e+01 1.265e+02, threshold=1.805e+02, percent-clipped=0.0 2024-09-20 12:31:35,882 INFO [train.py:1198] (0/2) Epoch 50, batch 450, loss[loss=0.2442, ctc_loss=0.1196, cr_loss=0.3621, attn_decoder_loss=0.25, over 29684.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1067, cr_loss=0.3444, attn_decoder_loss=0.2361, over 5188054.26 frames. ], batch size: 83, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:31:36,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888700.0, ans=0.1 2024-09-20 12:32:15,177 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=888780.0, ans=0.0 2024-09-20 12:32:31,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=888820.0, ans=0.125 2024-09-20 12:32:33,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=888820.0, ans=0.125 2024-09-20 12:32:45,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=888860.0, ans=0.2 2024-09-20 12:32:51,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888860.0, ans=0.1 2024-09-20 12:32:54,119 INFO [train.py:1198] (0/2) Epoch 50, batch 500, loss[loss=0.2391, ctc_loss=0.1108, cr_loss=0.3576, attn_decoder_loss=0.2454, over 29454.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1064, cr_loss=0.3434, attn_decoder_loss=0.2354, over 5330500.17 frames. ], batch size: 94, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:32:55,917 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=888900.0, ans=0.0 2024-09-20 12:33:11,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-09-20 12:33:21,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=888940.0, ans=0.0 2024-09-20 12:33:36,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=888980.0, ans=0.125 2024-09-20 12:33:48,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.714e+01 9.199e+01 9.608e+01 6.151e+02, threshold=1.840e+02, percent-clipped=1.0 2024-09-20 12:34:09,225 INFO [train.py:1198] (0/2) Epoch 50, batch 550, loss[loss=0.2401, ctc_loss=0.1088, cr_loss=0.3283, attn_decoder_loss=0.2474, over 28801.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1064, cr_loss=0.3437, attn_decoder_loss=0.2357, over 5423083.85 frames. ], batch size: 104, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:34:17,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=889100.0, ans=0.125 2024-09-20 12:34:20,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=889100.0, ans=0.0 2024-09-20 12:34:42,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=889180.0, ans=0.0 2024-09-20 12:34:43,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=889180.0, ans=0.125 2024-09-20 12:34:52,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=889180.0, ans=0.2 2024-09-20 12:35:19,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=889260.0, ans=0.125 2024-09-20 12:35:27,190 INFO [train.py:1198] (0/2) Epoch 50, batch 600, loss[loss=0.2503, ctc_loss=0.1228, cr_loss=0.3779, attn_decoder_loss=0.2561, over 29242.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1066, cr_loss=0.3447, attn_decoder_loss=0.236, over 5508142.50 frames. ], batch size: 100, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:35:39,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=889300.0, ans=0.07 2024-09-20 12:35:48,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=889340.0, ans=0.025 2024-09-20 12:36:10,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=889380.0, ans=0.2 2024-09-20 12:36:23,308 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.988e+01 8.691e+01 9.098e+01 9.563e+01 1.951e+02, threshold=1.820e+02, percent-clipped=1.0 2024-09-20 12:36:23,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=889420.0, ans=0.2 2024-09-20 12:36:25,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=889420.0, ans=0.125 2024-09-20 12:36:44,428 INFO [train.py:1198] (0/2) Epoch 50, batch 650, loss[loss=0.2289, ctc_loss=0.1022, cr_loss=0.335, attn_decoder_loss=0.2356, over 29762.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.106, cr_loss=0.3433, attn_decoder_loss=0.2353, over 5586809.37 frames. ], batch size: 81, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:37:02,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=889540.0, ans=0.2 2024-09-20 12:37:59,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=889700.0, ans=0.1 2024-09-20 12:38:00,257 INFO [train.py:1198] (0/2) Epoch 50, batch 700, loss[loss=0.2291, ctc_loss=0.1129, cr_loss=0.3417, attn_decoder_loss=0.2345, over 29542.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1063, cr_loss=0.3439, attn_decoder_loss=0.2359, over 5636366.11 frames. ], batch size: 76, lr: 2.21e-03, grad_scale: 8.0 2024-09-20 12:38:09,562 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:38:28,123 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.06 vs. limit=15.0 2024-09-20 12:38:43,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2024-09-20 12:38:54,486 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.77 vs. limit=15.0 2024-09-20 12:38:56,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=889820.0, ans=22.5 2024-09-20 12:38:56,770 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.634e+01 9.067e+01 9.623e+01 1.303e+02, threshold=1.813e+02, percent-clipped=0.0 2024-09-20 12:38:58,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=889820.0, ans=0.125 2024-09-20 12:39:17,925 INFO [train.py:1198] (0/2) Epoch 50, batch 750, loss[loss=0.2302, ctc_loss=0.1107, cr_loss=0.362, attn_decoder_loss=0.2354, over 29722.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1061, cr_loss=0.3437, attn_decoder_loss=0.2355, over 5674027.81 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:39:19,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=889900.0, ans=0.0 2024-09-20 12:39:30,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=889900.0, ans=0.025 2024-09-20 12:39:31,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=889940.0, ans=0.125 2024-09-20 12:39:33,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=889940.0, ans=0.125 2024-09-20 12:40:07,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=890020.0, ans=0.015 2024-09-20 12:40:16,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=890020.0, ans=0.0 2024-09-20 12:40:23,702 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:40:29,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=890060.0, ans=0.1 2024-09-20 12:40:35,475 INFO [train.py:1198] (0/2) Epoch 50, batch 800, loss[loss=0.2116, ctc_loss=0.09415, cr_loss=0.3078, attn_decoder_loss=0.2178, over 29640.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1065, cr_loss=0.3444, attn_decoder_loss=0.2359, over 5706395.14 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:40:43,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=890100.0, ans=0.0 2024-09-20 12:41:13,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=890180.0, ans=0.125 2024-09-20 12:41:14,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=890180.0, ans=0.0 2024-09-20 12:41:20,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=890220.0, ans=0.2 2024-09-20 12:41:23,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-09-20 12:41:29,705 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.785e+01 9.269e+01 9.766e+01 2.898e+02, threshold=1.854e+02, percent-clipped=1.0 2024-09-20 12:41:46,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=890260.0, ans=0.125 2024-09-20 12:41:50,577 INFO [train.py:1198] (0/2) Epoch 50, batch 850, loss[loss=0.2388, ctc_loss=0.108, cr_loss=0.3564, attn_decoder_loss=0.2454, over 29749.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1061, cr_loss=0.3433, attn_decoder_loss=0.2354, over 5734323.32 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:41:58,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=890300.0, ans=0.1 2024-09-20 12:42:14,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=890340.0, ans=0.5 2024-09-20 12:42:17,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=890340.0, ans=0.125 2024-09-20 12:42:27,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890380.0, ans=0.1 2024-09-20 12:42:34,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=890420.0, ans=0.1 2024-09-20 12:42:46,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=890420.0, ans=0.07 2024-09-20 12:43:05,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=890460.0, ans=0.125 2024-09-20 12:43:08,696 INFO [train.py:1198] (0/2) Epoch 50, batch 900, loss[loss=0.2073, ctc_loss=0.08363, cr_loss=0.2979, attn_decoder_loss=0.2144, over 29621.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1064, cr_loss=0.3439, attn_decoder_loss=0.2358, over 5740005.92 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:43:25,582 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:43:45,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=890580.0, ans=0.95 2024-09-20 12:44:05,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=890620.0, ans=0.125 2024-09-20 12:44:06,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 8.704e+01 9.222e+01 9.610e+01 2.090e+02, threshold=1.844e+02, percent-clipped=2.0 2024-09-20 12:44:11,385 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:44:25,910 INFO [train.py:1198] (0/2) Epoch 50, batch 950, loss[loss=0.2146, ctc_loss=0.0931, cr_loss=0.3242, attn_decoder_loss=0.2209, over 29516.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1063, cr_loss=0.3437, attn_decoder_loss=0.236, over 5743418.77 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:44:29,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=890700.0, ans=0.1 2024-09-20 12:44:32,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=890700.0, ans=0.125 2024-09-20 12:45:04,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=890780.0, ans=0.0 2024-09-20 12:45:08,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=890780.0, ans=0.125 2024-09-20 12:45:30,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=890860.0, ans=0.05 2024-09-20 12:45:39,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=890900.0, ans=0.125 2024-09-20 12:45:41,064 INFO [train.py:1198] (0/2) Epoch 50, batch 1000, loss[loss=0.2195, ctc_loss=0.09449, cr_loss=0.3107, attn_decoder_loss=0.2264, over 29517.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1069, cr_loss=0.3446, attn_decoder_loss=0.2364, over 5737991.09 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:46:11,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=890980.0, ans=0.125 2024-09-20 12:46:13,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=890980.0, ans=0.025 2024-09-20 12:46:21,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=890980.0, ans=0.125 2024-09-20 12:46:32,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.08 vs. limit=15.0 2024-09-20 12:46:38,847 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.665e+01 9.186e+01 9.812e+01 1.609e+02, threshold=1.837e+02, percent-clipped=0.0 2024-09-20 12:46:48,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891060.0, ans=0.1 2024-09-20 12:46:56,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.61 vs. limit=12.0 2024-09-20 12:46:58,380 INFO [train.py:1198] (0/2) Epoch 50, batch 1050, loss[loss=0.2325, ctc_loss=0.1053, cr_loss=0.3318, attn_decoder_loss=0.2393, over 29662.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1063, cr_loss=0.3431, attn_decoder_loss=0.2357, over 5745560.16 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:47:06,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_na.min_abs, batch_count=891100.0, ans=0.02 2024-09-20 12:47:31,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=891180.0, ans=0.125 2024-09-20 12:47:52,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=891220.0, ans=0.2 2024-09-20 12:48:09,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=891260.0, ans=0.125 2024-09-20 12:48:16,840 INFO [train.py:1198] (0/2) Epoch 50, batch 1100, loss[loss=0.2313, ctc_loss=0.1063, cr_loss=0.3544, attn_decoder_loss=0.2373, over 29411.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1061, cr_loss=0.3427, attn_decoder_loss=0.2354, over 5756472.67 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:48:52,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=891380.0, ans=0.125 2024-09-20 12:48:53,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891380.0, ans=0.1 2024-09-20 12:49:02,600 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=891420.0, ans=0.0 2024-09-20 12:49:12,783 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.673e+01 9.174e+01 9.700e+01 1.224e+02, threshold=1.835e+02, percent-clipped=0.0 2024-09-20 12:49:32,504 INFO [train.py:1198] (0/2) Epoch 50, batch 1150, loss[loss=0.2271, ctc_loss=0.1079, cr_loss=0.3499, attn_decoder_loss=0.2326, over 29459.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1064, cr_loss=0.3432, attn_decoder_loss=0.2356, over 5753371.39 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 12:49:38,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=891500.0, ans=0.1 2024-09-20 12:49:50,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=891540.0, ans=0.125 2024-09-20 12:49:51,722 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2024-09-20 12:50:37,124 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:50:42,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=891660.0, ans=0.2 2024-09-20 12:50:49,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.84 vs. limit=15.0 2024-09-20 12:50:50,190 INFO [train.py:1198] (0/2) Epoch 50, batch 1200, loss[loss=0.2226, ctc_loss=0.09905, cr_loss=0.3275, attn_decoder_loss=0.2291, over 29702.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1069, cr_loss=0.3444, attn_decoder_loss=0.2363, over 5745543.74 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:51:10,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=891740.0, ans=0.04949747468305833 2024-09-20 12:51:48,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.827e+01 9.334e+01 1.012e+02 2.490e+02, threshold=1.867e+02, percent-clipped=1.0 2024-09-20 12:52:07,778 INFO [train.py:1198] (0/2) Epoch 50, batch 1250, loss[loss=0.2376, ctc_loss=0.1067, cr_loss=0.3415, attn_decoder_loss=0.2446, over 29515.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1073, cr_loss=0.3455, attn_decoder_loss=0.2368, over 5773771.06 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:52:34,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=891940.0, ans=15.0 2024-09-20 12:53:06,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=892020.0, ans=0.0 2024-09-20 12:53:21,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=892060.0, ans=0.0 2024-09-20 12:53:24,061 INFO [train.py:1198] (0/2) Epoch 50, batch 1300, loss[loss=0.2424, ctc_loss=0.1032, cr_loss=0.3353, attn_decoder_loss=0.2504, over 28389.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.3452, attn_decoder_loss=0.2365, over 5779069.98 frames. ], batch size: 111, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:53:40,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=892140.0, ans=0.025 2024-09-20 12:54:02,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=892180.0, ans=0.125 2024-09-20 12:54:02,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=892180.0, ans=0.0 2024-09-20 12:54:18,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=892220.0, ans=0.07 2024-09-20 12:54:19,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.576e+01 8.998e+01 9.559e+01 1.394e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-20 12:54:28,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=892260.0, ans=0.125 2024-09-20 12:54:41,684 INFO [train.py:1198] (0/2) Epoch 50, batch 1350, loss[loss=0.2352, ctc_loss=0.1143, cr_loss=0.367, attn_decoder_loss=0.2405, over 29762.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1067, cr_loss=0.3444, attn_decoder_loss=0.2363, over 5795597.22 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:54:49,368 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:54:52,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=892300.0, ans=0.0 2024-09-20 12:55:00,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.46 vs. limit=5.0 2024-09-20 12:55:07,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=892340.0, ans=0.125 2024-09-20 12:55:12,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=892380.0, ans=0.0 2024-09-20 12:55:35,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=892420.0, ans=0.125 2024-09-20 12:55:47,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=892460.0, ans=0.125 2024-09-20 12:55:58,268 INFO [train.py:1198] (0/2) Epoch 50, batch 1400, loss[loss=0.212, ctc_loss=0.09596, cr_loss=0.314, attn_decoder_loss=0.2179, over 29574.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1064, cr_loss=0.344, attn_decoder_loss=0.2361, over 5806865.35 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:56:24,647 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-20 12:56:40,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=892580.0, ans=0.025 2024-09-20 12:56:48,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=892620.0, ans=0.0 2024-09-20 12:56:53,531 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 8.988e+01 9.426e+01 9.888e+01 1.632e+02, threshold=1.885e+02, percent-clipped=0.0 2024-09-20 12:56:55,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892620.0, ans=0.125 2024-09-20 12:56:56,026 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.60 vs. limit=15.0 2024-09-20 12:56:58,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=892660.0, ans=0.125 2024-09-20 12:57:13,308 INFO [train.py:1198] (0/2) Epoch 50, batch 1450, loss[loss=0.2485, ctc_loss=0.1268, cr_loss=0.3835, attn_decoder_loss=0.2535, over 29420.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1065, cr_loss=0.3441, attn_decoder_loss=0.2366, over 5803597.61 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:57:34,451 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:57:34,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-09-20 12:57:38,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=892740.0, ans=0.0 2024-09-20 12:57:46,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=892780.0, ans=0.125 2024-09-20 12:58:30,655 INFO [train.py:1198] (0/2) Epoch 50, batch 1500, loss[loss=0.231, ctc_loss=0.09964, cr_loss=0.3205, attn_decoder_loss=0.2385, over 29617.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1065, cr_loss=0.3438, attn_decoder_loss=0.2367, over 5804988.70 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 12:58:56,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=892940.0, ans=0.0 2024-09-20 12:58:59,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=892980.0, ans=0.125 2024-09-20 12:59:28,764 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.732e+01 9.209e+01 9.748e+01 2.356e+02, threshold=1.842e+02, percent-clipped=2.0 2024-09-20 12:59:29,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=893020.0, ans=0.125 2024-09-20 12:59:48,133 INFO [train.py:1198] (0/2) Epoch 50, batch 1550, loss[loss=0.2533, ctc_loss=0.1272, cr_loss=0.3946, attn_decoder_loss=0.2586, over 29542.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1069, cr_loss=0.3446, attn_decoder_loss=0.2368, over 5780405.86 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:00:18,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=893180.0, ans=0.95 2024-09-20 13:00:30,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=893180.0, ans=0.125 2024-09-20 13:00:58,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=893260.0, ans=0.2 2024-09-20 13:01:02,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=22.5 2024-09-20 13:01:02,885 INFO [train.py:1198] (0/2) Epoch 50, batch 1600, loss[loss=0.2453, ctc_loss=0.122, cr_loss=0.3814, attn_decoder_loss=0.2505, over 29666.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.107, cr_loss=0.345, attn_decoder_loss=0.2364, over 5762558.53 frames. ], batch size: 85, lr: 2.20e-03, grad_scale: 32.0 2024-09-20 13:01:20,588 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=22.5 2024-09-20 13:01:35,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.09 vs. limit=22.5 2024-09-20 13:01:36,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=893380.0, ans=0.125 2024-09-20 13:01:36,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=893380.0, ans=0.0 2024-09-20 13:01:41,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=893380.0, ans=0.0 2024-09-20 13:01:48,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=893420.0, ans=0.125 2024-09-20 13:02:00,397 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.886e+01 8.710e+01 9.146e+01 9.958e+01 1.437e+02, threshold=1.829e+02, percent-clipped=0.0 2024-09-20 13:02:06,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=893460.0, ans=0.2 2024-09-20 13:02:16,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=893460.0, ans=0.125 2024-09-20 13:02:17,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=893460.0, ans=0.0 2024-09-20 13:02:18,443 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.44 vs. limit=15.0 2024-09-20 13:02:20,546 INFO [train.py:1198] (0/2) Epoch 50, batch 1650, loss[loss=0.2368, ctc_loss=0.1105, cr_loss=0.3603, attn_decoder_loss=0.2428, over 29705.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1068, cr_loss=0.3445, attn_decoder_loss=0.2363, over 5757343.48 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:02:28,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=893500.0, ans=0.125 2024-09-20 13:02:29,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=893500.0, ans=0.125 2024-09-20 13:02:34,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893540.0, ans=0.1 2024-09-20 13:02:37,937 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.68 vs. limit=12.0 2024-09-20 13:02:43,773 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.55 vs. limit=15.0 2024-09-20 13:02:59,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=893580.0, ans=0.125 2024-09-20 13:03:03,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=893580.0, ans=0.2 2024-09-20 13:03:32,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=893660.0, ans=0.0 2024-09-20 13:03:37,898 INFO [train.py:1198] (0/2) Epoch 50, batch 1700, loss[loss=0.198, ctc_loss=0.08267, cr_loss=0.2941, attn_decoder_loss=0.2042, over 29567.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1064, cr_loss=0.3439, attn_decoder_loss=0.2361, over 5779734.91 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:03:38,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=893700.0, ans=0.025 2024-09-20 13:03:57,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=893740.0, ans=0.05 2024-09-20 13:04:09,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=893780.0, ans=0.025 2024-09-20 13:04:17,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=893780.0, ans=0.125 2024-09-20 13:04:21,975 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=893820.0, ans=0.2 2024-09-20 13:04:29,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893820.0, ans=0.1 2024-09-20 13:04:35,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.662e+01 9.260e+01 9.838e+01 1.206e+02, threshold=1.852e+02, percent-clipped=0.0 2024-09-20 13:04:38,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=893860.0, ans=0.2 2024-09-20 13:04:53,291 INFO [train.py:1198] (0/2) Epoch 50, batch 1750, loss[loss=0.212, ctc_loss=0.1023, cr_loss=0.3357, attn_decoder_loss=0.2167, over 29384.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1061, cr_loss=0.3438, attn_decoder_loss=0.2357, over 5787766.13 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:05:02,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893900.0, ans=0.1 2024-09-20 13:05:10,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=893940.0, ans=0.125 2024-09-20 13:05:20,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=893940.0, ans=0.2 2024-09-20 13:05:55,343 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:05:58,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894060.0, ans=0.1 2024-09-20 13:06:04,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=894060.0, ans=0.125 2024-09-20 13:06:08,369 INFO [train.py:1198] (0/2) Epoch 50, batch 1800, loss[loss=0.2331, ctc_loss=0.1071, cr_loss=0.3364, attn_decoder_loss=0.2397, over 29693.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1067, cr_loss=0.3442, attn_decoder_loss=0.2363, over 5790559.24 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:06:18,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=894100.0, ans=0.125 2024-09-20 13:06:23,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=894100.0, ans=0.125 2024-09-20 13:06:23,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2024-09-20 13:06:28,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=894140.0, ans=0.0 2024-09-20 13:06:36,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-09-20 13:06:43,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=894180.0, ans=0.0 2024-09-20 13:07:10,022 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.701e+01 9.171e+01 9.771e+01 2.069e+02, threshold=1.834e+02, percent-clipped=1.0 2024-09-20 13:07:16,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=894260.0, ans=0.125 2024-09-20 13:07:20,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=894260.0, ans=0.025 2024-09-20 13:07:23,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=894260.0, ans=0.2 2024-09-20 13:07:28,093 INFO [train.py:1198] (0/2) Epoch 50, batch 1850, loss[loss=0.2441, ctc_loss=0.1092, cr_loss=0.3476, attn_decoder_loss=0.2513, over 29614.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1062, cr_loss=0.3433, attn_decoder_loss=0.2359, over 5796216.29 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:07:49,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=894340.0, ans=0.1 2024-09-20 13:07:52,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=894340.0, ans=10.0 2024-09-20 13:07:54,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=894340.0, ans=0.125 2024-09-20 13:08:27,734 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.25 vs. limit=22.5 2024-09-20 13:08:34,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=894460.0, ans=0.125 2024-09-20 13:08:43,298 INFO [train.py:1198] (0/2) Epoch 50, batch 1900, loss[loss=0.2478, ctc_loss=0.119, cr_loss=0.3899, attn_decoder_loss=0.2535, over 29694.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1066, cr_loss=0.3448, attn_decoder_loss=0.2365, over 5805273.08 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:08:51,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894500.0, ans=0.125 2024-09-20 13:09:06,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.43 vs. limit=12.0 2024-09-20 13:09:13,040 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.59 vs. limit=6.0 2024-09-20 13:09:15,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=894580.0, ans=0.0 2024-09-20 13:09:32,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=894620.0, ans=0.07 2024-09-20 13:09:37,552 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-20 13:09:42,597 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.927e+01 9.438e+01 9.973e+01 1.317e+02, threshold=1.888e+02, percent-clipped=0.0 2024-09-20 13:09:53,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=894660.0, ans=0.125 2024-09-20 13:09:59,249 INFO [train.py:1198] (0/2) Epoch 50, batch 1950, loss[loss=0.2243, ctc_loss=0.1037, cr_loss=0.3408, attn_decoder_loss=0.2301, over 29445.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1073, cr_loss=0.3467, attn_decoder_loss=0.2374, over 5819907.47 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:10:10,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=894700.0, ans=0.125 2024-09-20 13:10:14,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-20 13:10:28,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=894740.0, ans=0.1 2024-09-20 13:10:28,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-20 13:10:29,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-09-20 13:10:47,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894820.0, ans=0.125 2024-09-20 13:10:51,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894820.0, ans=0.1 2024-09-20 13:11:13,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=894860.0, ans=0.125 2024-09-20 13:11:15,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=894860.0, ans=0.025 2024-09-20 13:11:18,240 INFO [train.py:1198] (0/2) Epoch 50, batch 2000, loss[loss=0.2082, ctc_loss=0.09163, cr_loss=0.3084, attn_decoder_loss=0.2143, over 29349.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1078, cr_loss=0.3472, attn_decoder_loss=0.238, over 5797761.53 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:11:26,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=894900.0, ans=0.125 2024-09-20 13:11:45,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=894940.0, ans=0.2 2024-09-20 13:11:59,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894980.0, ans=0.125 2024-09-20 13:12:17,047 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.718e+01 9.150e+01 9.752e+01 2.823e+02, threshold=1.830e+02, percent-clipped=2.0 2024-09-20 13:12:17,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=895060.0, ans=0.125 2024-09-20 13:12:26,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895060.0, ans=0.1 2024-09-20 13:12:34,065 INFO [train.py:1198] (0/2) Epoch 50, batch 2050, loss[loss=0.206, ctc_loss=0.08871, cr_loss=0.2974, attn_decoder_loss=0.2124, over 29435.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1068, cr_loss=0.345, attn_decoder_loss=0.2368, over 5789649.11 frames. ], batch size: 70, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:12:37,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=895100.0, ans=0.0 2024-09-20 13:12:37,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=895100.0, ans=0.125 2024-09-20 13:12:38,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=895100.0, ans=0.125 2024-09-20 13:12:39,380 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-20 13:12:47,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=895140.0, ans=0.125 2024-09-20 13:12:52,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.79 vs. limit=15.0 2024-09-20 13:13:01,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=895140.0, ans=0.125 2024-09-20 13:13:18,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=895220.0, ans=0.125 2024-09-20 13:13:34,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=895260.0, ans=0.0 2024-09-20 13:13:46,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=895260.0, ans=0.2 2024-09-20 13:13:49,482 INFO [train.py:1198] (0/2) Epoch 50, batch 2100, loss[loss=0.2295, ctc_loss=0.1136, cr_loss=0.361, attn_decoder_loss=0.2344, over 29789.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1063, cr_loss=0.344, attn_decoder_loss=0.2363, over 5800109.06 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:13:55,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895300.0, ans=0.1 2024-09-20 13:13:55,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=895300.0, ans=0.0 2024-09-20 13:14:08,383 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:14:17,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2024-09-20 13:14:51,864 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.617e+01 9.072e+01 9.604e+01 1.170e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 13:15:02,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=895460.0, ans=0.2 2024-09-20 13:15:05,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=895460.0, ans=0.0 2024-09-20 13:15:07,329 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:15:08,554 INFO [train.py:1198] (0/2) Epoch 50, batch 2150, loss[loss=0.2301, ctc_loss=0.1066, cr_loss=0.3479, attn_decoder_loss=0.2361, over 29422.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1059, cr_loss=0.3435, attn_decoder_loss=0.2357, over 5815808.45 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:15:16,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=895500.0, ans=0.0 2024-09-20 13:15:34,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-20 13:15:46,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=895580.0, ans=0.2 2024-09-20 13:16:03,926 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2024-09-20 13:16:15,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=895660.0, ans=0.125 2024-09-20 13:16:23,847 INFO [train.py:1198] (0/2) Epoch 50, batch 2200, loss[loss=0.2342, ctc_loss=0.1123, cr_loss=0.3471, attn_decoder_loss=0.24, over 29659.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1059, cr_loss=0.3431, attn_decoder_loss=0.2356, over 5813062.70 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:16:24,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2024-09-20 13:16:44,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=895740.0, ans=0.125 2024-09-20 13:16:46,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=895740.0, ans=0.5 2024-09-20 13:16:52,479 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:17:03,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=895780.0, ans=0.125 2024-09-20 13:17:12,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-09-20 13:17:18,342 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:17:23,882 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.920e+01 8.609e+01 9.131e+01 9.597e+01 2.793e+02, threshold=1.826e+02, percent-clipped=2.0 2024-09-20 13:17:31,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2024-09-20 13:17:31,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2024-09-20 13:17:38,980 INFO [train.py:1198] (0/2) Epoch 50, batch 2250, loss[loss=0.2414, ctc_loss=0.1102, cr_loss=0.3521, attn_decoder_loss=0.2481, over 29712.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.106, cr_loss=0.3435, attn_decoder_loss=0.2357, over 5812161.97 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:17:39,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=895900.0, ans=0.0 2024-09-20 13:17:47,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.13 vs. limit=15.0 2024-09-20 13:18:17,387 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/checkpoint-224000.pt 2024-09-20 13:18:34,605 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-20 13:18:55,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896060.0, ans=0.1 2024-09-20 13:19:05,562 INFO [train.py:1198] (0/2) Epoch 50, batch 2300, loss[loss=0.2016, ctc_loss=0.08571, cr_loss=0.3127, attn_decoder_loss=0.2075, over 29340.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1055, cr_loss=0.3424, attn_decoder_loss=0.2347, over 5800645.12 frames. ], batch size: 71, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:19:10,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=896100.0, ans=0.125 2024-09-20 13:19:13,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896100.0, ans=0.1 2024-09-20 13:19:18,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=896140.0, ans=0.0 2024-09-20 13:19:50,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=896220.0, ans=0.0 2024-09-20 13:19:58,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=896220.0, ans=0.125 2024-09-20 13:20:01,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=896220.0, ans=0.125 2024-09-20 13:20:05,620 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.477e+01 9.129e+01 9.785e+01 2.320e+02, threshold=1.826e+02, percent-clipped=1.0 2024-09-20 13:20:06,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.58 vs. limit=15.0 2024-09-20 13:20:20,684 INFO [train.py:1198] (0/2) Epoch 50, batch 2350, loss[loss=0.2391, ctc_loss=0.111, cr_loss=0.3657, attn_decoder_loss=0.2452, over 29683.00 frames. ], tot_loss[loss=0.2287, ctc_loss=0.1056, cr_loss=0.3426, attn_decoder_loss=0.2348, over 5805896.02 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 8.0 2024-09-20 13:20:26,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=896300.0, ans=0.125 2024-09-20 13:20:29,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=896300.0, ans=0.2 2024-09-20 13:20:44,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=896340.0, ans=0.125 2024-09-20 13:21:11,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=896420.0, ans=0.0 2024-09-20 13:21:33,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=896460.0, ans=0.125 2024-09-20 13:21:36,032 INFO [train.py:1198] (0/2) Epoch 50, batch 2400, loss[loss=0.2247, ctc_loss=0.1089, cr_loss=0.3538, attn_decoder_loss=0.2297, over 29538.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1064, cr_loss=0.3448, attn_decoder_loss=0.2355, over 5809543.85 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:21:37,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896500.0, ans=0.1 2024-09-20 13:22:22,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=896580.0, ans=0.125 2024-09-20 13:22:25,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=896620.0, ans=0.1 2024-09-20 13:22:26,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=896620.0, ans=0.0 2024-09-20 13:22:40,129 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 8.885e+01 9.266e+01 9.766e+01 1.218e+02, threshold=1.853e+02, percent-clipped=0.0 2024-09-20 13:22:43,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=896660.0, ans=0.125 2024-09-20 13:22:55,251 INFO [train.py:1198] (0/2) Epoch 50, batch 2450, loss[loss=0.23, ctc_loss=0.1059, cr_loss=0.3485, attn_decoder_loss=0.2361, over 29716.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.3456, attn_decoder_loss=0.2365, over 5785629.85 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:22:59,873 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=896700.0, ans=0.2 2024-09-20 13:23:04,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=896700.0, ans=0.035 2024-09-20 13:23:04,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896700.0, ans=0.1 2024-09-20 13:23:05,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=896700.0, ans=0.1 2024-09-20 13:23:13,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=896740.0, ans=0.0 2024-09-20 13:23:26,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=896780.0, ans=0.125 2024-09-20 13:23:58,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896860.0, ans=0.1 2024-09-20 13:24:10,523 INFO [train.py:1198] (0/2) Epoch 50, batch 2500, loss[loss=0.2375, ctc_loss=0.1108, cr_loss=0.3726, attn_decoder_loss=0.2433, over 29623.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1073, cr_loss=0.3462, attn_decoder_loss=0.2364, over 5795594.10 frames. ], batch size: 86, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:24:19,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=896900.0, ans=0.025 2024-09-20 13:24:48,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=896980.0, ans=0.125 2024-09-20 13:24:48,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-20 13:25:00,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=897020.0, ans=0.035 2024-09-20 13:25:10,796 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.696e+01 9.094e+01 9.539e+01 5.829e+02, threshold=1.819e+02, percent-clipped=1.0 2024-09-20 13:25:25,882 INFO [train.py:1198] (0/2) Epoch 50, batch 2550, loss[loss=0.2094, ctc_loss=0.09646, cr_loss=0.3209, attn_decoder_loss=0.2148, over 29338.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1072, cr_loss=0.3461, attn_decoder_loss=0.2363, over 5799225.30 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:25:33,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=897100.0, ans=0.0 2024-09-20 13:25:47,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=897140.0, ans=0.2 2024-09-20 13:25:55,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=897140.0, ans=0.125 2024-09-20 13:26:31,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-09-20 13:26:45,538 INFO [train.py:1198] (0/2) Epoch 50, batch 2600, loss[loss=0.228, ctc_loss=0.1066, cr_loss=0.3502, attn_decoder_loss=0.2337, over 29463.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.107, cr_loss=0.3455, attn_decoder_loss=0.2366, over 5794751.67 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:27:35,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=897420.0, ans=0.0 2024-09-20 13:27:45,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.589e+01 9.158e+01 9.861e+01 1.661e+02, threshold=1.832e+02, percent-clipped=0.0 2024-09-20 13:27:54,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=897460.0, ans=0.0 2024-09-20 13:28:00,332 INFO [train.py:1198] (0/2) Epoch 50, batch 2650, loss[loss=0.245, ctc_loss=0.1208, cr_loss=0.3738, attn_decoder_loss=0.2505, over 29232.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1075, cr_loss=0.3466, attn_decoder_loss=0.2372, over 5802545.07 frames. ], batch size: 100, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:28:00,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=897500.0, ans=0.0 2024-09-20 13:28:00,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=897500.0, ans=0.125 2024-09-20 13:28:13,250 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2024-09-20 13:28:14,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=897540.0, ans=0.125 2024-09-20 13:28:16,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2024-09-20 13:28:32,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=897580.0, ans=0.2 2024-09-20 13:28:33,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=897580.0, ans=0.025 2024-09-20 13:28:36,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.74 vs. limit=10.0 2024-09-20 13:28:44,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=897620.0, ans=0.125 2024-09-20 13:29:05,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=897660.0, ans=0.125 2024-09-20 13:29:15,401 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.14 vs. limit=12.0 2024-09-20 13:29:15,764 INFO [train.py:1198] (0/2) Epoch 50, batch 2700, loss[loss=0.2411, ctc_loss=0.1139, cr_loss=0.3663, attn_decoder_loss=0.247, over 29532.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1076, cr_loss=0.347, attn_decoder_loss=0.2373, over 5797877.90 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 16.0 2024-09-20 13:29:28,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=897700.0, ans=0.125 2024-09-20 13:29:38,450 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:29:50,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=897780.0, ans=0.025 2024-09-20 13:29:52,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=897780.0, ans=0.0 2024-09-20 13:30:00,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=897780.0, ans=0.125 2024-09-20 13:30:06,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=897820.0, ans=0.0 2024-09-20 13:30:12,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=897820.0, ans=0.025 2024-09-20 13:30:19,834 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.670e+01 9.155e+01 9.600e+01 1.586e+02, threshold=1.831e+02, percent-clipped=0.0 2024-09-20 13:30:26,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=12.0 2024-09-20 13:30:32,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=897860.0, ans=0.2 2024-09-20 13:30:35,022 INFO [train.py:1198] (0/2) Epoch 50, batch 2750, loss[loss=0.2184, ctc_loss=0.09926, cr_loss=0.3309, attn_decoder_loss=0.2243, over 29526.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1067, cr_loss=0.3451, attn_decoder_loss=0.236, over 5796275.49 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:30:46,400 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.54 vs. limit=15.0 2024-09-20 13:30:50,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=897940.0, ans=0.125 2024-09-20 13:31:02,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=897940.0, ans=0.125 2024-09-20 13:31:21,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=898020.0, ans=0.0 2024-09-20 13:31:40,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=898060.0, ans=0.125 2024-09-20 13:31:41,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=898060.0, ans=0.2 2024-09-20 13:31:47,867 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898060.0, ans=0.1 2024-09-20 13:31:50,612 INFO [train.py:1198] (0/2) Epoch 50, batch 2800, loss[loss=0.2567, ctc_loss=0.1332, cr_loss=0.3895, attn_decoder_loss=0.2618, over 19419.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1071, cr_loss=0.3458, attn_decoder_loss=0.2365, over 5776445.49 frames. ], batch size: 210, lr: 2.19e-03, grad_scale: 32.0 2024-09-20 13:31:50,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=898100.0, ans=0.05 2024-09-20 13:31:54,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.72 vs. limit=12.0 2024-09-20 13:32:17,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=898140.0, ans=0.025 2024-09-20 13:32:47,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=898220.0, ans=0.125 2024-09-20 13:32:53,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.751e+01 9.267e+01 9.840e+01 2.500e+02, threshold=1.853e+02, percent-clipped=1.0 2024-09-20 13:33:05,423 INFO [train.py:1198] (0/2) Epoch 50, batch 2850, loss[loss=0.2241, ctc_loss=0.101, cr_loss=0.3221, attn_decoder_loss=0.2306, over 29519.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1072, cr_loss=0.3459, attn_decoder_loss=0.2368, over 5761785.29 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:33:38,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=898380.0, ans=0.0 2024-09-20 13:33:49,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=898380.0, ans=0.125 2024-09-20 13:34:00,632 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2024-09-20 13:34:14,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-09-20 13:34:22,363 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=898500.0, ans=0.0 2024-09-20 13:34:23,631 INFO [train.py:1198] (0/2) Epoch 50, batch 2900, loss[loss=0.2327, ctc_loss=0.1116, cr_loss=0.3792, attn_decoder_loss=0.2378, over 29455.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1076, cr_loss=0.3471, attn_decoder_loss=0.2377, over 5786813.83 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:34:46,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=898540.0, ans=0.125 2024-09-20 13:35:03,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=898580.0, ans=0.125 2024-09-20 13:35:18,921 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-09-20 13:35:26,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 8.699e+01 9.096e+01 9.563e+01 1.472e+02, threshold=1.819e+02, percent-clipped=0.0 2024-09-20 13:35:38,791 INFO [train.py:1198] (0/2) Epoch 50, batch 2950, loss[loss=0.21, ctc_loss=0.09216, cr_loss=0.2988, attn_decoder_loss=0.2164, over 29502.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1068, cr_loss=0.3453, attn_decoder_loss=0.2367, over 5780721.70 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:35:47,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-09-20 13:35:56,349 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=12.0 2024-09-20 13:36:01,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=898740.0, ans=0.05 2024-09-20 13:36:15,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=898780.0, ans=0.125 2024-09-20 13:36:31,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=898820.0, ans=0.015 2024-09-20 13:36:48,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=898860.0, ans=0.025 2024-09-20 13:36:51,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2024-09-20 13:36:53,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=898900.0, ans=0.0 2024-09-20 13:36:53,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=898900.0, ans=0.125 2024-09-20 13:36:54,244 INFO [train.py:1198] (0/2) Epoch 50, batch 3000, loss[loss=0.2398, ctc_loss=0.119, cr_loss=0.3718, attn_decoder_loss=0.2449, over 29748.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1064, cr_loss=0.344, attn_decoder_loss=0.2364, over 5780910.52 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:36:54,245 INFO [train.py:1221] (0/2) Computing validation loss 2024-09-20 13:37:12,393 INFO [train.py:1230] (0/2) Epoch 50, validation: loss=0.213, ctc_loss=0.03629, cr_loss=7.081e-15, attn_decoder_loss=0.2326, over 944034.00 frames. 2024-09-20 13:37:12,394 INFO [train.py:1231] (0/2) Maximum memory allocated so far is 52576MB 2024-09-20 13:37:14,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-20 13:37:20,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898900.0, ans=0.1 2024-09-20 13:37:28,207 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.72 vs. limit=15.0 2024-09-20 13:37:39,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=898940.0, ans=0.125 2024-09-20 13:37:47,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-20 13:37:50,177 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-09-20 13:38:00,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=899020.0, ans=0.025 2024-09-20 13:38:04,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899020.0, ans=0.1 2024-09-20 13:38:15,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=899060.0, ans=0.0 2024-09-20 13:38:19,545 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.906e+01 9.324e+01 9.722e+01 1.754e+02, threshold=1.865e+02, percent-clipped=0.0 2024-09-20 13:38:20,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=899060.0, ans=0.125 2024-09-20 13:38:31,730 INFO [train.py:1198] (0/2) Epoch 50, batch 3050, loss[loss=0.2284, ctc_loss=0.1065, cr_loss=0.3333, attn_decoder_loss=0.2346, over 29547.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1072, cr_loss=0.3456, attn_decoder_loss=0.2373, over 5776508.44 frames. ], batch size: 76, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:38:47,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=899140.0, ans=0.2 2024-09-20 13:39:06,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2024-09-20 13:39:25,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2024-09-20 13:39:40,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899260.0, ans=0.0 2024-09-20 13:39:47,283 INFO [train.py:1198] (0/2) Epoch 50, batch 3100, loss[loss=0.2453, ctc_loss=0.1109, cr_loss=0.3531, attn_decoder_loss=0.2524, over 29258.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1073, cr_loss=0.3457, attn_decoder_loss=0.2371, over 5776397.07 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:40:01,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=899340.0, ans=0.025 2024-09-20 13:40:12,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.31 vs. limit=15.0 2024-09-20 13:40:50,445 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.844e+01 9.233e+01 9.806e+01 2.846e+02, threshold=1.847e+02, percent-clipped=1.0 2024-09-20 13:40:58,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=899460.0, ans=0.125 2024-09-20 13:41:02,636 INFO [train.py:1198] (0/2) Epoch 50, batch 3150, loss[loss=0.2411, ctc_loss=0.114, cr_loss=0.3658, attn_decoder_loss=0.2471, over 28864.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1073, cr_loss=0.3459, attn_decoder_loss=0.2372, over 5782010.22 frames. ], batch size: 104, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:41:14,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=899500.0, ans=0.125 2024-09-20 13:41:40,868 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.0 2024-09-20 13:41:49,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=899580.0, ans=0.025 2024-09-20 13:42:21,892 INFO [train.py:1198] (0/2) Epoch 50, batch 3200, loss[loss=0.2241, ctc_loss=0.1037, cr_loss=0.3342, attn_decoder_loss=0.2301, over 29391.00 frames. ], tot_loss[loss=0.2305, ctc_loss=0.1068, cr_loss=0.345, attn_decoder_loss=0.2365, over 5792765.16 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:42:46,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-20 13:43:04,551 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:43:12,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=899820.0, ans=0.125 2024-09-20 13:43:25,248 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.416e+01 9.001e+01 9.640e+01 1.386e+02, threshold=1.800e+02, percent-clipped=0.0 2024-09-20 13:43:31,769 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:43:37,445 INFO [train.py:1198] (0/2) Epoch 50, batch 3250, loss[loss=0.2383, ctc_loss=0.112, cr_loss=0.355, attn_decoder_loss=0.2445, over 29725.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1071, cr_loss=0.3457, attn_decoder_loss=0.2372, over 5799733.26 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:43:46,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=899900.0, ans=0.1 2024-09-20 13:44:05,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=899980.0, ans=0.125 2024-09-20 13:44:41,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=900060.0, ans=0.2 2024-09-20 13:44:53,053 INFO [train.py:1198] (0/2) Epoch 50, batch 3300, loss[loss=0.2394, ctc_loss=0.109, cr_loss=0.3474, attn_decoder_loss=0.2461, over 28378.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1063, cr_loss=0.3439, attn_decoder_loss=0.2359, over 5797901.01 frames. ], batch size: 111, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:45:04,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=900100.0, ans=0.95 2024-09-20 13:45:06,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-09-20 13:45:19,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=900140.0, ans=0.0 2024-09-20 13:45:38,976 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.98 vs. limit=15.0 2024-09-20 13:45:47,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=900220.0, ans=0.5 2024-09-20 13:46:01,978 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.683e+01 9.254e+01 9.837e+01 3.581e+02, threshold=1.851e+02, percent-clipped=1.0 2024-09-20 13:46:12,356 INFO [train.py:1198] (0/2) Epoch 50, batch 3350, loss[loss=0.2454, ctc_loss=0.1113, cr_loss=0.3472, attn_decoder_loss=0.2526, over 28855.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1069, cr_loss=0.3447, attn_decoder_loss=0.2367, over 5774119.94 frames. ], batch size: 104, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:46:18,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=900300.0, ans=0.09899494936611666 2024-09-20 13:46:27,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=900340.0, ans=0.07 2024-09-20 13:46:29,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-20 13:46:54,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=900380.0, ans=0.125 2024-09-20 13:47:27,587 INFO [train.py:1198] (0/2) Epoch 50, batch 3400, loss[loss=0.2084, ctc_loss=0.09511, cr_loss=0.3192, attn_decoder_loss=0.2139, over 29341.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1076, cr_loss=0.3462, attn_decoder_loss=0.2368, over 5766118.44 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:47:30,856 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=900500.0, ans=0.125 2024-09-20 13:47:41,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=900540.0, ans=0.125 2024-09-20 13:47:52,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=900540.0, ans=0.125 2024-09-20 13:48:08,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=900580.0, ans=0.2 2024-09-20 13:48:16,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=900620.0, ans=0.125 2024-09-20 13:48:32,608 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.696e+01 9.262e+01 1.002e+02 2.353e+02, threshold=1.852e+02, percent-clipped=1.0 2024-09-20 13:48:44,943 INFO [train.py:1198] (0/2) Epoch 50, batch 3450, loss[loss=0.2363, ctc_loss=0.1044, cr_loss=0.3378, attn_decoder_loss=0.2435, over 28209.00 frames. ], tot_loss[loss=0.2312, ctc_loss=0.1075, cr_loss=0.3463, attn_decoder_loss=0.2373, over 5774468.91 frames. ], batch size: 111, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:48:51,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900700.0, ans=0.125 2024-09-20 13:49:01,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=900740.0, ans=0.2 2024-09-20 13:49:06,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=900740.0, ans=0.2 2024-09-20 13:49:09,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900740.0, ans=0.1 2024-09-20 13:49:17,469 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:49:22,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=4.68 vs. limit=15.0 2024-09-20 13:49:25,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=900780.0, ans=0.025 2024-09-20 13:49:37,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=900820.0, ans=0.125 2024-09-20 13:49:49,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=900860.0, ans=0.125 2024-09-20 13:50:02,673 INFO [train.py:1198] (0/2) Epoch 50, batch 3500, loss[loss=0.2104, ctc_loss=0.1009, cr_loss=0.3248, attn_decoder_loss=0.2153, over 29330.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1069, cr_loss=0.3446, attn_decoder_loss=0.2364, over 5775771.68 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:50:06,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-09-20 13:50:20,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900940.0, ans=0.1 2024-09-20 13:50:49,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=901020.0, ans=0.2 2024-09-20 13:51:06,274 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.687e+01 9.132e+01 9.623e+01 1.623e+02, threshold=1.826e+02, percent-clipped=0.0 2024-09-20 13:51:14,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-09-20 13:51:16,620 INFO [train.py:1198] (0/2) Epoch 50, batch 3550, loss[loss=0.2437, ctc_loss=0.1134, cr_loss=0.3711, attn_decoder_loss=0.25, over 29704.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1067, cr_loss=0.3442, attn_decoder_loss=0.236, over 5782839.12 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 13:51:19,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901100.0, ans=0.1 2024-09-20 13:51:28,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=901100.0, ans=0.125 2024-09-20 13:51:43,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=901140.0, ans=0.125 2024-09-20 13:51:45,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=901180.0, ans=0.125 2024-09-20 13:51:52,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=901180.0, ans=0.035 2024-09-20 13:52:06,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=901220.0, ans=0.0 2024-09-20 13:52:07,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2024-09-20 13:52:22,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-09-20 13:52:28,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-20 13:52:30,415 INFO [train.py:1198] (0/2) Epoch 50, batch 3600, loss[loss=0.2275, ctc_loss=0.1038, cr_loss=0.3333, attn_decoder_loss=0.2338, over 29495.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1068, cr_loss=0.3446, attn_decoder_loss=0.2363, over 5792820.40 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:52:35,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=901300.0, ans=0.125 2024-09-20 13:52:46,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=901340.0, ans=0.1 2024-09-20 13:52:55,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901340.0, ans=0.1 2024-09-20 13:52:59,708 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2024-09-20 13:53:33,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=901460.0, ans=0.125 2024-09-20 13:53:34,116 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.620e+01 9.031e+01 9.703e+01 1.754e+02, threshold=1.806e+02, percent-clipped=0.0 2024-09-20 13:53:44,527 INFO [train.py:1198] (0/2) Epoch 50, batch 3650, loss[loss=0.2465, ctc_loss=0.1159, cr_loss=0.3807, attn_decoder_loss=0.2525, over 29492.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1067, cr_loss=0.3447, attn_decoder_loss=0.2359, over 5794745.94 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:53:54,229 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.55 vs. limit=15.0 2024-09-20 13:54:02,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=901540.0, ans=0.125 2024-09-20 13:54:08,598 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 13:54:13,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=901540.0, ans=0.0 2024-09-20 13:54:19,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=4.49 vs. limit=15.0 2024-09-20 13:54:25,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=901580.0, ans=0.2 2024-09-20 13:54:36,514 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2024-09-20 13:54:38,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=901620.0, ans=0.125 2024-09-20 13:55:02,719 INFO [train.py:1198] (0/2) Epoch 50, batch 3700, loss[loss=0.2356, ctc_loss=0.1026, cr_loss=0.3417, attn_decoder_loss=0.2428, over 29708.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1062, cr_loss=0.3437, attn_decoder_loss=0.2359, over 5803830.28 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:55:08,037 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.91 vs. limit=15.0 2024-09-20 13:55:09,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=901700.0, ans=0.125 2024-09-20 13:55:19,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=901740.0, ans=0.2 2024-09-20 13:55:31,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-09-20 13:55:33,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=901780.0, ans=0.125 2024-09-20 13:55:36,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=901780.0, ans=0.0 2024-09-20 13:55:44,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.82 vs. limit=15.0 2024-09-20 13:55:45,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-20 13:55:46,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=12.0 2024-09-20 13:55:54,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=901820.0, ans=0.0 2024-09-20 13:56:05,876 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.521e+01 9.140e+01 9.589e+01 3.115e+02, threshold=1.828e+02, percent-clipped=1.0 2024-09-20 13:56:16,331 INFO [train.py:1198] (0/2) Epoch 50, batch 3750, loss[loss=0.2042, ctc_loss=0.08649, cr_loss=0.3014, attn_decoder_loss=0.2106, over 29328.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1064, cr_loss=0.3443, attn_decoder_loss=0.2359, over 5808121.57 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:56:30,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=901940.0, ans=0.2 2024-09-20 13:56:52,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=901980.0, ans=0.0 2024-09-20 13:56:55,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=901980.0, ans=0.0 2024-09-20 13:57:30,712 INFO [train.py:1198] (0/2) Epoch 50, batch 3800, loss[loss=0.2449, ctc_loss=0.1172, cr_loss=0.3568, attn_decoder_loss=0.2512, over 29635.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1061, cr_loss=0.3436, attn_decoder_loss=0.2354, over 5798036.49 frames. ], batch size: 86, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:57:30,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=902100.0, ans=0.125 2024-09-20 13:57:36,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=902100.0, ans=0.0 2024-09-20 13:57:38,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=902100.0, ans=0.2 2024-09-20 13:57:39,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=902100.0, ans=0.2 2024-09-20 13:57:48,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=902140.0, ans=0.0 2024-09-20 13:58:09,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=902180.0, ans=0.95 2024-09-20 13:58:12,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=902180.0, ans=0.125 2024-09-20 13:58:14,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=22.5 2024-09-20 13:58:22,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=902220.0, ans=0.05 2024-09-20 13:58:22,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=902220.0, ans=0.1 2024-09-20 13:58:34,453 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.593e+01 9.127e+01 9.556e+01 1.815e+02, threshold=1.825e+02, percent-clipped=0.0 2024-09-20 13:58:43,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=902300.0, ans=0.0 2024-09-20 13:58:44,682 INFO [train.py:1198] (0/2) Epoch 50, batch 3850, loss[loss=0.2521, ctc_loss=0.1202, cr_loss=0.3833, attn_decoder_loss=0.2582, over 29308.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1061, cr_loss=0.3439, attn_decoder_loss=0.2355, over 5811545.07 frames. ], batch size: 100, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 13:58:56,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902300.0, ans=0.125 2024-09-20 13:59:06,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=902340.0, ans=0.125 2024-09-20 13:59:23,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902380.0, ans=0.1 2024-09-20 13:59:39,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=902420.0, ans=0.0 2024-09-20 13:59:50,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902460.0, ans=0.1 2024-09-20 14:00:00,224 INFO [train.py:1198] (0/2) Epoch 50, batch 3900, loss[loss=0.2479, ctc_loss=0.115, cr_loss=0.3285, attn_decoder_loss=0.2554, over 29636.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1065, cr_loss=0.3444, attn_decoder_loss=0.2359, over 5815582.47 frames. ], batch size: 86, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:00:05,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.91 vs. limit=15.0 2024-09-20 14:00:12,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=902500.0, ans=0.0 2024-09-20 14:00:40,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.92 vs. limit=22.5 2024-09-20 14:00:46,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=902620.0, ans=0.0 2024-09-20 14:00:59,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=902660.0, ans=0.0 2024-09-20 14:01:05,029 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.795e+01 8.747e+01 9.246e+01 9.668e+01 1.412e+02, threshold=1.849e+02, percent-clipped=0.0 2024-09-20 14:01:06,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902660.0, ans=0.125 2024-09-20 14:01:15,505 INFO [train.py:1198] (0/2) Epoch 50, batch 3950, loss[loss=0.2515, ctc_loss=0.1265, cr_loss=0.3954, attn_decoder_loss=0.2566, over 29475.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1067, cr_loss=0.3451, attn_decoder_loss=0.2363, over 5835275.30 frames. ], batch size: 97, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:01:29,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902740.0, ans=0.125 2024-09-20 14:01:42,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=902740.0, ans=0.0 2024-09-20 14:01:46,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902780.0, ans=0.1 2024-09-20 14:01:54,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=902780.0, ans=0.0 2024-09-20 14:02:03,337 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=10.74 vs. limit=15.0 2024-09-20 14:02:16,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=902860.0, ans=0.1 2024-09-20 14:02:17,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=902860.0, ans=0.125 2024-09-20 14:02:20,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=902860.0, ans=0.0 2024-09-20 14:02:28,914 INFO [train.py:1198] (0/2) Epoch 50, batch 4000, loss[loss=0.2067, ctc_loss=0.09226, cr_loss=0.3133, attn_decoder_loss=0.2124, over 29519.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1066, cr_loss=0.3444, attn_decoder_loss=0.2361, over 5813223.71 frames. ], batch size: 74, lr: 2.19e-03, grad_scale: 32.0 2024-09-20 14:02:33,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=902900.0, ans=0.0 2024-09-20 14:02:36,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=902900.0, ans=0.2 2024-09-20 14:02:39,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=902900.0, ans=0.0 2024-09-20 14:02:39,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=902900.0, ans=0.0 2024-09-20 14:02:46,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=902940.0, ans=0.2 2024-09-20 14:02:57,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=902980.0, ans=0.125 2024-09-20 14:03:01,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=902980.0, ans=0.025 2024-09-20 14:03:17,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=903020.0, ans=0.125 2024-09-20 14:03:22,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=903020.0, ans=0.2 2024-09-20 14:03:22,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.39 vs. limit=15.0 2024-09-20 14:03:32,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=903060.0, ans=0.125 2024-09-20 14:03:33,843 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 8.787e+01 9.327e+01 9.838e+01 2.486e+02, threshold=1.865e+02, percent-clipped=3.0 2024-09-20 14:03:37,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=903060.0, ans=0.125 2024-09-20 14:03:42,695 INFO [train.py:1198] (0/2) Epoch 50, batch 4050, loss[loss=0.2593, ctc_loss=0.1382, cr_loss=0.3817, attn_decoder_loss=0.2642, over 19583.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1067, cr_loss=0.3446, attn_decoder_loss=0.2361, over 5794791.88 frames. ], batch size: 209, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:03:53,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=903100.0, ans=0.0 2024-09-20 14:04:13,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=903180.0, ans=0.2 2024-09-20 14:04:44,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=903260.0, ans=0.0 2024-09-20 14:04:57,484 INFO [train.py:1198] (0/2) Epoch 50, batch 4100, loss[loss=0.25, ctc_loss=0.1222, cr_loss=0.3898, attn_decoder_loss=0.2555, over 29515.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1066, cr_loss=0.344, attn_decoder_loss=0.2361, over 5790534.14 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:05:28,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=903380.0, ans=0.125 2024-09-20 14:05:45,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=903420.0, ans=0.1 2024-09-20 14:05:57,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903460.0, ans=0.1 2024-09-20 14:06:04,477 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.737e+01 9.246e+01 9.591e+01 2.033e+02, threshold=1.849e+02, percent-clipped=1.0 2024-09-20 14:06:11,996 INFO [train.py:1198] (0/2) Epoch 50, batch 4150, loss[loss=0.2292, ctc_loss=0.1104, cr_loss=0.3436, attn_decoder_loss=0.2347, over 29504.00 frames. ], tot_loss[loss=0.2299, ctc_loss=0.1064, cr_loss=0.3444, attn_decoder_loss=0.2359, over 5796037.13 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:06:29,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=903540.0, ans=0.0 2024-09-20 14:06:40,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-20 14:06:41,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=903580.0, ans=0.125 2024-09-20 14:06:45,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903580.0, ans=0.1 2024-09-20 14:06:47,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=903580.0, ans=0.2 2024-09-20 14:07:03,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=903620.0, ans=0.125 2024-09-20 14:07:06,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=903620.0, ans=0.2 2024-09-20 14:07:25,114 INFO [train.py:1198] (0/2) Epoch 50, batch 4200, loss[loss=0.2462, ctc_loss=0.1175, cr_loss=0.3781, attn_decoder_loss=0.2521, over 29507.00 frames. ], tot_loss[loss=0.2303, ctc_loss=0.1066, cr_loss=0.345, attn_decoder_loss=0.2363, over 5799445.03 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:07:29,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2024-09-20 14:07:29,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=903700.0, ans=0.0 2024-09-20 14:07:38,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=903740.0, ans=0.1 2024-09-20 14:07:40,198 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 14:07:47,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903740.0, ans=0.1 2024-09-20 14:07:50,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=903740.0, ans=0.125 2024-09-20 14:08:05,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-09-20 14:08:11,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=903820.0, ans=0.125 2024-09-20 14:08:29,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=903860.0, ans=0.125 2024-09-20 14:08:32,189 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.954e+01 8.657e+01 9.068e+01 9.554e+01 1.385e+02, threshold=1.814e+02, percent-clipped=0.0 2024-09-20 14:08:32,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=903860.0, ans=0.125 2024-09-20 14:08:39,515 INFO [train.py:1198] (0/2) Epoch 50, batch 4250, loss[loss=0.2156, ctc_loss=0.09524, cr_loss=0.3291, attn_decoder_loss=0.2216, over 29528.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1062, cr_loss=0.344, attn_decoder_loss=0.2363, over 5806101.16 frames. ], batch size: 74, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:08:47,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=903900.0, ans=0.125 2024-09-20 14:09:08,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=903980.0, ans=0.125 2024-09-20 14:09:30,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=904020.0, ans=0.125 2024-09-20 14:09:53,370 INFO [train.py:1198] (0/2) Epoch 50, batch 4300, loss[loss=0.238, ctc_loss=0.1084, cr_loss=0.3295, attn_decoder_loss=0.2451, over 29531.00 frames. ], tot_loss[loss=0.2301, ctc_loss=0.1061, cr_loss=0.3438, attn_decoder_loss=0.2363, over 5794814.95 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:10:17,334 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 14:10:17,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=904140.0, ans=0.0 2024-09-20 14:10:59,382 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 8.735e+01 9.239e+01 9.870e+01 1.478e+02, threshold=1.848e+02, percent-clipped=0.0 2024-09-20 14:10:59,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=904260.0, ans=0.125 2024-09-20 14:11:04,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=904260.0, ans=0.125 2024-09-20 14:11:06,791 INFO [train.py:1198] (0/2) Epoch 50, batch 4350, loss[loss=0.2534, ctc_loss=0.1225, cr_loss=0.3832, attn_decoder_loss=0.2594, over 29470.00 frames. ], tot_loss[loss=0.2333, ctc_loss=0.1084, cr_loss=0.3486, attn_decoder_loss=0.2394, over 5797554.90 frames. ], batch size: 97, lr: 2.19e-03, grad_scale: 8.0 2024-09-20 14:11:34,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=904340.0, ans=0.2 2024-09-20 14:12:06,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=904460.0, ans=0.125 2024-09-20 14:12:12,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=904460.0, ans=0.125 2024-09-20 14:12:18,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=904460.0, ans=0.025 2024-09-20 14:12:21,024 INFO [train.py:1198] (0/2) Epoch 50, batch 4400, loss[loss=0.2408, ctc_loss=0.1191, cr_loss=0.3746, attn_decoder_loss=0.246, over 27240.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1096, cr_loss=0.3512, attn_decoder_loss=0.2411, over 5769723.67 frames. ], batch size: 124, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:12:35,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=904540.0, ans=0.95 2024-09-20 14:13:25,695 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=12.0 2024-09-20 14:13:27,637 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 8.488e+01 9.232e+01 9.764e+01 1.027e+02 1.631e+02, threshold=1.953e+02, percent-clipped=0.0 2024-09-20 14:13:35,035 INFO [train.py:1198] (0/2) Epoch 50, batch 4450, loss[loss=0.2438, ctc_loss=0.1251, cr_loss=0.3629, attn_decoder_loss=0.2489, over 20354.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.113, cr_loss=0.3571, attn_decoder_loss=0.2432, over 5584724.76 frames. ], batch size: 210, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:14:11,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=904780.0, ans=0.125 2024-09-20 14:14:22,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=904820.0, ans=0.2 2024-09-20 14:14:28,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=904820.0, ans=0.2 2024-09-20 14:14:29,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=904820.0, ans=0.0 2024-09-20 14:14:31,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=904820.0, ans=0.125 2024-09-20 14:14:40,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=904860.0, ans=0.0 2024-09-20 14:14:43,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=904860.0, ans=0.1 2024-09-20 14:14:50,474 INFO [train.py:1198] (0/2) Epoch 50, batch 4500, loss[loss=0.2529, ctc_loss=0.1331, cr_loss=0.3812, attn_decoder_loss=0.2577, over 19615.00 frames. ], tot_loss[loss=0.2392, ctc_loss=0.1156, cr_loss=0.3592, attn_decoder_loss=0.2449, over 5240240.45 frames. ], batch size: 210, lr: 2.19e-03, grad_scale: 16.0 2024-09-20 14:14:52,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=904900.0, ans=0.95 2024-09-20 14:15:27,054 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp-large-ctc-aed-ctc-loss-scale-0.1-aed-loss-scale-0.9-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1/epoch-50.pt 2024-09-20 14:15:39,984 INFO [train.py:1496] (0/2) Done!