2023-10-09 10:20:55,466 INFO [train.py:1099] (1/4) Training started 2023-10-09 10:20:55,466 INFO [train.py:1109] (1/4) Device: cuda:1 2023-10-09 10:20:55,472 INFO [train.py:1121] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '821ebc378e7fb99b8adc81950227963332821e01', 'k2-git-date': 'Wed Jul 19 15:38:25 2023', 'lhotse-version': '1.16.0.dev+git.1db4d97a.clean', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev_multi_zh-hans', 'icefall-git-sha1': '919793d-dirty', 'icefall-git-date': 'Thu Sep 7 21:06:37 2023', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.3.dev20230721+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.16.0.dev0+git.1db4d97a.clean-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-1-1220091118-57c4d55446-mvd6x', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 14, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-w-ctc'), 'bpe_model': 'data/lang_bpe_2000/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 700, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'vocab_size': 2000} 2023-10-09 10:20:55,472 INFO [train.py:1123] (1/4) About to create model 2023-10-09 10:20:56,038 INFO [train.py:1127] (1/4) Number of model parameters: 69651511 2023-10-09 10:20:56,039 INFO [checkpoint.py:112] (1/4) Loading checkpoint from zipformer/exp-w-ctc/epoch-13.pt 2023-10-09 10:21:02,243 INFO [train.py:1142] (1/4) Using DDP 2023-10-09 10:21:05,303 INFO [train.py:1154] (1/4) Loading optimizer state dict 2023-10-09 10:21:05,953 INFO [train.py:1162] (1/4) Loading scheduler state dict 2023-10-09 10:21:05,953 INFO [multi_dataset.py:52] (1/4) About to get multidataset train cuts 2023-10-09 10:21:05,954 INFO [multi_dataset.py:55] (1/4) Loading THCHS-30 in lazy mode 2023-10-09 10:21:05,988 INFO [multi_dataset.py:61] (1/4) Loading Aishell-1 in lazy mode 2023-10-09 10:21:05,990 INFO [multi_dataset.py:67] (1/4) Loading Aishell-2 in lazy mode 2023-10-09 10:21:05,991 INFO [multi_dataset.py:73] (1/4) Loading Aishell-4 in lazy mode 2023-10-09 10:21:05,994 INFO [multi_dataset.py:85] (1/4) Loading ST-CMDS in lazy mode 2023-10-09 10:21:05,995 INFO [multi_dataset.py:89] (1/4) Loading Primewords in lazy mode 2023-10-09 10:21:05,996 INFO [multi_dataset.py:95] (1/4) Loading MagicData in lazy mode 2023-10-09 10:21:05,997 INFO [multi_dataset.py:101] (1/4) Loading Aidatatang_200zh in lazy mode 2023-10-09 10:21:05,998 INFO [multi_dataset.py:107] (1/4) Loading Ali-Meeting in lazy mode 2023-10-09 10:21:05,999 INFO [multi_dataset.py:113] (1/4) Loading WeNetSpeech in lazy mode 2023-10-09 10:21:06,000 INFO [multi_dataset.py:119] (1/4) Loading KeSpeech in lazy mode 2023-10-09 10:22:53,486 INFO [asr_datamodule.py:218] (1/4) Enable MUSAN 2023-10-09 10:22:53,487 INFO [asr_datamodule.py:219] (1/4) About to get Musan cuts 2023-10-09 10:22:55,734 INFO [asr_datamodule.py:243] (1/4) Enable SpecAugment 2023-10-09 10:22:55,734 INFO [asr_datamodule.py:244] (1/4) Time warp factor: 80 2023-10-09 10:22:55,735 INFO [asr_datamodule.py:254] (1/4) Num frame mask: 10 2023-10-09 10:22:55,735 INFO [asr_datamodule.py:267] (1/4) About to create train dataset 2023-10-09 10:22:55,735 INFO [asr_datamodule.py:294] (1/4) Using DynamicBucketingSampler. 2023-10-09 10:22:59,089 INFO [asr_datamodule.py:309] (1/4) About to create train dataloader 2023-10-09 10:22:59,090 INFO [multi_dataset.py:161] (1/4) About to get multidataset dev cuts 2023-10-09 10:22:59,090 INFO [multi_dataset.py:164] (1/4) Loading Aidatatang_200zh DEV set in lazy mode 2023-10-09 10:22:59,092 INFO [multi_dataset.py:170] (1/4) Loading Aishell DEV set in lazy mode 2023-10-09 10:22:59,093 INFO [multi_dataset.py:176] (1/4) Loading Aishell-2 DEV set in lazy mode 2023-10-09 10:22:59,094 INFO [multi_dataset.py:182] (1/4) Loading Ali-Meeting DEV set in lazy mode 2023-10-09 10:22:59,095 INFO [multi_dataset.py:188] (1/4) Loading MagicData DEV set in lazy mode 2023-10-09 10:22:59,096 INFO [multi_dataset.py:194] (1/4) Loading KeSpeech DEV set in lazy mode 2023-10-09 10:22:59,098 INFO [multi_dataset.py:203] (1/4) Loading WeNetSpeech DEV set in lazy mode 2023-10-09 10:22:59,099 INFO [asr_datamodule.py:340] (1/4) About to create dev dataset 2023-10-09 10:22:59,578 INFO [asr_datamodule.py:357] (1/4) About to create dev dataloader 2023-10-09 10:22:59,578 INFO [train.py:1243] (1/4) Loading grad scaler state dict 2023-10-09 10:23:19,460 INFO [train.py:1031] (1/4) Epoch 14, batch 0, loss[loss=0.2001, simple_loss=0.2558, pruned_loss=0.05294, ctc_loss=0.09639, over 16789.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.2558, pruned_loss=0.05294, ctc_loss=0.09639, over 16789.00 frames. ], batch size: 243, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:23:19,460 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 10:23:33,186 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2325, simple_loss=0.3081, pruned_loss=0.06029, ctc_loss=0.09091, over 1796401.00 frames. 2023-10-09 10:23:33,186 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 12738MB 2023-10-09 10:23:45,081 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2728796.0, ans=0.125 2023-10-09 10:23:48,867 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.348e+02 4.018e+02 4.917e+02 9.056e+02, threshold=8.035e+02, percent-clipped=7.0 2023-10-09 10:24:06,947 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2728842.6666666665, ans=0.0 2023-10-09 10:24:13,129 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2023-10-09 10:24:31,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2728936.0, ans=0.125 2023-10-09 10:24:33,526 INFO [train.py:1031] (1/4) Epoch 14, batch 50, loss[loss=0.2554, simple_loss=0.3223, pruned_loss=0.06933, ctc_loss=0.1245, over 16825.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2796, pruned_loss=0.06258, ctc_loss=0.109, over 744070.57 frames. ], batch size: 292, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:24:41,275 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2728982.6666666665, ans=0.1 2023-10-09 10:24:57,715 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2729076.0, ans=22.5 2023-10-09 10:25:09,079 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2729122.6666666665, ans=0.125 2023-10-09 10:25:25,410 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2729169.3333333335, ans=0.2 2023-10-09 10:25:34,390 INFO [train.py:1031] (1/4) Epoch 14, batch 100, loss[loss=0.1748, simple_loss=0.2116, pruned_loss=0.05256, ctc_loss=0.08191, over 10610.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2977, pruned_loss=0.0662, ctc_loss=0.1162, over 1304831.32 frames. ], batch size: 38, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:25:48,688 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2729262.6666666665, ans=0.125 2023-10-09 10:25:49,380 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.274e+02 3.784e+02 4.390e+02 8.009e+02, threshold=7.568e+02, percent-clipped=0.0 2023-10-09 10:25:57,287 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2729309.3333333335, ans=0.05 2023-10-09 10:26:32,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2729402.6666666665, ans=0.0 2023-10-09 10:26:34,425 INFO [train.py:1031] (1/4) Epoch 14, batch 150, loss[loss=0.2338, simple_loss=0.2866, pruned_loss=0.06833, ctc_loss=0.1107, over 16720.00 frames. ], tot_loss[loss=0.2452, simple_loss=0.3099, pruned_loss=0.06665, ctc_loss=0.1182, over 1739147.85 frames. ], batch size: 111, lr: 2.60e-03, grad_scale: 1.0 2023-10-09 10:26:36,935 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2729449.3333333335, ans=0.125 2023-10-09 10:26:42,399 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2729449.3333333335, ans=0.5 2023-10-09 10:26:56,982 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2729496.0, ans=0.0 2023-10-09 10:27:00,748 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2729542.6666666665, ans=0.125 2023-10-09 10:27:26,775 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-10-09 10:27:29,983 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2729636.0, ans=0.125 2023-10-09 10:27:32,033 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2729636.0, ans=0.125 2023-10-09 10:27:33,421 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2729636.0, ans=22.5 2023-10-09 10:27:36,039 INFO [train.py:1031] (1/4) Epoch 14, batch 200, loss[loss=0.226, simple_loss=0.2843, pruned_loss=0.06234, ctc_loss=0.1077, over 16741.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.3114, pruned_loss=0.06846, ctc_loss=0.1215, over 2084943.46 frames. ], batch size: 140, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:27:54,281 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.044e+02 3.577e+02 4.251e+02 7.739e+02, threshold=7.154e+02, percent-clipped=1.0 2023-10-09 10:27:57,896 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2729729.3333333335, ans=0.2 2023-10-09 10:28:29,672 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2729869.3333333335, ans=0.0 2023-10-09 10:28:34,101 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2729869.3333333335, ans=0.0 2023-10-09 10:28:35,944 INFO [train.py:1031] (1/4) Epoch 14, batch 250, loss[loss=0.2128, simple_loss=0.281, pruned_loss=0.05463, ctc_loss=0.08841, over 16792.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.31, pruned_loss=0.06598, ctc_loss=0.1172, over 2356327.93 frames. ], batch size: 188, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:28:37,939 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2729916.0, ans=0.2 2023-10-09 10:28:48,870 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.99 vs. limit=10.0 2023-10-09 10:29:15,012 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.61 vs. limit=10.0 2023-10-09 10:29:28,837 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-10-09 10:29:31,142 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2023-10-09 10:29:37,211 INFO [train.py:1031] (1/4) Epoch 14, batch 300, loss[loss=0.2249, simple_loss=0.2807, pruned_loss=0.06321, ctc_loss=0.1069, over 16693.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.3044, pruned_loss=0.06363, ctc_loss=0.1129, over 2562906.59 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:29:56,420 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+02 3.126e+02 3.650e+02 4.282e+02 7.513e+02, threshold=7.299e+02, percent-clipped=1.0 2023-10-09 10:30:29,352 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2730336.0, ans=0.125 2023-10-09 10:30:38,033 INFO [train.py:1031] (1/4) Epoch 14, batch 350, loss[loss=0.2226, simple_loss=0.2784, pruned_loss=0.06197, ctc_loss=0.107, over 16900.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.3038, pruned_loss=0.06664, ctc_loss=0.1175, over 2721127.70 frames. ], batch size: 242, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:30:57,110 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2730429.3333333335, ans=0.125 2023-10-09 10:31:00,194 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:31:14,656 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2730522.6666666665, ans=0.125 2023-10-09 10:31:19,190 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-10-09 10:31:25,685 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2730569.3333333335, ans=0.125 2023-10-09 10:31:34,089 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-10-09 10:31:35,754 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2730569.3333333335, ans=0.125 2023-10-09 10:31:38,085 INFO [train.py:1031] (1/4) Epoch 14, batch 400, loss[loss=0.2051, simple_loss=0.2359, pruned_loss=0.0632, ctc_loss=0.12, over 15483.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2977, pruned_loss=0.06691, ctc_loss=0.1176, over 2853989.61 frames. ], batch size: 526, lr: 2.60e-03, grad_scale: 8.0 2023-10-09 10:31:53,132 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:31:57,468 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.680e+02 3.285e+02 3.968e+02 4.685e+02 8.332e+02, threshold=7.936e+02, percent-clipped=1.0 2023-10-09 10:32:07,507 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2730709.3333333335, ans=0.125 2023-10-09 10:32:18,193 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-10-09 10:32:18,777 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2730756.0, ans=0.015 2023-10-09 10:32:23,249 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2730756.0, ans=0.0 2023-10-09 10:32:39,458 INFO [train.py:1031] (1/4) Epoch 14, batch 450, loss[loss=0.2152, simple_loss=0.2925, pruned_loss=0.05042, ctc_loss=0.09245, over 16780.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2966, pruned_loss=0.06676, ctc_loss=0.1173, over 2955813.88 frames. ], batch size: 188, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:33:40,962 INFO [train.py:1031] (1/4) Epoch 14, batch 500, loss[loss=0.2023, simple_loss=0.2551, pruned_loss=0.05654, ctc_loss=0.09107, over 16755.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2902, pruned_loss=0.06473, ctc_loss=0.1135, over 3036519.78 frames. ], batch size: 140, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:33:43,641 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=22.5 2023-10-09 10:33:54,853 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2731129.3333333335, ans=0.0 2023-10-09 10:34:00,456 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.135e+02 3.674e+02 4.514e+02 8.848e+02, threshold=7.348e+02, percent-clipped=4.0 2023-10-09 10:34:07,985 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2731176.0, ans=0.1 2023-10-09 10:34:32,370 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2731269.3333333335, ans=0.125 2023-10-09 10:34:34,590 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2731269.3333333335, ans=0.125 2023-10-09 10:34:41,171 INFO [train.py:1031] (1/4) Epoch 14, batch 550, loss[loss=0.1834, simple_loss=0.2367, pruned_loss=0.04822, ctc_loss=0.08418, over 16716.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2831, pruned_loss=0.064, ctc_loss=0.1119, over 3092212.83 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:34:49,606 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2731316.0, ans=0.125 2023-10-09 10:35:14,441 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2731409.3333333335, ans=0.1 2023-10-09 10:35:14,460 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2731409.3333333335, ans=0.125 2023-10-09 10:35:14,494 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2731409.3333333335, ans=0.125 2023-10-09 10:35:32,685 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2023-10-09 10:35:42,187 INFO [train.py:1031] (1/4) Epoch 14, batch 600, loss[loss=0.2229, simple_loss=0.262, pruned_loss=0.0675, ctc_loss=0.1222, over 16763.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2781, pruned_loss=0.06354, ctc_loss=0.111, over 3133716.60 frames. ], batch size: 329, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:35:43,587 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2731549.3333333335, ans=0.125 2023-10-09 10:36:02,848 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 3.049e+02 3.429e+02 4.091e+02 7.448e+02, threshold=6.859e+02, percent-clipped=1.0 2023-10-09 10:36:11,850 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2731642.6666666665, ans=10.0 2023-10-09 10:36:43,672 INFO [train.py:1031] (1/4) Epoch 14, batch 650, loss[loss=0.2082, simple_loss=0.2474, pruned_loss=0.06317, ctc_loss=0.1069, over 16840.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.273, pruned_loss=0.06317, ctc_loss=0.11, over 3167445.17 frames. ], batch size: 121, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:36:46,425 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=22.5 2023-10-09 10:36:48,429 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-10-09 10:36:54,219 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2731829.3333333335, ans=0.125 2023-10-09 10:36:57,800 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=12.0 2023-10-09 10:37:06,389 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=12.0 2023-10-09 10:37:23,062 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2731922.6666666665, ans=0.125 2023-10-09 10:37:43,523 INFO [train.py:1031] (1/4) Epoch 14, batch 700, loss[loss=0.1955, simple_loss=0.2666, pruned_loss=0.0457, ctc_loss=0.08238, over 16798.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2702, pruned_loss=0.05972, ctc_loss=0.1046, over 3196783.32 frames. ], batch size: 176, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:37:49,936 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2732016.0, ans=0.95 2023-10-09 10:37:54,154 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2732062.6666666665, ans=0.125 2023-10-09 10:37:58,507 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732062.6666666665, ans=0.1 2023-10-09 10:38:00,150 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2732062.6666666665, ans=0.09899494936611666 2023-10-09 10:38:04,579 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2732062.6666666665, ans=0.0 2023-10-09 10:38:06,357 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.869e+02 3.199e+02 3.835e+02 8.884e+02, threshold=6.398e+02, percent-clipped=1.0 2023-10-09 10:38:33,288 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2732202.6666666665, ans=0.0 2023-10-09 10:38:39,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2732202.6666666665, ans=0.5 2023-10-09 10:38:44,811 INFO [train.py:1031] (1/4) Epoch 14, batch 750, loss[loss=0.2368, simple_loss=0.3099, pruned_loss=0.06104, ctc_loss=0.1043, over 16794.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2788, pruned_loss=0.05798, ctc_loss=0.1028, over 3221273.63 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:38:56,141 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2732249.3333333335, ans=0.125 2023-10-09 10:38:59,939 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-10-09 10:39:10,521 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2732342.6666666665, ans=0.0 2023-10-09 10:39:11,655 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:39:23,397 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2732389.3333333335, ans=0.125 2023-10-09 10:39:48,675 INFO [train.py:1031] (1/4) Epoch 14, batch 800, loss[loss=0.246, simple_loss=0.3579, pruned_loss=0.04874, ctc_loss=0.09157, over 15177.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2934, pruned_loss=0.06011, ctc_loss=0.1073, over 3233392.65 frames. ], batch size: 526, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:40:02,952 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2023-10-09 10:40:12,733 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.375e+02 4.289e+02 5.326e+02 8.856e+02, threshold=8.578e+02, percent-clipped=11.0 2023-10-09 10:40:21,453 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2732576.0, ans=0.125 2023-10-09 10:40:40,944 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-10-09 10:40:49,314 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2732716.0, ans=0.125 2023-10-09 10:40:50,031 INFO [train.py:1031] (1/4) Epoch 14, batch 850, loss[loss=0.2083, simple_loss=0.2584, pruned_loss=0.05979, ctc_loss=0.09675, over 16547.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2991, pruned_loss=0.06123, ctc_loss=0.1094, over 3249724.72 frames. ], batch size: 110, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:41:30,291 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732856.0, ans=0.1 2023-10-09 10:41:44,398 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2732902.6666666665, ans=0.125 2023-10-09 10:41:49,387 INFO [train.py:1031] (1/4) Epoch 14, batch 900, loss[loss=0.2236, simple_loss=0.2801, pruned_loss=0.06197, ctc_loss=0.1079, over 16806.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2979, pruned_loss=0.06154, ctc_loss=0.1094, over 3255456.60 frames. ], batch size: 121, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:42:01,518 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2732996.0, ans=0.0 2023-10-09 10:42:17,044 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.289e+02 4.047e+02 4.926e+02 9.646e+02, threshold=8.093e+02, percent-clipped=3.0 2023-10-09 10:42:18,587 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2733042.6666666665, ans=0.0 2023-10-09 10:42:51,361 INFO [train.py:1031] (1/4) Epoch 14, batch 950, loss[loss=0.2223, simple_loss=0.2779, pruned_loss=0.06293, ctc_loss=0.1023, over 17013.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2964, pruned_loss=0.06325, ctc_loss=0.1119, over 3263418.38 frames. ], batch size: 86, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:43:07,965 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-10-09 10:43:26,997 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2733322.6666666665, ans=0.0 2023-10-09 10:43:38,513 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2733369.3333333335, ans=0.125 2023-10-09 10:43:51,557 INFO [train.py:1031] (1/4) Epoch 14, batch 1000, loss[loss=0.2073, simple_loss=0.2646, pruned_loss=0.0563, ctc_loss=0.09348, over 16791.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.3017, pruned_loss=0.0661, ctc_loss=0.1166, over 3269678.74 frames. ], batch size: 121, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:43:55,828 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=22.5 2023-10-09 10:44:05,204 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2733462.6666666665, ans=0.125 2023-10-09 10:44:05,214 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2733462.6666666665, ans=0.125 2023-10-09 10:44:07,305 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2733462.6666666665, ans=0.07 2023-10-09 10:44:14,677 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2023-10-09 10:44:18,372 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+02 3.383e+02 4.196e+02 5.191e+02 1.287e+03, threshold=8.392e+02, percent-clipped=5.0 2023-10-09 10:44:30,720 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2733556.0, ans=0.0 2023-10-09 10:44:40,665 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2733602.6666666665, ans=0.0 2023-10-09 10:44:48,956 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2733602.6666666665, ans=0.2 2023-10-09 10:44:49,932 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733602.6666666665, ans=0.1 2023-10-09 10:44:52,742 INFO [train.py:1031] (1/4) Epoch 14, batch 1050, loss[loss=0.2336, simple_loss=0.2738, pruned_loss=0.0725, ctc_loss=0.1211, over 16811.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.296, pruned_loss=0.06468, ctc_loss=0.1139, over 3277809.78 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:45:06,127 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2733696.0, ans=0.0 2023-10-09 10:45:06,253 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2733696.0, ans=0.0 2023-10-09 10:45:09,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2733696.0, ans=0.125 2023-10-09 10:45:19,631 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2733742.6666666665, ans=0.0 2023-10-09 10:45:21,788 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733742.6666666665, ans=0.1 2023-10-09 10:45:52,761 INFO [train.py:1031] (1/4) Epoch 14, batch 1100, loss[loss=0.226, simple_loss=0.2789, pruned_loss=0.0631, ctc_loss=0.1174, over 16939.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2917, pruned_loss=0.06545, ctc_loss=0.1152, over 3289648.96 frames. ], batch size: 243, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:46:07,670 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2733929.3333333335, ans=0.2 2023-10-09 10:46:12,772 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2023-10-09 10:46:17,195 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2733976.0, ans=0.0 2023-10-09 10:46:21,063 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.252e+02 3.598e+02 4.155e+02 7.430e+02, threshold=7.195e+02, percent-clipped=0.0 2023-10-09 10:46:34,014 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=22.5 2023-10-09 10:46:51,903 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2734116.0, ans=0.1 2023-10-09 10:46:52,284 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2023-10-09 10:46:52,568 INFO [train.py:1031] (1/4) Epoch 14, batch 1150, loss[loss=0.2159, simple_loss=0.2653, pruned_loss=0.06102, ctc_loss=0.1112, over 16742.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2862, pruned_loss=0.06467, ctc_loss=0.1135, over 3297921.67 frames. ], batch size: 291, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:47:04,035 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2734162.6666666665, ans=0.0 2023-10-09 10:47:27,180 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2734256.0, ans=0.125 2023-10-09 10:47:51,158 INFO [train.py:1031] (1/4) Epoch 14, batch 1200, loss[loss=0.2111, simple_loss=0.2612, pruned_loss=0.05875, ctc_loss=0.1089, over 16782.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2795, pruned_loss=0.06415, ctc_loss=0.1126, over 3305166.61 frames. ], batch size: 242, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:47:55,746 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2734349.3333333335, ans=0.1 2023-10-09 10:47:59,396 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2734349.3333333335, ans=0.1 2023-10-09 10:48:05,213 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2734396.0, ans=0.125 2023-10-09 10:48:13,221 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2734396.0, ans=0.0 2023-10-09 10:48:20,091 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=8.0 2023-10-09 10:48:20,269 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 2.978e+02 3.440e+02 3.913e+02 6.490e+02, threshold=6.880e+02, percent-clipped=0.0 2023-10-09 10:48:30,483 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-10-09 10:48:51,818 INFO [train.py:1031] (1/4) Epoch 14, batch 1250, loss[loss=0.2201, simple_loss=0.2774, pruned_loss=0.06231, ctc_loss=0.09559, over 16939.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2774, pruned_loss=0.06402, ctc_loss=0.1122, over 3314582.84 frames. ], batch size: 86, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:48:56,676 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2734582.6666666665, ans=0.125 2023-10-09 10:49:03,020 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2734629.3333333335, ans=0.0 2023-10-09 10:49:25,219 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2734676.0, ans=0.125 2023-10-09 10:49:53,653 INFO [train.py:1031] (1/4) Epoch 14, batch 1300, loss[loss=0.2985, simple_loss=0.3135, pruned_loss=0.1046, ctc_loss=0.1858, over 16833.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2782, pruned_loss=0.06551, ctc_loss=0.1147, over 3312269.45 frames. ], batch size: 384, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:50:06,560 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2734862.6666666665, ans=0.125 2023-10-09 10:50:25,185 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+02 3.518e+02 3.904e+02 4.606e+02 8.060e+02, threshold=7.809e+02, percent-clipped=2.0 2023-10-09 10:50:30,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2734956.0, ans=0.125 2023-10-09 10:50:39,365 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2734956.0, ans=0.0 2023-10-09 10:50:41,995 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2735002.6666666665, ans=0.125 2023-10-09 10:50:54,223 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2735049.3333333335, ans=0.125 2023-10-09 10:50:54,948 INFO [train.py:1031] (1/4) Epoch 14, batch 1350, loss[loss=0.2091, simple_loss=0.2524, pruned_loss=0.06177, ctc_loss=0.1057, over 16746.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2768, pruned_loss=0.06594, ctc_loss=0.1153, over 3316205.51 frames. ], batch size: 130, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:50:57,948 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2735049.3333333335, ans=0.125 2023-10-09 10:51:16,091 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2735096.0, ans=0.125 2023-10-09 10:51:22,928 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2735142.6666666665, ans=0.0 2023-10-09 10:51:28,427 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2735142.6666666665, ans=0.125 2023-10-09 10:51:39,383 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2735189.3333333335, ans=0.125 2023-10-09 10:51:46,929 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2735236.0, ans=0.1 2023-10-09 10:51:55,994 INFO [train.py:1031] (1/4) Epoch 14, batch 1400, loss[loss=0.2252, simple_loss=0.2795, pruned_loss=0.06438, ctc_loss=0.1052, over 16937.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2733, pruned_loss=0.06569, ctc_loss=0.115, over 3321207.32 frames. ], batch size: 78, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:51:59,614 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2735282.6666666665, ans=0.2 2023-10-09 10:52:07,191 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2735329.3333333335, ans=0.1 2023-10-09 10:52:18,925 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2735376.0, ans=0.125 2023-10-09 10:52:25,174 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2023-10-09 10:52:27,996 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.259e+02 3.795e+02 4.545e+02 1.175e+03, threshold=7.590e+02, percent-clipped=1.0 2023-10-09 10:52:34,918 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2735422.6666666665, ans=0.125 2023-10-09 10:52:55,146 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2735516.0, ans=0.2 2023-10-09 10:52:55,876 INFO [train.py:1031] (1/4) Epoch 14, batch 1450, loss[loss=0.2033, simple_loss=0.2735, pruned_loss=0.04933, ctc_loss=0.08625, over 16770.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2761, pruned_loss=0.06403, ctc_loss=0.1122, over 3317974.26 frames. ], batch size: 188, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:53:02,433 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2735516.0, ans=0.2 2023-10-09 10:53:08,556 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2023-10-09 10:53:23,035 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2735609.3333333335, ans=0.125 2023-10-09 10:53:30,712 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=22.5 2023-10-09 10:53:32,147 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2735656.0, ans=0.125 2023-10-09 10:53:48,954 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2735702.6666666665, ans=0.0 2023-10-09 10:53:53,794 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2735702.6666666665, ans=0.0 2023-10-09 10:53:57,020 INFO [train.py:1031] (1/4) Epoch 14, batch 1500, loss[loss=0.2725, simple_loss=0.2988, pruned_loss=0.09161, ctc_loss=0.1575, over 16628.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2776, pruned_loss=0.06397, ctc_loss=0.112, over 3316524.70 frames. ], batch size: 416, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:54:01,269 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2023-10-09 10:54:26,559 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2735842.6666666665, ans=0.2 2023-10-09 10:54:32,300 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+02 3.308e+02 3.843e+02 4.778e+02 1.080e+03, threshold=7.686e+02, percent-clipped=1.0 2023-10-09 10:54:34,024 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=22.5 2023-10-09 10:54:34,074 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-10-09 10:54:47,077 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=12.0 2023-10-09 10:55:00,099 INFO [train.py:1031] (1/4) Epoch 14, batch 1550, loss[loss=0.1151, simple_loss=0.1609, pruned_loss=0.0254, ctc_loss=0.04607, over 10873.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2773, pruned_loss=0.06323, ctc_loss=0.1111, over 3304751.02 frames. ], batch size: 39, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:55:03,261 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2735982.6666666665, ans=0.0 2023-10-09 10:55:10,033 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2735982.6666666665, ans=0.125 2023-10-09 10:55:12,691 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2736029.3333333335, ans=0.125 2023-10-09 10:55:13,780 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2736029.3333333335, ans=0.0 2023-10-09 10:55:15,902 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2023-10-09 10:55:34,145 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2736076.0, ans=0.0 2023-10-09 10:55:48,397 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2736169.3333333335, ans=0.0 2023-10-09 10:56:01,565 INFO [train.py:1031] (1/4) Epoch 14, batch 1600, loss[loss=0.2101, simple_loss=0.2634, pruned_loss=0.05861, ctc_loss=0.0989, over 16846.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2745, pruned_loss=0.05933, ctc_loss=0.1046, over 3308902.03 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:56:24,728 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-10-09 10:56:36,435 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.648e+02 3.123e+02 3.834e+02 1.151e+03, threshold=6.247e+02, percent-clipped=2.0 2023-10-09 10:56:54,250 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2736402.6666666665, ans=0.05 2023-10-09 10:57:00,788 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2736449.3333333335, ans=0.0 2023-10-09 10:57:01,586 INFO [train.py:1031] (1/4) Epoch 14, batch 1650, loss[loss=0.3063, simple_loss=0.3321, pruned_loss=0.1027, ctc_loss=0.1877, over 16793.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2761, pruned_loss=0.06109, ctc_loss=0.1072, over 3315091.01 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:57:02,279 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-10-09 10:57:31,030 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2736542.6666666665, ans=0.2 2023-10-09 10:57:35,654 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2736542.6666666665, ans=0.2 2023-10-09 10:58:03,245 INFO [train.py:1031] (1/4) Epoch 14, batch 1700, loss[loss=0.2524, simple_loss=0.308, pruned_loss=0.07216, ctc_loss=0.1313, over 16918.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2828, pruned_loss=0.06468, ctc_loss=0.1134, over 3309318.51 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:58:09,826 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-10-09 10:58:11,139 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2736682.6666666665, ans=0.2 2023-10-09 10:58:26,713 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2736776.0, ans=0.125 2023-10-09 10:58:38,776 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.292e+02 3.848e+02 4.651e+02 1.016e+03, threshold=7.697e+02, percent-clipped=4.0 2023-10-09 10:58:54,098 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2736869.3333333335, ans=0.0 2023-10-09 10:59:04,568 INFO [train.py:1031] (1/4) Epoch 14, batch 1750, loss[loss=0.244, simple_loss=0.2986, pruned_loss=0.07178, ctc_loss=0.1148, over 17029.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2868, pruned_loss=0.06658, ctc_loss=0.1168, over 3313289.18 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:59:18,546 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2736962.6666666665, ans=0.125 2023-10-09 10:59:29,049 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=22.5 2023-10-09 10:59:50,509 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2737056.0, ans=0.125 2023-10-09 11:00:02,755 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2737102.6666666665, ans=0.1 2023-10-09 11:00:05,527 INFO [train.py:1031] (1/4) Epoch 14, batch 1800, loss[loss=0.2476, simple_loss=0.313, pruned_loss=0.06648, ctc_loss=0.1231, over 16787.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2861, pruned_loss=0.06513, ctc_loss=0.1149, over 3310180.49 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:00:16,194 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2737149.3333333335, ans=0.2 2023-10-09 11:00:19,439 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2737196.0, ans=0.125 2023-10-09 11:00:43,557 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.917e+02 3.383e+02 3.800e+02 1.043e+03, threshold=6.767e+02, percent-clipped=1.0 2023-10-09 11:00:48,515 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2737289.3333333335, ans=0.1 2023-10-09 11:01:06,587 INFO [train.py:1031] (1/4) Epoch 14, batch 1850, loss[loss=0.202, simple_loss=0.2799, pruned_loss=0.04549, ctc_loss=0.08259, over 16789.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2874, pruned_loss=0.0627, ctc_loss=0.111, over 3309332.20 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:01:46,991 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2737522.6666666665, ans=0.125 2023-10-09 11:01:57,749 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2737569.3333333335, ans=0.125 2023-10-09 11:02:06,433 INFO [train.py:1031] (1/4) Epoch 14, batch 1900, loss[loss=0.2214, simple_loss=0.2885, pruned_loss=0.0569, ctc_loss=0.1014, over 16887.00 frames. ], tot_loss[loss=0.228, simple_loss=0.287, pruned_loss=0.06245, ctc_loss=0.1102, over 3311564.42 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:02:21,668 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2737662.6666666665, ans=0.125 2023-10-09 11:02:33,237 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2737709.3333333335, ans=0.125 2023-10-09 11:02:43,659 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.102e+02 3.624e+02 4.440e+02 7.780e+02, threshold=7.248e+02, percent-clipped=1.0 2023-10-09 11:02:44,524 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.72 vs. limit=15.0 2023-10-09 11:03:06,284 INFO [train.py:1031] (1/4) Epoch 14, batch 1950, loss[loss=0.209, simple_loss=0.2782, pruned_loss=0.05027, ctc_loss=0.09817, over 16847.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.288, pruned_loss=0.06221, ctc_loss=0.1098, over 3312997.02 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:03:46,177 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2737989.3333333335, ans=0.0 2023-10-09 11:03:47,228 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2737989.3333333335, ans=0.0 2023-10-09 11:03:59,832 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=12.0 2023-10-09 11:04:08,694 INFO [train.py:1031] (1/4) Epoch 14, batch 2000, loss[loss=0.2368, simple_loss=0.2739, pruned_loss=0.07256, ctc_loss=0.1362, over 15346.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2903, pruned_loss=0.0634, ctc_loss=0.1119, over 3295264.56 frames. ], batch size: 529, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:04:15,453 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2738082.6666666665, ans=0.2 2023-10-09 11:04:24,720 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2738129.3333333335, ans=0.125 2023-10-09 11:04:36,367 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2738176.0, ans=0.125 2023-10-09 11:04:40,758 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-10-09 11:04:46,687 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2738222.6666666665, ans=0.125 2023-10-09 11:04:48,440 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+02 3.379e+02 3.817e+02 4.663e+02 9.562e+02, threshold=7.635e+02, percent-clipped=5.0 2023-10-09 11:05:02,982 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2738269.3333333335, ans=15.0 2023-10-09 11:05:09,398 INFO [train.py:1031] (1/4) Epoch 14, batch 2050, loss[loss=0.2279, simple_loss=0.2741, pruned_loss=0.0683, ctc_loss=0.1129, over 16506.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2948, pruned_loss=0.06585, ctc_loss=0.1161, over 3295670.10 frames. ], batch size: 70, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:05:14,449 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2738316.0, ans=0.125 2023-10-09 11:05:53,411 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2738456.0, ans=0.0 2023-10-09 11:06:10,928 INFO [train.py:1031] (1/4) Epoch 14, batch 2100, loss[loss=0.218, simple_loss=0.2618, pruned_loss=0.06458, ctc_loss=0.1123, over 16725.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2935, pruned_loss=0.06701, ctc_loss=0.1181, over 3298498.84 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:06:18,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2738549.3333333335, ans=0.125 2023-10-09 11:06:38,528 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2738642.6666666665, ans=0.2 2023-10-09 11:06:53,253 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.150e+02 3.651e+02 4.552e+02 6.884e+02, threshold=7.301e+02, percent-clipped=0.0 2023-10-09 11:07:13,876 INFO [train.py:1031] (1/4) Epoch 14, batch 2150, loss[loss=0.2901, simple_loss=0.3554, pruned_loss=0.08352, ctc_loss=0.1443, over 16473.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2952, pruned_loss=0.0659, ctc_loss=0.1169, over 3305375.05 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:07:16,941 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2738782.6666666665, ans=0.1 2023-10-09 11:07:17,159 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=12.0 2023-10-09 11:07:19,611 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2738782.6666666665, ans=0.035 2023-10-09 11:07:48,762 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-10-09 11:08:04,362 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2738969.3333333335, ans=0.0 2023-10-09 11:08:11,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2738969.3333333335, ans=0.125 2023-10-09 11:08:14,605 INFO [train.py:1031] (1/4) Epoch 14, batch 2200, loss[loss=0.2143, simple_loss=0.2693, pruned_loss=0.05838, ctc_loss=0.1063, over 16951.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2941, pruned_loss=0.0661, ctc_loss=0.1171, over 3305248.42 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:08:58,372 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.214e+02 3.681e+02 4.721e+02 1.015e+03, threshold=7.363e+02, percent-clipped=4.0 2023-10-09 11:09:16,599 INFO [train.py:1031] (1/4) Epoch 14, batch 2250, loss[loss=0.1863, simple_loss=0.2413, pruned_loss=0.04905, ctc_loss=0.08281, over 16787.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.288, pruned_loss=0.06546, ctc_loss=0.1158, over 3306510.16 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:09:35,168 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2023-10-09 11:09:39,885 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:09:45,448 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2739342.6666666665, ans=0.2 2023-10-09 11:10:05,237 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2739436.0, ans=0.0 2023-10-09 11:10:09,017 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2739436.0, ans=0.125 2023-10-09 11:10:18,410 INFO [train.py:1031] (1/4) Epoch 14, batch 2300, loss[loss=0.3036, simple_loss=0.3183, pruned_loss=0.1068, ctc_loss=0.1882, over 16603.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2807, pruned_loss=0.06433, ctc_loss=0.1139, over 3310865.12 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:10:34,745 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2739529.3333333335, ans=0.125 2023-10-09 11:11:04,760 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+02 3.279e+02 3.727e+02 4.728e+02 7.971e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 11:11:08,934 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2739669.3333333335, ans=0.125 2023-10-09 11:11:11,703 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739669.3333333335, ans=0.1 2023-10-09 11:11:21,214 INFO [train.py:1031] (1/4) Epoch 14, batch 2350, loss[loss=0.251, simple_loss=0.3116, pruned_loss=0.07243, ctc_loss=0.1141, over 16932.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2837, pruned_loss=0.06627, ctc_loss=0.1173, over 3310119.29 frames. ], batch size: 91, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:11:23,102 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-10-09 11:11:29,281 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=22.5 2023-10-09 11:11:34,972 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2739762.6666666665, ans=0.125 2023-10-09 11:11:45,898 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2739809.3333333335, ans=0.125 2023-10-09 11:11:48,176 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-10-09 11:11:48,361 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2023-10-09 11:12:13,666 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-10-09 11:12:21,919 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2739949.3333333335, ans=0.125 2023-10-09 11:12:22,659 INFO [train.py:1031] (1/4) Epoch 14, batch 2400, loss[loss=0.2637, simple_loss=0.2936, pruned_loss=0.08638, ctc_loss=0.1525, over 16575.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2862, pruned_loss=0.06789, ctc_loss=0.1198, over 3311618.72 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:12:25,025 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2023-10-09 11:12:36,915 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2739996.0, ans=0.125 2023-10-09 11:12:59,465 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2740089.3333333335, ans=0.2 2023-10-09 11:13:07,193 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2740089.3333333335, ans=0.2 2023-10-09 11:13:08,802 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2740089.3333333335, ans=0.125 2023-10-09 11:13:09,496 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+02 3.337e+02 3.917e+02 4.663e+02 1.051e+03, threshold=7.833e+02, percent-clipped=2.0 2023-10-09 11:13:18,847 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2023-10-09 11:13:25,550 INFO [train.py:1031] (1/4) Epoch 14, batch 2450, loss[loss=0.2102, simple_loss=0.2626, pruned_loss=0.05845, ctc_loss=0.1023, over 16897.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2844, pruned_loss=0.06754, ctc_loss=0.1185, over 3304000.78 frames. ], batch size: 82, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:13:44,511 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-10-09 11:14:14,889 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2023-10-09 11:14:28,352 INFO [train.py:1031] (1/4) Epoch 14, batch 2500, loss[loss=0.1844, simple_loss=0.2531, pruned_loss=0.04224, ctc_loss=0.0782, over 16678.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2794, pruned_loss=0.06277, ctc_loss=0.1109, over 3293308.96 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:14:38,187 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2740416.0, ans=0.125 2023-10-09 11:14:48,850 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2740462.6666666665, ans=0.2 2023-10-09 11:14:55,837 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-10-09 11:15:10,253 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2023-10-09 11:15:17,464 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.820e+02 3.218e+02 3.802e+02 1.081e+03, threshold=6.436e+02, percent-clipped=2.0 2023-10-09 11:15:31,782 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2023-10-09 11:15:33,266 INFO [train.py:1031] (1/4) Epoch 14, batch 2550, loss[loss=0.2035, simple_loss=0.2397, pruned_loss=0.06337, ctc_loss=0.1015, over 16485.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2825, pruned_loss=0.06235, ctc_loss=0.1092, over 3289747.78 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:15:37,969 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2740649.3333333335, ans=0.1 2023-10-09 11:16:10,123 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2740789.3333333335, ans=0.125 2023-10-09 11:16:35,628 INFO [train.py:1031] (1/4) Epoch 14, batch 2600, loss[loss=0.2415, simple_loss=0.2787, pruned_loss=0.07596, ctc_loss=0.1312, over 16413.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.281, pruned_loss=0.06209, ctc_loss=0.1083, over 3289191.29 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:16:39,208 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-10-09 11:17:07,844 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2740976.0, ans=0.05 2023-10-09 11:17:09,888 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2740976.0, ans=0.0 2023-10-09 11:17:23,703 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+02 2.979e+02 3.641e+02 4.453e+02 7.344e+02, threshold=7.282e+02, percent-clipped=4.0 2023-10-09 11:17:26,940 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741069.3333333335, ans=0.1 2023-10-09 11:17:32,287 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2741069.3333333335, ans=0.125 2023-10-09 11:17:37,765 INFO [train.py:1031] (1/4) Epoch 14, batch 2650, loss[loss=0.2298, simple_loss=0.2914, pruned_loss=0.06293, ctc_loss=0.1057, over 16773.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2834, pruned_loss=0.06125, ctc_loss=0.1074, over 3288107.52 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:18:29,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2741302.6666666665, ans=0.2 2023-10-09 11:18:34,063 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2741302.6666666665, ans=0.0 2023-10-09 11:18:39,291 INFO [train.py:1031] (1/4) Epoch 14, batch 2700, loss[loss=0.2381, simple_loss=0.2883, pruned_loss=0.06986, ctc_loss=0.1207, over 16735.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2874, pruned_loss=0.06345, ctc_loss=0.1111, over 3284746.50 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:18:46,961 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2741349.3333333335, ans=0.0 2023-10-09 11:18:47,982 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2741349.3333333335, ans=0.125 2023-10-09 11:18:50,157 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2741349.3333333335, ans=0.125 2023-10-09 11:18:55,271 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2741396.0, ans=0.025 2023-10-09 11:19:00,808 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2741396.0, ans=0.2 2023-10-09 11:19:01,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2741396.0, ans=0.125 2023-10-09 11:19:05,966 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2741442.6666666665, ans=0.125 2023-10-09 11:19:09,869 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2741442.6666666665, ans=0.125 2023-10-09 11:19:13,159 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=22.5 2023-10-09 11:19:17,922 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-10-09 11:19:25,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2741489.3333333335, ans=0.025 2023-10-09 11:19:31,011 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+02 3.579e+02 4.156e+02 4.960e+02 1.400e+03, threshold=8.312e+02, percent-clipped=4.0 2023-10-09 11:19:42,378 INFO [train.py:1031] (1/4) Epoch 14, batch 2750, loss[loss=0.2298, simple_loss=0.3048, pruned_loss=0.05633, ctc_loss=0.1054, over 16865.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2905, pruned_loss=0.06306, ctc_loss=0.1105, over 3274122.64 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:19:42,685 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2741582.6666666665, ans=0.2 2023-10-09 11:20:01,406 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2741629.3333333335, ans=0.1 2023-10-09 11:20:09,005 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2741676.0, ans=0.125 2023-10-09 11:20:44,739 INFO [train.py:1031] (1/4) Epoch 14, batch 2800, loss[loss=0.2047, simple_loss=0.251, pruned_loss=0.05825, ctc_loss=0.1047, over 16518.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2856, pruned_loss=0.05924, ctc_loss=0.1045, over 3267740.07 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:21:02,811 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2741862.6666666665, ans=0.0 2023-10-09 11:21:02,875 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:21:17,261 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2741909.3333333335, ans=0.125 2023-10-09 11:21:17,289 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2741909.3333333335, ans=0.1 2023-10-09 11:21:26,118 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741956.0, ans=0.1 2023-10-09 11:21:32,024 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2741956.0, ans=0.0 2023-10-09 11:21:35,708 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 3.033e+02 3.735e+02 4.727e+02 1.179e+03, threshold=7.471e+02, percent-clipped=1.0 2023-10-09 11:21:38,693 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2742002.6666666665, ans=0.1 2023-10-09 11:21:47,213 INFO [train.py:1031] (1/4) Epoch 14, batch 2850, loss[loss=0.1981, simple_loss=0.2772, pruned_loss=0.04413, ctc_loss=0.07686, over 16839.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2839, pruned_loss=0.05739, ctc_loss=0.1017, over 3263899.43 frames. ], batch size: 242, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:22:12,918 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2742142.6666666665, ans=0.1 2023-10-09 11:22:24,067 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2742142.6666666665, ans=0.2 2023-10-09 11:22:51,979 INFO [train.py:1031] (1/4) Epoch 14, batch 2900, loss[loss=0.2055, simple_loss=0.284, pruned_loss=0.04605, ctc_loss=0.08721, over 16877.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.285, pruned_loss=0.05529, ctc_loss=0.09857, over 3265492.21 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:22:53,735 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.24 vs. limit=10.0 2023-10-09 11:22:57,187 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2742282.6666666665, ans=0.125 2023-10-09 11:23:03,602 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-10-09 11:23:19,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2742376.0, ans=0.2 2023-10-09 11:23:28,070 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2742422.6666666665, ans=0.0 2023-10-09 11:23:31,285 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2742422.6666666665, ans=0.125 2023-10-09 11:23:43,139 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.117e+02 3.737e+02 4.874e+02 8.025e+02, threshold=7.473e+02, percent-clipped=2.0 2023-10-09 11:23:51,592 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=22.5 2023-10-09 11:23:52,691 INFO [train.py:1031] (1/4) Epoch 14, batch 2950, loss[loss=0.2708, simple_loss=0.3153, pruned_loss=0.08269, ctc_loss=0.1523, over 16874.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2861, pruned_loss=0.05588, ctc_loss=0.09952, over 3271990.31 frames. ], batch size: 291, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:23:55,733 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2742516.0, ans=0.0 2023-10-09 11:24:02,674 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2742516.0, ans=0.2 2023-10-09 11:24:09,152 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2742562.6666666665, ans=0.0 2023-10-09 11:24:15,725 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2742562.6666666665, ans=0.125 2023-10-09 11:24:16,731 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2742609.3333333335, ans=0.0 2023-10-09 11:24:19,695 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=12.0 2023-10-09 11:24:41,825 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2742702.6666666665, ans=0.125 2023-10-09 11:24:54,856 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2023-10-09 11:24:55,800 INFO [train.py:1031] (1/4) Epoch 14, batch 3000, loss[loss=0.2079, simple_loss=0.2611, pruned_loss=0.0578, ctc_loss=0.09801, over 16958.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2845, pruned_loss=0.05818, ctc_loss=0.1034, over 3272596.58 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:24:55,800 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 11:25:13,599 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2392, simple_loss=0.3062, pruned_loss=0.06637, ctc_loss=0.09863, over 1796401.00 frames. 2023-10-09 11:25:13,600 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14349MB 2023-10-09 11:25:18,758 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2742749.3333333335, ans=0.0 2023-10-09 11:25:21,600 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=12.0 2023-10-09 11:25:24,616 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2742796.0, ans=0.0 2023-10-09 11:25:26,779 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2742796.0, ans=0.0 2023-10-09 11:25:26,873 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2742796.0, ans=0.125 2023-10-09 11:25:31,202 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2742796.0, ans=0.1 2023-10-09 11:25:48,442 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2023-10-09 11:25:49,130 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2742889.3333333335, ans=0.125 2023-10-09 11:25:55,038 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2742889.3333333335, ans=0.125 2023-10-09 11:26:05,135 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+02 3.037e+02 3.527e+02 4.152e+02 6.631e+02, threshold=7.054e+02, percent-clipped=0.0 2023-10-09 11:26:11,357 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2742936.0, ans=0.1 2023-10-09 11:26:14,948 INFO [train.py:1031] (1/4) Epoch 14, batch 3050, loss[loss=0.1941, simple_loss=0.2437, pruned_loss=0.05305, ctc_loss=0.09594, over 16623.00 frames. ], tot_loss[loss=0.219, simple_loss=0.28, pruned_loss=0.05831, ctc_loss=0.1035, over 3277135.39 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:26:25,679 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2743029.3333333335, ans=0.0 2023-10-09 11:26:33,366 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2743029.3333333335, ans=0.04949747468305833 2023-10-09 11:26:43,944 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-10-09 11:26:48,493 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=22.5 2023-10-09 11:27:15,157 INFO [train.py:1031] (1/4) Epoch 14, batch 3100, loss[loss=0.2255, simple_loss=0.2789, pruned_loss=0.06377, ctc_loss=0.1113, over 16812.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2744, pruned_loss=0.05852, ctc_loss=0.1033, over 3279906.07 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:27:52,888 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-10-09 11:28:07,998 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.923e+02 3.339e+02 4.092e+02 6.355e+02, threshold=6.678e+02, percent-clipped=0.0 2023-10-09 11:28:12,206 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2743402.6666666665, ans=0.125 2023-10-09 11:28:12,214 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2743402.6666666665, ans=0.125 2023-10-09 11:28:15,027 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2743449.3333333335, ans=0.0 2023-10-09 11:28:15,797 INFO [train.py:1031] (1/4) Epoch 14, batch 3150, loss[loss=0.1983, simple_loss=0.2454, pruned_loss=0.05626, ctc_loss=0.09688, over 16927.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2702, pruned_loss=0.05675, ctc_loss=0.1002, over 3289972.16 frames. ], batch size: 78, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:28:27,783 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2023-10-09 11:28:37,320 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2743496.0, ans=0.2 2023-10-09 11:28:40,970 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=22.5 2023-10-09 11:29:08,266 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2743636.0, ans=0.2 2023-10-09 11:29:17,392 INFO [train.py:1031] (1/4) Epoch 14, batch 3200, loss[loss=0.2255, simple_loss=0.2886, pruned_loss=0.06079, ctc_loss=0.102, over 16649.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2795, pruned_loss=0.0575, ctc_loss=0.1027, over 3291217.39 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:29:54,500 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2743822.6666666665, ans=0.125 2023-10-09 11:29:56,589 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2743822.6666666665, ans=0.1 2023-10-09 11:30:11,549 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2743869.3333333335, ans=0.125 2023-10-09 11:30:12,208 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.362e+02 3.915e+02 4.671e+02 1.064e+03, threshold=7.829e+02, percent-clipped=5.0 2023-10-09 11:30:18,111 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.30 vs. limit=10.0 2023-10-09 11:30:18,611 INFO [train.py:1031] (1/4) Epoch 14, batch 3250, loss[loss=0.2235, simple_loss=0.2814, pruned_loss=0.06217, ctc_loss=0.1032, over 16739.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2821, pruned_loss=0.06041, ctc_loss=0.1074, over 3293879.34 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:30:44,595 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.96 vs. limit=10.0 2023-10-09 11:30:49,594 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2744009.3333333335, ans=0.125 2023-10-09 11:31:09,703 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-10-09 11:31:13,407 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2744102.6666666665, ans=0.125 2023-10-09 11:31:23,520 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=22.5 2023-10-09 11:31:23,914 INFO [train.py:1031] (1/4) Epoch 14, batch 3300, loss[loss=0.2457, simple_loss=0.2983, pruned_loss=0.07209, ctc_loss=0.1223, over 16770.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2874, pruned_loss=0.06298, ctc_loss=0.1115, over 3298877.95 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:31:31,441 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2744149.3333333335, ans=0.1 2023-10-09 11:31:34,078 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2744149.3333333335, ans=0.0 2023-10-09 11:31:34,103 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:31:51,738 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:32:03,696 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2744289.3333333335, ans=0.0 2023-10-09 11:32:20,913 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.186e+02 3.803e+02 4.439e+02 1.060e+03, threshold=7.606e+02, percent-clipped=1.0 2023-10-09 11:32:26,283 INFO [train.py:1031] (1/4) Epoch 14, batch 3350, loss[loss=0.2062, simple_loss=0.2452, pruned_loss=0.06244, ctc_loss=0.1056, over 16665.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2853, pruned_loss=0.06386, ctc_loss=0.1129, over 3292889.08 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:32:37,351 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2744382.6666666665, ans=0.1 2023-10-09 11:32:37,741 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-10-09 11:32:56,838 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:33:06,404 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-10-09 11:33:27,752 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2744569.3333333335, ans=0.125 2023-10-09 11:33:29,353 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2023-10-09 11:33:29,578 INFO [train.py:1031] (1/4) Epoch 14, batch 3400, loss[loss=0.2406, simple_loss=0.3352, pruned_loss=0.05375, ctc_loss=0.09631, over 16244.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2841, pruned_loss=0.0641, ctc_loss=0.1133, over 3300348.71 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:33:33,497 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2744616.0, ans=0.0 2023-10-09 11:33:51,397 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2744662.6666666665, ans=0.2 2023-10-09 11:33:58,427 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:33:59,539 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2744709.3333333335, ans=0.0 2023-10-09 11:34:03,779 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2744709.3333333335, ans=0.125 2023-10-09 11:34:19,779 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2744802.6666666665, ans=0.0 2023-10-09 11:34:26,697 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.089e+02 3.600e+02 4.217e+02 8.048e+02, threshold=7.200e+02, percent-clipped=1.0 2023-10-09 11:34:28,271 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2744802.6666666665, ans=0.0 2023-10-09 11:34:30,973 INFO [train.py:1031] (1/4) Epoch 14, batch 3450, loss[loss=0.2985, simple_loss=0.3413, pruned_loss=0.09256, ctc_loss=0.1766, over 16691.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2872, pruned_loss=0.06408, ctc_loss=0.1133, over 3294692.21 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:34:45,551 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2744896.0, ans=0.2 2023-10-09 11:35:18,684 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2745036.0, ans=0.025 2023-10-09 11:35:21,456 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2745036.0, ans=0.125 2023-10-09 11:35:26,235 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-10-09 11:35:32,418 INFO [train.py:1031] (1/4) Epoch 14, batch 3500, loss[loss=0.2226, simple_loss=0.2924, pruned_loss=0.05566, ctc_loss=0.1035, over 16758.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2854, pruned_loss=0.06231, ctc_loss=0.1103, over 3294946.60 frames. ], batch size: 272, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:35:56,217 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2745176.0, ans=0.0 2023-10-09 11:36:01,091 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2745176.0, ans=0.2 2023-10-09 11:36:28,666 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 2.946e+02 3.396e+02 4.316e+02 6.919e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 11:36:31,830 INFO [train.py:1031] (1/4) Epoch 14, batch 3550, loss[loss=0.2002, simple_loss=0.2536, pruned_loss=0.05546, ctc_loss=0.08964, over 16934.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2792, pruned_loss=0.06182, ctc_loss=0.1092, over 3299950.01 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:36:32,435 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-10-09 11:36:40,576 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2745316.0, ans=0.1 2023-10-09 11:36:44,347 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-10-09 11:36:54,037 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2745362.6666666665, ans=0.0 2023-10-09 11:37:14,032 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2745456.0, ans=0.0 2023-10-09 11:37:29,518 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2745502.6666666665, ans=0.1 2023-10-09 11:37:31,136 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2745502.6666666665, ans=0.125 2023-10-09 11:37:32,883 INFO [train.py:1031] (1/4) Epoch 14, batch 3600, loss[loss=0.2333, simple_loss=0.2609, pruned_loss=0.07629, ctc_loss=0.1329, over 16423.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2728, pruned_loss=0.06151, ctc_loss=0.1086, over 3299675.08 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:37:45,884 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2745596.0, ans=0.04949747468305833 2023-10-09 11:38:07,871 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2745689.3333333335, ans=0.1 2023-10-09 11:38:30,262 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2745736.0, ans=0.1 2023-10-09 11:38:31,202 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2745736.0, ans=0.05 2023-10-09 11:38:31,934 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.169e+02 3.614e+02 4.285e+02 9.204e+02, threshold=7.228e+02, percent-clipped=2.0 2023-10-09 11:38:33,631 INFO [train.py:1031] (1/4) Epoch 14, batch 3650, loss[loss=0.1951, simple_loss=0.2458, pruned_loss=0.05354, ctc_loss=0.09348, over 16783.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2694, pruned_loss=0.06181, ctc_loss=0.1089, over 3308450.64 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:38:38,098 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2745782.6666666665, ans=0.125 2023-10-09 11:38:39,703 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2745782.6666666665, ans=0.125 2023-10-09 11:38:55,008 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2023-10-09 11:38:56,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2745829.3333333335, ans=0.125 2023-10-09 11:38:57,130 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2023-10-09 11:38:58,723 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2745876.0, ans=0.125 2023-10-09 11:39:18,511 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2745922.6666666665, ans=0.125 2023-10-09 11:39:36,572 INFO [train.py:1031] (1/4) Epoch 14, batch 3700, loss[loss=0.2345, simple_loss=0.3048, pruned_loss=0.06171, ctc_loss=0.102, over 16756.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2731, pruned_loss=0.06348, ctc_loss=0.1114, over 3314652.90 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:39:40,124 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2746016.0, ans=0.125 2023-10-09 11:39:42,836 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2746016.0, ans=0.015 2023-10-09 11:39:45,077 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2746016.0, ans=0.0 2023-10-09 11:39:55,522 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2746062.6666666665, ans=0.125 2023-10-09 11:39:59,370 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2746062.6666666665, ans=0.1 2023-10-09 11:40:40,094 INFO [train.py:1031] (1/4) Epoch 14, batch 3750, loss[loss=0.2313, simple_loss=0.2823, pruned_loss=0.06764, ctc_loss=0.1128, over 16717.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2799, pruned_loss=0.06631, ctc_loss=0.1163, over 3298846.62 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:40:41,103 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+02 3.323e+02 3.693e+02 4.050e+02 7.078e+02, threshold=7.386e+02, percent-clipped=0.0 2023-10-09 11:40:45,199 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2746249.3333333335, ans=0.0 2023-10-09 11:40:45,452 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-10-09 11:40:52,471 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-10-09 11:41:25,629 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2746389.3333333335, ans=0.125 2023-10-09 11:41:42,466 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2746482.6666666665, ans=0.0 2023-10-09 11:41:43,228 INFO [train.py:1031] (1/4) Epoch 14, batch 3800, loss[loss=0.237, simple_loss=0.2821, pruned_loss=0.0696, ctc_loss=0.1315, over 16506.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2859, pruned_loss=0.06813, ctc_loss=0.1194, over 3292290.26 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:41:56,025 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2746529.3333333335, ans=0.125 2023-10-09 11:42:01,088 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-10-09 11:42:11,107 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=12.0 2023-10-09 11:42:13,402 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2746576.0, ans=0.125 2023-10-09 11:42:38,126 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2746669.3333333335, ans=0.125 2023-10-09 11:42:42,120 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2746669.3333333335, ans=0.2 2023-10-09 11:42:44,577 INFO [train.py:1031] (1/4) Epoch 14, batch 3850, loss[loss=0.2061, simple_loss=0.2604, pruned_loss=0.05677, ctc_loss=0.09572, over 16794.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2802, pruned_loss=0.06613, ctc_loss=0.1159, over 3287268.30 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:42:47,251 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.112e+02 3.545e+02 3.960e+02 7.617e+02, threshold=7.089e+02, percent-clipped=1.0 2023-10-09 11:43:28,571 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2746856.0, ans=0.2 2023-10-09 11:43:46,339 INFO [train.py:1031] (1/4) Epoch 14, batch 3900, loss[loss=0.2134, simple_loss=0.2811, pruned_loss=0.05371, ctc_loss=0.09544, over 16766.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2785, pruned_loss=0.06358, ctc_loss=0.1116, over 3293662.65 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:43:50,335 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2746949.3333333335, ans=0.125 2023-10-09 11:43:52,452 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2746949.3333333335, ans=0.07 2023-10-09 11:43:59,850 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2746996.0, ans=0.125 2023-10-09 11:44:24,629 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2747089.3333333335, ans=0.0 2023-10-09 11:44:34,769 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2747136.0, ans=0.125 2023-10-09 11:44:44,546 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2747136.0, ans=0.0 2023-10-09 11:44:45,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2747136.0, ans=0.125 2023-10-09 11:44:47,987 INFO [train.py:1031] (1/4) Epoch 14, batch 3950, loss[loss=0.2023, simple_loss=0.2618, pruned_loss=0.05312, ctc_loss=0.09126, over 16711.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.276, pruned_loss=0.06264, ctc_loss=0.11, over 3284418.78 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:44:50,713 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.084e+02 3.430e+02 4.061e+02 1.180e+03, threshold=6.860e+02, percent-clipped=1.0 2023-10-09 11:44:54,804 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:44:57,165 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-10-09 11:45:00,683 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2747229.3333333335, ans=0.125 2023-10-09 11:45:10,486 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2747229.3333333335, ans=0.1 2023-10-09 11:45:29,724 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:45:32,702 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2747322.6666666665, ans=0.0 2023-10-09 11:45:45,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2747369.3333333335, ans=0.125 2023-10-09 11:45:50,016 INFO [train.py:1031] (1/4) Epoch 14, batch 4000, loss[loss=0.3023, simple_loss=0.3362, pruned_loss=0.09841, ctc_loss=0.1792, over 16713.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2773, pruned_loss=0.06361, ctc_loss=0.1114, over 3292211.41 frames. ], batch size: 353, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:45:54,743 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2747416.0, ans=0.125 2023-10-09 11:46:08,557 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-10-09 11:46:15,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2747509.3333333335, ans=0.2 2023-10-09 11:46:15,692 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2747509.3333333335, ans=0.2 2023-10-09 11:46:21,605 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2747509.3333333335, ans=0.125 2023-10-09 11:46:40,090 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=12.0 2023-10-09 11:46:42,367 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2747602.6666666665, ans=0.0 2023-10-09 11:46:46,769 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2747602.6666666665, ans=0.2 2023-10-09 11:46:51,637 INFO [train.py:1031] (1/4) Epoch 14, batch 4050, loss[loss=0.2154, simple_loss=0.2747, pruned_loss=0.05742, ctc_loss=0.103, over 16851.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2821, pruned_loss=0.06641, ctc_loss=0.1162, over 3294176.19 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:46:54,422 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.332e+02 3.986e+02 4.544e+02 6.934e+02, threshold=7.972e+02, percent-clipped=1.0 2023-10-09 11:47:26,934 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2747789.3333333335, ans=0.2 2023-10-09 11:47:28,425 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2747789.3333333335, ans=0.2 2023-10-09 11:47:39,637 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2747836.0, ans=0.0 2023-10-09 11:47:52,884 INFO [train.py:1031] (1/4) Epoch 14, batch 4100, loss[loss=0.2409, simple_loss=0.2847, pruned_loss=0.07269, ctc_loss=0.1293, over 17120.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2812, pruned_loss=0.06683, ctc_loss=0.117, over 3302422.32 frames. ], batch size: 83, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:47:56,449 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2747882.6666666665, ans=0.0 2023-10-09 11:48:00,791 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2747882.6666666665, ans=0.2 2023-10-09 11:48:08,039 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-10-09 11:48:08,654 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2747929.3333333335, ans=0.125 2023-10-09 11:48:14,635 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2747929.3333333335, ans=0.125 2023-10-09 11:48:25,532 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2747976.0, ans=0.1 2023-10-09 11:48:54,161 INFO [train.py:1031] (1/4) Epoch 14, batch 4150, loss[loss=0.2063, simple_loss=0.2752, pruned_loss=0.05118, ctc_loss=0.08781, over 16961.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.28, pruned_loss=0.06506, ctc_loss=0.1135, over 3309631.02 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:48:57,963 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 3.212e+02 3.640e+02 4.119e+02 7.384e+02, threshold=7.280e+02, percent-clipped=0.0 2023-10-09 11:48:58,415 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2748116.0, ans=0.125 2023-10-09 11:49:40,257 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2023-10-09 11:49:51,886 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2748302.6666666665, ans=0.04949747468305833 2023-10-09 11:49:56,389 INFO [train.py:1031] (1/4) Epoch 14, batch 4200, loss[loss=0.1995, simple_loss=0.2512, pruned_loss=0.05451, ctc_loss=0.09688, over 16700.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2776, pruned_loss=0.06331, ctc_loss=0.1097, over 3308725.35 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:50:04,088 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:50:08,961 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748396.0, ans=0.1 2023-10-09 11:50:20,293 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748442.6666666665, ans=0.1 2023-10-09 11:50:21,432 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2748442.6666666665, ans=0.09899494936611666 2023-10-09 11:50:21,456 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2748442.6666666665, ans=0.0 2023-10-09 11:50:23,623 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2748442.6666666665, ans=0.2 2023-10-09 11:50:29,346 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2748442.6666666665, ans=0.1 2023-10-09 11:50:43,674 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2748536.0, ans=0.125 2023-10-09 11:50:56,838 INFO [train.py:1031] (1/4) Epoch 14, batch 4250, loss[loss=0.2187, simple_loss=0.2865, pruned_loss=0.05603, ctc_loss=0.09686, over 16947.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2788, pruned_loss=0.06468, ctc_loss=0.1118, over 3303700.17 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:51:03,482 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.251e+02 3.808e+02 4.615e+02 8.624e+02, threshold=7.616e+02, percent-clipped=2.0 2023-10-09 11:51:11,595 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2748629.3333333335, ans=0.0 2023-10-09 11:51:11,621 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2748629.3333333335, ans=0.125 2023-10-09 11:51:20,815 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2748629.3333333335, ans=0.0 2023-10-09 11:51:58,269 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748816.0, ans=0.1 2023-10-09 11:51:58,940 INFO [train.py:1031] (1/4) Epoch 14, batch 4300, loss[loss=0.2977, simple_loss=0.3791, pruned_loss=0.08152, ctc_loss=0.133, over 16275.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2857, pruned_loss=0.06568, ctc_loss=0.1142, over 3290397.34 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:52:09,257 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.62 vs. limit=8.0 2023-10-09 11:52:11,950 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2748862.6666666665, ans=0.05 2023-10-09 11:52:21,312 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=22.5 2023-10-09 11:52:23,301 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748862.6666666665, ans=0.1 2023-10-09 11:52:50,981 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2023-10-09 11:53:02,299 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2749002.6666666665, ans=0.2 2023-10-09 11:53:04,754 INFO [train.py:1031] (1/4) Epoch 14, batch 4350, loss[loss=0.2201, simple_loss=0.2652, pruned_loss=0.06575, ctc_loss=0.1087, over 16731.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2934, pruned_loss=0.06895, ctc_loss=0.12, over 3294068.32 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:53:08,400 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2749049.3333333335, ans=0.2 2023-10-09 11:53:11,365 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+02 3.477e+02 4.126e+02 5.175e+02 8.890e+02, threshold=8.251e+02, percent-clipped=2.0 2023-10-09 11:54:06,537 INFO [train.py:1031] (1/4) Epoch 14, batch 4400, loss[loss=0.2405, simple_loss=0.2988, pruned_loss=0.06687, ctc_loss=0.121, over 16845.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2909, pruned_loss=0.06815, ctc_loss=0.1178, over 3289455.43 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:54:22,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2749329.3333333335, ans=0.2 2023-10-09 11:54:34,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2749376.0, ans=0.125 2023-10-09 11:54:37,736 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.81 vs. limit=10.0 2023-10-09 11:55:03,006 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.66 vs. limit=10.0 2023-10-09 11:55:03,045 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=12.0 2023-10-09 11:55:08,713 INFO [train.py:1031] (1/4) Epoch 14, batch 4450, loss[loss=0.2295, simple_loss=0.2826, pruned_loss=0.06549, ctc_loss=0.1136, over 16960.00 frames. ], tot_loss[loss=0.232, simple_loss=0.286, pruned_loss=0.06619, ctc_loss=0.1138, over 3288978.21 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:55:11,406 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.01 vs. limit=15.0 2023-10-09 11:55:16,088 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+02 3.136e+02 3.606e+02 4.303e+02 6.153e+02, threshold=7.211e+02, percent-clipped=0.0 2023-10-09 11:55:22,396 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2749562.6666666665, ans=0.125 2023-10-09 11:55:31,593 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2749562.6666666665, ans=0.125 2023-10-09 11:55:31,673 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2749562.6666666665, ans=0.1 2023-10-09 11:56:02,703 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:56:10,349 INFO [train.py:1031] (1/4) Epoch 14, batch 4500, loss[loss=0.2529, simple_loss=0.3013, pruned_loss=0.07754, ctc_loss=0.1236, over 16965.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2834, pruned_loss=0.06634, ctc_loss=0.114, over 3303152.07 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:56:24,670 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2749796.0, ans=0.1 2023-10-09 11:56:59,456 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2749936.0, ans=0.0 2023-10-09 11:57:12,367 INFO [train.py:1031] (1/4) Epoch 14, batch 4550, loss[loss=0.1941, simple_loss=0.2719, pruned_loss=0.04301, ctc_loss=0.07581, over 16728.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2855, pruned_loss=0.06509, ctc_loss=0.1123, over 3303693.10 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:57:14,393 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2749982.6666666665, ans=0.0 2023-10-09 11:57:20,846 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.133e+02 3.571e+02 4.090e+02 7.081e+02, threshold=7.142e+02, percent-clipped=0.0 2023-10-09 11:57:28,644 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=22.5 2023-10-09 11:58:00,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2750122.6666666665, ans=0.125 2023-10-09 11:58:09,546 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2750169.3333333335, ans=0.035 2023-10-09 11:58:15,116 INFO [train.py:1031] (1/4) Epoch 14, batch 4600, loss[loss=0.2478, simple_loss=0.3006, pruned_loss=0.07193, ctc_loss=0.1277, over 16826.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2899, pruned_loss=0.06571, ctc_loss=0.1138, over 3306519.25 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:58:15,445 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2750216.0, ans=0.0 2023-10-09 11:59:09,145 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2750402.6666666665, ans=0.0 2023-10-09 11:59:12,047 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2750402.6666666665, ans=0.125 2023-10-09 11:59:18,198 INFO [train.py:1031] (1/4) Epoch 14, batch 4650, loss[loss=0.2312, simple_loss=0.2888, pruned_loss=0.06519, ctc_loss=0.1081, over 16793.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2933, pruned_loss=0.0669, ctc_loss=0.1162, over 3310419.45 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:59:18,876 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-10-09 11:59:28,967 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+02 3.249e+02 3.763e+02 4.381e+02 6.611e+02, threshold=7.526e+02, percent-clipped=0.0 2023-10-09 12:00:00,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2750589.3333333335, ans=0.1 2023-10-09 12:00:07,905 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2750636.0, ans=0.1 2023-10-09 12:00:18,085 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2750636.0, ans=0.1 2023-10-09 12:00:19,912 INFO [train.py:1031] (1/4) Epoch 14, batch 4700, loss[loss=0.2115, simple_loss=0.2765, pruned_loss=0.05371, ctc_loss=0.09753, over 16995.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2897, pruned_loss=0.06276, ctc_loss=0.1101, over 3303531.14 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:00:20,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2750682.6666666665, ans=0.2 2023-10-09 12:00:51,494 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=22.5 2023-10-09 12:01:11,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2750869.3333333335, ans=0.125 2023-10-09 12:01:14,531 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2750869.3333333335, ans=0.04949747468305833 2023-10-09 12:01:16,138 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2750869.3333333335, ans=0.125 2023-10-09 12:01:22,367 INFO [train.py:1031] (1/4) Epoch 14, batch 4750, loss[loss=0.2178, simple_loss=0.288, pruned_loss=0.05474, ctc_loss=0.09523, over 16904.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2903, pruned_loss=0.06292, ctc_loss=0.1106, over 3305310.20 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 12:01:32,361 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2750916.0, ans=0.125 2023-10-09 12:01:34,527 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2750962.6666666665, ans=0.125 2023-10-09 12:01:35,189 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+02 3.139e+02 3.749e+02 4.382e+02 2.421e+03, threshold=7.497e+02, percent-clipped=2.0 2023-10-09 12:01:39,622 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=22.5 2023-10-09 12:01:52,450 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2751009.3333333335, ans=0.0 2023-10-09 12:01:58,679 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=22.5 2023-10-09 12:02:15,712 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2751102.6666666665, ans=0.09899494936611666 2023-10-09 12:02:24,698 INFO [train.py:1031] (1/4) Epoch 14, batch 4800, loss[loss=0.2061, simple_loss=0.2757, pruned_loss=0.05084, ctc_loss=0.08711, over 16843.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2911, pruned_loss=0.0625, ctc_loss=0.1098, over 3294169.96 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:02:34,531 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2751149.3333333335, ans=0.0 2023-10-09 12:02:43,126 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2751196.0, ans=0.125 2023-10-09 12:03:00,971 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2751242.6666666665, ans=0.2 2023-10-09 12:03:06,500 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2751289.3333333335, ans=0.125 2023-10-09 12:03:10,816 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2751289.3333333335, ans=0.125 2023-10-09 12:03:14,093 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2751289.3333333335, ans=0.125 2023-10-09 12:03:23,802 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2751336.0, ans=0.015 2023-10-09 12:03:28,561 INFO [train.py:1031] (1/4) Epoch 14, batch 4850, loss[loss=0.2418, simple_loss=0.2885, pruned_loss=0.07017, ctc_loss=0.1372, over 15165.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2957, pruned_loss=0.06633, ctc_loss=0.1161, over 3299776.08 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:03:33,989 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2751382.6666666665, ans=0.2 2023-10-09 12:03:34,937 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2751382.6666666665, ans=0.125 2023-10-09 12:03:42,639 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.280e+02 3.688e+02 4.479e+02 9.310e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 12:03:54,919 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2751476.0, ans=0.125 2023-10-09 12:04:07,679 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2751522.6666666665, ans=0.0 2023-10-09 12:04:11,421 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2751522.6666666665, ans=0.1 2023-10-09 12:04:13,165 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2751522.6666666665, ans=0.0 2023-10-09 12:04:18,491 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-10-09 12:04:31,387 INFO [train.py:1031] (1/4) Epoch 14, batch 4900, loss[loss=0.2123, simple_loss=0.2755, pruned_loss=0.05549, ctc_loss=0.09516, over 16805.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2958, pruned_loss=0.06546, ctc_loss=0.1148, over 3302714.49 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:04:39,511 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.11 vs. limit=10.0 2023-10-09 12:05:00,087 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2751709.3333333335, ans=0.1 2023-10-09 12:05:20,500 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2751756.0, ans=0.1 2023-10-09 12:05:21,600 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2751756.0, ans=0.2 2023-10-09 12:05:36,550 INFO [train.py:1031] (1/4) Epoch 14, batch 4950, loss[loss=0.278, simple_loss=0.3267, pruned_loss=0.0859, ctc_loss=0.1434, over 16737.00 frames. ], tot_loss[loss=0.239, simple_loss=0.2975, pruned_loss=0.06684, ctc_loss=0.1169, over 3305075.28 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:05:38,272 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=22.5 2023-10-09 12:05:44,830 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2751849.3333333335, ans=0.125 2023-10-09 12:05:49,176 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2751896.0, ans=0.015 2023-10-09 12:05:51,607 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.251e+02 3.637e+02 4.222e+02 8.685e+02, threshold=7.275e+02, percent-clipped=2.0 2023-10-09 12:05:52,621 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2751896.0, ans=0.05 2023-10-09 12:06:29,510 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2752036.0, ans=0.0 2023-10-09 12:06:34,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2752036.0, ans=0.2 2023-10-09 12:06:39,356 INFO [train.py:1031] (1/4) Epoch 14, batch 5000, loss[loss=0.2197, simple_loss=0.238, pruned_loss=0.07344, ctc_loss=0.1363, over 15373.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2926, pruned_loss=0.06785, ctc_loss=0.1187, over 3303770.02 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:06:46,004 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2752082.6666666665, ans=0.0 2023-10-09 12:06:49,319 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2023-10-09 12:07:14,743 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2752222.6666666665, ans=0.0 2023-10-09 12:07:15,916 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2752222.6666666665, ans=0.125 2023-10-09 12:07:24,743 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2752222.6666666665, ans=0.04949747468305833 2023-10-09 12:07:40,190 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2752316.0, ans=0.1 2023-10-09 12:07:41,576 INFO [train.py:1031] (1/4) Epoch 14, batch 5050, loss[loss=0.2425, simple_loss=0.3223, pruned_loss=0.05868, ctc_loss=0.1136, over 16841.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2871, pruned_loss=0.06636, ctc_loss=0.1162, over 3306278.17 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:07:55,442 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2023-10-09 12:07:56,489 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+02 3.308e+02 3.761e+02 4.513e+02 1.207e+03, threshold=7.522e+02, percent-clipped=1.0 2023-10-09 12:07:59,818 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2752362.6666666665, ans=0.95 2023-10-09 12:07:59,891 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2752362.6666666665, ans=0.1 2023-10-09 12:08:08,917 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2752409.3333333335, ans=0.2 2023-10-09 12:08:12,718 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2752409.3333333335, ans=0.1 2023-10-09 12:08:26,152 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2752456.0, ans=0.125 2023-10-09 12:08:36,162 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2752502.6666666665, ans=0.2 2023-10-09 12:08:42,501 INFO [train.py:1031] (1/4) Epoch 14, batch 5100, loss[loss=0.2532, simple_loss=0.3106, pruned_loss=0.07397, ctc_loss=0.1198, over 16544.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2922, pruned_loss=0.0657, ctc_loss=0.1154, over 3307212.92 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:08:55,698 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2752596.0, ans=0.0 2023-10-09 12:09:12,840 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2752642.6666666665, ans=0.2 2023-10-09 12:09:23,881 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:09:24,239 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-10-09 12:09:32,807 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2752736.0, ans=0.015 2023-10-09 12:09:38,400 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:09:43,378 INFO [train.py:1031] (1/4) Epoch 14, batch 5150, loss[loss=0.2374, simple_loss=0.2836, pruned_loss=0.07047, ctc_loss=0.1256, over 16807.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2936, pruned_loss=0.06805, ctc_loss=0.1191, over 3300269.55 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:09:44,744 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2752782.6666666665, ans=0.125 2023-10-09 12:09:52,040 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2752782.6666666665, ans=0.5 2023-10-09 12:09:59,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2752829.3333333335, ans=0.1 2023-10-09 12:10:00,363 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.267e+02 3.761e+02 4.649e+02 7.424e+02, threshold=7.522e+02, percent-clipped=0.0 2023-10-09 12:10:45,574 INFO [train.py:1031] (1/4) Epoch 14, batch 5200, loss[loss=0.2108, simple_loss=0.272, pruned_loss=0.0564, ctc_loss=0.09192, over 16894.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.2925, pruned_loss=0.06735, ctc_loss=0.1177, over 3307715.35 frames. ], batch size: 82, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:10:56,229 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2753016.0, ans=0.0 2023-10-09 12:11:08,092 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2753062.6666666665, ans=0.1 2023-10-09 12:11:08,162 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2753062.6666666665, ans=0.125 2023-10-09 12:11:12,306 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2753109.3333333335, ans=0.0 2023-10-09 12:11:13,157 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2753109.3333333335, ans=0.125 2023-10-09 12:11:22,563 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2753156.0, ans=0.125 2023-10-09 12:11:23,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2753156.0, ans=0.0 2023-10-09 12:11:26,183 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-10-09 12:11:28,905 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2753156.0, ans=0.125 2023-10-09 12:11:33,032 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.82 vs. limit=15.0 2023-10-09 12:11:47,487 INFO [train.py:1031] (1/4) Epoch 14, batch 5250, loss[loss=0.2083, simple_loss=0.2353, pruned_loss=0.06667, ctc_loss=0.1197, over 16088.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2848, pruned_loss=0.06584, ctc_loss=0.1153, over 3306748.67 frames. ], batch size: 466, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:11:50,058 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2753249.3333333335, ans=0.2 2023-10-09 12:11:50,129 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2753249.3333333335, ans=0.125 2023-10-09 12:12:05,608 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 2.914e+02 3.261e+02 3.772e+02 6.960e+02, threshold=6.522e+02, percent-clipped=0.0 2023-10-09 12:12:07,835 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2753296.0, ans=0.125 2023-10-09 12:12:18,855 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2753342.6666666665, ans=0.2 2023-10-09 12:12:34,705 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2753389.3333333335, ans=0.125 2023-10-09 12:12:49,309 INFO [train.py:1031] (1/4) Epoch 14, batch 5300, loss[loss=0.4381, simple_loss=0.4558, pruned_loss=0.1552, ctc_loss=0.2754, over 16654.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2882, pruned_loss=0.06774, ctc_loss=0.118, over 3313120.15 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:12:51,424 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2753482.6666666665, ans=0.125 2023-10-09 12:12:56,363 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2753482.6666666665, ans=0.125 2023-10-09 12:12:56,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2753482.6666666665, ans=0.1 2023-10-09 12:13:03,431 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2753529.3333333335, ans=0.0 2023-10-09 12:13:04,527 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2753529.3333333335, ans=0.125 2023-10-09 12:13:16,176 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2753576.0, ans=0.125 2023-10-09 12:13:24,415 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2753576.0, ans=0.1 2023-10-09 12:13:48,944 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2753669.3333333335, ans=0.07 2023-10-09 12:13:51,898 INFO [train.py:1031] (1/4) Epoch 14, batch 5350, loss[loss=0.2812, simple_loss=0.3446, pruned_loss=0.08099, ctc_loss=0.1398, over 15128.00 frames. ], tot_loss[loss=0.2433, simple_loss=0.297, pruned_loss=0.07035, ctc_loss=0.1222, over 3301266.36 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:14:12,278 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+02 3.646e+02 4.307e+02 5.553e+02 1.031e+03, threshold=8.614e+02, percent-clipped=13.0 2023-10-09 12:14:12,558 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2753762.6666666665, ans=0.125 2023-10-09 12:14:24,683 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2753809.3333333335, ans=0.2 2023-10-09 12:14:25,666 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2753809.3333333335, ans=0.125 2023-10-09 12:14:54,881 INFO [train.py:1031] (1/4) Epoch 14, batch 5400, loss[loss=0.2049, simple_loss=0.2656, pruned_loss=0.05291, ctc_loss=0.09579, over 16810.00 frames. ], tot_loss[loss=0.2452, simple_loss=0.2986, pruned_loss=0.0711, ctc_loss=0.1238, over 3300122.56 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:14:57,422 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2753949.3333333335, ans=0.125 2023-10-09 12:14:58,424 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2753949.3333333335, ans=0.125 2023-10-09 12:15:17,899 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2754042.6666666665, ans=0.125 2023-10-09 12:15:19,482 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:15:25,405 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2754042.6666666665, ans=0.05 2023-10-09 12:15:41,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2754089.3333333335, ans=0.0 2023-10-09 12:15:46,543 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-10-09 12:15:55,501 INFO [train.py:1031] (1/4) Epoch 14, batch 5450, loss[loss=0.2229, simple_loss=0.2649, pruned_loss=0.06653, ctc_loss=0.1192, over 16806.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2904, pruned_loss=0.06936, ctc_loss=0.1211, over 3303165.50 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:15:55,880 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2754182.6666666665, ans=0.04949747468305833 2023-10-09 12:15:56,859 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2754182.6666666665, ans=0.0 2023-10-09 12:16:01,734 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2754182.6666666665, ans=0.125 2023-10-09 12:16:07,992 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2754229.3333333335, ans=0.0 2023-10-09 12:16:16,183 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.013e+02 3.420e+02 3.920e+02 8.304e+02, threshold=6.840e+02, percent-clipped=0.0 2023-10-09 12:16:32,907 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2754322.6666666665, ans=0.0 2023-10-09 12:16:39,615 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2754322.6666666665, ans=0.125 2023-10-09 12:16:49,377 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:16:57,654 INFO [train.py:1031] (1/4) Epoch 14, batch 5500, loss[loss=0.2284, simple_loss=0.2772, pruned_loss=0.06696, ctc_loss=0.1142, over 16207.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2871, pruned_loss=0.06963, ctc_loss=0.1211, over 3298216.10 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:17:07,265 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2754416.0, ans=0.125 2023-10-09 12:17:08,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2754416.0, ans=0.125 2023-10-09 12:17:34,514 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2754556.0, ans=0.125 2023-10-09 12:17:51,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2754602.6666666665, ans=0.2 2023-10-09 12:17:52,198 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2023-10-09 12:17:54,077 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2754602.6666666665, ans=0.0 2023-10-09 12:17:58,541 INFO [train.py:1031] (1/4) Epoch 14, batch 5550, loss[loss=0.246, simple_loss=0.3222, pruned_loss=0.06082, ctc_loss=0.1203, over 16808.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2895, pruned_loss=0.07021, ctc_loss=0.1225, over 3286877.14 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:17:58,860 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2754649.3333333335, ans=0.125 2023-10-09 12:17:58,919 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2754649.3333333335, ans=0.0 2023-10-09 12:18:05,320 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2754649.3333333335, ans=0.125 2023-10-09 12:18:12,256 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-10-09 12:18:14,314 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2754696.0, ans=0.125 2023-10-09 12:18:18,625 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+02 3.038e+02 3.521e+02 4.365e+02 6.662e+02, threshold=7.043e+02, percent-clipped=0.0 2023-10-09 12:18:21,224 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2754696.0, ans=0.2 2023-10-09 12:18:27,972 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2754742.6666666665, ans=0.125 2023-10-09 12:18:31,172 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2754742.6666666665, ans=0.0 2023-10-09 12:18:54,227 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:18:55,590 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2023-10-09 12:18:59,757 INFO [train.py:1031] (1/4) Epoch 14, batch 5600, loss[loss=0.232, simple_loss=0.3013, pruned_loss=0.06176, ctc_loss=0.09782, over 12697.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2869, pruned_loss=0.06746, ctc_loss=0.118, over 3289497.39 frames. ], batch size: 35, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:19:05,746 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2754882.6666666665, ans=0.125 2023-10-09 12:19:08,437 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754882.6666666665, ans=0.1 2023-10-09 12:19:27,628 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754976.0, ans=0.1 2023-10-09 12:19:32,636 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2754976.0, ans=0.1 2023-10-09 12:19:47,242 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2755069.3333333335, ans=0.125 2023-10-09 12:19:58,558 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2755069.3333333335, ans=0.0 2023-10-09 12:20:00,858 INFO [train.py:1031] (1/4) Epoch 14, batch 5650, loss[loss=0.2585, simple_loss=0.2914, pruned_loss=0.08247, ctc_loss=0.1517, over 16529.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2861, pruned_loss=0.06714, ctc_loss=0.1176, over 3303470.05 frames. ], batch size: 415, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:20:11,197 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2755116.0, ans=0.1 2023-10-09 12:20:22,587 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+02 3.073e+02 3.464e+02 4.035e+02 6.010e+02, threshold=6.928e+02, percent-clipped=0.0 2023-10-09 12:20:25,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2755209.3333333335, ans=0.0 2023-10-09 12:20:31,274 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2755209.3333333335, ans=0.125 2023-10-09 12:20:33,346 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2755209.3333333335, ans=0.125 2023-10-09 12:20:37,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2755256.0, ans=0.125 2023-10-09 12:20:46,209 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=8.0 2023-10-09 12:20:52,244 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.42 vs. limit=10.0 2023-10-09 12:21:01,264 INFO [train.py:1031] (1/4) Epoch 14, batch 5700, loss[loss=0.1815, simple_loss=0.2477, pruned_loss=0.04207, ctc_loss=0.07807, over 16833.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2837, pruned_loss=0.06636, ctc_loss=0.1163, over 3316227.16 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:21:01,690 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2755349.3333333335, ans=0.125 2023-10-09 12:21:06,069 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2755349.3333333335, ans=0.0 2023-10-09 12:21:46,690 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:21:57,583 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2755536.0, ans=0.125 2023-10-09 12:22:04,413 INFO [train.py:1031] (1/4) Epoch 14, batch 5750, loss[loss=0.2145, simple_loss=0.2866, pruned_loss=0.05185, ctc_loss=0.09691, over 15282.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2795, pruned_loss=0.06293, ctc_loss=0.1108, over 3307154.24 frames. ], batch size: 529, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:22:23,912 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2755629.3333333335, ans=0.125 2023-10-09 12:22:28,327 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 3.061e+02 3.546e+02 4.294e+02 7.342e+02, threshold=7.092e+02, percent-clipped=2.0 2023-10-09 12:22:29,936 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=22.5 2023-10-09 12:22:55,868 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-10-09 12:23:05,576 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:23:07,365 INFO [train.py:1031] (1/4) Epoch 14, batch 5800, loss[loss=0.2359, simple_loss=0.2895, pruned_loss=0.06641, ctc_loss=0.1237, over 16929.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2826, pruned_loss=0.06446, ctc_loss=0.1133, over 3293371.24 frames. ], batch size: 229, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:23:08,702 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2755816.0, ans=0.0 2023-10-09 12:23:12,955 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2755816.0, ans=0.0 2023-10-09 12:23:24,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2755862.6666666665, ans=0.125 2023-10-09 12:23:33,091 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2755909.3333333335, ans=0.1 2023-10-09 12:23:39,731 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=22.5 2023-10-09 12:24:06,441 INFO [train.py:1031] (1/4) Epoch 14, batch 5850, loss[loss=0.2134, simple_loss=0.2491, pruned_loss=0.0667, ctc_loss=0.1109, over 16718.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2786, pruned_loss=0.06444, ctc_loss=0.1128, over 3295352.11 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:24:09,461 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2756049.3333333335, ans=0.035 2023-10-09 12:24:31,816 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.138e+02 3.574e+02 4.171e+02 9.183e+02, threshold=7.147e+02, percent-clipped=2.0 2023-10-09 12:24:37,108 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2756142.6666666665, ans=0.125 2023-10-09 12:24:55,066 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2756236.0, ans=0.0 2023-10-09 12:25:03,013 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=2756236.0, ans=0.1 2023-10-09 12:25:05,811 INFO [train.py:1031] (1/4) Epoch 14, batch 5900, loss[loss=0.2331, simple_loss=0.2773, pruned_loss=0.0695, ctc_loss=0.1247, over 17021.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2741, pruned_loss=0.06412, ctc_loss=0.1126, over 3301268.51 frames. ], batch size: 259, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:25:14,863 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2756282.6666666665, ans=0.04949747468305833 2023-10-09 12:25:38,426 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2023-10-09 12:25:38,967 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2756376.0, ans=0.2 2023-10-09 12:25:52,446 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2756422.6666666665, ans=0.125 2023-10-09 12:25:58,496 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2756469.3333333335, ans=0.0 2023-10-09 12:26:06,819 INFO [train.py:1031] (1/4) Epoch 14, batch 5950, loss[loss=0.3164, simple_loss=0.3559, pruned_loss=0.1021, ctc_loss=0.1816, over 16753.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2749, pruned_loss=0.06534, ctc_loss=0.1143, over 3308107.81 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:26:08,533 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-10-09 12:26:09,396 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-10-09 12:26:26,552 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2756562.6666666665, ans=0.0 2023-10-09 12:26:30,005 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2756609.3333333335, ans=0.02 2023-10-09 12:26:31,792 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 3.192e+02 3.462e+02 4.093e+02 6.652e+02, threshold=6.925e+02, percent-clipped=0.0 2023-10-09 12:26:46,098 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2756656.0, ans=0.125 2023-10-09 12:27:03,893 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:27:06,817 INFO [train.py:1031] (1/4) Epoch 14, batch 6000, loss[loss=0.1998, simple_loss=0.2716, pruned_loss=0.0475, ctc_loss=0.08253, over 16765.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2794, pruned_loss=0.06406, ctc_loss=0.1124, over 3303362.05 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:27:06,818 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 12:27:22,144 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2203, 3.2393, 2.2582, 2.3286], device='cuda:1') 2023-10-09 12:27:23,522 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2297, simple_loss=0.3012, pruned_loss=0.0607, ctc_loss=0.09172, over 1796401.00 frames. 2023-10-09 12:27:23,523 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14349MB 2023-10-09 12:27:58,617 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2756889.3333333335, ans=0.0 2023-10-09 12:28:00,272 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2756889.3333333335, ans=0.125 2023-10-09 12:28:23,965 INFO [train.py:1031] (1/4) Epoch 14, batch 6050, loss[loss=0.2293, simple_loss=0.2744, pruned_loss=0.06953, ctc_loss=0.1128, over 15953.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2748, pruned_loss=0.06096, ctc_loss=0.1073, over 3298744.67 frames. ], batch size: 70, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:28:25,352 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2756982.6666666665, ans=0.125 2023-10-09 12:28:30,734 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2756982.6666666665, ans=0.125 2023-10-09 12:28:30,775 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2756982.6666666665, ans=0.1 2023-10-09 12:28:39,480 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2757029.3333333335, ans=0.125 2023-10-09 12:28:48,915 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2757076.0, ans=0.125 2023-10-09 12:28:52,470 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.832e+02 3.391e+02 4.153e+02 6.756e+02, threshold=6.782e+02, percent-clipped=0.0 2023-10-09 12:28:57,469 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2757076.0, ans=0.025 2023-10-09 12:28:57,474 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2757076.0, ans=0.0 2023-10-09 12:29:19,669 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2757169.3333333335, ans=0.125 2023-10-09 12:29:24,169 INFO [train.py:1031] (1/4) Epoch 14, batch 6100, loss[loss=0.2266, simple_loss=0.2934, pruned_loss=0.05925, ctc_loss=0.1034, over 16867.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2693, pruned_loss=0.05994, ctc_loss=0.1054, over 3297038.67 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:29:36,154 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=22.5 2023-10-09 12:29:44,517 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2757262.6666666665, ans=0.125 2023-10-09 12:29:46,137 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2757262.6666666665, ans=0.125 2023-10-09 12:29:47,642 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2757262.6666666665, ans=0.2 2023-10-09 12:30:06,373 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2757356.0, ans=0.125 2023-10-09 12:30:13,732 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2757402.6666666665, ans=0.0 2023-10-09 12:30:25,877 INFO [train.py:1031] (1/4) Epoch 14, batch 6150, loss[loss=0.1761, simple_loss=0.2437, pruned_loss=0.04013, ctc_loss=0.07084, over 16811.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2729, pruned_loss=0.05889, ctc_loss=0.1039, over 3301112.47 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:30:28,337 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2757449.3333333335, ans=0.1 2023-10-09 12:30:53,021 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2757542.6666666665, ans=0.1 2023-10-09 12:30:56,530 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 2.946e+02 3.372e+02 3.986e+02 9.785e+02, threshold=6.744e+02, percent-clipped=2.0 2023-10-09 12:30:58,064 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2023-10-09 12:31:02,137 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2757589.3333333335, ans=0.125 2023-10-09 12:31:07,891 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2757589.3333333335, ans=0.125 2023-10-09 12:31:20,246 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2757636.0, ans=0.125 2023-10-09 12:31:26,124 INFO [train.py:1031] (1/4) Epoch 14, batch 6200, loss[loss=0.2175, simple_loss=0.278, pruned_loss=0.05764, ctc_loss=0.1042, over 16903.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2761, pruned_loss=0.06053, ctc_loss=0.1064, over 3302610.98 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:31:44,967 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2757729.3333333335, ans=0.95 2023-10-09 12:31:46,383 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2757729.3333333335, ans=0.125 2023-10-09 12:31:49,172 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2757729.3333333335, ans=0.125 2023-10-09 12:31:59,250 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2757776.0, ans=0.0 2023-10-09 12:32:11,200 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2757822.6666666665, ans=0.1 2023-10-09 12:32:12,273 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2757822.6666666665, ans=0.125 2023-10-09 12:32:26,031 INFO [train.py:1031] (1/4) Epoch 14, batch 6250, loss[loss=0.2218, simple_loss=0.2921, pruned_loss=0.05619, ctc_loss=0.09754, over 16893.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2761, pruned_loss=0.06017, ctc_loss=0.1057, over 3303793.84 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:32:27,521 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2757916.0, ans=0.125 2023-10-09 12:32:27,782 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=12.0 2023-10-09 12:32:37,590 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2757962.6666666665, ans=0.2 2023-10-09 12:32:55,891 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 2.973e+02 3.416e+02 4.008e+02 8.801e+02, threshold=6.831e+02, percent-clipped=1.0 2023-10-09 12:33:04,284 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2758056.0, ans=0.0 2023-10-09 12:33:15,110 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2758102.6666666665, ans=0.125 2023-10-09 12:33:26,295 INFO [train.py:1031] (1/4) Epoch 14, batch 6300, loss[loss=0.2018, simple_loss=0.2756, pruned_loss=0.0465, ctc_loss=0.08724, over 16838.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2767, pruned_loss=0.0572, ctc_loss=0.1011, over 3300606.75 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:33:27,678 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:33:58,254 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2758242.6666666665, ans=0.0 2023-10-09 12:34:06,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2758289.3333333335, ans=0.0 2023-10-09 12:34:12,551 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2758289.3333333335, ans=0.0 2023-10-09 12:34:28,480 INFO [train.py:1031] (1/4) Epoch 14, batch 6350, loss[loss=0.2595, simple_loss=0.2996, pruned_loss=0.07993, ctc_loss=0.1488, over 16404.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2772, pruned_loss=0.05824, ctc_loss=0.1029, over 3299562.44 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:34:29,286 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2023-10-09 12:34:38,604 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:35:00,732 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 3.028e+02 3.570e+02 4.973e+02 1.101e+03, threshold=7.141e+02, percent-clipped=8.0 2023-10-09 12:35:01,009 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2758476.0, ans=0.1 2023-10-09 12:35:01,256 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-10-09 12:35:15,596 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2758522.6666666665, ans=0.0 2023-10-09 12:35:17,710 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2023-10-09 12:35:32,010 INFO [train.py:1031] (1/4) Epoch 14, batch 6400, loss[loss=0.2141, simple_loss=0.2585, pruned_loss=0.06287, ctc_loss=0.1097, over 16751.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2796, pruned_loss=0.05893, ctc_loss=0.1043, over 3291837.18 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:35:35,351 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2758616.0, ans=0.0 2023-10-09 12:35:52,948 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2758662.6666666665, ans=0.125 2023-10-09 12:36:11,127 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2758756.0, ans=0.07 2023-10-09 12:36:34,414 INFO [train.py:1031] (1/4) Epoch 14, batch 6450, loss[loss=0.2489, simple_loss=0.3054, pruned_loss=0.07219, ctc_loss=0.12, over 16873.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2879, pruned_loss=0.06086, ctc_loss=0.1077, over 3271851.81 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:36:34,747 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2758849.3333333335, ans=0.125 2023-10-09 12:36:35,754 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2758849.3333333335, ans=0.125 2023-10-09 12:36:52,684 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-10-09 12:37:07,768 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2758942.6666666665, ans=0.0 2023-10-09 12:37:09,122 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.527e+02 4.142e+02 5.294e+02 1.315e+03, threshold=8.284e+02, percent-clipped=10.0 2023-10-09 12:37:25,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759036.0, ans=0.1 2023-10-09 12:37:30,664 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2759036.0, ans=0.0 2023-10-09 12:37:37,500 INFO [train.py:1031] (1/4) Epoch 14, batch 6500, loss[loss=0.2797, simple_loss=0.3598, pruned_loss=0.07385, ctc_loss=0.1299, over 16679.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2928, pruned_loss=0.06257, ctc_loss=0.1105, over 3282052.00 frames. ], batch size: 351, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:37:42,261 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2759082.6666666665, ans=0.07 2023-10-09 12:37:46,457 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2759082.6666666665, ans=0.04949747468305833 2023-10-09 12:37:46,556 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2759082.6666666665, ans=0.0 2023-10-09 12:37:46,843 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.54 vs. limit=6.0 2023-10-09 12:37:56,935 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2759129.3333333335, ans=0.125 2023-10-09 12:38:03,602 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2023-10-09 12:38:06,438 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2023-10-09 12:38:23,639 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2759222.6666666665, ans=0.125 2023-10-09 12:38:34,896 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=22.5 2023-10-09 12:38:39,331 INFO [train.py:1031] (1/4) Epoch 14, batch 6550, loss[loss=0.212, simple_loss=0.2997, pruned_loss=0.04519, ctc_loss=0.08455, over 16918.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2968, pruned_loss=0.06166, ctc_loss=0.109, over 3283251.24 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:38:48,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2759316.0, ans=0.035 2023-10-09 12:38:51,099 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2759362.6666666665, ans=0.125 2023-10-09 12:39:05,822 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2759409.3333333335, ans=0.0 2023-10-09 12:39:13,987 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+02 3.110e+02 3.501e+02 4.783e+02 9.305e+02, threshold=7.003e+02, percent-clipped=1.0 2023-10-09 12:39:20,887 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2759456.0, ans=0.125 2023-10-09 12:39:22,425 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759456.0, ans=0.1 2023-10-09 12:39:41,348 INFO [train.py:1031] (1/4) Epoch 14, batch 6600, loss[loss=0.235, simple_loss=0.2846, pruned_loss=0.06744, ctc_loss=0.1266, over 15142.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2936, pruned_loss=0.06136, ctc_loss=0.1084, over 3273092.85 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:40:29,314 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2759736.0, ans=0.0 2023-10-09 12:40:30,732 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-10-09 12:40:33,191 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759736.0, ans=0.1 2023-10-09 12:40:43,320 INFO [train.py:1031] (1/4) Epoch 14, batch 6650, loss[loss=0.205, simple_loss=0.2591, pruned_loss=0.05527, ctc_loss=0.1009, over 16653.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2863, pruned_loss=0.06081, ctc_loss=0.1071, over 3274892.01 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:40:46,071 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2023-10-09 12:40:49,496 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2759782.6666666665, ans=0.125 2023-10-09 12:40:52,527 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2023-10-09 12:41:00,868 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2759829.3333333335, ans=0.125 2023-10-09 12:41:04,571 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2759829.3333333335, ans=0.1 2023-10-09 12:41:18,756 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.057e+02 3.351e+02 3.889e+02 6.888e+02, threshold=6.703e+02, percent-clipped=0.0 2023-10-09 12:41:25,373 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759922.6666666665, ans=0.1 2023-10-09 12:41:45,279 INFO [train.py:1031] (1/4) Epoch 14, batch 6700, loss[loss=0.2617, simple_loss=0.3378, pruned_loss=0.06767, ctc_loss=0.1258, over 16891.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2879, pruned_loss=0.06004, ctc_loss=0.1061, over 3283035.97 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:41:58,865 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2760062.6666666665, ans=0.125 2023-10-09 12:42:28,980 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2760156.0, ans=0.0 2023-10-09 12:42:36,838 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-10-09 12:42:48,657 INFO [train.py:1031] (1/4) Epoch 14, batch 6750, loss[loss=0.28, simple_loss=0.3374, pruned_loss=0.08111, ctc_loss=0.151, over 16827.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2998, pruned_loss=0.06338, ctc_loss=0.1128, over 3291234.25 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:42:53,378 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2760249.3333333335, ans=0.125 2023-10-09 12:43:07,332 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-10-09 12:43:11,138 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2760296.0, ans=0.125 2023-10-09 12:43:25,067 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-10-09 12:43:25,357 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 3.277e+02 3.937e+02 4.777e+02 6.969e+02, threshold=7.873e+02, percent-clipped=1.0 2023-10-09 12:43:39,775 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2760436.0, ans=0.1 2023-10-09 12:43:42,975 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2760436.0, ans=0.2 2023-10-09 12:43:48,001 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2760436.0, ans=0.0 2023-10-09 12:43:49,732 INFO [train.py:1031] (1/4) Epoch 14, batch 6800, loss[loss=0.2116, simple_loss=0.266, pruned_loss=0.05777, ctc_loss=0.1044, over 16767.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2989, pruned_loss=0.06533, ctc_loss=0.116, over 3291285.35 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:44:00,046 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-10-09 12:44:06,750 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2760529.3333333335, ans=0.125 2023-10-09 12:44:09,483 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2760529.3333333335, ans=0.125 2023-10-09 12:44:09,524 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2760529.3333333335, ans=0.07 2023-10-09 12:44:12,800 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2760529.3333333335, ans=0.125 2023-10-09 12:44:19,843 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2760576.0, ans=0.125 2023-10-09 12:44:20,010 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-10-09 12:44:43,113 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2760669.3333333335, ans=0.125 2023-10-09 12:44:51,403 INFO [train.py:1031] (1/4) Epoch 14, batch 6850, loss[loss=0.2618, simple_loss=0.3185, pruned_loss=0.07512, ctc_loss=0.1371, over 16511.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2984, pruned_loss=0.06424, ctc_loss=0.1145, over 3299659.27 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:44:51,691 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2760716.0, ans=0.125 2023-10-09 12:44:54,616 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2760716.0, ans=0.5 2023-10-09 12:45:11,183 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2760762.6666666665, ans=0.0 2023-10-09 12:45:28,869 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.178e+02 3.827e+02 4.505e+02 1.079e+03, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 12:45:54,910 INFO [train.py:1031] (1/4) Epoch 14, batch 6900, loss[loss=0.2529, simple_loss=0.3017, pruned_loss=0.07612, ctc_loss=0.1296, over 16755.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.3009, pruned_loss=0.06569, ctc_loss=0.1168, over 3294059.39 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 12:46:00,706 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2760949.3333333335, ans=0.125 2023-10-09 12:46:33,123 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2761089.3333333335, ans=0.125 2023-10-09 12:46:41,744 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2761089.3333333335, ans=0.125 2023-10-09 12:46:42,038 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2023-10-09 12:46:44,920 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2761136.0, ans=0.125 2023-10-09 12:46:52,018 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2761136.0, ans=0.1 2023-10-09 12:46:55,999 INFO [train.py:1031] (1/4) Epoch 14, batch 6950, loss[loss=0.1843, simple_loss=0.2662, pruned_loss=0.03812, ctc_loss=0.06552, over 16783.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.3015, pruned_loss=0.06715, ctc_loss=0.1192, over 3288944.76 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:47:31,191 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=22.5 2023-10-09 12:47:34,796 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.216e+02 3.562e+02 4.288e+02 5.901e+02, threshold=7.125e+02, percent-clipped=0.0 2023-10-09 12:47:43,014 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.10 vs. limit=10.0 2023-10-09 12:47:46,197 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2761369.3333333335, ans=0.0 2023-10-09 12:47:55,787 INFO [train.py:1031] (1/4) Epoch 14, batch 7000, loss[loss=0.1992, simple_loss=0.256, pruned_loss=0.05231, ctc_loss=0.09468, over 16931.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2967, pruned_loss=0.06597, ctc_loss=0.1171, over 3290918.77 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:48:03,134 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2761416.0, ans=0.125 2023-10-09 12:48:37,913 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2761556.0, ans=0.09899494936611666 2023-10-09 12:48:45,935 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:48:46,210 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-10-09 12:48:50,759 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2761602.6666666665, ans=0.125 2023-10-09 12:48:56,192 INFO [train.py:1031] (1/4) Epoch 14, batch 7050, loss[loss=0.2009, simple_loss=0.2638, pruned_loss=0.04919, ctc_loss=0.0988, over 16882.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2886, pruned_loss=0.06396, ctc_loss=0.1139, over 3301369.36 frames. ], batch size: 310, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:48:57,948 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-10-09 12:49:09,619 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=22.5 2023-10-09 12:49:16,485 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.01 vs. limit=6.0 2023-10-09 12:49:27,082 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2761742.6666666665, ans=0.125 2023-10-09 12:49:38,087 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.795e+02 3.132e+02 3.641e+02 6.976e+02, threshold=6.264e+02, percent-clipped=0.0 2023-10-09 12:49:41,049 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2761789.3333333335, ans=10.0 2023-10-09 12:49:44,243 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2761836.0, ans=0.2 2023-10-09 12:49:45,383 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2761836.0, ans=0.07 2023-10-09 12:49:45,834 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2023-10-09 12:49:58,341 INFO [train.py:1031] (1/4) Epoch 14, batch 7100, loss[loss=0.1695, simple_loss=0.1989, pruned_loss=0.05387, ctc_loss=0.08095, over 9720.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2797, pruned_loss=0.06208, ctc_loss=0.1098, over 3288202.91 frames. ], batch size: 35, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:50:11,864 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2761929.3333333335, ans=0.05 2023-10-09 12:50:18,155 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.90 vs. limit=10.0 2023-10-09 12:50:23,648 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2761976.0, ans=0.0 2023-10-09 12:50:26,589 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-10-09 12:50:37,661 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2762022.6666666665, ans=0.035 2023-10-09 12:50:44,333 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2762022.6666666665, ans=0.0 2023-10-09 12:50:45,428 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2762022.6666666665, ans=0.0 2023-10-09 12:50:46,450 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2762022.6666666665, ans=0.04949747468305833 2023-10-09 12:50:46,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2762022.6666666665, ans=15.0 2023-10-09 12:51:00,103 INFO [train.py:1031] (1/4) Epoch 14, batch 7150, loss[loss=0.2051, simple_loss=0.2538, pruned_loss=0.05806, ctc_loss=0.1005, over 16795.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2736, pruned_loss=0.06188, ctc_loss=0.1093, over 3296896.91 frames. ], batch size: 141, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:51:00,465 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2762116.0, ans=0.95 2023-10-09 12:51:04,401 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2762116.0, ans=0.0 2023-10-09 12:51:11,334 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2762116.0, ans=0.125 2023-10-09 12:51:12,553 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:51:25,825 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2762209.3333333335, ans=0.0 2023-10-09 12:51:30,956 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2023-10-09 12:51:41,324 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2762256.0, ans=0.0 2023-10-09 12:51:43,562 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.162e+02 3.626e+02 4.175e+02 1.632e+03, threshold=7.251e+02, percent-clipped=2.0 2023-10-09 12:52:00,870 INFO [train.py:1031] (1/4) Epoch 14, batch 7200, loss[loss=0.2204, simple_loss=0.2709, pruned_loss=0.0631, ctc_loss=0.1093, over 16939.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2743, pruned_loss=0.06378, ctc_loss=0.1121, over 3296331.45 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:52:01,207 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2762349.3333333335, ans=0.0 2023-10-09 12:52:05,371 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2762349.3333333335, ans=0.2 2023-10-09 12:52:07,582 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2762349.3333333335, ans=0.125 2023-10-09 12:52:18,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2762396.0, ans=0.2 2023-10-09 12:52:21,174 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2762396.0, ans=0.0 2023-10-09 12:52:28,945 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:52:29,257 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2023-10-09 12:52:51,421 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2762536.0, ans=0.125 2023-10-09 12:53:01,774 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2762582.6666666665, ans=0.125 2023-10-09 12:53:03,088 INFO [train.py:1031] (1/4) Epoch 14, batch 7250, loss[loss=0.2814, simple_loss=0.3303, pruned_loss=0.08671, ctc_loss=0.1475, over 16745.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2753, pruned_loss=0.06444, ctc_loss=0.1129, over 3293080.83 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:53:07,960 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-10-09 12:53:29,697 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2762676.0, ans=0.125 2023-10-09 12:53:36,160 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2762676.0, ans=0.05 2023-10-09 12:53:37,103 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2762676.0, ans=0.125 2023-10-09 12:53:43,673 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2762722.6666666665, ans=0.125 2023-10-09 12:53:44,508 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2762722.6666666665, ans=0.0 2023-10-09 12:53:49,738 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+02 3.091e+02 3.554e+02 4.025e+02 7.139e+02, threshold=7.107e+02, percent-clipped=0.0 2023-10-09 12:53:58,749 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2762769.3333333335, ans=0.1 2023-10-09 12:53:59,843 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2762769.3333333335, ans=0.125 2023-10-09 12:54:07,990 INFO [train.py:1031] (1/4) Epoch 14, batch 7300, loss[loss=0.2185, simple_loss=0.2728, pruned_loss=0.06068, ctc_loss=0.107, over 16702.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2773, pruned_loss=0.06255, ctc_loss=0.1099, over 3299164.44 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:54:10,356 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2762816.0, ans=0.025 2023-10-09 12:54:35,523 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2023-10-09 12:54:42,021 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2762956.0, ans=0.125 2023-10-09 12:55:07,496 INFO [train.py:1031] (1/4) Epoch 14, batch 7350, loss[loss=0.2089, simple_loss=0.2507, pruned_loss=0.0607, ctc_loss=0.1142, over 15374.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2765, pruned_loss=0.06244, ctc_loss=0.1098, over 3305079.35 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:55:17,252 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.48 vs. limit=15.0 2023-10-09 12:55:20,492 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2763096.0, ans=0.125 2023-10-09 12:55:47,398 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=22.5 2023-10-09 12:55:50,541 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.022e+02 3.585e+02 4.093e+02 1.089e+03, threshold=7.169e+02, percent-clipped=4.0 2023-10-09 12:56:07,708 INFO [train.py:1031] (1/4) Epoch 14, batch 7400, loss[loss=0.221, simple_loss=0.2922, pruned_loss=0.05583, ctc_loss=0.09538, over 16798.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2788, pruned_loss=0.06409, ctc_loss=0.1125, over 3308176.61 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:56:12,873 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2763282.6666666665, ans=0.125 2023-10-09 12:56:14,036 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2763282.6666666665, ans=0.0 2023-10-09 12:56:40,166 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2763376.0, ans=0.0 2023-10-09 12:56:51,565 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2763422.6666666665, ans=0.125 2023-10-09 12:57:00,873 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2763469.3333333335, ans=0.025 2023-10-09 12:57:09,480 INFO [train.py:1031] (1/4) Epoch 14, batch 7450, loss[loss=0.2417, simple_loss=0.3228, pruned_loss=0.05792, ctc_loss=0.1118, over 16786.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2866, pruned_loss=0.06471, ctc_loss=0.1139, over 3302493.35 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:57:16,957 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2763516.0, ans=0.125 2023-10-09 12:57:17,008 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2763516.0, ans=0.1 2023-10-09 12:57:28,142 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2763562.6666666665, ans=0.0 2023-10-09 12:57:38,214 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2763609.3333333335, ans=0.0 2023-10-09 12:57:38,274 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2763609.3333333335, ans=0.125 2023-10-09 12:57:58,524 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 3.056e+02 3.585e+02 4.525e+02 9.951e+02, threshold=7.170e+02, percent-clipped=3.0 2023-10-09 12:58:02,791 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2763702.6666666665, ans=0.1 2023-10-09 12:58:11,428 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.32 vs. limit=10.0 2023-10-09 12:58:13,556 INFO [train.py:1031] (1/4) Epoch 14, batch 7500, loss[loss=0.1773, simple_loss=0.2314, pruned_loss=0.04606, ctc_loss=0.07788, over 16896.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2858, pruned_loss=0.06105, ctc_loss=0.1081, over 3310312.60 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:58:20,819 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2763749.3333333335, ans=0.0 2023-10-09 12:58:31,669 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:58:32,802 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2763796.0, ans=0.125 2023-10-09 12:58:53,120 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2023-10-09 12:58:59,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2763889.3333333335, ans=0.0 2023-10-09 12:59:13,831 INFO [train.py:1031] (1/4) Epoch 14, batch 7550, loss[loss=0.277, simple_loss=0.3128, pruned_loss=0.08947, ctc_loss=0.1556, over 16601.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2857, pruned_loss=0.05926, ctc_loss=0.105, over 3317385.78 frames. ], batch size: 350, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:59:20,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2763982.6666666665, ans=0.125 2023-10-09 12:59:25,115 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2764029.3333333335, ans=0.125 2023-10-09 12:59:55,458 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:59:56,483 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2764122.6666666665, ans=0.125 2023-10-09 12:59:59,786 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+02 3.477e+02 4.054e+02 5.168e+02 9.952e+02, threshold=8.108e+02, percent-clipped=5.0 2023-10-09 13:00:01,772 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764169.3333333335, ans=0.1 2023-10-09 13:00:14,751 INFO [train.py:1031] (1/4) Epoch 14, batch 7600, loss[loss=0.2153, simple_loss=0.2734, pruned_loss=0.05768, ctc_loss=0.1048, over 16769.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2857, pruned_loss=0.06082, ctc_loss=0.1073, over 3323179.30 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:00:22,499 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2764216.0, ans=0.125 2023-10-09 13:00:32,000 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2764262.6666666665, ans=0.0 2023-10-09 13:00:46,938 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-10-09 13:00:50,835 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2764356.0, ans=0.125 2023-10-09 13:00:57,494 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-10-09 13:01:04,179 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2023-10-09 13:01:16,423 INFO [train.py:1031] (1/4) Epoch 14, batch 7650, loss[loss=0.2234, simple_loss=0.2712, pruned_loss=0.06427, ctc_loss=0.1176, over 16482.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2852, pruned_loss=0.0628, ctc_loss=0.1102, over 3329124.59 frames. ], batch size: 466, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:01:57,952 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2764589.3333333335, ans=0.0 2023-10-09 13:02:03,203 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2764589.3333333335, ans=0.125 2023-10-09 13:02:03,962 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.219e+02 3.717e+02 4.421e+02 1.818e+03, threshold=7.434e+02, percent-clipped=3.0 2023-10-09 13:02:16,471 INFO [train.py:1031] (1/4) Epoch 14, batch 7700, loss[loss=0.2071, simple_loss=0.2726, pruned_loss=0.05231, ctc_loss=0.09272, over 16947.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2833, pruned_loss=0.06273, ctc_loss=0.1103, over 3332565.63 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:02:35,612 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2764729.3333333335, ans=0.125 2023-10-09 13:02:53,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2764822.6666666665, ans=0.1 2023-10-09 13:02:59,128 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2764822.6666666665, ans=0.0 2023-10-09 13:03:01,364 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2764822.6666666665, ans=0.1 2023-10-09 13:03:09,276 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2764869.3333333335, ans=0.125 2023-10-09 13:03:17,589 INFO [train.py:1031] (1/4) Epoch 14, batch 7750, loss[loss=0.2528, simple_loss=0.2761, pruned_loss=0.08413, ctc_loss=0.1531, over 16636.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2821, pruned_loss=0.06403, ctc_loss=0.1123, over 3324722.67 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:03:17,888 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2764916.0, ans=0.125 2023-10-09 13:03:35,408 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2764962.6666666665, ans=0.125 2023-10-09 13:04:08,220 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.190e+02 3.453e+02 4.126e+02 8.582e+02, threshold=6.905e+02, percent-clipped=1.0 2023-10-09 13:04:11,877 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=30.09 vs. limit=15.0 2023-10-09 13:04:19,223 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2765149.3333333335, ans=0.125 2023-10-09 13:04:20,486 INFO [train.py:1031] (1/4) Epoch 14, batch 7800, loss[loss=0.2077, simple_loss=0.2671, pruned_loss=0.05612, ctc_loss=0.09019, over 16900.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2819, pruned_loss=0.06546, ctc_loss=0.1133, over 3309512.85 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:04:42,616 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2765196.0, ans=0.0 2023-10-09 13:04:54,023 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2765242.6666666665, ans=0.0 2023-10-09 13:05:04,275 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2765289.3333333335, ans=0.125 2023-10-09 13:05:14,773 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2765336.0, ans=0.125 2023-10-09 13:05:23,386 INFO [train.py:1031] (1/4) Epoch 14, batch 7850, loss[loss=0.1887, simple_loss=0.2411, pruned_loss=0.05145, ctc_loss=0.08327, over 16741.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2857, pruned_loss=0.06538, ctc_loss=0.1122, over 3310282.95 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:05:34,959 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2765429.3333333335, ans=0.125 2023-10-09 13:05:37,113 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2765429.3333333335, ans=0.125 2023-10-09 13:05:41,373 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2765429.3333333335, ans=0.1 2023-10-09 13:06:06,622 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2765522.6666666665, ans=0.125 2023-10-09 13:06:11,562 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2765522.6666666665, ans=0.04949747468305833 2023-10-09 13:06:14,914 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+02 3.051e+02 3.784e+02 4.491e+02 1.708e+03, threshold=7.568e+02, percent-clipped=4.0 2023-10-09 13:06:26,268 INFO [train.py:1031] (1/4) Epoch 14, batch 7900, loss[loss=0.2187, simple_loss=0.2803, pruned_loss=0.05893, ctc_loss=0.0981, over 16777.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2898, pruned_loss=0.06333, ctc_loss=0.1094, over 3309976.09 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:06:26,626 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2765616.0, ans=0.1 2023-10-09 13:06:32,698 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-10-09 13:06:36,148 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2765616.0, ans=0.1 2023-10-09 13:06:51,833 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2765709.3333333335, ans=0.125 2023-10-09 13:07:27,537 INFO [train.py:1031] (1/4) Epoch 14, batch 7950, loss[loss=0.2358, simple_loss=0.2771, pruned_loss=0.07082, ctc_loss=0.1321, over 15407.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2904, pruned_loss=0.06304, ctc_loss=0.1093, over 3311220.38 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:08:06,305 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=15.0 2023-10-09 13:08:19,020 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.992e+02 3.305e+02 3.981e+02 8.015e+02, threshold=6.609e+02, percent-clipped=1.0 2023-10-09 13:08:28,686 INFO [train.py:1031] (1/4) Epoch 14, batch 8000, loss[loss=0.2043, simple_loss=0.2581, pruned_loss=0.05607, ctc_loss=0.09572, over 16998.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2869, pruned_loss=0.06376, ctc_loss=0.1103, over 3311823.55 frames. ], batch size: 203, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:08:34,222 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2766082.6666666665, ans=0.125 2023-10-09 13:09:08,753 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2766222.6666666665, ans=0.0 2023-10-09 13:09:29,361 INFO [train.py:1031] (1/4) Epoch 14, batch 8050, loss[loss=0.2307, simple_loss=0.2836, pruned_loss=0.065, ctc_loss=0.1196, over 16893.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.285, pruned_loss=0.06456, ctc_loss=0.1117, over 3313673.56 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:09:49,025 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2766362.6666666665, ans=0.0 2023-10-09 13:09:56,955 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2766409.3333333335, ans=0.125 2023-10-09 13:10:11,508 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2766456.0, ans=0.125 2023-10-09 13:10:13,179 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2766456.0, ans=0.125 2023-10-09 13:10:21,441 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:10:22,705 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.190e+02 3.616e+02 4.250e+02 6.056e+02, threshold=7.233e+02, percent-clipped=0.0 2023-10-09 13:10:26,264 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:10:30,818 INFO [train.py:1031] (1/4) Epoch 14, batch 8100, loss[loss=0.2442, simple_loss=0.2988, pruned_loss=0.07192, ctc_loss=0.1145, over 17003.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2825, pruned_loss=0.06473, ctc_loss=0.112, over 3310281.69 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:10:44,528 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2766596.0, ans=0.0 2023-10-09 13:10:58,809 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2766642.6666666665, ans=0.125 2023-10-09 13:11:21,817 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2766736.0, ans=0.125 2023-10-09 13:11:31,971 INFO [train.py:1031] (1/4) Epoch 14, batch 8150, loss[loss=0.2078, simple_loss=0.2646, pruned_loss=0.05658, ctc_loss=0.09445, over 16848.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2798, pruned_loss=0.06473, ctc_loss=0.1121, over 3305913.57 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:12:10,684 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2766922.6666666665, ans=0.04949747468305833 2023-10-09 13:12:13,822 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2766922.6666666665, ans=0.0 2023-10-09 13:12:26,044 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 3.114e+02 3.645e+02 4.244e+02 8.238e+02, threshold=7.291e+02, percent-clipped=3.0 2023-10-09 13:12:28,886 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=22.5 2023-10-09 13:12:33,449 INFO [train.py:1031] (1/4) Epoch 14, batch 8200, loss[loss=0.2319, simple_loss=0.2851, pruned_loss=0.06796, ctc_loss=0.1069, over 16559.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2813, pruned_loss=0.06302, ctc_loss=0.1099, over 3296185.81 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:12:36,983 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2767016.0, ans=0.0 2023-10-09 13:12:46,170 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2767062.6666666665, ans=0.2 2023-10-09 13:12:57,992 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2767109.3333333335, ans=0.125 2023-10-09 13:13:01,806 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=22.5 2023-10-09 13:13:01,822 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2023-10-09 13:13:11,296 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.17 vs. limit=6.0 2023-10-09 13:13:19,420 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2767156.0, ans=0.125 2023-10-09 13:13:24,660 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.22 vs. limit=15.0 2023-10-09 13:13:37,129 INFO [train.py:1031] (1/4) Epoch 14, batch 8250, loss[loss=0.201, simple_loss=0.2962, pruned_loss=0.03803, ctc_loss=0.07415, over 16860.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2899, pruned_loss=0.06162, ctc_loss=0.1088, over 3295953.78 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:13:43,488 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2767249.3333333335, ans=0.0 2023-10-09 13:14:08,716 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-10-09 13:14:23,317 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2767389.3333333335, ans=0.125 2023-10-09 13:14:31,046 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2767436.0, ans=0.1 2023-10-09 13:14:32,778 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.653e+02 3.032e+02 3.705e+02 6.938e+02, threshold=6.064e+02, percent-clipped=0.0 2023-10-09 13:14:33,180 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2767436.0, ans=0.05 2023-10-09 13:14:34,882 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2767436.0, ans=0.1 2023-10-09 13:14:40,098 INFO [train.py:1031] (1/4) Epoch 14, batch 8300, loss[loss=0.2252, simple_loss=0.3031, pruned_loss=0.05459, ctc_loss=0.09519, over 16833.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2892, pruned_loss=0.0581, ctc_loss=0.1034, over 3306960.97 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:14:48,755 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2767482.6666666665, ans=0.05 2023-10-09 13:14:57,996 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2767529.3333333335, ans=0.0 2023-10-09 13:15:02,486 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2767529.3333333335, ans=0.125 2023-10-09 13:15:18,186 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=15.0 2023-10-09 13:15:19,125 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=2767622.6666666665, ans=12.0 2023-10-09 13:15:25,669 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2767622.6666666665, ans=0.125 2023-10-09 13:15:42,537 INFO [train.py:1031] (1/4) Epoch 14, batch 8350, loss[loss=0.2647, simple_loss=0.3325, pruned_loss=0.07221, ctc_loss=0.1314, over 16678.00 frames. ], tot_loss[loss=0.225, simple_loss=0.292, pruned_loss=0.05826, ctc_loss=0.1038, over 3310367.23 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:15:56,457 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2767762.6666666665, ans=0.0 2023-10-09 13:16:38,122 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.846e+02 3.361e+02 4.142e+02 6.688e+02, threshold=6.722e+02, percent-clipped=2.0 2023-10-09 13:16:41,459 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.63 vs. limit=10.0 2023-10-09 13:16:44,683 INFO [train.py:1031] (1/4) Epoch 14, batch 8400, loss[loss=0.174, simple_loss=0.2254, pruned_loss=0.0467, ctc_loss=0.07326, over 16546.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2864, pruned_loss=0.05409, ctc_loss=0.0973, over 3303472.23 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:16:53,796 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2767949.3333333335, ans=10.0 2023-10-09 13:17:05,741 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=15.0 2023-10-09 13:17:14,001 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2768042.6666666665, ans=0.125 2023-10-09 13:17:17,345 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2768042.6666666665, ans=0.125 2023-10-09 13:17:17,657 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2023-10-09 13:17:17,683 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=12.0 2023-10-09 13:17:31,016 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=12.0 2023-10-09 13:17:34,452 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2768136.0, ans=0.1 2023-10-09 13:17:41,512 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-10-09 13:17:43,892 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2768136.0, ans=0.125 2023-10-09 13:17:48,487 INFO [train.py:1031] (1/4) Epoch 14, batch 8450, loss[loss=0.3047, simple_loss=0.37, pruned_loss=0.087, ctc_loss=0.1636, over 16861.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2927, pruned_loss=0.05602, ctc_loss=0.1011, over 3304218.68 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:17:51,765 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-10-09 13:18:00,021 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2768229.3333333335, ans=0.04949747468305833 2023-10-09 13:18:06,635 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2023-10-09 13:18:09,758 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2023-10-09 13:18:15,217 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2768276.0, ans=0.125 2023-10-09 13:18:34,845 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2768322.6666666665, ans=0.125 2023-10-09 13:18:45,225 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 3.438e+02 4.157e+02 5.474e+02 9.633e+02, threshold=8.314e+02, percent-clipped=10.0 2023-10-09 13:18:48,295 INFO [train.py:1031] (1/4) Epoch 14, batch 8500, loss[loss=0.2416, simple_loss=0.2941, pruned_loss=0.06957, ctc_loss=0.1247, over 16905.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.297, pruned_loss=0.05886, ctc_loss=0.1063, over 3305407.32 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:18:49,255 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2768416.0, ans=0.125 2023-10-09 13:18:51,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2768416.0, ans=0.0 2023-10-09 13:18:52,622 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2023-10-09 13:18:58,098 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:18:59,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2768462.6666666665, ans=0.0 2023-10-09 13:19:04,516 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2768462.6666666665, ans=0.04949747468305833 2023-10-09 13:19:06,682 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2768462.6666666665, ans=0.125 2023-10-09 13:19:26,936 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2768556.0, ans=0.0 2023-10-09 13:19:29,462 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2768556.0, ans=0.0 2023-10-09 13:19:35,926 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2768602.6666666665, ans=0.125 2023-10-09 13:19:48,999 INFO [train.py:1031] (1/4) Epoch 14, batch 8550, loss[loss=0.2417, simple_loss=0.2761, pruned_loss=0.07704, ctc_loss=0.1329, over 16631.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2929, pruned_loss=0.06042, ctc_loss=0.108, over 3311000.97 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:20:10,331 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2768696.0, ans=0.125 2023-10-09 13:20:20,099 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2768742.6666666665, ans=0.1 2023-10-09 13:20:29,312 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2768789.3333333335, ans=0.0 2023-10-09 13:20:30,423 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2768789.3333333335, ans=0.125 2023-10-09 13:20:34,577 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2023-10-09 13:20:45,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2768836.0, ans=0.125 2023-10-09 13:20:51,140 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.120e+02 3.650e+02 4.330e+02 6.615e+02, threshold=7.300e+02, percent-clipped=0.0 2023-10-09 13:20:51,382 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2768836.0, ans=0.1 2023-10-09 13:20:53,247 INFO [train.py:1031] (1/4) Epoch 14, batch 8600, loss[loss=0.2767, simple_loss=0.3373, pruned_loss=0.07758, ctc_loss=0.1526, over 16701.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.292, pruned_loss=0.05965, ctc_loss=0.1064, over 3309750.33 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:21:12,330 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2023-10-09 13:21:56,024 INFO [train.py:1031] (1/4) Epoch 14, batch 8650, loss[loss=0.2251, simple_loss=0.2989, pruned_loss=0.05396, ctc_loss=0.1084, over 16599.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2888, pruned_loss=0.05689, ctc_loss=0.1016, over 3299529.58 frames. ], batch size: 351, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:22:11,676 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2769162.6666666665, ans=0.125 2023-10-09 13:22:23,449 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2769209.3333333335, ans=0.0 2023-10-09 13:22:59,313 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 2.781e+02 3.323e+02 4.118e+02 1.274e+03, threshold=6.646e+02, percent-clipped=1.0 2023-10-09 13:23:00,362 INFO [train.py:1031] (1/4) Epoch 14, batch 8700, loss[loss=0.1916, simple_loss=0.2663, pruned_loss=0.04326, ctc_loss=0.076, over 16768.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2857, pruned_loss=0.05487, ctc_loss=0.09804, over 3289556.47 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:23:32,463 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=22.5 2023-10-09 13:23:36,291 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2769489.3333333335, ans=0.125 2023-10-09 13:23:43,364 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2769489.3333333335, ans=0.0 2023-10-09 13:23:45,467 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2769489.3333333335, ans=0.125 2023-10-09 13:23:49,167 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2769536.0, ans=0.125 2023-10-09 13:24:00,640 INFO [train.py:1031] (1/4) Epoch 14, batch 8750, loss[loss=0.2406, simple_loss=0.3214, pruned_loss=0.05758, ctc_loss=0.1115, over 16830.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2879, pruned_loss=0.05473, ctc_loss=0.09868, over 3283585.66 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:24:04,827 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2769582.6666666665, ans=0.1 2023-10-09 13:24:06,941 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2769582.6666666665, ans=0.2 2023-10-09 13:24:08,593 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2769582.6666666665, ans=0.125 2023-10-09 13:24:18,965 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2769629.3333333335, ans=0.1 2023-10-09 13:24:26,583 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2769676.0, ans=0.1 2023-10-09 13:24:29,199 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2023-10-09 13:24:29,715 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2769676.0, ans=0.125 2023-10-09 13:24:36,171 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2769676.0, ans=0.125 2023-10-09 13:24:40,637 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2769722.6666666665, ans=0.0 2023-10-09 13:24:41,677 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2769722.6666666665, ans=0.0 2023-10-09 13:24:57,498 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-10-09 13:25:02,688 INFO [train.py:1031] (1/4) Epoch 14, batch 8800, loss[loss=0.1704, simple_loss=0.2515, pruned_loss=0.03342, ctc_loss=0.05608, over 16605.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2832, pruned_loss=0.0505, ctc_loss=0.09143, over 3288779.35 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:25:03,031 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2769816.0, ans=0.07 2023-10-09 13:25:03,717 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.673e+02 3.234e+02 4.604e+02 9.306e+02, threshold=6.469e+02, percent-clipped=8.0 2023-10-09 13:25:08,778 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-10-09 13:25:21,326 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2769862.6666666665, ans=0.125 2023-10-09 13:25:26,374 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2769862.6666666665, ans=0.125 2023-10-09 13:25:28,527 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2769909.3333333335, ans=0.125 2023-10-09 13:25:33,875 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2769909.3333333335, ans=0.0 2023-10-09 13:25:36,584 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2769909.3333333335, ans=0.125 2023-10-09 13:26:05,139 INFO [train.py:1031] (1/4) Epoch 14, batch 8850, loss[loss=0.2145, simple_loss=0.2882, pruned_loss=0.05112, ctc_loss=0.09635, over 16683.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2787, pruned_loss=0.04615, ctc_loss=0.08419, over 3292243.73 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:26:14,820 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-10-09 13:26:21,240 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2770096.0, ans=0.125 2023-10-09 13:27:05,691 INFO [train.py:1031] (1/4) Epoch 14, batch 8900, loss[loss=0.2586, simple_loss=0.3186, pruned_loss=0.07599, ctc_loss=0.1167, over 14044.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2747, pruned_loss=0.046, ctc_loss=0.08309, over 3300821.11 frames. ], batch size: 35, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:27:07,137 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2770282.6666666665, ans=0.0 2023-10-09 13:27:08,454 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.336e+02 2.693e+02 3.509e+02 6.659e+02, threshold=5.387e+02, percent-clipped=1.0 2023-10-09 13:27:16,243 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2770282.6666666665, ans=0.125 2023-10-09 13:27:29,498 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2770376.0, ans=0.0 2023-10-09 13:27:32,362 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2770376.0, ans=0.125 2023-10-09 13:27:51,519 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-10-09 13:27:53,411 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2770422.6666666665, ans=0.0 2023-10-09 13:27:55,663 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2770469.3333333335, ans=0.125 2023-10-09 13:28:05,475 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2770469.3333333335, ans=0.0 2023-10-09 13:28:08,426 INFO [train.py:1031] (1/4) Epoch 14, batch 8950, loss[loss=0.1892, simple_loss=0.2437, pruned_loss=0.04978, ctc_loss=0.08767, over 16838.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2742, pruned_loss=0.04986, ctc_loss=0.08887, over 3302663.75 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:28:15,597 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2770516.0, ans=0.0 2023-10-09 13:28:20,349 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2770562.6666666665, ans=0.1 2023-10-09 13:28:25,508 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2770562.6666666665, ans=0.035 2023-10-09 13:28:29,915 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770562.6666666665, ans=0.1 2023-10-09 13:28:35,753 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2770609.3333333335, ans=0.2 2023-10-09 13:28:37,528 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2770609.3333333335, ans=0.125 2023-10-09 13:28:44,735 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2023-10-09 13:28:57,134 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2023-10-09 13:29:10,855 INFO [train.py:1031] (1/4) Epoch 14, batch 9000, loss[loss=0.2163, simple_loss=0.2661, pruned_loss=0.06236, ctc_loss=0.1043, over 16890.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2708, pruned_loss=0.0529, ctc_loss=0.09356, over 3306251.74 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:29:10,855 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 13:29:26,445 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2412, simple_loss=0.3097, pruned_loss=0.06635, ctc_loss=0.1001, over 1796401.00 frames. 2023-10-09 13:29:26,446 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14370MB 2023-10-09 13:29:26,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2770749.3333333335, ans=0.125 2023-10-09 13:29:29,130 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.435e+02 3.467e+02 3.875e+02 4.625e+02 8.873e+02, threshold=7.750e+02, percent-clipped=12.0 2023-10-09 13:29:41,195 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.69 vs. limit=10.0 2023-10-09 13:29:46,718 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2770796.0, ans=0.0 2023-10-09 13:29:49,112 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2023-10-09 13:30:08,362 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2770889.3333333335, ans=0.125 2023-10-09 13:30:15,027 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2770936.0, ans=0.125 2023-10-09 13:30:18,098 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:30:28,024 INFO [train.py:1031] (1/4) Epoch 14, batch 9050, loss[loss=0.189, simple_loss=0.2432, pruned_loss=0.04992, ctc_loss=0.08762, over 16790.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2677, pruned_loss=0.0556, ctc_loss=0.0979, over 3311282.73 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:30:42,912 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2771029.3333333335, ans=0.125 2023-10-09 13:30:47,605 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2771029.3333333335, ans=0.125 2023-10-09 13:31:04,290 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-10-09 13:31:05,996 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771122.6666666665, ans=0.1 2023-10-09 13:31:13,550 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:31:29,169 INFO [train.py:1031] (1/4) Epoch 14, batch 9100, loss[loss=0.1906, simple_loss=0.2445, pruned_loss=0.05113, ctc_loss=0.08611, over 16798.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2641, pruned_loss=0.05626, ctc_loss=0.09887, over 3317270.94 frames. ], batch size: 189, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:31:34,214 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2023-10-09 13:31:34,400 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.919e+02 3.286e+02 3.918e+02 6.845e+02, threshold=6.573e+02, percent-clipped=0.0 2023-10-09 13:31:34,756 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2771216.0, ans=0.125 2023-10-09 13:31:54,055 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2771309.3333333335, ans=0.125 2023-10-09 13:31:54,584 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.72 vs. limit=22.5 2023-10-09 13:32:09,174 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2771356.0, ans=0.0 2023-10-09 13:32:19,137 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2771402.6666666665, ans=0.125 2023-10-09 13:32:22,566 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2771402.6666666665, ans=0.2 2023-10-09 13:32:22,585 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2771402.6666666665, ans=0.125 2023-10-09 13:32:30,944 INFO [train.py:1031] (1/4) Epoch 14, batch 9150, loss[loss=0.2303, simple_loss=0.2842, pruned_loss=0.0648, ctc_loss=0.1168, over 16806.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2645, pruned_loss=0.05363, ctc_loss=0.09506, over 3317137.40 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:33:02,622 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771542.6666666665, ans=0.1 2023-10-09 13:33:09,464 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2023-10-09 13:33:23,227 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2771636.0, ans=0.125 2023-10-09 13:33:30,897 INFO [train.py:1031] (1/4) Epoch 14, batch 9200, loss[loss=0.2209, simple_loss=0.2801, pruned_loss=0.06043, ctc_loss=0.1023, over 16840.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.268, pruned_loss=0.05559, ctc_loss=0.09808, over 3317711.97 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:33:37,286 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.803e+02 3.406e+02 4.271e+02 8.869e+02, threshold=6.811e+02, percent-clipped=4.0 2023-10-09 13:33:56,911 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=22.5 2023-10-09 13:33:57,656 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2771776.0, ans=0.125 2023-10-09 13:34:15,248 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2771822.6666666665, ans=0.0 2023-10-09 13:34:31,272 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2771916.0, ans=0.125 2023-10-09 13:34:31,879 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2023-10-09 13:34:32,046 INFO [train.py:1031] (1/4) Epoch 14, batch 9250, loss[loss=0.1784, simple_loss=0.2447, pruned_loss=0.04152, ctc_loss=0.07276, over 16831.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2702, pruned_loss=0.05794, ctc_loss=0.1017, over 3324519.83 frames. ], batch size: 141, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:34:59,817 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:35:04,043 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.37 vs. limit=10.0 2023-10-09 13:35:15,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2772056.0, ans=0.2 2023-10-09 13:35:18,432 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2772056.0, ans=0.2 2023-10-09 13:35:33,973 INFO [train.py:1031] (1/4) Epoch 14, batch 9300, loss[loss=0.2629, simple_loss=0.3318, pruned_loss=0.07129, ctc_loss=0.1288, over 16741.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.273, pruned_loss=0.05737, ctc_loss=0.1013, over 3323771.38 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:35:41,376 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.952e+02 3.315e+02 3.911e+02 8.519e+02, threshold=6.629e+02, percent-clipped=4.0 2023-10-09 13:35:49,098 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2772196.0, ans=0.0 2023-10-09 13:35:51,303 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:35:57,698 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2772242.6666666665, ans=0.125 2023-10-09 13:36:32,918 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2772336.0, ans=0.0 2023-10-09 13:36:35,775 INFO [train.py:1031] (1/4) Epoch 14, batch 9350, loss[loss=0.1818, simple_loss=0.2556, pruned_loss=0.03946, ctc_loss=0.07263, over 16778.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2769, pruned_loss=0.05838, ctc_loss=0.1035, over 3325518.87 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:36:37,708 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2772382.6666666665, ans=0.05 2023-10-09 13:36:44,006 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2772382.6666666665, ans=15.0 2023-10-09 13:37:09,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2772476.0, ans=0.125 2023-10-09 13:37:17,908 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2772522.6666666665, ans=0.125 2023-10-09 13:37:21,670 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2772522.6666666665, ans=0.0 2023-10-09 13:37:34,627 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2772569.3333333335, ans=0.025 2023-10-09 13:37:39,103 INFO [train.py:1031] (1/4) Epoch 14, batch 9400, loss[loss=0.2028, simple_loss=0.2801, pruned_loss=0.04594, ctc_loss=0.08398, over 16819.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2824, pruned_loss=0.05805, ctc_loss=0.1036, over 3320485.46 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:37:43,310 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2772616.0, ans=0.2 2023-10-09 13:37:44,483 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-10-09 13:37:46,567 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 3.369e+02 4.289e+02 5.603e+02 1.054e+03, threshold=8.577e+02, percent-clipped=14.0 2023-10-09 13:38:00,502 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2772662.6666666665, ans=0.125 2023-10-09 13:38:40,999 INFO [train.py:1031] (1/4) Epoch 14, batch 9450, loss[loss=0.2398, simple_loss=0.3066, pruned_loss=0.06182, ctc_loss=0.1232, over 16665.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2837, pruned_loss=0.05614, ctc_loss=0.1007, over 3302958.92 frames. ], batch size: 351, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:39:18,719 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2772989.3333333335, ans=0.125 2023-10-09 13:39:29,581 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2773036.0, ans=0.125 2023-10-09 13:39:39,671 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2773036.0, ans=0.125 2023-10-09 13:39:43,188 INFO [train.py:1031] (1/4) Epoch 14, batch 9500, loss[loss=0.2954, simple_loss=0.3178, pruned_loss=0.1018, ctc_loss=0.1734, over 16863.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2835, pruned_loss=0.05823, ctc_loss=0.1036, over 3309204.56 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:39:51,849 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 3.046e+02 3.571e+02 4.127e+02 8.787e+02, threshold=7.141e+02, percent-clipped=1.0 2023-10-09 13:39:58,761 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2773129.3333333335, ans=0.07 2023-10-09 13:39:59,786 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2773129.3333333335, ans=0.07 2023-10-09 13:40:07,655 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-10-09 13:40:46,271 INFO [train.py:1031] (1/4) Epoch 14, batch 9550, loss[loss=0.2584, simple_loss=0.3091, pruned_loss=0.07608, ctc_loss=0.1387, over 16841.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2866, pruned_loss=0.06231, ctc_loss=0.11, over 3314780.90 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:40:53,116 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2773316.0, ans=0.0 2023-10-09 13:41:08,586 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.39 vs. limit=6.0 2023-10-09 13:41:15,198 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:41:17,316 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2773409.3333333335, ans=0.0 2023-10-09 13:41:33,729 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2773456.0, ans=0.2 2023-10-09 13:41:48,549 INFO [train.py:1031] (1/4) Epoch 14, batch 9600, loss[loss=0.2514, simple_loss=0.3071, pruned_loss=0.07305, ctc_loss=0.124, over 16779.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2918, pruned_loss=0.06625, ctc_loss=0.1173, over 3310336.04 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:42:00,632 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.675e+02 3.309e+02 3.670e+02 4.199e+02 1.268e+03, threshold=7.340e+02, percent-clipped=3.0 2023-10-09 13:42:05,855 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2773596.0, ans=0.0 2023-10-09 13:42:24,101 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.32 vs. limit=6.0 2023-10-09 13:42:25,736 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=22.5 2023-10-09 13:42:33,886 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-10-09 13:42:35,249 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:42:35,662 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2023-10-09 13:42:47,742 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2773736.0, ans=0.1 2023-10-09 13:42:52,927 INFO [train.py:1031] (1/4) Epoch 14, batch 9650, loss[loss=0.1929, simple_loss=0.2618, pruned_loss=0.04533, ctc_loss=0.08315, over 16724.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2949, pruned_loss=0.06742, ctc_loss=0.1189, over 3308894.73 frames. ], batch size: 140, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:43:30,318 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2773922.6666666665, ans=0.125 2023-10-09 13:43:46,630 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2773969.3333333335, ans=0.0 2023-10-09 13:43:47,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2773969.3333333335, ans=0.125 2023-10-09 13:43:55,661 INFO [train.py:1031] (1/4) Epoch 14, batch 9700, loss[loss=0.2139, simple_loss=0.2926, pruned_loss=0.05019, ctc_loss=0.08718, over 16777.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2949, pruned_loss=0.06471, ctc_loss=0.1146, over 3309496.52 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:43:56,118 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2774016.0, ans=0.2 2023-10-09 13:43:56,122 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2774016.0, ans=0.04949747468305833 2023-10-09 13:44:05,110 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2774016.0, ans=0.1 2023-10-09 13:44:06,800 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.889e+02 3.444e+02 4.302e+02 1.235e+03, threshold=6.889e+02, percent-clipped=2.0 2023-10-09 13:44:17,781 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2774062.6666666665, ans=0.125 2023-10-09 13:44:19,799 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2774109.3333333335, ans=0.0 2023-10-09 13:44:56,792 INFO [train.py:1031] (1/4) Epoch 14, batch 9750, loss[loss=0.2016, simple_loss=0.2551, pruned_loss=0.05589, ctc_loss=0.09056, over 16883.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2869, pruned_loss=0.06353, ctc_loss=0.1124, over 3313509.68 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:44:59,900 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774249.3333333335, ans=0.125 2023-10-09 13:44:59,986 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774249.3333333335, ans=0.1 2023-10-09 13:45:18,652 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2774296.0, ans=0.2 2023-10-09 13:45:23,355 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=22.5 2023-10-09 13:45:29,266 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2774342.6666666665, ans=0.125 2023-10-09 13:45:50,433 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2774436.0, ans=0.125 2023-10-09 13:45:54,746 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2774436.0, ans=0.0 2023-10-09 13:45:56,882 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2774436.0, ans=0.1 2023-10-09 13:45:59,210 INFO [train.py:1031] (1/4) Epoch 14, batch 9800, loss[loss=0.1918, simple_loss=0.2514, pruned_loss=0.04941, ctc_loss=0.08376, over 16722.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2843, pruned_loss=0.06175, ctc_loss=0.1094, over 3302416.17 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:46:05,584 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2774482.6666666665, ans=0.125 2023-10-09 13:46:06,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2774482.6666666665, ans=0.0 2023-10-09 13:46:11,661 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.062e+02 3.512e+02 4.119e+02 7.038e+02, threshold=7.024e+02, percent-clipped=1.0 2023-10-09 13:46:58,656 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2774669.3333333335, ans=0.125 2023-10-09 13:47:01,110 INFO [train.py:1031] (1/4) Epoch 14, batch 9850, loss[loss=0.2366, simple_loss=0.2902, pruned_loss=0.06843, ctc_loss=0.1152, over 11992.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2854, pruned_loss=0.06216, ctc_loss=0.1098, over 3296568.65 frames. ], batch size: 35, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:47:59,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2774902.6666666665, ans=0.0 2023-10-09 13:48:02,255 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=15.0 2023-10-09 13:48:02,737 INFO [train.py:1031] (1/4) Epoch 14, batch 9900, loss[loss=0.2329, simple_loss=0.2709, pruned_loss=0.07271, ctc_loss=0.1236, over 16720.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2803, pruned_loss=0.06243, ctc_loss=0.1098, over 3298977.88 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:48:13,209 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2774949.3333333335, ans=0.125 2023-10-09 13:48:16,795 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+02 2.878e+02 3.183e+02 3.713e+02 1.156e+03, threshold=6.367e+02, percent-clipped=1.0 2023-10-09 13:48:18,221 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2774996.0, ans=0.2 2023-10-09 13:48:24,631 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2774996.0, ans=0.125 2023-10-09 13:48:30,359 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2775042.6666666665, ans=0.0 2023-10-09 13:48:36,467 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2775042.6666666665, ans=0.1 2023-10-09 13:48:39,727 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2775089.3333333335, ans=0.125 2023-10-09 13:49:03,027 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2775136.0, ans=0.125 2023-10-09 13:49:05,472 INFO [train.py:1031] (1/4) Epoch 14, batch 9950, loss[loss=0.1922, simple_loss=0.2479, pruned_loss=0.05114, ctc_loss=0.08545, over 16630.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2737, pruned_loss=0.06072, ctc_loss=0.1067, over 3294801.34 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:49:27,138 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.33 vs. limit=22.5 2023-10-09 13:49:28,672 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2775229.3333333335, ans=0.04949747468305833 2023-10-09 13:49:41,419 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2775276.0, ans=0.125 2023-10-09 13:49:46,150 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2775322.6666666665, ans=0.125 2023-10-09 13:49:49,466 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2023-10-09 13:50:01,766 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2775369.3333333335, ans=15.0 2023-10-09 13:50:08,678 INFO [train.py:1031] (1/4) Epoch 14, batch 10000, loss[loss=0.2042, simple_loss=0.2505, pruned_loss=0.05935, ctc_loss=0.09785, over 16583.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2689, pruned_loss=0.05823, ctc_loss=0.1024, over 3298203.46 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:50:14,747 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-10-09 13:50:24,893 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+02 2.835e+02 3.193e+02 3.667e+02 1.150e+03, threshold=6.386e+02, percent-clipped=3.0 2023-10-09 13:50:27,792 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2775462.6666666665, ans=0.1 2023-10-09 13:50:32,771 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2775509.3333333335, ans=0.125 2023-10-09 13:50:33,789 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:50:45,895 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2775556.0, ans=0.0 2023-10-09 13:50:47,876 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2775556.0, ans=0.07 2023-10-09 13:50:58,369 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2775602.6666666665, ans=0.0 2023-10-09 13:51:10,625 INFO [train.py:1031] (1/4) Epoch 14, batch 10050, loss[loss=0.1891, simple_loss=0.2529, pruned_loss=0.04667, ctc_loss=0.07989, over 16875.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.265, pruned_loss=0.05856, ctc_loss=0.1026, over 3300001.94 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:51:37,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2775742.6666666665, ans=0.2 2023-10-09 13:51:41,555 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2775742.6666666665, ans=0.125 2023-10-09 13:51:57,790 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2775789.3333333335, ans=0.2 2023-10-09 13:52:13,729 INFO [train.py:1031] (1/4) Epoch 14, batch 10100, loss[loss=0.2087, simple_loss=0.2664, pruned_loss=0.05492, ctc_loss=0.1032, over 16785.00 frames. ], tot_loss[loss=0.2101, simple_loss=0.2633, pruned_loss=0.05806, ctc_loss=0.102, over 3304136.25 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:52:14,482 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2023-10-09 13:52:30,295 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.838e+02 3.162e+02 3.584e+02 6.355e+02, threshold=6.323e+02, percent-clipped=0.0 2023-10-09 13:52:42,396 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-10-09 13:52:45,724 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:52:57,881 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-10-09 13:53:06,364 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2776069.3333333335, ans=0.125 2023-10-09 13:53:12,915 INFO [train.py:1031] (1/4) Epoch 14, batch 10150, loss[loss=0.2523, simple_loss=0.3074, pruned_loss=0.07399, ctc_loss=0.123, over 16886.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2648, pruned_loss=0.06, ctc_loss=0.105, over 3298204.81 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:53:22,929 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=22.5 2023-10-09 13:53:25,135 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.26 vs. limit=10.0 2023-10-09 13:53:27,521 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2776162.6666666665, ans=0.2 2023-10-09 13:53:30,412 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=22.5 2023-10-09 13:53:55,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2776256.0, ans=0.125 2023-10-09 13:54:00,725 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2776302.6666666665, ans=0.125 2023-10-09 13:54:12,026 INFO [train.py:1031] (1/4) Epoch 14, batch 10200, loss[loss=0.2087, simple_loss=0.2633, pruned_loss=0.05669, ctc_loss=0.1018, over 16903.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.267, pruned_loss=0.06154, ctc_loss=0.1076, over 3299477.91 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:54:28,915 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.256e+02 3.649e+02 4.243e+02 9.669e+02, threshold=7.298e+02, percent-clipped=6.0 2023-10-09 13:54:31,836 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2776396.0, ans=0.125 2023-10-09 13:54:37,086 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2776442.6666666665, ans=0.0 2023-10-09 13:54:39,765 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2776442.6666666665, ans=0.125 2023-10-09 13:54:50,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2776489.3333333335, ans=0.0 2023-10-09 13:54:59,875 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2776536.0, ans=0.0 2023-10-09 13:55:01,534 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2776536.0, ans=0.0 2023-10-09 13:55:12,494 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-10-09 13:55:12,753 INFO [train.py:1031] (1/4) Epoch 14, batch 10250, loss[loss=0.2094, simple_loss=0.256, pruned_loss=0.06062, ctc_loss=0.104, over 16702.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2657, pruned_loss=0.06201, ctc_loss=0.1081, over 3303684.59 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:55:35,295 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2776629.3333333335, ans=0.125 2023-10-09 13:55:36,328 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2776676.0, ans=0.0 2023-10-09 13:55:43,850 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2776676.0, ans=0.2 2023-10-09 13:55:51,465 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-10-09 13:56:12,739 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2776816.0, ans=0.0 2023-10-09 13:56:14,047 INFO [train.py:1031] (1/4) Epoch 14, batch 10300, loss[loss=0.2306, simple_loss=0.2777, pruned_loss=0.06805, ctc_loss=0.1188, over 16864.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2656, pruned_loss=0.06283, ctc_loss=0.1091, over 3301840.79 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:56:14,329 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2776816.0, ans=0.1 2023-10-09 13:56:22,776 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:56:33,610 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+02 3.333e+02 3.833e+02 4.530e+02 9.139e+02, threshold=7.666e+02, percent-clipped=3.0 2023-10-09 13:56:36,726 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2776862.6666666665, ans=0.125 2023-10-09 13:56:50,171 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2776956.0, ans=0.0 2023-10-09 13:56:55,040 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2776956.0, ans=0.125 2023-10-09 13:57:08,966 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2777002.6666666665, ans=0.07 2023-10-09 13:57:16,378 INFO [train.py:1031] (1/4) Epoch 14, batch 10350, loss[loss=0.1821, simple_loss=0.2385, pruned_loss=0.04667, ctc_loss=0.08107, over 16911.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2682, pruned_loss=0.06234, ctc_loss=0.1087, over 3311208.03 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:57:17,862 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2777049.3333333335, ans=0.125 2023-10-09 13:57:30,106 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2777096.0, ans=0.1 2023-10-09 13:57:47,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2777142.6666666665, ans=0.125 2023-10-09 13:57:53,807 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=22.5 2023-10-09 13:58:00,594 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2777189.3333333335, ans=0.125 2023-10-09 13:58:17,994 INFO [train.py:1031] (1/4) Epoch 14, batch 10400, loss[loss=0.2284, simple_loss=0.3032, pruned_loss=0.05606, ctc_loss=0.1039, over 16864.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2711, pruned_loss=0.05769, ctc_loss=0.1018, over 3311127.48 frames. ], batch size: 291, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:58:19,485 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2777282.6666666665, ans=0.125 2023-10-09 13:58:29,797 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.74 vs. limit=15.0 2023-10-09 13:58:37,078 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.958e+02 3.554e+02 4.330e+02 8.227e+02, threshold=7.107e+02, percent-clipped=1.0 2023-10-09 13:58:37,418 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2777329.3333333335, ans=0.0 2023-10-09 13:59:20,076 INFO [train.py:1031] (1/4) Epoch 14, batch 10450, loss[loss=0.2607, simple_loss=0.3053, pruned_loss=0.08254, ctc_loss=0.1276, over 16836.00 frames. ], tot_loss[loss=0.219, simple_loss=0.276, pruned_loss=0.05988, ctc_loss=0.1056, over 3296883.77 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:59:30,385 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2777516.0, ans=0.125 2023-10-09 13:59:35,424 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2777562.6666666665, ans=0.125 2023-10-09 13:59:40,582 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-10-09 14:00:06,678 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2777656.0, ans=0.2 2023-10-09 14:00:10,416 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2777702.6666666665, ans=0.1 2023-10-09 14:00:21,481 INFO [train.py:1031] (1/4) Epoch 14, batch 10500, loss[loss=0.2733, simple_loss=0.2871, pruned_loss=0.09623, ctc_loss=0.1674, over 16437.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2764, pruned_loss=0.06267, ctc_loss=0.1102, over 3302497.23 frames. ], batch size: 350, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:00:26,561 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2777749.3333333335, ans=0.125 2023-10-09 14:00:32,250 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-10-09 14:00:43,436 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+02 3.496e+02 3.857e+02 4.755e+02 1.181e+03, threshold=7.715e+02, percent-clipped=1.0 2023-10-09 14:00:58,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2777889.3333333335, ans=0.0 2023-10-09 14:00:59,930 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2777889.3333333335, ans=0.0 2023-10-09 14:01:12,152 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2777936.0, ans=0.0 2023-10-09 14:01:15,899 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2777936.0, ans=0.125 2023-10-09 14:01:21,273 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2777982.6666666665, ans=0.0 2023-10-09 14:01:22,068 INFO [train.py:1031] (1/4) Epoch 14, batch 10550, loss[loss=0.2469, simple_loss=0.3124, pruned_loss=0.06598, ctc_loss=0.1234, over 16624.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2731, pruned_loss=0.06202, ctc_loss=0.109, over 3304480.72 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:01:25,656 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2777982.6666666665, ans=0.0 2023-10-09 14:01:51,473 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2778076.0, ans=0.0 2023-10-09 14:01:52,504 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2778076.0, ans=0.125 2023-10-09 14:01:54,895 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2023-10-09 14:01:56,809 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2778076.0, ans=0.0 2023-10-09 14:02:17,318 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2778169.3333333335, ans=0.125 2023-10-09 14:02:24,165 INFO [train.py:1031] (1/4) Epoch 14, batch 10600, loss[loss=0.2052, simple_loss=0.2765, pruned_loss=0.04754, ctc_loss=0.097, over 15130.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2756, pruned_loss=0.061, ctc_loss=0.1078, over 3299023.26 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:02:25,593 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2778216.0, ans=0.05 2023-10-09 14:02:36,014 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2778262.6666666665, ans=0.1 2023-10-09 14:02:43,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2778262.6666666665, ans=0.035 2023-10-09 14:02:47,791 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.164e+02 3.650e+02 4.243e+02 8.211e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 14:03:17,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2778402.6666666665, ans=0.125 2023-10-09 14:03:21,565 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=22.5 2023-10-09 14:03:23,362 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2778402.6666666665, ans=0.0 2023-10-09 14:03:26,249 INFO [train.py:1031] (1/4) Epoch 14, batch 10650, loss[loss=0.1819, simple_loss=0.2537, pruned_loss=0.04003, ctc_loss=0.07523, over 16840.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2801, pruned_loss=0.06254, ctc_loss=0.1106, over 3294788.30 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:03:50,884 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=2778542.6666666665, ans=12.0 2023-10-09 14:03:59,682 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-10-09 14:04:28,566 INFO [train.py:1031] (1/4) Epoch 14, batch 10700, loss[loss=0.1944, simple_loss=0.255, pruned_loss=0.0494, ctc_loss=0.08734, over 16807.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2738, pruned_loss=0.05897, ctc_loss=0.104, over 3295804.03 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:04:36,497 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2778682.6666666665, ans=0.0 2023-10-09 14:04:44,844 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-10-09 14:04:45,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2778729.3333333335, ans=0.0 2023-10-09 14:04:52,615 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 3.058e+02 3.576e+02 4.175e+02 9.953e+02, threshold=7.153e+02, percent-clipped=1.0 2023-10-09 14:05:32,626 INFO [train.py:1031] (1/4) Epoch 14, batch 10750, loss[loss=0.3025, simple_loss=0.3403, pruned_loss=0.09709, ctc_loss=0.1762, over 16769.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2778, pruned_loss=0.06145, ctc_loss=0.1078, over 3278204.99 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:05:34,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2778916.0, ans=0.2 2023-10-09 14:05:34,943 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2778916.0, ans=0.0 2023-10-09 14:05:42,783 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2778916.0, ans=0.125 2023-10-09 14:05:46,548 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2778962.6666666665, ans=0.0 2023-10-09 14:05:51,978 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2778962.6666666665, ans=0.125 2023-10-09 14:06:04,590 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2779009.3333333335, ans=0.2 2023-10-09 14:06:16,049 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2779056.0, ans=0.2 2023-10-09 14:06:28,535 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2779102.6666666665, ans=0.0 2023-10-09 14:06:30,660 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2779102.6666666665, ans=0.125 2023-10-09 14:06:35,718 INFO [train.py:1031] (1/4) Epoch 14, batch 10800, loss[loss=0.2434, simple_loss=0.277, pruned_loss=0.07724, ctc_loss=0.1382, over 16795.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2775, pruned_loss=0.0631, ctc_loss=0.1109, over 3282696.79 frames. ], batch size: 329, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:06:39,046 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2779149.3333333335, ans=0.125 2023-10-09 14:06:42,757 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-10-09 14:07:01,260 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.600e+02 3.349e+02 3.657e+02 4.515e+02 8.469e+02, threshold=7.313e+02, percent-clipped=4.0 2023-10-09 14:07:09,191 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2779242.6666666665, ans=0.2 2023-10-09 14:07:17,255 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2779289.3333333335, ans=0.09899494936611666 2023-10-09 14:07:23,849 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2779336.0, ans=0.1 2023-10-09 14:07:32,763 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2779336.0, ans=0.2 2023-10-09 14:07:36,230 INFO [train.py:1031] (1/4) Epoch 14, batch 10850, loss[loss=0.2421, simple_loss=0.2743, pruned_loss=0.07668, ctc_loss=0.1414, over 16778.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2727, pruned_loss=0.06295, ctc_loss=0.1106, over 3292330.48 frames. ], batch size: 329, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:07:48,775 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2779429.3333333335, ans=0.125 2023-10-09 14:07:51,051 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-10-09 14:08:13,363 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779522.6666666665, ans=0.1 2023-10-09 14:08:36,167 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2779569.3333333335, ans=0.125 2023-10-09 14:08:38,620 INFO [train.py:1031] (1/4) Epoch 14, batch 10900, loss[loss=0.1937, simple_loss=0.2292, pruned_loss=0.05736, ctc_loss=0.1084, over 16429.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2694, pruned_loss=0.06247, ctc_loss=0.1095, over 3304793.56 frames. ], batch size: 419, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:08:49,547 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2779662.6666666665, ans=0.125 2023-10-09 14:08:59,365 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=12.0 2023-10-09 14:09:05,441 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.210e+02 3.855e+02 4.821e+02 1.226e+03, threshold=7.710e+02, percent-clipped=2.0 2023-10-09 14:09:24,807 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2779756.0, ans=0.0 2023-10-09 14:09:31,462 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-10-09 14:09:36,893 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-10-09 14:09:39,568 INFO [train.py:1031] (1/4) Epoch 14, batch 10950, loss[loss=0.1777, simple_loss=0.2278, pruned_loss=0.04752, ctc_loss=0.08136, over 16789.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2647, pruned_loss=0.06194, ctc_loss=0.1088, over 3295580.29 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:09:51,225 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2779896.0, ans=0.2 2023-10-09 14:09:54,968 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2779896.0, ans=0.0 2023-10-09 14:09:59,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2779896.0, ans=0.125 2023-10-09 14:10:01,938 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2779896.0, ans=0.2 2023-10-09 14:10:08,255 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2779942.6666666665, ans=0.125 2023-10-09 14:10:18,700 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2779989.3333333335, ans=0.125 2023-10-09 14:10:42,355 INFO [train.py:1031] (1/4) Epoch 14, batch 11000, loss[loss=0.2939, simple_loss=0.3187, pruned_loss=0.09817, ctc_loss=0.1819, over 16616.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2644, pruned_loss=0.0629, ctc_loss=0.1106, over 3289848.57 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:11:04,380 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2780129.3333333335, ans=0.125 2023-10-09 14:11:11,553 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.328e+02 3.883e+02 5.018e+02 9.874e+02, threshold=7.766e+02, percent-clipped=3.0 2023-10-09 14:11:14,999 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2780176.0, ans=0.0 2023-10-09 14:11:46,296 INFO [train.py:1031] (1/4) Epoch 14, batch 11050, loss[loss=0.2391, simple_loss=0.307, pruned_loss=0.06337, ctc_loss=0.111, over 16760.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2723, pruned_loss=0.06558, ctc_loss=0.1149, over 3297828.79 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:12:03,279 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:12:17,965 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2780409.3333333335, ans=0.2 2023-10-09 14:12:30,454 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2780456.0, ans=0.07 2023-10-09 14:12:40,733 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2780502.6666666665, ans=0.125 2023-10-09 14:12:41,929 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2780502.6666666665, ans=0.125 2023-10-09 14:12:48,941 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2780549.3333333335, ans=0.125 2023-10-09 14:12:49,758 INFO [train.py:1031] (1/4) Epoch 14, batch 11100, loss[loss=0.2083, simple_loss=0.309, pruned_loss=0.03844, ctc_loss=0.07677, over 15159.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2766, pruned_loss=0.06316, ctc_loss=0.1109, over 3291243.49 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:12:52,415 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2780549.3333333335, ans=0.125 2023-10-09 14:13:02,712 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2780596.0, ans=0.125 2023-10-09 14:13:14,699 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2780642.6666666665, ans=0.0 2023-10-09 14:13:18,549 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+02 3.611e+02 4.307e+02 5.885e+02 1.880e+03, threshold=8.614e+02, percent-clipped=7.0 2023-10-09 14:13:22,051 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2780642.6666666665, ans=0.0 2023-10-09 14:13:35,327 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2780689.3333333335, ans=0.1 2023-10-09 14:13:51,664 INFO [train.py:1031] (1/4) Epoch 14, batch 11150, loss[loss=0.2073, simple_loss=0.2465, pruned_loss=0.06067, ctc_loss=0.1165, over 15357.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2743, pruned_loss=0.06207, ctc_loss=0.1089, over 3293669.74 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:14:03,862 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2780829.3333333335, ans=0.0 2023-10-09 14:14:22,931 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2780876.0, ans=0.125 2023-10-09 14:14:44,114 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2023-10-09 14:14:53,090 INFO [train.py:1031] (1/4) Epoch 14, batch 11200, loss[loss=0.3344, simple_loss=0.367, pruned_loss=0.1099, ctc_loss=0.205, over 16665.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2749, pruned_loss=0.06395, ctc_loss=0.1115, over 3288076.53 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:15:25,073 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.168e+02 3.497e+02 4.095e+02 1.585e+03, threshold=6.993e+02, percent-clipped=3.0 2023-10-09 14:15:35,379 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2023-10-09 14:15:55,178 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781249.3333333335, ans=0.1 2023-10-09 14:15:55,296 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781249.3333333335, ans=0.1 2023-10-09 14:15:55,969 INFO [train.py:1031] (1/4) Epoch 14, batch 11250, loss[loss=0.2372, simple_loss=0.2898, pruned_loss=0.06973, ctc_loss=0.113, over 16700.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2849, pruned_loss=0.06483, ctc_loss=0.1129, over 3286442.35 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:16:01,916 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2781249.3333333335, ans=0.2 2023-10-09 14:16:03,906 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2781249.3333333335, ans=0.0 2023-10-09 14:16:03,979 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2781249.3333333335, ans=0.1 2023-10-09 14:16:03,998 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2781249.3333333335, ans=0.0 2023-10-09 14:16:13,025 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2781296.0, ans=10.0 2023-10-09 14:16:29,136 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2781342.6666666665, ans=15.0 2023-10-09 14:17:03,085 INFO [train.py:1031] (1/4) Epoch 14, batch 11300, loss[loss=0.2171, simple_loss=0.2759, pruned_loss=0.05811, ctc_loss=0.1052, over 16716.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2899, pruned_loss=0.06297, ctc_loss=0.1102, over 3289665.32 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:17:22,594 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2781529.3333333335, ans=0.125 2023-10-09 14:17:33,914 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 3.091e+02 3.848e+02 4.979e+02 9.254e+02, threshold=7.696e+02, percent-clipped=6.0 2023-10-09 14:17:38,634 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2781576.0, ans=0.125 2023-10-09 14:17:39,671 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2781622.6666666665, ans=0.0 2023-10-09 14:17:40,699 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781622.6666666665, ans=0.1 2023-10-09 14:17:46,011 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2781622.6666666665, ans=0.1 2023-10-09 14:18:03,389 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2781716.0, ans=0.125 2023-10-09 14:18:04,195 INFO [train.py:1031] (1/4) Epoch 14, batch 11350, loss[loss=0.2574, simple_loss=0.3045, pruned_loss=0.07868, ctc_loss=0.1325, over 16988.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2884, pruned_loss=0.06118, ctc_loss=0.1076, over 3302050.81 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:18:06,038 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2781716.0, ans=0.125 2023-10-09 14:18:21,049 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2781762.6666666665, ans=0.125 2023-10-09 14:18:31,717 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2781809.3333333335, ans=0.0 2023-10-09 14:19:01,343 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2781902.6666666665, ans=0.125 2023-10-09 14:19:05,822 INFO [train.py:1031] (1/4) Epoch 14, batch 11400, loss[loss=0.2574, simple_loss=0.2895, pruned_loss=0.08458, ctc_loss=0.1403, over 11458.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2867, pruned_loss=0.06265, ctc_loss=0.1092, over 3290835.68 frames. ], batch size: 35, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:19:06,184 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2781949.3333333335, ans=0.0 2023-10-09 14:19:08,709 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=22.5 2023-10-09 14:19:11,281 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-10-09 14:19:23,281 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2781996.0, ans=0.0 2023-10-09 14:19:34,076 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2782042.6666666665, ans=10.0 2023-10-09 14:19:37,884 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.191e+02 3.489e+02 4.256e+02 5.952e+02, threshold=6.979e+02, percent-clipped=0.0 2023-10-09 14:19:38,260 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2782042.6666666665, ans=0.1 2023-10-09 14:20:07,767 INFO [train.py:1031] (1/4) Epoch 14, batch 11450, loss[loss=0.2235, simple_loss=0.2656, pruned_loss=0.06622, ctc_loss=0.1227, over 16966.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2841, pruned_loss=0.06365, ctc_loss=0.1109, over 3305716.43 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:20:18,763 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2782182.6666666665, ans=15.0 2023-10-09 14:21:00,441 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-10-09 14:21:08,872 INFO [train.py:1031] (1/4) Epoch 14, batch 11500, loss[loss=0.2649, simple_loss=0.3121, pruned_loss=0.08163, ctc_loss=0.1364, over 16729.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2852, pruned_loss=0.06547, ctc_loss=0.1138, over 3309085.18 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:21:19,766 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2782416.0, ans=0.07 2023-10-09 14:21:28,371 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2782462.6666666665, ans=0.125 2023-10-09 14:21:34,866 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:44,090 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.359e+02 3.820e+02 4.365e+02 7.019e+02, threshold=7.640e+02, percent-clipped=1.0 2023-10-09 14:21:44,554 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:52,209 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2782556.0, ans=0.1 2023-10-09 14:21:56,161 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2782556.0, ans=0.1 2023-10-09 14:22:02,168 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2782602.6666666665, ans=0.0 2023-10-09 14:22:11,638 INFO [train.py:1031] (1/4) Epoch 14, batch 11550, loss[loss=0.2646, simple_loss=0.3148, pruned_loss=0.07805, ctc_loss=0.146, over 16607.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.289, pruned_loss=0.06767, ctc_loss=0.1179, over 3305810.98 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:22:47,882 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2782742.6666666665, ans=0.0 2023-10-09 14:22:57,786 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2782789.3333333335, ans=0.0 2023-10-09 14:22:58,980 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2782789.3333333335, ans=0.07 2023-10-09 14:23:11,177 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2782836.0, ans=0.2 2023-10-09 14:23:15,896 INFO [train.py:1031] (1/4) Epoch 14, batch 11600, loss[loss=0.2542, simple_loss=0.3328, pruned_loss=0.06369, ctc_loss=0.1205, over 16841.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2966, pruned_loss=0.0669, ctc_loss=0.1174, over 3303587.50 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:23:22,049 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-10-09 14:23:22,125 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=22.5 2023-10-09 14:23:26,092 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2782882.6666666665, ans=0.125 2023-10-09 14:23:33,095 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2782929.3333333335, ans=0.0 2023-10-09 14:23:45,108 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2782976.0, ans=0.125 2023-10-09 14:23:52,925 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 3.363e+02 4.073e+02 4.861e+02 8.872e+02, threshold=8.146e+02, percent-clipped=3.0 2023-10-09 14:24:11,390 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-10-09 14:24:19,981 INFO [train.py:1031] (1/4) Epoch 14, batch 11650, loss[loss=0.2042, simple_loss=0.2602, pruned_loss=0.0545, ctc_loss=0.09794, over 16769.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.2991, pruned_loss=0.06723, ctc_loss=0.1182, over 3308618.66 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:24:47,015 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2023-10-09 14:25:04,399 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2783256.0, ans=0.125 2023-10-09 14:25:12,003 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2783302.6666666665, ans=0.125 2023-10-09 14:25:14,057 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2783302.6666666665, ans=0.125 2023-10-09 14:25:20,371 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2023-10-09 14:25:23,311 INFO [train.py:1031] (1/4) Epoch 14, batch 11700, loss[loss=0.2373, simple_loss=0.2873, pruned_loss=0.0696, ctc_loss=0.1204, over 16400.00 frames. ], tot_loss[loss=0.237, simple_loss=0.2937, pruned_loss=0.06674, ctc_loss=0.117, over 3285583.27 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:25:44,624 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2783396.0, ans=0.125 2023-10-09 14:25:55,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2783442.6666666665, ans=0.0 2023-10-09 14:25:58,213 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.451e+02 4.279e+02 5.142e+02 9.107e+02, threshold=8.558e+02, percent-clipped=4.0 2023-10-09 14:26:05,018 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2783489.3333333335, ans=0.125 2023-10-09 14:26:23,053 INFO [train.py:1031] (1/4) Epoch 14, batch 11750, loss[loss=0.2306, simple_loss=0.2631, pruned_loss=0.07396, ctc_loss=0.1252, over 16455.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2869, pruned_loss=0.0664, ctc_loss=0.1162, over 3275545.50 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:26:43,325 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2783629.3333333335, ans=0.125 2023-10-09 14:26:50,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2783676.0, ans=0.125 2023-10-09 14:26:50,386 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2783676.0, ans=0.1 2023-10-09 14:26:59,333 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2783722.6666666665, ans=0.05 2023-10-09 14:27:02,555 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2783722.6666666665, ans=0.1 2023-10-09 14:27:24,358 INFO [train.py:1031] (1/4) Epoch 14, batch 11800, loss[loss=0.2253, simple_loss=0.2791, pruned_loss=0.0619, ctc_loss=0.1192, over 15289.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2801, pruned_loss=0.06499, ctc_loss=0.1136, over 3276617.20 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:27:28,121 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2783816.0, ans=0.0 2023-10-09 14:27:37,287 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2783862.6666666665, ans=0.0 2023-10-09 14:27:39,035 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2783862.6666666665, ans=0.2 2023-10-09 14:28:03,167 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-10-09 14:28:03,481 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+02 3.032e+02 3.578e+02 4.296e+02 8.317e+02, threshold=7.156e+02, percent-clipped=0.0 2023-10-09 14:28:29,815 INFO [train.py:1031] (1/4) Epoch 14, batch 11850, loss[loss=0.2655, simple_loss=0.3401, pruned_loss=0.06965, ctc_loss=0.1289, over 16865.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2856, pruned_loss=0.06446, ctc_loss=0.113, over 3269864.53 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:28:30,148 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784049.3333333335, ans=0.1 2023-10-09 14:28:35,160 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2784049.3333333335, ans=0.0 2023-10-09 14:29:03,629 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2784142.6666666665, ans=0.125 2023-10-09 14:29:13,094 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2784189.3333333335, ans=0.125 2023-10-09 14:29:31,267 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784236.0, ans=0.1 2023-10-09 14:29:33,114 INFO [train.py:1031] (1/4) Epoch 14, batch 11900, loss[loss=0.2125, simple_loss=0.2826, pruned_loss=0.05193, ctc_loss=0.09621, over 16870.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2911, pruned_loss=0.06363, ctc_loss=0.1124, over 3269187.86 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:29:36,092 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2784282.6666666665, ans=0.125 2023-10-09 14:29:43,519 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2784282.6666666665, ans=0.0 2023-10-09 14:29:55,366 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2784329.3333333335, ans=0.125 2023-10-09 14:30:11,782 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2784422.6666666665, ans=15.0 2023-10-09 14:30:14,076 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.208e+02 3.772e+02 4.590e+02 1.035e+03, threshold=7.543e+02, percent-clipped=4.0 2023-10-09 14:30:31,462 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2784469.3333333335, ans=0.125 2023-10-09 14:30:36,487 INFO [train.py:1031] (1/4) Epoch 14, batch 11950, loss[loss=0.2873, simple_loss=0.3343, pruned_loss=0.08834, ctc_loss=0.1588, over 16916.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2949, pruned_loss=0.06637, ctc_loss=0.1172, over 3276904.71 frames. ], batch size: 292, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:30:38,748 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2784516.0, ans=0.5 2023-10-09 14:30:40,226 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-10-09 14:30:54,751 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2784562.6666666665, ans=0.2 2023-10-09 14:30:56,305 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2784562.6666666665, ans=0.05 2023-10-09 14:31:03,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2784609.3333333335, ans=0.1 2023-10-09 14:31:40,164 INFO [train.py:1031] (1/4) Epoch 14, batch 12000, loss[loss=0.2431, simple_loss=0.3052, pruned_loss=0.06848, ctc_loss=0.1103, over 16828.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2967, pruned_loss=0.06609, ctc_loss=0.1169, over 3273335.79 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:31:40,164 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 14:31:46,474 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3414, 2.3903, 2.9083, 3.1134, 3.3594, 3.0107, 2.8045, 2.3403], device='cuda:1') 2023-10-09 14:31:54,599 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2358, simple_loss=0.3055, pruned_loss=0.064, ctc_loss=0.09509, over 1796401.00 frames. 2023-10-09 14:31:54,600 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14563MB 2023-10-09 14:32:05,279 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784749.3333333335, ans=0.1 2023-10-09 14:32:06,505 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2784796.0, ans=0.0 2023-10-09 14:32:23,647 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2784842.6666666665, ans=0.125 2023-10-09 14:32:24,175 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2023-10-09 14:32:28,448 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2784842.6666666665, ans=0.125 2023-10-09 14:32:28,824 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2784842.6666666665, ans=22.5 2023-10-09 14:32:36,798 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 3.474e+02 4.193e+02 5.077e+02 1.283e+03, threshold=8.386e+02, percent-clipped=9.0 2023-10-09 14:32:39,460 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=2784889.3333333335, ans=0.1 2023-10-09 14:32:54,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2784936.0, ans=0.09899494936611666 2023-10-09 14:33:00,837 INFO [train.py:1031] (1/4) Epoch 14, batch 12050, loss[loss=0.3063, simple_loss=0.3617, pruned_loss=0.09353, ctc_loss=0.1595, over 16597.00 frames. ], tot_loss[loss=0.2406, simple_loss=0.3017, pruned_loss=0.06647, ctc_loss=0.1163, over 3261024.75 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:33:29,262 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2785076.0, ans=0.125 2023-10-09 14:34:03,701 INFO [train.py:1031] (1/4) Epoch 14, batch 12100, loss[loss=0.2047, simple_loss=0.255, pruned_loss=0.05722, ctc_loss=0.1, over 16731.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.2986, pruned_loss=0.06692, ctc_loss=0.1169, over 3271579.28 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:34:04,020 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2785216.0, ans=0.125 2023-10-09 14:34:09,753 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2785216.0, ans=0.125 2023-10-09 14:34:21,084 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2785262.6666666665, ans=0.95 2023-10-09 14:34:24,903 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2785262.6666666665, ans=0.125 2023-10-09 14:34:32,946 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=22.5 2023-10-09 14:34:37,504 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2785309.3333333335, ans=0.125 2023-10-09 14:34:45,439 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+02 3.559e+02 4.280e+02 5.187e+02 9.097e+02, threshold=8.560e+02, percent-clipped=2.0 2023-10-09 14:34:56,483 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-10-09 14:35:02,727 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.55 vs. limit=10.0 2023-10-09 14:35:06,630 INFO [train.py:1031] (1/4) Epoch 14, batch 12150, loss[loss=0.3099, simple_loss=0.3792, pruned_loss=0.08677, ctc_loss=0.1673, over 16750.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2968, pruned_loss=0.0669, ctc_loss=0.117, over 3290908.03 frames. ], batch size: 271, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:35:21,743 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2785496.0, ans=0.125 2023-10-09 14:35:32,853 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2785542.6666666665, ans=0.125 2023-10-09 14:35:35,111 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:35:37,345 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2785542.6666666665, ans=0.125 2023-10-09 14:35:54,655 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2785589.3333333335, ans=0.0 2023-10-09 14:35:54,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2785589.3333333335, ans=0.1 2023-10-09 14:36:09,748 INFO [train.py:1031] (1/4) Epoch 14, batch 12200, loss[loss=0.2138, simple_loss=0.2817, pruned_loss=0.05389, ctc_loss=0.09525, over 16829.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.3064, pruned_loss=0.06746, ctc_loss=0.1202, over 3281321.20 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:36:29,448 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:36:44,639 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=2785776.0, ans=0.02 2023-10-09 14:36:45,661 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:36:47,732 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2785822.6666666665, ans=0.125 2023-10-09 14:36:52,409 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.614e+02 4.622e+02 6.130e+02 1.347e+03, threshold=9.244e+02, percent-clipped=11.0 2023-10-09 14:36:52,796 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2785822.6666666665, ans=0.125 2023-10-09 14:36:59,703 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2785869.3333333335, ans=0.125 2023-10-09 14:37:08,466 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2785869.3333333335, ans=0.125 2023-10-09 14:37:12,094 INFO [train.py:1031] (1/4) Epoch 14, batch 12250, loss[loss=0.2555, simple_loss=0.2928, pruned_loss=0.08013, ctc_loss=0.1446, over 16327.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.3016, pruned_loss=0.06551, ctc_loss=0.1172, over 3284500.51 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:37:27,981 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2785962.6666666665, ans=0.1 2023-10-09 14:37:30,083 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2785962.6666666665, ans=0.125 2023-10-09 14:37:42,547 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2023-10-09 14:37:58,825 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2786102.6666666665, ans=0.0 2023-10-09 14:38:03,791 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2786102.6666666665, ans=0.125 2023-10-09 14:38:12,143 INFO [train.py:1031] (1/4) Epoch 14, batch 12300, loss[loss=0.1874, simple_loss=0.2415, pruned_loss=0.04971, ctc_loss=0.08484, over 16750.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2916, pruned_loss=0.06476, ctc_loss=0.1156, over 3289741.29 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:38:50,556 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=15.0 2023-10-09 14:38:55,170 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+02 3.070e+02 3.744e+02 4.897e+02 1.313e+03, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 14:39:13,379 INFO [train.py:1031] (1/4) Epoch 14, batch 12350, loss[loss=0.2372, simple_loss=0.3108, pruned_loss=0.05978, ctc_loss=0.1101, over 16884.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.2935, pruned_loss=0.06491, ctc_loss=0.1156, over 3290748.96 frames. ], batch size: 243, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:39:17,424 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2786382.6666666665, ans=0.125 2023-10-09 14:39:36,948 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2786476.0, ans=0.125 2023-10-09 14:39:55,138 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2023-10-09 14:40:06,092 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2786569.3333333335, ans=0.125 2023-10-09 14:40:14,934 INFO [train.py:1031] (1/4) Epoch 14, batch 12400, loss[loss=0.2014, simple_loss=0.266, pruned_loss=0.04991, ctc_loss=0.09237, over 16656.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2904, pruned_loss=0.0626, ctc_loss=0.1121, over 3289801.80 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:40:21,077 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2786616.0, ans=10.0 2023-10-09 14:40:51,431 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2786756.0, ans=0.0 2023-10-09 14:41:00,375 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.233e+02 3.612e+02 4.098e+02 6.929e+02, threshold=7.223e+02, percent-clipped=0.0 2023-10-09 14:41:02,092 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2023-10-09 14:41:07,619 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2786802.6666666665, ans=0.0 2023-10-09 14:41:09,559 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2786802.6666666665, ans=0.1 2023-10-09 14:41:17,637 INFO [train.py:1031] (1/4) Epoch 14, batch 12450, loss[loss=0.2892, simple_loss=0.336, pruned_loss=0.09, ctc_loss=0.1559, over 16577.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2887, pruned_loss=0.06161, ctc_loss=0.1104, over 3297372.07 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:41:25,206 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2023-10-09 14:41:25,241 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.66 vs. limit=6.0 2023-10-09 14:42:00,459 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:42:10,708 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2787036.0, ans=0.125 2023-10-09 14:42:16,514 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2787036.0, ans=0.0 2023-10-09 14:42:19,504 INFO [train.py:1031] (1/4) Epoch 14, batch 12500, loss[loss=0.2002, simple_loss=0.256, pruned_loss=0.05419, ctc_loss=0.08997, over 16767.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2877, pruned_loss=0.06052, ctc_loss=0.1081, over 3293565.93 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:42:27,887 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2787082.6666666665, ans=0.0 2023-10-09 14:42:37,497 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2787129.3333333335, ans=0.1 2023-10-09 14:42:49,332 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2787176.0, ans=0.0 2023-10-09 14:43:06,660 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2787222.6666666665, ans=0.1 2023-10-09 14:43:07,999 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.957e+02 3.304e+02 4.556e+02 8.176e+02, threshold=6.608e+02, percent-clipped=1.0 2023-10-09 14:43:09,313 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:43:09,358 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2787269.3333333335, ans=0.125 2023-10-09 14:43:10,478 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2787269.3333333335, ans=0.1 2023-10-09 14:43:11,396 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2787269.3333333335, ans=0.125 2023-10-09 14:43:13,193 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2787269.3333333335, ans=0.0 2023-10-09 14:43:22,095 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2787316.0, ans=0.2 2023-10-09 14:43:23,456 INFO [train.py:1031] (1/4) Epoch 14, batch 12550, loss[loss=0.1725, simple_loss=0.2573, pruned_loss=0.03173, ctc_loss=0.06079, over 16858.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2861, pruned_loss=0.05847, ctc_loss=0.1049, over 3301285.96 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:43:23,773 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2787316.0, ans=0.0 2023-10-09 14:43:53,844 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2787409.3333333335, ans=0.125 2023-10-09 14:44:06,714 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2787456.0, ans=0.125 2023-10-09 14:44:09,301 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-10-09 14:44:16,396 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2787502.6666666665, ans=0.2 2023-10-09 14:44:23,450 INFO [train.py:1031] (1/4) Epoch 14, batch 12600, loss[loss=0.2009, simple_loss=0.2609, pruned_loss=0.05207, ctc_loss=0.09213, over 16902.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.282, pruned_loss=0.05521, ctc_loss=0.0996, over 3299267.67 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:44:26,131 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2787549.3333333335, ans=0.025 2023-10-09 14:44:35,083 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2787596.0, ans=0.125 2023-10-09 14:44:35,496 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=22.5 2023-10-09 14:44:39,823 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2787596.0, ans=0.125 2023-10-09 14:44:41,442 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787596.0, ans=0.1 2023-10-09 14:45:01,813 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2787689.3333333335, ans=0.125 2023-10-09 14:45:11,285 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 3.144e+02 3.491e+02 4.128e+02 9.398e+02, threshold=6.982e+02, percent-clipped=1.0 2023-10-09 14:45:24,788 INFO [train.py:1031] (1/4) Epoch 14, batch 12650, loss[loss=0.237, simple_loss=0.2758, pruned_loss=0.07409, ctc_loss=0.125, over 16599.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2819, pruned_loss=0.05835, ctc_loss=0.1044, over 3295151.60 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:45:33,726 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:45:45,663 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787829.3333333335, ans=0.1 2023-10-09 14:45:57,307 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2787876.0, ans=0.125 2023-10-09 14:45:59,231 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2787876.0, ans=0.125 2023-10-09 14:46:08,850 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=15.0 2023-10-09 14:46:17,590 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2787969.3333333335, ans=0.0 2023-10-09 14:46:26,368 INFO [train.py:1031] (1/4) Epoch 14, batch 12700, loss[loss=0.1948, simple_loss=0.2435, pruned_loss=0.05499, ctc_loss=0.09019, over 16785.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2781, pruned_loss=0.06, ctc_loss=0.1066, over 3298168.49 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:46:32,416 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2788016.0, ans=0.125 2023-10-09 14:46:40,810 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2788062.6666666665, ans=0.035 2023-10-09 14:46:46,263 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2788062.6666666665, ans=0.125 2023-10-09 14:46:46,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2788062.6666666665, ans=0.0 2023-10-09 14:46:50,649 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=12.0 2023-10-09 14:46:58,471 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2788109.3333333335, ans=0.0 2023-10-09 14:46:58,608 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2788109.3333333335, ans=0.125 2023-10-09 14:47:08,943 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2788156.0, ans=0.125 2023-10-09 14:47:14,463 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2788202.6666666665, ans=0.95 2023-10-09 14:47:15,779 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.456e+02 3.994e+02 4.833e+02 1.526e+03, threshold=7.989e+02, percent-clipped=4.0 2023-10-09 14:47:27,069 INFO [train.py:1031] (1/4) Epoch 14, batch 12750, loss[loss=0.3456, simple_loss=0.355, pruned_loss=0.1225, ctc_loss=0.2277, over 16678.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2782, pruned_loss=0.06247, ctc_loss=0.1106, over 3298678.66 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:48:03,538 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2788389.3333333335, ans=0.125 2023-10-09 14:48:08,062 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2023-10-09 14:48:15,860 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2788436.0, ans=0.035 2023-10-09 14:48:20,073 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2023-10-09 14:48:24,015 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2788436.0, ans=0.125 2023-10-09 14:48:29,398 INFO [train.py:1031] (1/4) Epoch 14, batch 12800, loss[loss=0.2034, simple_loss=0.2609, pruned_loss=0.05439, ctc_loss=0.09252, over 16781.00 frames. ], tot_loss[loss=0.231, simple_loss=0.287, pruned_loss=0.06448, ctc_loss=0.1148, over 3297272.29 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:48:29,700 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2788482.6666666665, ans=0.0 2023-10-09 14:48:40,831 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-10-09 14:48:43,124 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2788529.3333333335, ans=0.125 2023-10-09 14:48:43,161 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2788529.3333333335, ans=0.1 2023-10-09 14:48:44,759 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2788529.3333333335, ans=0.0 2023-10-09 14:49:16,329 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2788622.6666666665, ans=0.1 2023-10-09 14:49:18,039 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+02 3.551e+02 3.934e+02 4.932e+02 8.018e+02, threshold=7.868e+02, percent-clipped=1.0 2023-10-09 14:49:30,705 INFO [train.py:1031] (1/4) Epoch 14, batch 12850, loss[loss=0.2568, simple_loss=0.3115, pruned_loss=0.07521, ctc_loss=0.1292, over 15209.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2925, pruned_loss=0.06611, ctc_loss=0.1167, over 3297509.30 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:49:42,970 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2788762.6666666665, ans=0.125 2023-10-09 14:49:46,586 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=22.5 2023-10-09 14:49:48,624 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2023-10-09 14:50:05,680 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2788809.3333333335, ans=0.1 2023-10-09 14:50:11,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2788856.0, ans=0.125 2023-10-09 14:50:16,512 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-10-09 14:50:17,658 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=22.5 2023-10-09 14:50:32,944 INFO [train.py:1031] (1/4) Epoch 14, batch 12900, loss[loss=0.2715, simple_loss=0.3583, pruned_loss=0.06656, ctc_loss=0.1289, over 16175.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.298, pruned_loss=0.06896, ctc_loss=0.1216, over 3299361.02 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:50:46,342 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2788996.0, ans=0.125 2023-10-09 14:51:02,841 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=22.5 2023-10-09 14:51:06,325 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-10-09 14:51:07,732 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2789042.6666666665, ans=0.125 2023-10-09 14:51:13,058 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2789089.3333333335, ans=0.0 2023-10-09 14:51:20,293 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2789089.3333333335, ans=0.2 2023-10-09 14:51:26,858 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+02 3.447e+02 3.800e+02 4.409e+02 9.438e+02, threshold=7.600e+02, percent-clipped=3.0 2023-10-09 14:51:29,343 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789136.0, ans=0.1 2023-10-09 14:51:35,867 INFO [train.py:1031] (1/4) Epoch 14, batch 12950, loss[loss=0.1771, simple_loss=0.2458, pruned_loss=0.03985, ctc_loss=0.0719, over 16789.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2973, pruned_loss=0.06544, ctc_loss=0.1163, over 3302725.30 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:51:51,479 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2789229.3333333335, ans=0.04949747468305833 2023-10-09 14:51:55,098 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-10-09 14:51:58,086 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2023-10-09 14:52:00,260 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2789276.0, ans=0.1 2023-10-09 14:52:03,321 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2789276.0, ans=0.125 2023-10-09 14:52:12,968 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2789322.6666666665, ans=0.0 2023-10-09 14:52:14,654 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2789322.6666666665, ans=0.1 2023-10-09 14:52:36,320 INFO [train.py:1031] (1/4) Epoch 14, batch 13000, loss[loss=0.2045, simple_loss=0.2577, pruned_loss=0.05631, ctc_loss=0.09663, over 16862.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2888, pruned_loss=0.06206, ctc_loss=0.1104, over 3297013.58 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:52:44,539 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=12.0 2023-10-09 14:52:57,129 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2789462.6666666665, ans=0.125 2023-10-09 14:53:03,852 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2789509.3333333335, ans=0.125 2023-10-09 14:53:17,715 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2789556.0, ans=0.2 2023-10-09 14:53:28,013 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.825e+02 3.282e+02 3.972e+02 1.143e+03, threshold=6.563e+02, percent-clipped=1.0 2023-10-09 14:53:36,444 INFO [train.py:1031] (1/4) Epoch 14, batch 13050, loss[loss=0.2142, simple_loss=0.2814, pruned_loss=0.05563, ctc_loss=0.08923, over 16919.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2815, pruned_loss=0.06149, ctc_loss=0.1089, over 3297100.63 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:53:36,760 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2789649.3333333335, ans=0.125 2023-10-09 14:53:47,985 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789696.0, ans=0.1 2023-10-09 14:54:20,962 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2789789.3333333335, ans=0.125 2023-10-09 14:54:22,506 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.93 vs. limit=5.0 2023-10-09 14:54:27,899 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2789836.0, ans=0.125 2023-10-09 14:54:29,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2789836.0, ans=0.125 2023-10-09 14:54:37,341 INFO [train.py:1031] (1/4) Epoch 14, batch 13100, loss[loss=0.214, simple_loss=0.2745, pruned_loss=0.05908, ctc_loss=0.08855, over 16761.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2809, pruned_loss=0.06318, ctc_loss=0.1112, over 3295207.40 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:55:04,298 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2789976.0, ans=0.2 2023-10-09 14:55:24,342 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2790022.6666666665, ans=0.2 2023-10-09 14:55:31,979 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2790069.3333333335, ans=0.2 2023-10-09 14:55:32,623 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.251e+02 4.048e+02 5.157e+02 1.010e+03, threshold=8.097e+02, percent-clipped=11.0 2023-10-09 14:55:40,680 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790116.0, ans=0.1 2023-10-09 14:55:40,707 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2790116.0, ans=0.2 2023-10-09 14:55:42,103 INFO [train.py:1031] (1/4) Epoch 14, batch 13150, loss[loss=0.2669, simple_loss=0.315, pruned_loss=0.08238, ctc_loss=0.1352, over 16620.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2948, pruned_loss=0.0659, ctc_loss=0.1173, over 3302007.67 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:55:57,961 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=22.5 2023-10-09 14:56:21,495 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2790256.0, ans=0.1 2023-10-09 14:56:28,893 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2790256.0, ans=0.1 2023-10-09 14:56:30,154 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.71 vs. limit=6.0 2023-10-09 14:56:36,167 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=22.5 2023-10-09 14:56:43,170 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=22.5 2023-10-09 14:56:45,813 INFO [train.py:1031] (1/4) Epoch 14, batch 13200, loss[loss=0.2498, simple_loss=0.3206, pruned_loss=0.06604, ctc_loss=0.1171, over 16907.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.3011, pruned_loss=0.06902, ctc_loss=0.1228, over 3302050.96 frames. ], batch size: 258, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:57:00,196 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2790396.0, ans=0.0 2023-10-09 14:57:00,524 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2790396.0, ans=15.0 2023-10-09 14:57:01,197 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2790396.0, ans=0.125 2023-10-09 14:57:07,478 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=22.5 2023-10-09 14:57:20,921 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2790442.6666666665, ans=0.125 2023-10-09 14:57:29,543 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2790489.3333333335, ans=0.125 2023-10-09 14:57:29,820 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2023-10-09 14:57:41,600 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+02 3.287e+02 3.760e+02 4.559e+02 7.411e+02, threshold=7.519e+02, percent-clipped=0.0 2023-10-09 14:57:48,113 INFO [train.py:1031] (1/4) Epoch 14, batch 13250, loss[loss=0.2072, simple_loss=0.2804, pruned_loss=0.04943, ctc_loss=0.08769, over 16678.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.301, pruned_loss=0.06722, ctc_loss=0.1194, over 3294373.58 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:58:00,991 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2790629.3333333335, ans=0.0 2023-10-09 14:58:12,362 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-10-09 14:58:33,596 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2790722.6666666665, ans=0.125 2023-10-09 14:58:34,009 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2023-10-09 14:58:49,203 INFO [train.py:1031] (1/4) Epoch 14, batch 13300, loss[loss=0.2183, simple_loss=0.307, pruned_loss=0.04593, ctc_loss=0.09425, over 15194.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2928, pruned_loss=0.06604, ctc_loss=0.1171, over 3290123.09 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:59:02,606 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2790862.6666666665, ans=0.125 2023-10-09 14:59:10,672 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2790862.6666666665, ans=0.125 2023-10-09 14:59:17,720 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2790909.3333333335, ans=0.0 2023-10-09 14:59:39,958 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2791002.6666666665, ans=0.0 2023-10-09 14:59:43,204 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2791002.6666666665, ans=0.125 2023-10-09 14:59:47,826 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+02 3.361e+02 3.791e+02 4.833e+02 1.183e+03, threshold=7.583e+02, percent-clipped=5.0 2023-10-09 14:59:52,779 INFO [train.py:1031] (1/4) Epoch 14, batch 13350, loss[loss=0.2001, simple_loss=0.2476, pruned_loss=0.05684, ctc_loss=0.09726, over 16885.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2921, pruned_loss=0.06438, ctc_loss=0.1145, over 3295884.02 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:00:08,815 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2791096.0, ans=0.125 2023-10-09 15:00:15,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2791096.0, ans=0.125 2023-10-09 15:00:17,257 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2791142.6666666665, ans=0.125 2023-10-09 15:00:21,852 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=22.5 2023-10-09 15:00:22,656 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2791142.6666666665, ans=0.2 2023-10-09 15:00:41,310 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-10-09 15:00:47,456 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2791236.0, ans=0.125 2023-10-09 15:00:54,416 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2791282.6666666665, ans=0.125 2023-10-09 15:00:55,813 INFO [train.py:1031] (1/4) Epoch 14, batch 13400, loss[loss=0.2591, simple_loss=0.3054, pruned_loss=0.07941, ctc_loss=0.1352, over 16802.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2924, pruned_loss=0.06457, ctc_loss=0.1133, over 3301301.98 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:00:57,769 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2791282.6666666665, ans=0.2 2023-10-09 15:01:55,324 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.467e+02 4.135e+02 5.221e+02 9.023e+02, threshold=8.270e+02, percent-clipped=2.0 2023-10-09 15:01:55,635 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2791469.3333333335, ans=0.125 2023-10-09 15:01:57,438 INFO [train.py:1031] (1/4) Epoch 14, batch 13450, loss[loss=0.2065, simple_loss=0.2529, pruned_loss=0.06027, ctc_loss=0.09873, over 16752.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2873, pruned_loss=0.06475, ctc_loss=0.113, over 3287813.54 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:02:01,627 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-10-09 15:02:02,374 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2791516.0, ans=0.2 2023-10-09 15:02:03,414 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2791516.0, ans=0.95 2023-10-09 15:02:13,500 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2791562.6666666665, ans=0.0 2023-10-09 15:02:16,146 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2791562.6666666665, ans=0.125 2023-10-09 15:02:27,713 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2791609.3333333335, ans=0.05 2023-10-09 15:02:41,040 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2791656.0, ans=0.125 2023-10-09 15:02:46,422 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2023-10-09 15:02:56,579 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2791702.6666666665, ans=0.0 2023-10-09 15:02:59,462 INFO [train.py:1031] (1/4) Epoch 14, batch 13500, loss[loss=0.1812, simple_loss=0.2733, pruned_loss=0.03152, ctc_loss=0.06525, over 15213.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2816, pruned_loss=0.0623, ctc_loss=0.1089, over 3293006.99 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:03:08,952 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2791749.3333333335, ans=0.2 2023-10-09 15:03:10,333 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2023-10-09 15:03:11,082 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2791796.0, ans=0.0 2023-10-09 15:03:25,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791842.6666666665, ans=0.1 2023-10-09 15:03:36,221 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2791889.3333333335, ans=0.1 2023-10-09 15:03:49,556 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=22.5 2023-10-09 15:03:52,515 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2791936.0, ans=0.125 2023-10-09 15:03:57,256 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2791936.0, ans=0.09899494936611666 2023-10-09 15:04:01,768 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.985e+02 3.501e+02 4.657e+02 8.617e+02, threshold=7.002e+02, percent-clipped=1.0 2023-10-09 15:04:01,795 INFO [train.py:1031] (1/4) Epoch 14, batch 13550, loss[loss=0.2343, simple_loss=0.2957, pruned_loss=0.06382, ctc_loss=0.1132, over 16843.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2825, pruned_loss=0.06206, ctc_loss=0.1087, over 3291784.87 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 0.5 2023-10-09 15:04:11,226 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-10-09 15:04:36,315 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2792076.0, ans=0.0 2023-10-09 15:04:45,677 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792122.6666666665, ans=0.1 2023-10-09 15:05:04,603 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2792216.0, ans=0.125 2023-10-09 15:05:05,325 INFO [train.py:1031] (1/4) Epoch 14, batch 13600, loss[loss=0.2363, simple_loss=0.3128, pruned_loss=0.05877, ctc_loss=0.1055, over 16846.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2876, pruned_loss=0.06472, ctc_loss=0.1136, over 3286976.52 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:05:07,808 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2023-10-09 15:06:08,501 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 3.317e+02 4.234e+02 5.653e+02 1.556e+03, threshold=8.468e+02, percent-clipped=11.0 2023-10-09 15:06:08,528 INFO [train.py:1031] (1/4) Epoch 14, batch 13650, loss[loss=0.2407, simple_loss=0.2997, pruned_loss=0.06693, ctc_loss=0.1196, over 16733.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2896, pruned_loss=0.06194, ctc_loss=0.1096, over 3289559.11 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:06:35,107 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-10-09 15:06:56,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2792589.3333333335, ans=0.0 2023-10-09 15:07:07,876 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2792636.0, ans=0.125 2023-10-09 15:07:11,262 INFO [train.py:1031] (1/4) Epoch 14, batch 13700, loss[loss=0.2522, simple_loss=0.3281, pruned_loss=0.06594, ctc_loss=0.1108, over 16869.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2931, pruned_loss=0.06148, ctc_loss=0.1091, over 3281335.48 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:07:21,197 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2792682.6666666665, ans=0.2 2023-10-09 15:07:41,442 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.21 vs. limit=22.5 2023-10-09 15:07:52,252 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2792822.6666666665, ans=0.05 2023-10-09 15:08:00,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2792822.6666666665, ans=0.09899494936611666 2023-10-09 15:08:15,041 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 3.067e+02 3.765e+02 4.506e+02 1.005e+03, threshold=7.530e+02, percent-clipped=2.0 2023-10-09 15:08:15,068 INFO [train.py:1031] (1/4) Epoch 14, batch 13750, loss[loss=0.214, simple_loss=0.3027, pruned_loss=0.04507, ctc_loss=0.08814, over 16858.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2922, pruned_loss=0.05946, ctc_loss=0.1062, over 3285720.43 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:08:16,505 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2792916.0, ans=0.125 2023-10-09 15:08:16,802 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=22.5 2023-10-09 15:08:27,549 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2792962.6666666665, ans=0.0 2023-10-09 15:08:34,518 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2792962.6666666665, ans=0.0 2023-10-09 15:08:48,597 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2793009.3333333335, ans=0.1 2023-10-09 15:09:06,973 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2793102.6666666665, ans=0.0 2023-10-09 15:09:17,790 INFO [train.py:1031] (1/4) Epoch 14, batch 13800, loss[loss=0.2234, simple_loss=0.2763, pruned_loss=0.06321, ctc_loss=0.1103, over 16709.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2956, pruned_loss=0.06282, ctc_loss=0.1114, over 3291022.13 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:09:20,826 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2793149.3333333335, ans=0.0 2023-10-09 15:09:23,134 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2793149.3333333335, ans=0.125 2023-10-09 15:09:27,921 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-10-09 15:09:36,468 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2793196.0, ans=0.125 2023-10-09 15:09:40,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2793196.0, ans=0.125 2023-10-09 15:09:51,210 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2793242.6666666665, ans=0.125 2023-10-09 15:10:09,997 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.05 vs. limit=15.0 2023-10-09 15:10:21,759 INFO [train.py:1031] (1/4) Epoch 14, batch 13850, loss[loss=0.2167, simple_loss=0.2335, pruned_loss=0.07117, ctc_loss=0.1441, over 15432.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2892, pruned_loss=0.06324, ctc_loss=0.1119, over 3295124.74 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:10:22,840 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.299e+02 3.702e+02 4.236e+02 7.153e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 15:10:39,891 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-10-09 15:10:47,470 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2793476.0, ans=0.1 2023-10-09 15:10:54,032 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2793476.0, ans=10.0 2023-10-09 15:11:00,384 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-10-09 15:11:16,793 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2793569.3333333335, ans=0.05 2023-10-09 15:11:18,307 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.48 vs. limit=22.5 2023-10-09 15:11:19,955 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=22.5 2023-10-09 15:11:25,340 INFO [train.py:1031] (1/4) Epoch 14, batch 13900, loss[loss=0.2329, simple_loss=0.3055, pruned_loss=0.05838, ctc_loss=0.1086, over 16830.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2853, pruned_loss=0.06302, ctc_loss=0.1114, over 3297154.02 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:11:35,836 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2793616.0, ans=0.5 2023-10-09 15:11:41,196 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2023-10-09 15:11:51,426 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2793709.3333333335, ans=0.0 2023-10-09 15:12:10,122 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2793756.0, ans=0.1 2023-10-09 15:12:28,096 INFO [train.py:1031] (1/4) Epoch 14, batch 13950, loss[loss=0.2441, simple_loss=0.3175, pruned_loss=0.06289, ctc_loss=0.1124, over 16238.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2938, pruned_loss=0.06433, ctc_loss=0.1138, over 3291008.65 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:12:30,204 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+02 3.293e+02 3.736e+02 4.752e+02 8.901e+02, threshold=7.472e+02, percent-clipped=3.0 2023-10-09 15:12:40,395 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2793896.0, ans=0.0 2023-10-09 15:12:48,950 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-10-09 15:13:19,319 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2794036.0, ans=0.125 2023-10-09 15:13:24,174 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-10-09 15:13:27,795 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2794036.0, ans=0.0 2023-10-09 15:13:27,807 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2794036.0, ans=0.1 2023-10-09 15:13:31,758 INFO [train.py:1031] (1/4) Epoch 14, batch 14000, loss[loss=0.2652, simple_loss=0.326, pruned_loss=0.07552, ctc_loss=0.1337, over 16847.00 frames. ], tot_loss[loss=0.239, simple_loss=0.2981, pruned_loss=0.06646, ctc_loss=0.1173, over 3290690.23 frames. ], batch size: 329, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:13:34,129 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-10-09 15:13:46,368 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:14:11,338 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:14:17,326 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794222.6666666665, ans=0.1 2023-10-09 15:14:28,238 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-10-09 15:14:31,573 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2794269.3333333335, ans=0.0 2023-10-09 15:14:32,717 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2794269.3333333335, ans=0.04949747468305833 2023-10-09 15:14:34,481 INFO [train.py:1031] (1/4) Epoch 14, batch 14050, loss[loss=0.1988, simple_loss=0.25, pruned_loss=0.05519, ctc_loss=0.09317, over 16316.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2964, pruned_loss=0.06601, ctc_loss=0.1164, over 3291563.38 frames. ], batch size: 71, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:14:38,949 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+02 3.126e+02 3.568e+02 4.195e+02 6.339e+02, threshold=7.137e+02, percent-clipped=0.0 2023-10-09 15:15:03,083 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2794409.3333333335, ans=0.0 2023-10-09 15:15:12,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2794456.0, ans=0.035 2023-10-09 15:15:31,091 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2023-10-09 15:15:37,072 INFO [train.py:1031] (1/4) Epoch 14, batch 14100, loss[loss=0.2005, simple_loss=0.2377, pruned_loss=0.05988, ctc_loss=0.1088, over 16378.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2877, pruned_loss=0.06478, ctc_loss=0.1138, over 3297848.93 frames. ], batch size: 417, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:15:55,965 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:15:57,427 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-10-09 15:15:57,951 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:16:00,034 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2794642.6666666665, ans=0.0 2023-10-09 15:16:15,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2794689.3333333335, ans=0.125 2023-10-09 15:16:16,835 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2794689.3333333335, ans=0.0 2023-10-09 15:16:17,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2794689.3333333335, ans=0.0 2023-10-09 15:16:18,301 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-10-09 15:16:26,164 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2794736.0, ans=0.125 2023-10-09 15:16:37,980 INFO [train.py:1031] (1/4) Epoch 14, batch 14150, loss[loss=0.2032, simple_loss=0.2512, pruned_loss=0.05761, ctc_loss=0.1001, over 16879.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2797, pruned_loss=0.06401, ctc_loss=0.1124, over 3294136.26 frames. ], batch size: 165, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:16:44,072 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.061e+02 3.515e+02 4.416e+02 9.283e+02, threshold=7.030e+02, percent-clipped=2.0 2023-10-09 15:16:50,393 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2794829.3333333335, ans=0.125 2023-10-09 15:17:03,913 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2794876.0, ans=0.125 2023-10-09 15:17:16,171 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2794922.6666666665, ans=0.09899494936611666 2023-10-09 15:17:39,324 INFO [train.py:1031] (1/4) Epoch 14, batch 14200, loss[loss=0.2042, simple_loss=0.2699, pruned_loss=0.05002, ctc_loss=0.0964, over 16858.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2757, pruned_loss=0.06227, ctc_loss=0.1093, over 3296772.86 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:17:55,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2795062.6666666665, ans=0.0 2023-10-09 15:18:04,682 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:18:22,104 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2795156.0, ans=0.125 2023-10-09 15:18:29,733 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2795202.6666666665, ans=0.125 2023-10-09 15:18:41,350 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2795202.6666666665, ans=0.125 2023-10-09 15:18:43,117 INFO [train.py:1031] (1/4) Epoch 14, batch 14250, loss[loss=0.268, simple_loss=0.3093, pruned_loss=0.08418, ctc_loss=0.1458, over 16653.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2803, pruned_loss=0.06396, ctc_loss=0.1125, over 3300516.81 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:18:48,647 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2795249.3333333335, ans=0.125 2023-10-09 15:18:49,204 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.880e+02 3.480e+02 3.927e+02 7.059e+02, threshold=6.960e+02, percent-clipped=1.0 2023-10-09 15:18:51,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2795249.3333333335, ans=0.125 2023-10-09 15:19:12,469 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-10-09 15:19:20,428 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2795389.3333333335, ans=0.125 2023-10-09 15:19:44,954 INFO [train.py:1031] (1/4) Epoch 14, batch 14300, loss[loss=0.2294, simple_loss=0.283, pruned_loss=0.06495, ctc_loss=0.1148, over 16858.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2845, pruned_loss=0.0649, ctc_loss=0.1139, over 3297662.86 frames. ], batch size: 189, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 15:19:53,559 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2795482.6666666665, ans=0.0 2023-10-09 15:20:21,149 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=12.0 2023-10-09 15:20:28,462 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2795622.6666666665, ans=0.125 2023-10-09 15:20:28,587 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2795622.6666666665, ans=0.125 2023-10-09 15:20:33,604 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2795669.3333333335, ans=0.1 2023-10-09 15:20:47,322 INFO [train.py:1031] (1/4) Epoch 14, batch 14350, loss[loss=0.2047, simple_loss=0.2581, pruned_loss=0.05622, ctc_loss=0.09693, over 16765.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2837, pruned_loss=0.0658, ctc_loss=0.1154, over 3303465.15 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:20:53,861 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.121e+02 3.540e+02 4.017e+02 5.602e+02, threshold=7.080e+02, percent-clipped=0.0 2023-10-09 15:21:27,938 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2795856.0, ans=0.125 2023-10-09 15:21:43,653 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2795902.6666666665, ans=0.0 2023-10-09 15:21:50,347 INFO [train.py:1031] (1/4) Epoch 14, batch 14400, loss[loss=0.2386, simple_loss=0.2869, pruned_loss=0.06888, ctc_loss=0.1311, over 15261.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2827, pruned_loss=0.06495, ctc_loss=0.1142, over 3304293.50 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:21:55,643 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2795949.3333333335, ans=0.125 2023-10-09 15:22:03,771 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:22:24,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2796042.6666666665, ans=0.125 2023-10-09 15:22:53,856 INFO [train.py:1031] (1/4) Epoch 14, batch 14450, loss[loss=0.27, simple_loss=0.3384, pruned_loss=0.0739, ctc_loss=0.1345, over 16794.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2872, pruned_loss=0.06606, ctc_loss=0.1165, over 3302742.70 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:23:00,764 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+02 3.304e+02 3.703e+02 4.462e+02 6.927e+02, threshold=7.405e+02, percent-clipped=0.0 2023-10-09 15:23:13,457 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2796229.3333333335, ans=0.125 2023-10-09 15:23:16,599 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2796229.3333333335, ans=0.125 2023-10-09 15:23:19,787 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2796276.0, ans=0.125 2023-10-09 15:23:24,913 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.59 vs. limit=10.0 2023-10-09 15:23:29,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2023-10-09 15:23:54,676 INFO [train.py:1031] (1/4) Epoch 14, batch 14500, loss[loss=0.2388, simple_loss=0.2765, pruned_loss=0.07464, ctc_loss=0.1294, over 16472.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2865, pruned_loss=0.06473, ctc_loss=0.1143, over 3296501.38 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:24:08,534 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.55 vs. limit=10.0 2023-10-09 15:24:11,112 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2023-10-09 15:24:24,327 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2796509.3333333335, ans=0.09899494936611666 2023-10-09 15:24:25,689 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-10-09 15:24:33,030 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2796556.0, ans=0.125 2023-10-09 15:24:46,124 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2796602.6666666665, ans=0.125 2023-10-09 15:24:47,146 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2796602.6666666665, ans=0.1 2023-10-09 15:24:47,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2796602.6666666665, ans=0.125 2023-10-09 15:24:56,642 INFO [train.py:1031] (1/4) Epoch 14, batch 14550, loss[loss=0.2353, simple_loss=0.2636, pruned_loss=0.07492, ctc_loss=0.1427, over 16437.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2795, pruned_loss=0.06342, ctc_loss=0.1115, over 3306041.45 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:25:05,847 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+02 3.180e+02 3.818e+02 4.474e+02 1.185e+03, threshold=7.637e+02, percent-clipped=2.0 2023-10-09 15:25:16,385 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2796696.0, ans=0.1 2023-10-09 15:25:23,484 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2796742.6666666665, ans=0.1 2023-10-09 15:25:41,414 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2796789.3333333335, ans=0.1 2023-10-09 15:25:45,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2796836.0, ans=0.1 2023-10-09 15:25:56,523 INFO [train.py:1031] (1/4) Epoch 14, batch 14600, loss[loss=0.3042, simple_loss=0.3351, pruned_loss=0.1025, ctc_loss=0.1706, over 16558.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2809, pruned_loss=0.06391, ctc_loss=0.112, over 3307329.52 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:26:05,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2796882.6666666665, ans=0.5 2023-10-09 15:26:47,364 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-10-09 15:26:56,318 INFO [train.py:1031] (1/4) Epoch 14, batch 14650, loss[loss=0.1998, simple_loss=0.2601, pruned_loss=0.05253, ctc_loss=0.08594, over 16693.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.282, pruned_loss=0.06468, ctc_loss=0.1133, over 3295856.83 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:26:59,233 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2797116.0, ans=0.125 2023-10-09 15:27:05,750 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.040e+02 3.470e+02 3.930e+02 6.552e+02, threshold=6.941e+02, percent-clipped=0.0 2023-10-09 15:27:09,558 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=12.0 2023-10-09 15:27:16,938 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2797162.6666666665, ans=0.125 2023-10-09 15:27:38,560 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2797256.0, ans=0.125 2023-10-09 15:27:57,842 INFO [train.py:1031] (1/4) Epoch 14, batch 14700, loss[loss=0.2052, simple_loss=0.2513, pruned_loss=0.05915, ctc_loss=0.1022, over 16699.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2788, pruned_loss=0.06461, ctc_loss=0.1135, over 3297296.56 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:28:11,878 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2797396.0, ans=0.0 2023-10-09 15:28:12,798 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2797396.0, ans=0.125 2023-10-09 15:28:22,489 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=22.5 2023-10-09 15:28:47,962 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-10-09 15:28:59,990 INFO [train.py:1031] (1/4) Epoch 14, batch 14750, loss[loss=0.2202, simple_loss=0.273, pruned_loss=0.06083, ctc_loss=0.1144, over 15232.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.273, pruned_loss=0.06338, ctc_loss=0.1115, over 3303665.69 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:29:02,989 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2023-10-09 15:29:11,864 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 3.078e+02 3.394e+02 3.991e+02 6.777e+02, threshold=6.787e+02, percent-clipped=0.0 2023-10-09 15:29:24,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2797676.0, ans=0.0 2023-10-09 15:29:38,577 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2797722.6666666665, ans=0.125 2023-10-09 15:29:44,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2797722.6666666665, ans=0.125 2023-10-09 15:29:46,154 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2797722.6666666665, ans=0.125 2023-10-09 15:30:00,990 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-10-09 15:30:01,444 INFO [train.py:1031] (1/4) Epoch 14, batch 14800, loss[loss=0.2887, simple_loss=0.3238, pruned_loss=0.09436, ctc_loss=0.1622, over 16518.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.277, pruned_loss=0.06464, ctc_loss=0.1134, over 3304914.45 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:30:07,249 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2797816.0, ans=0.125 2023-10-09 15:30:13,099 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-10-09 15:30:26,972 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-10-09 15:30:29,023 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2797909.3333333335, ans=0.015 2023-10-09 15:30:39,900 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2023-10-09 15:31:00,649 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2798002.6666666665, ans=0.2 2023-10-09 15:31:05,101 INFO [train.py:1031] (1/4) Epoch 14, batch 14850, loss[loss=0.2281, simple_loss=0.2854, pruned_loss=0.06476, ctc_loss=0.1032, over 16746.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2785, pruned_loss=0.06613, ctc_loss=0.1157, over 3305308.00 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:31:05,332 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2798049.3333333335, ans=0.125 2023-10-09 15:31:11,641 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-10-09 15:31:16,851 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.615e+02 3.104e+02 3.584e+02 4.093e+02 5.889e+02, threshold=7.167e+02, percent-clipped=0.0 2023-10-09 15:31:28,279 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.31 vs. limit=10.0 2023-10-09 15:31:40,873 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2798142.6666666665, ans=0.0 2023-10-09 15:31:41,296 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-10-09 15:31:54,587 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2798236.0, ans=0.125 2023-10-09 15:31:54,958 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-10-09 15:32:08,275 INFO [train.py:1031] (1/4) Epoch 14, batch 14900, loss[loss=0.1701, simple_loss=0.2428, pruned_loss=0.03482, ctc_loss=0.06931, over 16800.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.274, pruned_loss=0.06423, ctc_loss=0.1124, over 3311419.44 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:32:22,036 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:32:46,113 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2798422.6666666665, ans=0.125 2023-10-09 15:32:56,643 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2798422.6666666665, ans=0.035 2023-10-09 15:33:05,384 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2798469.3333333335, ans=0.07 2023-10-09 15:33:11,270 INFO [train.py:1031] (1/4) Epoch 14, batch 14950, loss[loss=0.2076, simple_loss=0.2693, pruned_loss=0.05347, ctc_loss=0.09726, over 16812.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2743, pruned_loss=0.06361, ctc_loss=0.1118, over 3314847.48 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:33:21,987 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2798516.0, ans=0.09899494936611666 2023-10-09 15:33:25,941 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+02 3.074e+02 3.344e+02 3.882e+02 5.335e+02, threshold=6.688e+02, percent-clipped=0.0 2023-10-09 15:33:39,192 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=22.5 2023-10-09 15:33:48,226 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2798656.0, ans=0.125 2023-10-09 15:33:54,944 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2798656.0, ans=0.09899494936611666 2023-10-09 15:33:59,782 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2798702.6666666665, ans=0.125 2023-10-09 15:34:01,638 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2798702.6666666665, ans=0.125 2023-10-09 15:34:01,682 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2798702.6666666665, ans=0.125 2023-10-09 15:34:13,092 INFO [train.py:1031] (1/4) Epoch 14, batch 15000, loss[loss=0.2286, simple_loss=0.2735, pruned_loss=0.06868, ctc_loss=0.1159, over 16405.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2761, pruned_loss=0.06355, ctc_loss=0.1118, over 3311328.46 frames. ], batch size: 417, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:34:13,092 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 15:34:29,424 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2384, simple_loss=0.3088, pruned_loss=0.06452, ctc_loss=0.09761, over 1796401.00 frames. 2023-10-09 15:34:29,424 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14563MB 2023-10-09 15:34:47,043 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2798796.0, ans=0.125 2023-10-09 15:34:54,099 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2798842.6666666665, ans=0.0 2023-10-09 15:35:09,090 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2023-10-09 15:35:12,517 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2798889.3333333335, ans=0.0 2023-10-09 15:35:32,337 INFO [train.py:1031] (1/4) Epoch 14, batch 15050, loss[loss=0.2432, simple_loss=0.3088, pruned_loss=0.06628, ctc_loss=0.1127, over 16790.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2746, pruned_loss=0.06156, ctc_loss=0.1083, over 3305649.75 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:35:44,253 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.41 vs. limit=15.0 2023-10-09 15:35:47,530 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2799029.3333333335, ans=0.125 2023-10-09 15:35:49,234 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+02 3.126e+02 3.487e+02 4.278e+02 6.504e+02, threshold=6.973e+02, percent-clipped=0.0 2023-10-09 15:35:51,242 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2799029.3333333335, ans=0.1 2023-10-09 15:36:02,115 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2799076.0, ans=0.125 2023-10-09 15:36:22,954 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2799169.3333333335, ans=0.125 2023-10-09 15:36:32,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2799169.3333333335, ans=0.2 2023-10-09 15:36:35,030 INFO [train.py:1031] (1/4) Epoch 14, batch 15100, loss[loss=0.2419, simple_loss=0.2904, pruned_loss=0.07091, ctc_loss=0.1292, over 16863.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2782, pruned_loss=0.06257, ctc_loss=0.1092, over 3303692.97 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:36:50,148 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=22.5 2023-10-09 15:36:52,433 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2799262.6666666665, ans=0.125 2023-10-09 15:36:56,068 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-10-09 15:36:59,067 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2799309.3333333335, ans=0.0 2023-10-09 15:37:00,108 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:37:07,922 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2799309.3333333335, ans=0.125 2023-10-09 15:37:07,939 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2799309.3333333335, ans=0.1 2023-10-09 15:37:08,304 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2799309.3333333335, ans=15.0 2023-10-09 15:37:14,534 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2799356.0, ans=0.07 2023-10-09 15:37:20,583 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2799356.0, ans=0.0 2023-10-09 15:37:21,550 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2799356.0, ans=0.0 2023-10-09 15:37:37,655 INFO [train.py:1031] (1/4) Epoch 14, batch 15150, loss[loss=0.2207, simple_loss=0.2716, pruned_loss=0.06328, ctc_loss=0.1081, over 16927.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2829, pruned_loss=0.06455, ctc_loss=0.1125, over 3303683.04 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:37:55,142 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.319e+02 4.410e+02 5.242e+02 1.151e+03, threshold=8.819e+02, percent-clipped=3.0 2023-10-09 15:37:58,909 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=22.5 2023-10-09 15:38:05,245 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2799542.6666666665, ans=0.0 2023-10-09 15:38:06,183 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2799542.6666666665, ans=0.125 2023-10-09 15:38:24,543 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2023-10-09 15:38:31,085 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2799636.0, ans=0.125 2023-10-09 15:38:38,456 INFO [train.py:1031] (1/4) Epoch 14, batch 15200, loss[loss=0.1798, simple_loss=0.2421, pruned_loss=0.04516, ctc_loss=0.06794, over 16566.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.281, pruned_loss=0.06242, ctc_loss=0.1088, over 3299679.29 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:39:01,275 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2799729.3333333335, ans=0.125 2023-10-09 15:39:04,448 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2799776.0, ans=0.0 2023-10-09 15:39:21,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2799822.6666666665, ans=0.125 2023-10-09 15:39:40,020 INFO [train.py:1031] (1/4) Epoch 14, batch 15250, loss[loss=0.1947, simple_loss=0.2635, pruned_loss=0.04724, ctc_loss=0.07866, over 16846.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2802, pruned_loss=0.06001, ctc_loss=0.105, over 3303665.50 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:39:48,589 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:39:58,266 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.655e+02 2.982e+02 3.898e+02 5.868e+02, threshold=5.964e+02, percent-clipped=0.0 2023-10-09 15:40:40,783 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2800102.6666666665, ans=0.125 2023-10-09 15:40:44,709 INFO [train.py:1031] (1/4) Epoch 14, batch 15300, loss[loss=0.1946, simple_loss=0.2676, pruned_loss=0.0452, ctc_loss=0.07828, over 16899.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2749, pruned_loss=0.05522, ctc_loss=0.09704, over 3305820.64 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:40:49,467 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2800149.3333333335, ans=0.1 2023-10-09 15:40:59,333 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-10-09 15:41:09,932 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800242.6666666665, ans=0.1 2023-10-09 15:41:12,069 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800242.6666666665, ans=0.1 2023-10-09 15:41:17,744 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2800242.6666666665, ans=0.0 2023-10-09 15:41:32,588 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2800289.3333333335, ans=0.125 2023-10-09 15:41:32,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2800289.3333333335, ans=0.125 2023-10-09 15:41:35,414 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2800336.0, ans=0.0 2023-10-09 15:41:38,449 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800336.0, ans=0.1 2023-10-09 15:41:42,697 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:41:48,938 INFO [train.py:1031] (1/4) Epoch 14, batch 15350, loss[loss=0.2981, simple_loss=0.3369, pruned_loss=0.09657, ctc_loss=0.1653, over 16767.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2803, pruned_loss=0.05877, ctc_loss=0.103, over 3300040.77 frames. ], batch size: 353, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:41:54,958 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2800382.6666666665, ans=0.0 2023-10-09 15:42:09,093 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.939e+02 3.401e+02 4.199e+02 7.970e+02, threshold=6.801e+02, percent-clipped=2.0 2023-10-09 15:42:53,682 INFO [train.py:1031] (1/4) Epoch 14, batch 15400, loss[loss=0.1989, simple_loss=0.2357, pruned_loss=0.06126, ctc_loss=0.0989, over 12381.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.287, pruned_loss=0.05985, ctc_loss=0.1051, over 3285378.71 frames. ], batch size: 41, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:43:04,170 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2800616.0, ans=15.0 2023-10-09 15:43:40,834 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=22.5 2023-10-09 15:43:50,697 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2800802.6666666665, ans=0.125 2023-10-09 15:43:56,848 INFO [train.py:1031] (1/4) Epoch 14, batch 15450, loss[loss=0.258, simple_loss=0.313, pruned_loss=0.07658, ctc_loss=0.1248, over 16542.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2853, pruned_loss=0.06005, ctc_loss=0.1046, over 3283587.31 frames. ], batch size: 350, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:44:15,573 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2800896.0, ans=0.125 2023-10-09 15:44:16,743 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800896.0, ans=0.1 2023-10-09 15:44:17,374 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 3.271e+02 3.987e+02 5.026e+02 8.046e+02, threshold=7.973e+02, percent-clipped=4.0 2023-10-09 15:44:33,759 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2800989.3333333335, ans=0.2 2023-10-09 15:44:36,493 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2023-10-09 15:44:39,456 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800989.3333333335, ans=0.1 2023-10-09 15:45:00,357 INFO [train.py:1031] (1/4) Epoch 14, batch 15500, loss[loss=0.1662, simple_loss=0.216, pruned_loss=0.04456, ctc_loss=0.06825, over 16733.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2787, pruned_loss=0.05899, ctc_loss=0.1013, over 3279277.81 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:45:05,189 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=22.5 2023-10-09 15:45:10,294 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2801082.6666666665, ans=0.125 2023-10-09 15:45:15,692 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2801129.3333333335, ans=0.2 2023-10-09 15:45:16,995 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=12.0 2023-10-09 15:45:31,881 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2801176.0, ans=0.2 2023-10-09 15:45:36,964 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2801222.6666666665, ans=0.125 2023-10-09 15:45:48,739 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-10-09 15:45:59,213 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2801316.0, ans=0.125 2023-10-09 15:45:59,989 INFO [train.py:1031] (1/4) Epoch 14, batch 15550, loss[loss=0.2312, simple_loss=0.2939, pruned_loss=0.06454, ctc_loss=0.09834, over 16754.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2764, pruned_loss=0.05936, ctc_loss=0.1011, over 3282355.00 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:46:11,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2801362.6666666665, ans=0.0 2023-10-09 15:46:17,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2801362.6666666665, ans=0.125 2023-10-09 15:46:20,298 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2801362.6666666665, ans=0.0 2023-10-09 15:46:22,076 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.223e+02 3.587e+02 4.203e+02 7.757e+02, threshold=7.174e+02, percent-clipped=0.0 2023-10-09 15:46:42,732 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2023-10-09 15:46:44,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2801456.0, ans=0.125 2023-10-09 15:46:59,412 INFO [train.py:1031] (1/4) Epoch 14, batch 15600, loss[loss=0.2219, simple_loss=0.2983, pruned_loss=0.0539, ctc_loss=0.09443, over 16811.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.283, pruned_loss=0.06322, ctc_loss=0.1077, over 3280197.61 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:47:05,303 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2801549.3333333335, ans=0.125 2023-10-09 15:47:18,746 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2801596.0, ans=0.125 2023-10-09 15:47:23,161 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2801642.6666666665, ans=0.2 2023-10-09 15:47:29,698 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2801642.6666666665, ans=0.125 2023-10-09 15:47:30,623 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2801642.6666666665, ans=0.125 2023-10-09 15:47:49,341 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2801736.0, ans=0.0 2023-10-09 15:47:59,738 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2801782.6666666665, ans=0.2 2023-10-09 15:48:00,448 INFO [train.py:1031] (1/4) Epoch 14, batch 15650, loss[loss=0.235, simple_loss=0.2798, pruned_loss=0.07146, ctc_loss=0.1184, over 16292.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2813, pruned_loss=0.06119, ctc_loss=0.1052, over 3284854.97 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 15:48:06,298 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2801782.6666666665, ans=0.125 2023-10-09 15:48:10,076 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2801782.6666666665, ans=0.125 2023-10-09 15:48:12,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2801829.3333333335, ans=0.125 2023-10-09 15:48:17,965 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2801829.3333333335, ans=0.125 2023-10-09 15:48:23,321 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 3.056e+02 3.461e+02 4.046e+02 6.916e+02, threshold=6.921e+02, percent-clipped=0.0 2023-10-09 15:48:30,196 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2801876.0, ans=0.1 2023-10-09 15:48:37,874 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-10-09 15:48:50,479 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:49:00,037 INFO [train.py:1031] (1/4) Epoch 14, batch 15700, loss[loss=0.1984, simple_loss=0.251, pruned_loss=0.05357, ctc_loss=0.09684, over 16702.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2768, pruned_loss=0.06135, ctc_loss=0.1059, over 3291457.12 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:49:00,348 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2802016.0, ans=0.125 2023-10-09 15:49:03,204 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802016.0, ans=0.1 2023-10-09 15:49:16,353 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2802062.6666666665, ans=0.0 2023-10-09 15:49:20,123 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2802062.6666666665, ans=0.07 2023-10-09 15:49:27,583 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2802109.3333333335, ans=0.2 2023-10-09 15:49:30,083 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-10-09 15:49:41,151 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2802156.0, ans=0.125 2023-10-09 15:49:42,845 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2802156.0, ans=0.125 2023-10-09 15:50:01,939 INFO [train.py:1031] (1/4) Epoch 14, batch 15750, loss[loss=0.1738, simple_loss=0.2313, pruned_loss=0.04306, ctc_loss=0.07569, over 16778.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2722, pruned_loss=0.06132, ctc_loss=0.106, over 3286304.95 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:50:06,559 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-10-09 15:50:08,383 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2023-10-09 15:50:24,312 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2802296.0, ans=0.2 2023-10-09 15:50:26,227 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.013e+02 3.498e+02 4.173e+02 6.687e+02, threshold=6.996e+02, percent-clipped=0.0 2023-10-09 15:50:31,968 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2802342.6666666665, ans=0.125 2023-10-09 15:50:44,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2802389.3333333335, ans=0.0 2023-10-09 15:50:52,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2802436.0, ans=0.125 2023-10-09 15:50:52,805 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2802436.0, ans=0.05 2023-10-09 15:50:53,796 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2802436.0, ans=0.125 2023-10-09 15:50:56,762 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2802436.0, ans=0.125 2023-10-09 15:50:57,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2802436.0, ans=0.125 2023-10-09 15:51:03,854 INFO [train.py:1031] (1/4) Epoch 14, batch 15800, loss[loss=0.2054, simple_loss=0.2542, pruned_loss=0.05904, ctc_loss=0.09604, over 16782.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2693, pruned_loss=0.06017, ctc_loss=0.1044, over 3281632.49 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:51:11,885 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2802482.6666666665, ans=0.0 2023-10-09 15:51:32,880 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802576.0, ans=0.1 2023-10-09 15:51:35,682 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2802576.0, ans=0.125 2023-10-09 15:51:55,229 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2802669.3333333335, ans=0.125 2023-10-09 15:52:09,230 INFO [train.py:1031] (1/4) Epoch 14, batch 15850, loss[loss=0.1835, simple_loss=0.2262, pruned_loss=0.05268, ctc_loss=0.08845, over 16637.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2745, pruned_loss=0.05912, ctc_loss=0.1026, over 3289392.18 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:52:31,346 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2802762.6666666665, ans=0.125 2023-10-09 15:52:36,120 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+02 3.156e+02 3.985e+02 5.059e+02 1.038e+03, threshold=7.970e+02, percent-clipped=10.0 2023-10-09 15:52:37,502 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2802809.3333333335, ans=0.0 2023-10-09 15:52:54,307 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2802856.0, ans=0.125 2023-10-09 15:53:12,701 INFO [train.py:1031] (1/4) Epoch 14, batch 15900, loss[loss=0.1847, simple_loss=0.2359, pruned_loss=0.04976, ctc_loss=0.08487, over 16652.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2751, pruned_loss=0.05782, ctc_loss=0.1, over 3282865.79 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:53:33,067 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:53:33,999 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2802996.0, ans=0.125 2023-10-09 15:53:50,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2803089.3333333335, ans=0.04949747468305833 2023-10-09 15:53:53,159 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-10-09 15:54:01,205 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2803136.0, ans=0.07 2023-10-09 15:54:05,939 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2803136.0, ans=0.015 2023-10-09 15:54:07,182 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:54:07,204 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2803136.0, ans=0.125 2023-10-09 15:54:14,353 INFO [train.py:1031] (1/4) Epoch 14, batch 15950, loss[loss=0.2084, simple_loss=0.2735, pruned_loss=0.05221, ctc_loss=0.09711, over 16889.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2728, pruned_loss=0.05778, ctc_loss=0.09998, over 3282417.75 frames. ], batch size: 243, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:54:34,980 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2803229.3333333335, ans=0.0 2023-10-09 15:54:41,758 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 3.015e+02 3.467e+02 4.153e+02 6.024e+02, threshold=6.935e+02, percent-clipped=0.0 2023-10-09 15:54:42,143 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2803276.0, ans=0.0 2023-10-09 15:54:50,968 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2803322.6666666665, ans=0.125 2023-10-09 15:54:51,061 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2803322.6666666665, ans=0.125 2023-10-09 15:54:56,392 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2023-10-09 15:55:02,488 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2803322.6666666665, ans=0.1 2023-10-09 15:55:16,781 INFO [train.py:1031] (1/4) Epoch 14, batch 16000, loss[loss=0.2437, simple_loss=0.3117, pruned_loss=0.06715, ctc_loss=0.1037, over 12698.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2786, pruned_loss=0.06137, ctc_loss=0.1063, over 3272847.42 frames. ], batch size: 38, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:55:17,130 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2803416.0, ans=0.2 2023-10-09 15:55:20,976 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2803416.0, ans=0.125 2023-10-09 15:55:26,488 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2803416.0, ans=0.125 2023-10-09 15:55:56,374 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2803556.0, ans=0.07 2023-10-09 15:56:00,474 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2803556.0, ans=0.1 2023-10-09 15:56:19,092 INFO [train.py:1031] (1/4) Epoch 14, batch 16050, loss[loss=0.2553, simple_loss=0.3342, pruned_loss=0.06388, ctc_loss=0.1217, over 16774.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2874, pruned_loss=0.06306, ctc_loss=0.1107, over 3282760.77 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:56:22,233 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2803649.3333333335, ans=0.0 2023-10-09 15:56:24,581 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2023-10-09 15:56:48,632 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 3.378e+02 4.238e+02 4.995e+02 7.928e+02, threshold=8.476e+02, percent-clipped=3.0 2023-10-09 15:57:06,027 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2803789.3333333335, ans=0.0 2023-10-09 15:57:21,627 INFO [train.py:1031] (1/4) Epoch 14, batch 16100, loss[loss=0.3286, simple_loss=0.359, pruned_loss=0.1112, ctc_loss=0.1896, over 16608.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2892, pruned_loss=0.0623, ctc_loss=0.1098, over 3288233.44 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:57:31,315 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2803882.6666666665, ans=0.125 2023-10-09 15:57:32,457 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2803882.6666666665, ans=0.125 2023-10-09 15:57:41,164 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2803929.3333333335, ans=0.0 2023-10-09 15:57:56,392 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2023-10-09 15:58:02,962 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2804022.6666666665, ans=0.125 2023-10-09 15:58:06,276 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2804022.6666666665, ans=0.2 2023-10-09 15:58:23,846 INFO [train.py:1031] (1/4) Epoch 14, batch 16150, loss[loss=0.2102, simple_loss=0.2882, pruned_loss=0.04815, ctc_loss=0.0899, over 16767.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2931, pruned_loss=0.06491, ctc_loss=0.1143, over 3287628.89 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:58:40,588 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2804162.6666666665, ans=0.0 2023-10-09 15:58:53,759 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.176e+02 3.660e+02 4.435e+02 1.361e+03, threshold=7.321e+02, percent-clipped=1.0 2023-10-09 15:59:10,529 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2804256.0, ans=0.125 2023-10-09 15:59:23,623 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-10-09 15:59:24,939 INFO [train.py:1031] (1/4) Epoch 14, batch 16200, loss[loss=0.1927, simple_loss=0.2465, pruned_loss=0.05113, ctc_loss=0.09167, over 16704.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2868, pruned_loss=0.06276, ctc_loss=0.1106, over 3287381.04 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:59:25,253 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2804349.3333333335, ans=0.1 2023-10-09 16:00:03,806 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:00:20,537 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=22.5 2023-10-09 16:00:27,732 INFO [train.py:1031] (1/4) Epoch 14, batch 16250, loss[loss=0.2147, simple_loss=0.2558, pruned_loss=0.06629, ctc_loss=0.1029, over 16594.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.281, pruned_loss=0.0614, ctc_loss=0.1084, over 3289798.53 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:00:58,616 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 3.037e+02 3.428e+02 4.095e+02 1.009e+03, threshold=6.855e+02, percent-clipped=2.0 2023-10-09 16:01:07,518 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2804722.6666666665, ans=0.0 2023-10-09 16:01:30,633 INFO [train.py:1031] (1/4) Epoch 14, batch 16300, loss[loss=0.2066, simple_loss=0.2615, pruned_loss=0.05656, ctc_loss=0.09624, over 16926.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.279, pruned_loss=0.05928, ctc_loss=0.1051, over 3291413.00 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:01:42,762 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2804862.6666666665, ans=0.0 2023-10-09 16:01:49,740 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2804862.6666666665, ans=0.1 2023-10-09 16:02:14,901 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2804956.0, ans=0.125 2023-10-09 16:02:17,653 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2804956.0, ans=0.2 2023-10-09 16:02:24,536 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.03 vs. limit=22.5 2023-10-09 16:02:30,780 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2805049.3333333335, ans=0.0 2023-10-09 16:02:31,522 INFO [train.py:1031] (1/4) Epoch 14, batch 16350, loss[loss=0.2276, simple_loss=0.2755, pruned_loss=0.06708, ctc_loss=0.114, over 16229.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2742, pruned_loss=0.05929, ctc_loss=0.1049, over 3283927.98 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:02:39,077 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2805049.3333333335, ans=0.125 2023-10-09 16:02:48,652 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2805096.0, ans=0.5 2023-10-09 16:03:00,166 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-10-09 16:03:01,698 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.121e+02 3.548e+02 4.178e+02 8.324e+02, threshold=7.096e+02, percent-clipped=2.0 2023-10-09 16:03:08,469 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2805189.3333333335, ans=0.125 2023-10-09 16:03:22,170 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2023-10-09 16:03:24,660 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2805236.0, ans=0.125 2023-10-09 16:03:32,988 INFO [train.py:1031] (1/4) Epoch 14, batch 16400, loss[loss=0.2076, simple_loss=0.2362, pruned_loss=0.06535, ctc_loss=0.1205, over 15606.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2737, pruned_loss=0.06021, ctc_loss=0.1064, over 3268433.39 frames. ], batch size: 530, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:03:33,325 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2805282.6666666665, ans=0.0 2023-10-09 16:03:42,734 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2805282.6666666665, ans=0.1 2023-10-09 16:03:58,039 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-10-09 16:04:04,262 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2805376.0, ans=0.09899494936611666 2023-10-09 16:04:07,002 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2805376.0, ans=0.0 2023-10-09 16:04:22,687 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-10-09 16:04:27,757 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2805469.3333333335, ans=0.0 2023-10-09 16:04:34,781 INFO [train.py:1031] (1/4) Epoch 14, batch 16450, loss[loss=0.2574, simple_loss=0.3002, pruned_loss=0.08079, ctc_loss=0.1324, over 11692.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2728, pruned_loss=0.06165, ctc_loss=0.1086, over 3271137.47 frames. ], batch size: 36, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:04:42,769 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2023-10-09 16:05:06,539 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+02 3.324e+02 3.650e+02 4.238e+02 1.011e+03, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 16:05:07,835 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2805609.3333333335, ans=0.0 2023-10-09 16:05:29,760 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2805702.6666666665, ans=0.1 2023-10-09 16:05:30,798 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:05:35,698 INFO [train.py:1031] (1/4) Epoch 14, batch 16500, loss[loss=0.2462, simple_loss=0.2961, pruned_loss=0.07199, ctc_loss=0.131, over 16797.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2689, pruned_loss=0.06202, ctc_loss=0.1092, over 3278949.64 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:05:42,305 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2805749.3333333335, ans=0.09899494936611666 2023-10-09 16:05:49,821 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:05:51,765 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2805796.0, ans=0.125 2023-10-09 16:05:56,595 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=22.5 2023-10-09 16:05:57,308 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2805796.0, ans=0.2 2023-10-09 16:05:58,356 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2805796.0, ans=0.125 2023-10-09 16:06:08,523 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2805842.6666666665, ans=0.0 2023-10-09 16:06:34,176 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2805936.0, ans=0.2 2023-10-09 16:06:37,059 INFO [train.py:1031] (1/4) Epoch 14, batch 16550, loss[loss=0.23, simple_loss=0.2838, pruned_loss=0.06543, ctc_loss=0.1134, over 16777.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2704, pruned_loss=0.06146, ctc_loss=0.1079, over 3283137.80 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:06:40,693 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-10-09 16:06:40,773 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=12.0 2023-10-09 16:07:09,904 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+02 3.011e+02 3.365e+02 4.120e+02 6.132e+02, threshold=6.730e+02, percent-clipped=0.0 2023-10-09 16:07:13,835 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2806122.6666666665, ans=0.125 2023-10-09 16:07:15,649 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2806122.6666666665, ans=0.09899494936611666 2023-10-09 16:07:25,356 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.02 vs. limit=10.0 2023-10-09 16:07:25,852 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2806169.3333333335, ans=0.2 2023-10-09 16:07:37,226 INFO [train.py:1031] (1/4) Epoch 14, batch 16600, loss[loss=0.2294, simple_loss=0.2819, pruned_loss=0.06396, ctc_loss=0.1225, over 16817.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2682, pruned_loss=0.06165, ctc_loss=0.1082, over 3291550.04 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:07:41,285 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2806216.0, ans=0.125 2023-10-09 16:08:12,844 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806356.0, ans=0.1 2023-10-09 16:08:26,932 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2806402.6666666665, ans=0.0 2023-10-09 16:08:35,431 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806402.6666666665, ans=0.1 2023-10-09 16:08:39,073 INFO [train.py:1031] (1/4) Epoch 14, batch 16650, loss[loss=0.2091, simple_loss=0.2621, pruned_loss=0.05807, ctc_loss=0.1001, over 16727.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2698, pruned_loss=0.0609, ctc_loss=0.107, over 3297018.19 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:08:41,566 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2806449.3333333335, ans=0.125 2023-10-09 16:08:50,218 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:09:15,076 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 2.878e+02 3.292e+02 3.921e+02 8.519e+02, threshold=6.584e+02, percent-clipped=3.0 2023-10-09 16:09:28,580 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2806636.0, ans=0.125 2023-10-09 16:09:29,629 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2806636.0, ans=0.2 2023-10-09 16:09:40,526 INFO [train.py:1031] (1/4) Epoch 14, batch 16700, loss[loss=0.2634, simple_loss=0.277, pruned_loss=0.09301, ctc_loss=0.1593, over 16649.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2672, pruned_loss=0.06135, ctc_loss=0.1073, over 3298670.84 frames. ], batch size: 386, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:09:49,719 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2806682.6666666665, ans=0.0 2023-10-09 16:09:57,336 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2806729.3333333335, ans=0.125 2023-10-09 16:10:20,875 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:10:42,263 INFO [train.py:1031] (1/4) Epoch 14, batch 16750, loss[loss=0.2105, simple_loss=0.2788, pruned_loss=0.05238, ctc_loss=0.09356, over 16943.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2671, pruned_loss=0.06165, ctc_loss=0.1074, over 3284123.25 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:11:08,203 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2807009.3333333335, ans=0.125 2023-10-09 16:11:18,573 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.049e+02 3.546e+02 4.303e+02 6.611e+02, threshold=7.093e+02, percent-clipped=1.0 2023-10-09 16:11:21,363 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:11:37,679 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-10-09 16:11:42,925 INFO [train.py:1031] (1/4) Epoch 14, batch 16800, loss[loss=0.1951, simple_loss=0.2382, pruned_loss=0.0543, ctc_loss=0.1087, over 15393.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2696, pruned_loss=0.06106, ctc_loss=0.107, over 3277727.09 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:11:44,235 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2807149.3333333335, ans=0.125 2023-10-09 16:12:27,496 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2807289.3333333335, ans=0.035 2023-10-09 16:12:32,943 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2807336.0, ans=0.025 2023-10-09 16:12:45,124 INFO [train.py:1031] (1/4) Epoch 14, batch 16850, loss[loss=0.199, simple_loss=0.2745, pruned_loss=0.04492, ctc_loss=0.08416, over 15424.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2705, pruned_loss=0.06141, ctc_loss=0.1079, over 3281420.07 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:12:48,309 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2807382.6666666665, ans=0.125 2023-10-09 16:12:50,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2807382.6666666665, ans=0.05 2023-10-09 16:12:53,962 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2807382.6666666665, ans=0.2 2023-10-09 16:12:56,862 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2807429.3333333335, ans=0.125 2023-10-09 16:12:56,864 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2807429.3333333335, ans=0.125 2023-10-09 16:12:57,970 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2807429.3333333335, ans=0.0 2023-10-09 16:13:00,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2807429.3333333335, ans=0.0 2023-10-09 16:13:08,474 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2807429.3333333335, ans=0.05 2023-10-09 16:13:14,806 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-10-09 16:13:19,529 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:13:22,077 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2807522.6666666665, ans=0.125 2023-10-09 16:13:24,925 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.198e+02 3.748e+02 4.342e+02 8.032e+02, threshold=7.496e+02, percent-clipped=3.0 2023-10-09 16:13:29,119 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2807522.6666666665, ans=0.125 2023-10-09 16:13:43,510 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2807569.3333333335, ans=0.0 2023-10-09 16:13:47,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-10-09 16:13:48,642 INFO [train.py:1031] (1/4) Epoch 14, batch 16900, loss[loss=0.2158, simple_loss=0.2832, pruned_loss=0.05438, ctc_loss=0.09933, over 16822.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2739, pruned_loss=0.061, ctc_loss=0.1077, over 3293781.60 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:13:54,960 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2807616.0, ans=0.0 2023-10-09 16:14:09,341 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2807662.6666666665, ans=0.2 2023-10-09 16:14:15,707 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=22.5 2023-10-09 16:14:16,502 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2807709.3333333335, ans=0.125 2023-10-09 16:14:30,017 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:14:39,211 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2807802.6666666665, ans=0.0 2023-10-09 16:14:42,621 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2807802.6666666665, ans=0.09899494936611666 2023-10-09 16:14:51,572 INFO [train.py:1031] (1/4) Epoch 14, batch 16950, loss[loss=0.2519, simple_loss=0.2966, pruned_loss=0.07825, ctc_loss=0.1267, over 16793.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2803, pruned_loss=0.06351, ctc_loss=0.112, over 3299298.51 frames. ], batch size: 201, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:14:51,970 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2807849.3333333335, ans=0.1 2023-10-09 16:14:56,983 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2807849.3333333335, ans=0.125 2023-10-09 16:15:06,162 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2807896.0, ans=0.0 2023-10-09 16:15:22,577 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2807942.6666666665, ans=0.09899494936611666 2023-10-09 16:15:29,802 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2807989.3333333335, ans=0.07 2023-10-09 16:15:33,196 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+02 3.296e+02 3.627e+02 4.465e+02 8.431e+02, threshold=7.254e+02, percent-clipped=3.0 2023-10-09 16:15:40,521 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2807989.3333333335, ans=0.0 2023-10-09 16:15:55,745 INFO [train.py:1031] (1/4) Epoch 14, batch 17000, loss[loss=0.2055, simple_loss=0.2491, pruned_loss=0.06062, ctc_loss=0.1016, over 16753.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2836, pruned_loss=0.06438, ctc_loss=0.113, over 3296459.69 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:16:05,270 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2808082.6666666665, ans=0.125 2023-10-09 16:16:22,758 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2808176.0, ans=0.125 2023-10-09 16:16:29,818 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2808176.0, ans=0.125 2023-10-09 16:16:33,207 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2808222.6666666665, ans=0.0 2023-10-09 16:16:34,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2808222.6666666665, ans=0.025 2023-10-09 16:16:41,032 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808222.6666666665, ans=0.1 2023-10-09 16:16:55,094 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2808269.3333333335, ans=0.015 2023-10-09 16:16:59,295 INFO [train.py:1031] (1/4) Epoch 14, batch 17050, loss[loss=0.2689, simple_loss=0.3204, pruned_loss=0.07932, ctc_loss=0.1466, over 16840.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2851, pruned_loss=0.06246, ctc_loss=0.1103, over 3293320.29 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:17:41,585 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2023-10-09 16:17:41,819 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+02 3.300e+02 3.832e+02 4.647e+02 9.893e+02, threshold=7.664e+02, percent-clipped=3.0 2023-10-09 16:17:55,812 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2808502.6666666665, ans=0.0 2023-10-09 16:17:57,896 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2808502.6666666665, ans=0.125 2023-10-09 16:18:02,370 INFO [train.py:1031] (1/4) Epoch 14, batch 17100, loss[loss=0.2374, simple_loss=0.2899, pruned_loss=0.06881, ctc_loss=0.118, over 16982.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.289, pruned_loss=0.06522, ctc_loss=0.1146, over 3297136.14 frames. ], batch size: 216, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:18:14,397 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2808596.0, ans=0.125 2023-10-09 16:18:19,032 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=22.5 2023-10-09 16:18:26,698 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2808642.6666666665, ans=0.125 2023-10-09 16:18:35,829 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2808642.6666666665, ans=0.125 2023-10-09 16:18:55,898 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2808736.0, ans=0.2 2023-10-09 16:19:01,601 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-10-09 16:19:03,710 INFO [train.py:1031] (1/4) Epoch 14, batch 17150, loss[loss=0.2497, simple_loss=0.3146, pruned_loss=0.06777, ctc_loss=0.1231, over 16807.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.291, pruned_loss=0.06427, ctc_loss=0.1133, over 3299706.40 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:19:34,187 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2808876.0, ans=0.05 2023-10-09 16:19:46,138 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.091e+02 3.589e+02 4.240e+02 6.885e+02, threshold=7.178e+02, percent-clipped=0.0 2023-10-09 16:19:46,524 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2808922.6666666665, ans=0.1 2023-10-09 16:20:05,578 INFO [train.py:1031] (1/4) Epoch 14, batch 17200, loss[loss=0.2894, simple_loss=0.3557, pruned_loss=0.07981, ctc_loss=0.1585, over 16812.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2986, pruned_loss=0.06497, ctc_loss=0.1154, over 3304546.85 frames. ], batch size: 291, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:20:07,683 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2809016.0, ans=0.1 2023-10-09 16:20:23,213 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2809062.6666666665, ans=0.125 2023-10-09 16:20:48,970 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2809156.0, ans=0.1 2023-10-09 16:20:49,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2809156.0, ans=0.0 2023-10-09 16:20:49,342 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.93 vs. limit=10.0 2023-10-09 16:21:00,284 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2809202.6666666665, ans=0.125 2023-10-09 16:21:12,738 INFO [train.py:1031] (1/4) Epoch 14, batch 17250, loss[loss=0.2722, simple_loss=0.356, pruned_loss=0.06964, ctc_loss=0.1226, over 16864.00 frames. ], tot_loss[loss=0.2501, simple_loss=0.3168, pruned_loss=0.06731, ctc_loss=0.1218, over 3293477.24 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:21:16,834 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2809249.3333333335, ans=0.125 2023-10-09 16:21:16,849 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2809249.3333333335, ans=0.1 2023-10-09 16:21:28,209 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2809296.0, ans=0.07 2023-10-09 16:21:31,204 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2023-10-09 16:21:57,826 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.919e+02 4.626e+02 5.820e+02 9.725e+02, threshold=9.252e+02, percent-clipped=7.0 2023-10-09 16:22:02,535 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2809389.3333333335, ans=0.015 2023-10-09 16:22:10,566 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2809436.0, ans=0.125 2023-10-09 16:22:16,151 INFO [train.py:1031] (1/4) Epoch 14, batch 17300, loss[loss=0.2323, simple_loss=0.3167, pruned_loss=0.0544, ctc_loss=0.0977, over 16864.00 frames. ], tot_loss[loss=0.2514, simple_loss=0.3206, pruned_loss=0.06689, ctc_loss=0.1209, over 3295212.43 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:22:24,090 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2809482.6666666665, ans=0.0 2023-10-09 16:22:35,669 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=22.5 2023-10-09 16:22:58,856 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2809622.6666666665, ans=0.035 2023-10-09 16:23:10,379 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2809669.3333333335, ans=0.125 2023-10-09 16:23:15,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2809669.3333333335, ans=0.0 2023-10-09 16:23:17,641 INFO [train.py:1031] (1/4) Epoch 14, batch 17350, loss[loss=0.2175, simple_loss=0.2884, pruned_loss=0.05459, ctc_loss=0.09365, over 16635.00 frames. ], tot_loss[loss=0.2519, simple_loss=0.3229, pruned_loss=0.06649, ctc_loss=0.1197, over 3289147.64 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:23:25,525 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2809716.0, ans=0.95 2023-10-09 16:24:01,202 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.229e+02 3.810e+02 5.005e+02 1.294e+03, threshold=7.619e+02, percent-clipped=1.0 2023-10-09 16:24:04,163 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2809856.0, ans=0.2 2023-10-09 16:24:09,054 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2809902.6666666665, ans=0.125 2023-10-09 16:24:18,402 INFO [train.py:1031] (1/4) Epoch 14, batch 17400, loss[loss=0.1693, simple_loss=0.2085, pruned_loss=0.04753, ctc_loss=0.08727, over 16012.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.3112, pruned_loss=0.06567, ctc_loss=0.1172, over 3290033.01 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:24:31,490 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2809996.0, ans=0.1 2023-10-09 16:24:54,087 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=12.0 2023-10-09 16:25:13,345 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2810136.0, ans=0.125 2023-10-09 16:25:18,381 INFO [train.py:1031] (1/4) Epoch 14, batch 17450, loss[loss=0.2071, simple_loss=0.2524, pruned_loss=0.06009, ctc_loss=0.1042, over 16763.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2985, pruned_loss=0.06456, ctc_loss=0.1145, over 3286757.67 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:25:42,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2810229.3333333335, ans=0.125 2023-10-09 16:25:49,333 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2810276.0, ans=0.0 2023-10-09 16:25:56,893 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2810322.6666666665, ans=0.125 2023-10-09 16:26:01,437 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:26:03,703 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-10-09 16:26:05,314 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+02 3.049e+02 3.427e+02 3.970e+02 9.337e+02, threshold=6.853e+02, percent-clipped=1.0 2023-10-09 16:26:15,962 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2810369.3333333335, ans=0.0 2023-10-09 16:26:17,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2810369.3333333335, ans=0.125 2023-10-09 16:26:20,866 INFO [train.py:1031] (1/4) Epoch 14, batch 17500, loss[loss=0.2529, simple_loss=0.2823, pruned_loss=0.08344, ctc_loss=0.1415, over 12152.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2896, pruned_loss=0.06452, ctc_loss=0.1142, over 3293911.01 frames. ], batch size: 35, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:26:37,907 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810462.6666666665, ans=0.1 2023-10-09 16:26:42,184 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=8.0 2023-10-09 16:26:59,527 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-10-09 16:27:00,979 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810556.0, ans=0.1 2023-10-09 16:27:14,024 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2810602.6666666665, ans=0.0 2023-10-09 16:27:22,359 INFO [train.py:1031] (1/4) Epoch 14, batch 17550, loss[loss=0.2302, simple_loss=0.2913, pruned_loss=0.06222, ctc_loss=0.1114, over 16922.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2897, pruned_loss=0.06625, ctc_loss=0.1169, over 3289915.85 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:28:12,372 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.113e+02 3.532e+02 4.348e+02 7.721e+02, threshold=7.063e+02, percent-clipped=2.0 2023-10-09 16:28:12,798 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2810836.0, ans=0.125 2023-10-09 16:28:14,645 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2810836.0, ans=0.125 2023-10-09 16:28:25,656 INFO [train.py:1031] (1/4) Epoch 14, batch 17600, loss[loss=0.2076, simple_loss=0.2678, pruned_loss=0.05529, ctc_loss=0.09179, over 16839.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2916, pruned_loss=0.06503, ctc_loss=0.1148, over 3291065.65 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:28:41,419 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2023-10-09 16:28:50,234 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2810976.0, ans=0.1 2023-10-09 16:28:55,932 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2810976.0, ans=0.125 2023-10-09 16:29:08,742 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2811022.6666666665, ans=0.125 2023-10-09 16:29:08,781 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2811022.6666666665, ans=0.1 2023-10-09 16:29:09,826 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2811022.6666666665, ans=0.09899494936611666 2023-10-09 16:29:10,850 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2811022.6666666665, ans=0.1 2023-10-09 16:29:17,472 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2811069.3333333335, ans=0.0 2023-10-09 16:29:27,535 INFO [train.py:1031] (1/4) Epoch 14, batch 17650, loss[loss=0.1989, simple_loss=0.2681, pruned_loss=0.04829, ctc_loss=0.08295, over 16727.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2904, pruned_loss=0.06318, ctc_loss=0.1117, over 3291322.27 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:29:39,207 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2811162.6666666665, ans=0.1 2023-10-09 16:29:59,955 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2811209.3333333335, ans=0.125 2023-10-09 16:30:06,602 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2811256.0, ans=0.125 2023-10-09 16:30:09,426 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2811256.0, ans=0.0 2023-10-09 16:30:17,961 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.983e+02 3.277e+02 4.147e+02 6.506e+02, threshold=6.554e+02, percent-clipped=0.0 2023-10-09 16:30:31,455 INFO [train.py:1031] (1/4) Epoch 14, batch 17700, loss[loss=0.258, simple_loss=0.3149, pruned_loss=0.0732, ctc_loss=0.1367, over 16548.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2917, pruned_loss=0.06061, ctc_loss=0.1081, over 3287598.29 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:30:33,961 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-10-09 16:30:36,581 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.20 vs. limit=10.0 2023-10-09 16:30:56,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2811442.6666666665, ans=0.125 2023-10-09 16:30:59,437 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2811442.6666666665, ans=0.125 2023-10-09 16:31:03,299 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2811442.6666666665, ans=0.2 2023-10-09 16:31:35,982 INFO [train.py:1031] (1/4) Epoch 14, batch 17750, loss[loss=0.2734, simple_loss=0.3549, pruned_loss=0.06859, ctc_loss=0.1366, over 16447.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2895, pruned_loss=0.05916, ctc_loss=0.106, over 3295263.95 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:31:42,648 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2023-10-09 16:31:53,922 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-10-09 16:31:55,440 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2811629.3333333335, ans=0.125 2023-10-09 16:31:55,498 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2811629.3333333335, ans=0.125 2023-10-09 16:32:04,975 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-10-09 16:32:06,895 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2811676.0, ans=0.125 2023-10-09 16:32:07,361 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-10-09 16:32:11,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2811676.0, ans=0.125 2023-10-09 16:32:26,659 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+02 3.107e+02 3.479e+02 4.054e+02 7.691e+02, threshold=6.958e+02, percent-clipped=4.0 2023-10-09 16:32:39,790 INFO [train.py:1031] (1/4) Epoch 14, batch 17800, loss[loss=0.1809, simple_loss=0.2455, pruned_loss=0.04288, ctc_loss=0.0763, over 16832.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2917, pruned_loss=0.05726, ctc_loss=0.1034, over 3293111.34 frames. ], batch size: 141, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:32:54,391 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-10-09 16:32:59,043 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2811862.6666666665, ans=0.0 2023-10-09 16:33:24,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2811956.0, ans=0.2 2023-10-09 16:33:34,114 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2023-10-09 16:33:35,167 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2812002.6666666665, ans=0.0 2023-10-09 16:33:41,485 INFO [train.py:1031] (1/4) Epoch 14, batch 17850, loss[loss=0.23, simple_loss=0.2775, pruned_loss=0.06746, ctc_loss=0.119, over 16826.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2865, pruned_loss=0.05619, ctc_loss=0.1016, over 3289938.16 frames. ], batch size: 273, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:34:19,488 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2812189.3333333335, ans=0.09899494936611666 2023-10-09 16:34:32,668 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.983e+02 3.516e+02 4.147e+02 7.275e+02, threshold=7.033e+02, percent-clipped=1.0 2023-10-09 16:34:43,848 INFO [train.py:1031] (1/4) Epoch 14, batch 17900, loss[loss=0.2338, simple_loss=0.2729, pruned_loss=0.07326, ctc_loss=0.1204, over 16701.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.28, pruned_loss=0.05762, ctc_loss=0.1033, over 3289142.98 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:34:48,702 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-10-09 16:34:51,620 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2812282.6666666665, ans=0.125 2023-10-09 16:35:03,785 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2812329.3333333335, ans=0.0 2023-10-09 16:35:03,835 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2812329.3333333335, ans=0.2 2023-10-09 16:35:37,728 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:35:43,154 INFO [train.py:1031] (1/4) Epoch 14, batch 17950, loss[loss=0.239, simple_loss=0.295, pruned_loss=0.06783, ctc_loss=0.118, over 16999.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2794, pruned_loss=0.05984, ctc_loss=0.1063, over 3293764.51 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:36:13,097 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:36:21,572 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-10-09 16:36:37,271 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+02 3.955e+02 4.561e+02 5.519e+02 1.023e+03, threshold=9.123e+02, percent-clipped=10.0 2023-10-09 16:36:47,028 INFO [train.py:1031] (1/4) Epoch 14, batch 18000, loss[loss=0.2407, simple_loss=0.2902, pruned_loss=0.07122, ctc_loss=0.1218, over 16733.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2845, pruned_loss=0.0635, ctc_loss=0.112, over 3301086.75 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:36:47,028 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 16:37:05,079 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2359, simple_loss=0.3042, pruned_loss=0.06468, ctc_loss=0.09589, over 1796401.00 frames. 2023-10-09 16:37:05,080 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14563MB 2023-10-09 16:37:12,234 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:37:14,502 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2812749.3333333335, ans=0.125 2023-10-09 16:37:15,402 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2023-10-09 16:37:22,566 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-10-09 16:37:32,747 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2812842.6666666665, ans=0.125 2023-10-09 16:37:52,459 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2812889.3333333335, ans=0.125 2023-10-09 16:38:10,374 INFO [train.py:1031] (1/4) Epoch 14, batch 18050, loss[loss=0.2394, simple_loss=0.3178, pruned_loss=0.05929, ctc_loss=0.1058, over 16780.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2885, pruned_loss=0.06496, ctc_loss=0.1144, over 3289245.91 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:38:16,611 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2023-10-09 16:38:17,526 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=22.5 2023-10-09 16:38:29,915 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2813029.3333333335, ans=0.1 2023-10-09 16:38:35,024 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2813076.0, ans=0.1 2023-10-09 16:38:59,369 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2813122.6666666665, ans=0.0 2023-10-09 16:39:06,174 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+02 3.447e+02 3.987e+02 5.015e+02 1.069e+03, threshold=7.973e+02, percent-clipped=1.0 2023-10-09 16:39:12,696 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2813169.3333333335, ans=0.0 2023-10-09 16:39:14,533 INFO [train.py:1031] (1/4) Epoch 14, batch 18100, loss[loss=0.232, simple_loss=0.339, pruned_loss=0.04475, ctc_loss=0.08859, over 15144.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2913, pruned_loss=0.06255, ctc_loss=0.1107, over 3281094.53 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:39:50,679 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2813309.3333333335, ans=0.125 2023-10-09 16:39:55,435 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2813356.0, ans=0.0 2023-10-09 16:40:06,286 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2023-10-09 16:40:16,840 INFO [train.py:1031] (1/4) Epoch 14, batch 18150, loss[loss=0.2489, simple_loss=0.287, pruned_loss=0.07759, ctc_loss=0.1391, over 16461.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2889, pruned_loss=0.06175, ctc_loss=0.109, over 3281311.36 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 16:40:23,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2813449.3333333335, ans=0.125 2023-10-09 16:40:47,365 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2023-10-09 16:41:12,481 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.202e+02 3.701e+02 4.396e+02 8.361e+02, threshold=7.403e+02, percent-clipped=2.0 2023-10-09 16:41:18,334 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2813682.6666666665, ans=0.125 2023-10-09 16:41:19,063 INFO [train.py:1031] (1/4) Epoch 14, batch 18200, loss[loss=0.2139, simple_loss=0.269, pruned_loss=0.05973, ctc_loss=0.09835, over 16869.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2826, pruned_loss=0.06205, ctc_loss=0.1091, over 3277915.15 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:41:21,665 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:41:34,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2813729.3333333335, ans=0.05 2023-10-09 16:41:36,845 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2813729.3333333335, ans=0.2 2023-10-09 16:42:16,479 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:42:21,165 INFO [train.py:1031] (1/4) Epoch 14, batch 18250, loss[loss=0.1548, simple_loss=0.2241, pruned_loss=0.03193, ctc_loss=0.05416, over 16731.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2742, pruned_loss=0.05799, ctc_loss=0.102, over 3281832.60 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:42:32,800 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2813962.6666666665, ans=0.125 2023-10-09 16:42:32,866 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2813962.6666666665, ans=0.1 2023-10-09 16:42:52,264 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2814009.3333333335, ans=0.0 2023-10-09 16:42:56,168 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2814009.3333333335, ans=0.1 2023-10-09 16:43:00,216 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-10-09 16:43:04,465 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2814056.0, ans=0.04949747468305833 2023-10-09 16:43:07,190 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-10-09 16:43:07,793 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2814056.0, ans=0.2 2023-10-09 16:43:14,349 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2814102.6666666665, ans=0.2 2023-10-09 16:43:16,734 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.801e+02 3.276e+02 4.033e+02 6.396e+02, threshold=6.552e+02, percent-clipped=0.0 2023-10-09 16:43:17,118 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2814102.6666666665, ans=0.125 2023-10-09 16:43:22,461 INFO [train.py:1031] (1/4) Epoch 14, batch 18300, loss[loss=0.2387, simple_loss=0.3133, pruned_loss=0.05963, ctc_loss=0.1122, over 16804.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2688, pruned_loss=0.05393, ctc_loss=0.0951, over 3291633.05 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:43:30,511 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814149.3333333335, ans=0.1 2023-10-09 16:43:30,685 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=22.5 2023-10-09 16:43:38,488 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2023-10-09 16:43:52,768 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-10-09 16:43:55,755 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2814242.6666666665, ans=0.2 2023-10-09 16:43:58,470 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2814242.6666666665, ans=0.125 2023-10-09 16:44:11,177 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=22.5 2023-10-09 16:44:17,560 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2814336.0, ans=0.1 2023-10-09 16:44:25,852 INFO [train.py:1031] (1/4) Epoch 14, batch 18350, loss[loss=0.1959, simple_loss=0.267, pruned_loss=0.04607, ctc_loss=0.08174, over 16760.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2735, pruned_loss=0.05424, ctc_loss=0.09582, over 3300802.29 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:44:38,077 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-10-09 16:44:38,719 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2814429.3333333335, ans=0.0 2023-10-09 16:44:56,276 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2814476.0, ans=0.0 2023-10-09 16:44:58,778 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=12.0 2023-10-09 16:45:00,609 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2814476.0, ans=0.0 2023-10-09 16:45:18,693 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2814569.3333333335, ans=0.015 2023-10-09 16:45:22,233 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 3.059e+02 3.585e+02 4.224e+02 7.359e+02, threshold=7.170e+02, percent-clipped=2.0 2023-10-09 16:45:26,919 INFO [train.py:1031] (1/4) Epoch 14, batch 18400, loss[loss=0.2633, simple_loss=0.3165, pruned_loss=0.07801, ctc_loss=0.1353, over 16857.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2823, pruned_loss=0.05791, ctc_loss=0.1024, over 3296674.23 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:45:29,313 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2814616.0, ans=0.0 2023-10-09 16:45:29,781 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-10-09 16:45:49,668 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2814709.3333333335, ans=0.125 2023-10-09 16:46:17,142 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.60 vs. limit=10.0 2023-10-09 16:46:27,839 INFO [train.py:1031] (1/4) Epoch 14, batch 18450, loss[loss=0.2065, simple_loss=0.2518, pruned_loss=0.06119, ctc_loss=0.09724, over 12400.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2841, pruned_loss=0.06127, ctc_loss=0.1078, over 3303363.39 frames. ], batch size: 35, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:46:39,418 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2814896.0, ans=0.125 2023-10-09 16:46:44,090 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=2814896.0, ans=0.02 2023-10-09 16:46:50,634 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.87 vs. limit=10.0 2023-10-09 16:47:01,423 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2814942.6666666665, ans=0.0 2023-10-09 16:47:06,024 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2814989.3333333335, ans=0.2 2023-10-09 16:47:13,387 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2814989.3333333335, ans=0.125 2023-10-09 16:47:24,334 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=12.0 2023-10-09 16:47:26,503 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+02 3.308e+02 3.613e+02 4.264e+02 6.985e+02, threshold=7.226e+02, percent-clipped=0.0 2023-10-09 16:47:30,870 INFO [train.py:1031] (1/4) Epoch 14, batch 18500, loss[loss=0.2203, simple_loss=0.2807, pruned_loss=0.05749, ctc_loss=0.1121, over 16684.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2851, pruned_loss=0.06309, ctc_loss=0.1108, over 3306466.73 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:47:53,652 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2815176.0, ans=0.125 2023-10-09 16:47:54,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2815176.0, ans=0.125 2023-10-09 16:48:15,503 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.44 vs. limit=10.0 2023-10-09 16:48:20,084 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815269.3333333335, ans=0.1 2023-10-09 16:48:31,666 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2023-10-09 16:48:32,684 INFO [train.py:1031] (1/4) Epoch 14, batch 18550, loss[loss=0.2609, simple_loss=0.3209, pruned_loss=0.07547, ctc_loss=0.1247, over 16765.00 frames. ], tot_loss[loss=0.234, simple_loss=0.29, pruned_loss=0.06599, ctc_loss=0.1153, over 3309465.31 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:48:40,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2815316.0, ans=0.125 2023-10-09 16:48:44,602 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2815362.6666666665, ans=0.035 2023-10-09 16:49:14,603 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-10-09 16:49:21,307 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2815456.0, ans=0.125 2023-10-09 16:49:26,013 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2815502.6666666665, ans=0.07 2023-10-09 16:49:27,300 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-10-09 16:49:34,435 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+02 3.368e+02 3.936e+02 4.731e+02 1.128e+03, threshold=7.872e+02, percent-clipped=2.0 2023-10-09 16:49:36,579 INFO [train.py:1031] (1/4) Epoch 14, batch 18600, loss[loss=0.2646, simple_loss=0.3495, pruned_loss=0.06473, ctc_loss=0.1256, over 16911.00 frames. ], tot_loss[loss=0.241, simple_loss=0.2989, pruned_loss=0.06783, ctc_loss=0.1189, over 3305189.46 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:49:47,142 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2815549.3333333335, ans=0.0 2023-10-09 16:49:54,978 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2815596.0, ans=0.2 2023-10-09 16:50:02,853 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-10-09 16:50:03,637 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2815642.6666666665, ans=0.0 2023-10-09 16:50:13,461 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2815642.6666666665, ans=0.025 2023-10-09 16:50:24,004 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2023-10-09 16:50:41,211 INFO [train.py:1031] (1/4) Epoch 14, batch 18650, loss[loss=0.2702, simple_loss=0.3204, pruned_loss=0.08065, ctc_loss=0.147, over 16824.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.305, pruned_loss=0.06933, ctc_loss=0.1219, over 3310247.94 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:50:41,561 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2815782.6666666665, ans=0.0 2023-10-09 16:50:47,705 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2815782.6666666665, ans=0.125 2023-10-09 16:50:47,763 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2815782.6666666665, ans=0.2 2023-10-09 16:50:48,972 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-10-09 16:50:49,642 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2815782.6666666665, ans=0.0 2023-10-09 16:51:25,195 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2815922.6666666665, ans=0.125 2023-10-09 16:51:26,939 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2815922.6666666665, ans=10.0 2023-10-09 16:51:41,576 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.349e+02 3.828e+02 4.485e+02 8.259e+02, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 16:51:43,744 INFO [train.py:1031] (1/4) Epoch 14, batch 18700, loss[loss=0.2342, simple_loss=0.31, pruned_loss=0.05673, ctc_loss=0.1122, over 16774.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.3035, pruned_loss=0.06955, ctc_loss=0.1223, over 3306396.68 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:52:14,930 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2816109.3333333335, ans=0.125 2023-10-09 16:52:19,526 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:52:30,555 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:52:46,841 INFO [train.py:1031] (1/4) Epoch 14, batch 18750, loss[loss=0.1923, simple_loss=0.2506, pruned_loss=0.05024, ctc_loss=0.08379, over 16716.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.3032, pruned_loss=0.0669, ctc_loss=0.1181, over 3308009.99 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:53:06,907 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-10-09 16:53:12,634 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2816342.6666666665, ans=0.07 2023-10-09 16:53:19,533 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-10-09 16:53:39,889 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2816436.0, ans=0.125 2023-10-09 16:53:48,759 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.936e+02 3.595e+02 4.298e+02 1.016e+03, threshold=7.191e+02, percent-clipped=2.0 2023-10-09 16:53:48,786 INFO [train.py:1031] (1/4) Epoch 14, batch 18800, loss[loss=0.1975, simple_loss=0.2526, pruned_loss=0.05326, ctc_loss=0.08948, over 16797.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2957, pruned_loss=0.06269, ctc_loss=0.1109, over 3301591.93 frames. ], batch size: 141, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:54:03,519 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2816529.3333333335, ans=0.125 2023-10-09 16:54:25,517 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2816622.6666666665, ans=0.0 2023-10-09 16:54:35,174 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-10-09 16:54:47,964 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-10-09 16:54:48,903 INFO [train.py:1031] (1/4) Epoch 14, batch 18850, loss[loss=0.2047, simple_loss=0.2649, pruned_loss=0.05283, ctc_loss=0.0972, over 16841.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2916, pruned_loss=0.06242, ctc_loss=0.1105, over 3303692.16 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:54:58,172 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2816716.0, ans=0.125 2023-10-09 16:55:17,020 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-10-09 16:55:20,099 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2816809.3333333335, ans=0.0 2023-10-09 16:55:22,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2816809.3333333335, ans=0.2 2023-10-09 16:55:49,881 INFO [train.py:1031] (1/4) Epoch 14, batch 18900, loss[loss=0.2453, simple_loss=0.2961, pruned_loss=0.07205, ctc_loss=0.1259, over 16818.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2908, pruned_loss=0.0645, ctc_loss=0.1137, over 3311668.09 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:55:53,222 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 3.158e+02 3.575e+02 4.091e+02 5.831e+02, threshold=7.150e+02, percent-clipped=0.0 2023-10-09 16:56:03,054 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-10-09 16:56:08,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2816996.0, ans=0.125 2023-10-09 16:56:14,365 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2817042.6666666665, ans=0.125 2023-10-09 16:56:16,185 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2817042.6666666665, ans=0.125 2023-10-09 16:56:41,909 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:56:53,517 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2817182.6666666665, ans=0.125 2023-10-09 16:56:54,182 INFO [train.py:1031] (1/4) Epoch 14, batch 18950, loss[loss=0.3154, simple_loss=0.3466, pruned_loss=0.1047, ctc_loss=0.1868, over 16699.00 frames. ], tot_loss[loss=0.237, simple_loss=0.2934, pruned_loss=0.06667, ctc_loss=0.1178, over 3314164.23 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:57:22,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2817276.0, ans=0.1 2023-10-09 16:57:39,004 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-10-09 16:57:40,883 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2817322.6666666665, ans=0.2 2023-10-09 16:57:49,685 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=22.5 2023-10-09 16:57:50,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2817369.3333333335, ans=0.05 2023-10-09 16:57:55,565 INFO [train.py:1031] (1/4) Epoch 14, batch 19000, loss[loss=0.2207, simple_loss=0.2804, pruned_loss=0.06062, ctc_loss=0.0997, over 16904.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2906, pruned_loss=0.06447, ctc_loss=0.1139, over 3309997.79 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:57:57,267 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.06 vs. limit=6.0 2023-10-09 16:57:58,302 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 3.278e+02 3.626e+02 4.352e+02 8.941e+02, threshold=7.252e+02, percent-clipped=2.0 2023-10-09 16:58:07,899 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2023-10-09 16:58:27,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2817509.3333333335, ans=0.125 2023-10-09 16:58:35,564 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2817556.0, ans=0.1 2023-10-09 16:58:44,162 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2817602.6666666665, ans=0.125 2023-10-09 16:58:46,860 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817602.6666666665, ans=0.1 2023-10-09 16:58:57,912 INFO [train.py:1031] (1/4) Epoch 14, batch 19050, loss[loss=0.2318, simple_loss=0.2887, pruned_loss=0.06556, ctc_loss=0.1095, over 12469.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2885, pruned_loss=0.06494, ctc_loss=0.1147, over 3303045.08 frames. ], batch size: 35, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:59:05,840 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2817649.3333333335, ans=0.2 2023-10-09 16:59:22,978 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2817742.6666666665, ans=0.2 2023-10-09 16:59:26,897 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2817742.6666666665, ans=0.125 2023-10-09 16:59:49,843 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-10-09 17:00:00,798 INFO [train.py:1031] (1/4) Epoch 14, batch 19100, loss[loss=0.2444, simple_loss=0.291, pruned_loss=0.07262, ctc_loss=0.1316, over 16960.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2903, pruned_loss=0.06689, ctc_loss=0.1177, over 3306263.08 frames. ], batch size: 259, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:00:04,649 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.758e+02 3.450e+02 4.008e+02 4.699e+02 1.096e+03, threshold=8.015e+02, percent-clipped=2.0 2023-10-09 17:00:17,292 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2023-10-09 17:00:45,332 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-10-09 17:00:52,227 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2818069.3333333335, ans=0.125 2023-10-09 17:01:02,243 INFO [train.py:1031] (1/4) Epoch 14, batch 19150, loss[loss=0.2064, simple_loss=0.2839, pruned_loss=0.04807, ctc_loss=0.08176, over 16812.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.289, pruned_loss=0.06432, ctc_loss=0.1135, over 3295417.85 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:01:02,876 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-10-09 17:01:03,568 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2818116.0, ans=0.125 2023-10-09 17:01:14,793 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2818162.6666666665, ans=0.2 2023-10-09 17:01:18,167 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2818162.6666666665, ans=0.0 2023-10-09 17:01:23,014 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2818162.6666666665, ans=0.125 2023-10-09 17:01:52,527 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:02:06,624 INFO [train.py:1031] (1/4) Epoch 14, batch 19200, loss[loss=0.1877, simple_loss=0.2531, pruned_loss=0.04534, ctc_loss=0.07926, over 16786.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2877, pruned_loss=0.06176, ctc_loss=0.1096, over 3293654.22 frames. ], batch size: 141, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:02:12,383 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.095e+02 3.707e+02 4.645e+02 1.379e+03, threshold=7.414e+02, percent-clipped=4.0 2023-10-09 17:02:12,735 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2818349.3333333335, ans=0.5 2023-10-09 17:02:36,700 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2818442.6666666665, ans=0.125 2023-10-09 17:02:37,791 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2818442.6666666665, ans=0.125 2023-10-09 17:02:53,427 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-10-09 17:02:54,671 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.96 vs. limit=22.5 2023-10-09 17:02:56,319 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=22.5 2023-10-09 17:03:09,813 INFO [train.py:1031] (1/4) Epoch 14, batch 19250, loss[loss=0.273, simple_loss=0.3506, pruned_loss=0.07039, ctc_loss=0.1365, over 15178.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2874, pruned_loss=0.0607, ctc_loss=0.1084, over 3294627.47 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:03:25,259 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:03:48,235 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=22.5 2023-10-09 17:04:05,149 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2818769.3333333335, ans=0.1 2023-10-09 17:04:15,645 INFO [train.py:1031] (1/4) Epoch 14, batch 19300, loss[loss=0.2578, simple_loss=0.3163, pruned_loss=0.07279, ctc_loss=0.1342, over 16817.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2874, pruned_loss=0.06116, ctc_loss=0.1093, over 3283902.73 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:04:15,969 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2818816.0, ans=0.0 2023-10-09 17:04:23,495 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2818816.0, ans=0.0 2023-10-09 17:04:24,136 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.286e+02 3.972e+02 4.950e+02 6.905e+02, threshold=7.944e+02, percent-clipped=0.0 2023-10-09 17:04:25,721 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2818816.0, ans=0.2 2023-10-09 17:04:25,748 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2818816.0, ans=0.0 2023-10-09 17:04:35,259 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-10-09 17:05:16,629 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2819002.6666666665, ans=0.125 2023-10-09 17:05:17,940 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-10-09 17:05:18,465 INFO [train.py:1031] (1/4) Epoch 14, batch 19350, loss[loss=0.1539, simple_loss=0.226, pruned_loss=0.0303, ctc_loss=0.05306, over 16711.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2863, pruned_loss=0.06098, ctc_loss=0.1085, over 3289080.44 frames. ], batch size: 140, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:05:24,237 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2819049.3333333335, ans=0.125 2023-10-09 17:05:26,976 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2819049.3333333335, ans=0.0 2023-10-09 17:05:27,236 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2023-10-09 17:05:31,590 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.39 vs. limit=10.0 2023-10-09 17:05:47,254 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2023-10-09 17:05:47,760 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2819142.6666666665, ans=0.125 2023-10-09 17:06:08,083 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2819236.0, ans=10.0 2023-10-09 17:06:18,200 INFO [train.py:1031] (1/4) Epoch 14, batch 19400, loss[loss=0.196, simple_loss=0.2626, pruned_loss=0.04802, ctc_loss=0.08334, over 16890.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2797, pruned_loss=0.05759, ctc_loss=0.1027, over 3300197.08 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:06:25,712 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.971e+02 3.609e+02 4.450e+02 6.456e+02, threshold=7.218e+02, percent-clipped=0.0 2023-10-09 17:06:49,110 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2819376.0, ans=0.125 2023-10-09 17:07:19,289 INFO [train.py:1031] (1/4) Epoch 14, batch 19450, loss[loss=0.2443, simple_loss=0.3235, pruned_loss=0.05993, ctc_loss=0.113, over 16935.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2787, pruned_loss=0.05908, ctc_loss=0.1052, over 3301475.97 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:07:19,599 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2819516.0, ans=0.125 2023-10-09 17:07:19,640 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2819516.0, ans=0.125 2023-10-09 17:08:07,425 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2819656.0, ans=0.0 2023-10-09 17:08:09,437 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2819702.6666666665, ans=0.125 2023-10-09 17:08:18,568 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2023-10-09 17:08:19,327 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2819702.6666666665, ans=0.0 2023-10-09 17:08:21,534 INFO [train.py:1031] (1/4) Epoch 14, batch 19500, loss[loss=0.2235, simple_loss=0.2735, pruned_loss=0.06529, ctc_loss=0.107, over 16679.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2817, pruned_loss=0.05915, ctc_loss=0.1053, over 3306907.14 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:08:24,042 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819749.3333333335, ans=0.1 2023-10-09 17:08:28,408 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2819749.3333333335, ans=0.125 2023-10-09 17:08:31,248 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 3.015e+02 3.593e+02 4.173e+02 8.054e+02, threshold=7.186e+02, percent-clipped=2.0 2023-10-09 17:08:40,642 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2023-10-09 17:08:44,479 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2819842.6666666665, ans=0.0 2023-10-09 17:09:21,238 INFO [train.py:1031] (1/4) Epoch 14, batch 19550, loss[loss=0.3181, simple_loss=0.3445, pruned_loss=0.1069, ctc_loss=0.1948, over 16739.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2864, pruned_loss=0.06233, ctc_loss=0.1106, over 3309286.63 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:09:38,523 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2820029.3333333335, ans=0.125 2023-10-09 17:10:04,830 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2820122.6666666665, ans=0.1 2023-10-09 17:10:10,605 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2820169.3333333335, ans=0.125 2023-10-09 17:10:17,512 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820169.3333333335, ans=0.1 2023-10-09 17:10:24,084 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2820216.0, ans=0.0 2023-10-09 17:10:24,798 INFO [train.py:1031] (1/4) Epoch 14, batch 19600, loss[loss=0.2273, simple_loss=0.2972, pruned_loss=0.058, ctc_loss=0.1033, over 16870.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2835, pruned_loss=0.06177, ctc_loss=0.1097, over 3297732.82 frames. ], batch size: 310, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:10:26,119 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2820216.0, ans=0.125 2023-10-09 17:10:35,225 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.075e+02 3.430e+02 4.007e+02 6.363e+02, threshold=6.860e+02, percent-clipped=0.0 2023-10-09 17:10:47,003 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820262.6666666665, ans=0.1 2023-10-09 17:11:03,535 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2820356.0, ans=0.05 2023-10-09 17:11:11,964 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2820356.0, ans=0.125 2023-10-09 17:11:13,965 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=22.5 2023-10-09 17:11:18,612 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2820402.6666666665, ans=0.0 2023-10-09 17:11:28,205 INFO [train.py:1031] (1/4) Epoch 14, batch 19650, loss[loss=0.2464, simple_loss=0.2962, pruned_loss=0.073, ctc_loss=0.1266, over 16870.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2854, pruned_loss=0.06307, ctc_loss=0.1117, over 3302891.93 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:12:12,734 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-10-09 17:12:26,869 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2023-10-09 17:12:30,820 INFO [train.py:1031] (1/4) Epoch 14, batch 19700, loss[loss=0.2482, simple_loss=0.2971, pruned_loss=0.07667, ctc_loss=0.1149, over 11000.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2854, pruned_loss=0.06543, ctc_loss=0.1155, over 3286620.49 frames. ], batch size: 35, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:12:42,625 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+02 3.372e+02 3.843e+02 4.494e+02 9.285e+02, threshold=7.687e+02, percent-clipped=3.0 2023-10-09 17:12:52,056 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2820729.3333333335, ans=0.125 2023-10-09 17:12:53,526 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2023-10-09 17:13:31,668 INFO [train.py:1031] (1/4) Epoch 14, batch 19750, loss[loss=0.196, simple_loss=0.2798, pruned_loss=0.03999, ctc_loss=0.08051, over 16748.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2869, pruned_loss=0.06437, ctc_loss=0.1141, over 3292455.52 frames. ], batch size: 201, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:13:49,063 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=22.5 2023-10-09 17:13:54,954 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2820962.6666666665, ans=0.125 2023-10-09 17:14:00,711 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2821009.3333333335, ans=0.07 2023-10-09 17:14:02,245 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=22.5 2023-10-09 17:14:04,855 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2821009.3333333335, ans=0.125 2023-10-09 17:14:14,929 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2821056.0, ans=0.125 2023-10-09 17:14:34,797 INFO [train.py:1031] (1/4) Epoch 14, batch 19800, loss[loss=0.2436, simple_loss=0.3079, pruned_loss=0.06651, ctc_loss=0.116, over 16745.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2907, pruned_loss=0.06513, ctc_loss=0.1153, over 3296218.35 frames. ], batch size: 201, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:14:43,256 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-10-09 17:14:45,054 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821149.3333333335, ans=0.125 2023-10-09 17:14:47,456 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+02 3.299e+02 3.756e+02 4.593e+02 7.524e+02, threshold=7.512e+02, percent-clipped=0.0 2023-10-09 17:15:02,295 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2023-10-09 17:15:14,619 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2821289.3333333335, ans=0.125 2023-10-09 17:15:17,464 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2821289.3333333335, ans=0.2 2023-10-09 17:15:25,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2821336.0, ans=0.125 2023-10-09 17:15:30,539 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2821336.0, ans=0.0 2023-10-09 17:15:38,927 INFO [train.py:1031] (1/4) Epoch 14, batch 19850, loss[loss=0.2413, simple_loss=0.2944, pruned_loss=0.0703, ctc_loss=0.1187, over 16834.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2939, pruned_loss=0.06725, ctc_loss=0.1185, over 3302314.60 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:15:57,725 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2821429.3333333335, ans=0.0 2023-10-09 17:16:24,704 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2821522.6666666665, ans=0.0 2023-10-09 17:16:25,811 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2821522.6666666665, ans=0.0 2023-10-09 17:16:32,248 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=21.37 vs. limit=15.0 2023-10-09 17:16:39,917 INFO [train.py:1031] (1/4) Epoch 14, batch 19900, loss[loss=0.2342, simple_loss=0.2885, pruned_loss=0.0676, ctc_loss=0.1116, over 16844.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2965, pruned_loss=0.06897, ctc_loss=0.1215, over 3295058.83 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:16:51,470 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2821662.6666666665, ans=0.2 2023-10-09 17:16:54,365 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+02 3.692e+02 4.204e+02 4.980e+02 8.655e+02, threshold=8.408e+02, percent-clipped=2.0 2023-10-09 17:17:09,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2821709.3333333335, ans=0.125 2023-10-09 17:17:22,607 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821756.0, ans=0.125 2023-10-09 17:17:39,575 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2821802.6666666665, ans=0.2 2023-10-09 17:17:40,567 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2821849.3333333335, ans=0.125 2023-10-09 17:17:41,865 INFO [train.py:1031] (1/4) Epoch 14, batch 19950, loss[loss=0.2269, simple_loss=0.2849, pruned_loss=0.06376, ctc_loss=0.1033, over 16812.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.2952, pruned_loss=0.06999, ctc_loss=0.1226, over 3299462.39 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:17:54,157 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2821896.0, ans=0.125 2023-10-09 17:17:55,347 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-10-09 17:17:55,354 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.52 vs. limit=10.0 2023-10-09 17:18:23,247 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-10-09 17:18:42,975 INFO [train.py:1031] (1/4) Epoch 14, batch 20000, loss[loss=0.3134, simple_loss=0.3362, pruned_loss=0.107, ctc_loss=0.1913, over 16676.00 frames. ], tot_loss[loss=0.2449, simple_loss=0.2971, pruned_loss=0.07144, ctc_loss=0.1249, over 3294488.47 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:18:43,532 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-10-09 17:18:58,234 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+02 3.390e+02 3.727e+02 4.517e+02 8.491e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 17:18:58,579 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2822129.3333333335, ans=0.125 2023-10-09 17:19:09,144 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=22.5 2023-10-09 17:19:13,730 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2822176.0, ans=0.125 2023-10-09 17:19:18,307 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2822176.0, ans=0.125 2023-10-09 17:19:36,375 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:19:44,644 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2822269.3333333335, ans=0.5 2023-10-09 17:19:46,370 INFO [train.py:1031] (1/4) Epoch 14, batch 20050, loss[loss=0.1785, simple_loss=0.2271, pruned_loss=0.0488, ctc_loss=0.08048, over 16778.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2889, pruned_loss=0.06913, ctc_loss=0.1201, over 3285226.90 frames. ], batch size: 140, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:20:07,438 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2822362.6666666665, ans=0.125 2023-10-09 17:20:28,654 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2822456.0, ans=0.0 2023-10-09 17:20:44,789 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2822502.6666666665, ans=0.0 2023-10-09 17:20:46,599 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2822502.6666666665, ans=0.1 2023-10-09 17:20:50,089 INFO [train.py:1031] (1/4) Epoch 14, batch 20100, loss[loss=0.2475, simple_loss=0.3036, pruned_loss=0.0696, ctc_loss=0.1304, over 15214.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2819, pruned_loss=0.06657, ctc_loss=0.1152, over 3278165.90 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:20:52,178 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2822549.3333333335, ans=0.1 2023-10-09 17:21:04,304 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=12.0 2023-10-09 17:21:07,903 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.352e+02 3.979e+02 4.568e+02 7.750e+02, threshold=7.958e+02, percent-clipped=1.0 2023-10-09 17:21:13,269 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:21:16,691 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=22.5 2023-10-09 17:21:26,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2822642.6666666665, ans=0.0 2023-10-09 17:21:54,794 INFO [train.py:1031] (1/4) Epoch 14, batch 20150, loss[loss=0.3, simple_loss=0.377, pruned_loss=0.08124, ctc_loss=0.1512, over 16785.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2882, pruned_loss=0.0657, ctc_loss=0.1151, over 3279643.82 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:22:04,424 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:22:10,514 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2822829.3333333335, ans=0.125 2023-10-09 17:22:55,974 INFO [train.py:1031] (1/4) Epoch 14, batch 20200, loss[loss=0.2183, simple_loss=0.2769, pruned_loss=0.06092, ctc_loss=0.0946, over 16769.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2933, pruned_loss=0.06591, ctc_loss=0.1162, over 3289669.97 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:22:56,267 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2823016.0, ans=0.125 2023-10-09 17:22:58,471 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2823016.0, ans=0.1 2023-10-09 17:23:05,026 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2823016.0, ans=0.0 2023-10-09 17:23:07,188 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2823062.6666666665, ans=0.2 2023-10-09 17:23:12,687 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+02 3.410e+02 4.005e+02 4.580e+02 8.040e+02, threshold=8.011e+02, percent-clipped=1.0 2023-10-09 17:23:17,817 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=22.5 2023-10-09 17:23:44,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2823202.6666666665, ans=0.125 2023-10-09 17:23:48,272 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2823202.6666666665, ans=0.2 2023-10-09 17:23:55,008 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2823249.3333333335, ans=0.0 2023-10-09 17:23:55,833 INFO [train.py:1031] (1/4) Epoch 14, batch 20250, loss[loss=0.1962, simple_loss=0.2585, pruned_loss=0.0494, ctc_loss=0.08784, over 16684.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2907, pruned_loss=0.06538, ctc_loss=0.1153, over 3301448.12 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:24:15,988 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:24:58,237 INFO [train.py:1031] (1/4) Epoch 14, batch 20300, loss[loss=0.2172, simple_loss=0.2652, pruned_loss=0.06197, ctc_loss=0.1129, over 16756.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2859, pruned_loss=0.06217, ctc_loss=0.1097, over 3297352.35 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:25:08,020 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-10-09 17:25:09,209 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2823482.6666666665, ans=0.1 2023-10-09 17:25:11,424 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2823529.3333333335, ans=0.125 2023-10-09 17:25:18,638 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+02 3.144e+02 3.729e+02 4.448e+02 8.440e+02, threshold=7.458e+02, percent-clipped=1.0 2023-10-09 17:25:37,460 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=12.0 2023-10-09 17:25:43,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2823622.6666666665, ans=0.125 2023-10-09 17:26:00,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2823716.0, ans=0.1 2023-10-09 17:26:00,747 INFO [train.py:1031] (1/4) Epoch 14, batch 20350, loss[loss=0.2064, simple_loss=0.2517, pruned_loss=0.06051, ctc_loss=0.1002, over 16806.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2789, pruned_loss=0.06164, ctc_loss=0.1087, over 3293456.76 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:26:13,697 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2823762.6666666665, ans=0.1 2023-10-09 17:26:21,208 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-10-09 17:26:26,747 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:26:28,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2823809.3333333335, ans=0.125 2023-10-09 17:26:28,333 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2823809.3333333335, ans=0.2 2023-10-09 17:26:32,780 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2823809.3333333335, ans=0.125 2023-10-09 17:26:40,115 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2023-10-09 17:26:59,908 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2823902.6666666665, ans=0.09899494936611666 2023-10-09 17:27:02,819 INFO [train.py:1031] (1/4) Epoch 14, batch 20400, loss[loss=0.2912, simple_loss=0.343, pruned_loss=0.09011, ctc_loss=0.1481, over 16353.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.278, pruned_loss=0.0622, ctc_loss=0.1083, over 3290571.65 frames. ], batch size: 414, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:27:23,297 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+02 3.333e+02 4.109e+02 4.919e+02 1.143e+03, threshold=8.217e+02, percent-clipped=3.0 2023-10-09 17:28:03,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2824136.0, ans=0.2 2023-10-09 17:28:05,983 INFO [train.py:1031] (1/4) Epoch 14, batch 20450, loss[loss=0.2513, simple_loss=0.3075, pruned_loss=0.07327, ctc_loss=0.1214, over 16293.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2772, pruned_loss=0.06153, ctc_loss=0.1059, over 3288871.86 frames. ], batch size: 414, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:28:07,450 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2824182.6666666665, ans=0.0 2023-10-09 17:28:11,735 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2824182.6666666665, ans=0.125 2023-10-09 17:28:29,894 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2023-10-09 17:28:46,252 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2824322.6666666665, ans=0.0 2023-10-09 17:28:56,156 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2824322.6666666665, ans=0.125 2023-10-09 17:28:59,322 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2824369.3333333335, ans=0.5 2023-10-09 17:28:59,649 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-10-09 17:29:09,118 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-10-09 17:29:11,359 INFO [train.py:1031] (1/4) Epoch 14, batch 20500, loss[loss=0.2534, simple_loss=0.3419, pruned_loss=0.06033, ctc_loss=0.1105, over 16888.00 frames. ], tot_loss[loss=0.219, simple_loss=0.278, pruned_loss=0.05946, ctc_loss=0.1028, over 3280562.68 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:29:26,007 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2824462.6666666665, ans=0.2 2023-10-09 17:29:32,927 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.206e+02 4.100e+02 5.464e+02 8.452e+02, threshold=8.200e+02, percent-clipped=1.0 2023-10-09 17:29:47,503 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=12.0 2023-10-09 17:29:55,505 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2824556.0, ans=0.125 2023-10-09 17:29:56,490 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2824556.0, ans=0.125 2023-10-09 17:29:56,517 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2824556.0, ans=0.09899494936611666 2023-10-09 17:30:15,065 INFO [train.py:1031] (1/4) Epoch 14, batch 20550, loss[loss=0.1612, simple_loss=0.195, pruned_loss=0.04786, ctc_loss=0.07932, over 9842.00 frames. ], tot_loss[loss=0.223, simple_loss=0.286, pruned_loss=0.05936, ctc_loss=0.1035, over 3280399.45 frames. ], batch size: 36, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:30:30,121 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824696.0, ans=0.1 2023-10-09 17:30:31,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824696.0, ans=0.1 2023-10-09 17:30:53,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2824789.3333333335, ans=0.125 2023-10-09 17:31:01,902 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2824789.3333333335, ans=0.125 2023-10-09 17:31:02,233 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2023-10-09 17:31:07,250 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2824836.0, ans=0.125 2023-10-09 17:31:17,512 INFO [train.py:1031] (1/4) Epoch 14, batch 20600, loss[loss=0.237, simple_loss=0.3028, pruned_loss=0.06266, ctc_loss=0.1149, over 16837.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2912, pruned_loss=0.06044, ctc_loss=0.1059, over 3278371.38 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:31:23,430 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2824882.6666666665, ans=0.125 2023-10-09 17:31:27,417 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2824882.6666666665, ans=0.04949747468305833 2023-10-09 17:31:32,021 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2824929.3333333335, ans=0.2 2023-10-09 17:31:40,864 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.711e+02 4.411e+02 5.380e+02 7.131e+02, threshold=8.823e+02, percent-clipped=0.0 2023-10-09 17:31:41,253 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2824929.3333333335, ans=0.125 2023-10-09 17:31:46,020 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2023-10-09 17:31:52,169 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2824976.0, ans=0.0 2023-10-09 17:31:52,574 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=12.0 2023-10-09 17:31:58,071 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2825022.6666666665, ans=0.125 2023-10-09 17:32:02,459 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2825022.6666666665, ans=0.0 2023-10-09 17:32:20,107 INFO [train.py:1031] (1/4) Epoch 14, batch 20650, loss[loss=0.241, simple_loss=0.2978, pruned_loss=0.06814, ctc_loss=0.1195, over 16931.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2952, pruned_loss=0.0639, ctc_loss=0.1118, over 3284612.07 frames. ], batch size: 243, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:32:27,133 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2825116.0, ans=0.05 2023-10-09 17:32:34,217 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.47 vs. limit=6.0 2023-10-09 17:32:35,984 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-10-09 17:32:40,059 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2825162.6666666665, ans=0.0 2023-10-09 17:32:45,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2825209.3333333335, ans=0.0 2023-10-09 17:32:45,765 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2825209.3333333335, ans=0.2 2023-10-09 17:32:54,961 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:32:55,982 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825209.3333333335, ans=0.1 2023-10-09 17:33:05,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2825256.0, ans=0.0 2023-10-09 17:33:15,713 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825302.6666666665, ans=0.1 2023-10-09 17:33:21,886 INFO [train.py:1031] (1/4) Epoch 14, batch 20700, loss[loss=0.2363, simple_loss=0.281, pruned_loss=0.07209, ctc_loss=0.1184, over 16756.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.294, pruned_loss=0.06543, ctc_loss=0.1146, over 3283616.20 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:33:45,256 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.306e+02 3.691e+02 4.281e+02 9.878e+02, threshold=7.382e+02, percent-clipped=2.0 2023-10-09 17:33:51,618 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2825442.6666666665, ans=0.0 2023-10-09 17:33:54,370 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2825442.6666666665, ans=0.0 2023-10-09 17:33:55,927 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2825442.6666666665, ans=0.0 2023-10-09 17:34:06,001 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2825489.3333333335, ans=0.0 2023-10-09 17:34:18,785 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2825536.0, ans=0.125 2023-10-09 17:34:18,986 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2825536.0, ans=15.0 2023-10-09 17:34:22,092 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2825582.6666666665, ans=0.125 2023-10-09 17:34:22,949 INFO [train.py:1031] (1/4) Epoch 14, batch 20750, loss[loss=0.2277, simple_loss=0.2935, pruned_loss=0.06043, ctc_loss=0.1027, over 16776.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2928, pruned_loss=0.06648, ctc_loss=0.1166, over 3295524.30 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:34:38,330 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-10-09 17:34:44,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2825629.3333333335, ans=0.125 2023-10-09 17:34:53,476 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2825676.0, ans=0.0 2023-10-09 17:34:58,825 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=22.5 2023-10-09 17:35:00,649 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2825722.6666666665, ans=0.125 2023-10-09 17:35:06,309 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2825722.6666666665, ans=0.2 2023-10-09 17:35:23,262 INFO [train.py:1031] (1/4) Epoch 14, batch 20800, loss[loss=0.1931, simple_loss=0.2463, pruned_loss=0.05319, ctc_loss=0.08386, over 16774.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2929, pruned_loss=0.06584, ctc_loss=0.1165, over 3295749.41 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:35:29,888 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2825816.0, ans=0.07 2023-10-09 17:35:46,253 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+02 3.235e+02 3.640e+02 4.210e+02 8.474e+02, threshold=7.280e+02, percent-clipped=1.0 2023-10-09 17:35:49,593 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2023-10-09 17:35:51,476 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2825909.3333333335, ans=10.0 2023-10-09 17:35:57,963 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2825956.0, ans=0.2 2023-10-09 17:36:05,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=23.39 vs. limit=22.5 2023-10-09 17:36:07,973 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2825956.0, ans=10.0 2023-10-09 17:36:09,498 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2023-10-09 17:36:22,181 INFO [train.py:1031] (1/4) Epoch 14, batch 20850, loss[loss=0.2063, simple_loss=0.2757, pruned_loss=0.04963, ctc_loss=0.09383, over 16748.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2872, pruned_loss=0.06201, ctc_loss=0.111, over 3291607.78 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:36:30,464 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2826049.3333333335, ans=0.2 2023-10-09 17:37:07,581 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2826189.3333333335, ans=0.0 2023-10-09 17:37:14,139 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2826236.0, ans=0.125 2023-10-09 17:37:20,427 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2826236.0, ans=0.025 2023-10-09 17:37:22,228 INFO [train.py:1031] (1/4) Epoch 14, batch 20900, loss[loss=0.1846, simple_loss=0.2388, pruned_loss=0.04826, ctc_loss=0.08488, over 16656.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2838, pruned_loss=0.0598, ctc_loss=0.1073, over 3292392.72 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:37:22,620 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2826282.6666666665, ans=0.125 2023-10-09 17:37:24,597 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.25 vs. limit=15.0 2023-10-09 17:37:25,241 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2826282.6666666665, ans=0.125 2023-10-09 17:37:45,255 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.12 vs. limit=10.0 2023-10-09 17:37:48,711 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.791e+02 3.163e+02 3.693e+02 7.251e+02, threshold=6.327e+02, percent-clipped=0.0 2023-10-09 17:37:54,485 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826376.0, ans=0.1 2023-10-09 17:38:05,862 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2826422.6666666665, ans=0.0 2023-10-09 17:38:22,287 INFO [train.py:1031] (1/4) Epoch 14, batch 20950, loss[loss=0.2131, simple_loss=0.258, pruned_loss=0.06253, ctc_loss=0.1079, over 17139.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2773, pruned_loss=0.05948, ctc_loss=0.1061, over 3295334.50 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:38:25,739 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2826516.0, ans=0.125 2023-10-09 17:38:46,096 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2826609.3333333335, ans=0.5 2023-10-09 17:38:59,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2826656.0, ans=0.0 2023-10-09 17:39:12,255 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2826702.6666666665, ans=0.0 2023-10-09 17:39:15,936 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2826702.6666666665, ans=15.0 2023-10-09 17:39:23,274 INFO [train.py:1031] (1/4) Epoch 14, batch 21000, loss[loss=0.2301, simple_loss=0.2859, pruned_loss=0.06458, ctc_loss=0.1125, over 16797.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2791, pruned_loss=0.0617, ctc_loss=0.1091, over 3305719.12 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:39:23,274 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 17:39:33,408 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0686, 2.2415, 2.0533, 4.0456], device='cuda:1') 2023-10-09 17:39:41,352 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2348, simple_loss=0.3049, pruned_loss=0.06333, ctc_loss=0.09533, over 1796401.00 frames. 2023-10-09 17:39:41,352 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 17:39:54,001 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2826796.0, ans=0.0 2023-10-09 17:39:54,933 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2826796.0, ans=0.2 2023-10-09 17:40:00,174 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-10-09 17:40:07,007 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+02 3.238e+02 3.624e+02 4.210e+02 7.239e+02, threshold=7.249e+02, percent-clipped=3.0 2023-10-09 17:40:15,899 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2826889.3333333335, ans=0.0 2023-10-09 17:40:20,559 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-10-09 17:40:32,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:36,262 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:37,329 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:38,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2826982.6666666665, ans=0.1 2023-10-09 17:40:39,035 INFO [train.py:1031] (1/4) Epoch 14, batch 21050, loss[loss=0.2552, simple_loss=0.3291, pruned_loss=0.06705, ctc_loss=0.1182, over 15020.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2839, pruned_loss=0.06153, ctc_loss=0.1088, over 3282899.67 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:41:19,561 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2827122.6666666665, ans=0.1 2023-10-09 17:41:30,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2827169.3333333335, ans=0.0 2023-10-09 17:41:36,264 INFO [train.py:1031] (1/4) Epoch 14, batch 21100, loss[loss=0.2102, simple_loss=0.2685, pruned_loss=0.0565, ctc_loss=0.09713, over 16706.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2837, pruned_loss=0.06047, ctc_loss=0.1054, over 3282583.49 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:41:57,578 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2827262.6666666665, ans=0.125 2023-10-09 17:42:05,336 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.720e+02 3.068e+02 3.590e+02 8.081e+02, threshold=6.137e+02, percent-clipped=1.0 2023-10-09 17:42:10,480 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2827309.3333333335, ans=0.0 2023-10-09 17:42:29,500 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2827402.6666666665, ans=0.0 2023-10-09 17:42:37,593 INFO [train.py:1031] (1/4) Epoch 14, batch 21150, loss[loss=0.2077, simple_loss=0.2647, pruned_loss=0.05657, ctc_loss=0.09368, over 16966.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2807, pruned_loss=0.06083, ctc_loss=0.1052, over 3275381.64 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:42:44,673 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:42:49,826 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2827496.0, ans=0.125 2023-10-09 17:42:58,089 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2827496.0, ans=0.04949747468305833 2023-10-09 17:43:08,419 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2827542.6666666665, ans=0.025 2023-10-09 17:43:11,157 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2827542.6666666665, ans=0.125 2023-10-09 17:43:16,036 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2827589.3333333335, ans=0.125 2023-10-09 17:43:22,461 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2827589.3333333335, ans=0.0 2023-10-09 17:43:36,527 INFO [train.py:1031] (1/4) Epoch 14, batch 21200, loss[loss=0.2009, simple_loss=0.2559, pruned_loss=0.05498, ctc_loss=0.08999, over 16920.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2763, pruned_loss=0.06145, ctc_loss=0.1063, over 3256710.31 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:43:52,184 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2827729.3333333335, ans=0.125 2023-10-09 17:43:56,548 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2827729.3333333335, ans=0.0 2023-10-09 17:44:04,293 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2827776.0, ans=0.125 2023-10-09 17:44:07,100 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.239e+02 3.845e+02 5.038e+02 8.843e+02, threshold=7.690e+02, percent-clipped=9.0 2023-10-09 17:44:17,753 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2827822.6666666665, ans=0.125 2023-10-09 17:44:25,016 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2827869.3333333335, ans=0.02 2023-10-09 17:44:34,703 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2827869.3333333335, ans=0.0 2023-10-09 17:44:39,125 INFO [train.py:1031] (1/4) Epoch 14, batch 21250, loss[loss=0.2104, simple_loss=0.2872, pruned_loss=0.04972, ctc_loss=0.08548, over 16758.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2759, pruned_loss=0.0592, ctc_loss=0.1031, over 3265925.98 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:44:46,331 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2827916.0, ans=0.125 2023-10-09 17:44:52,207 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2827962.6666666665, ans=0.035 2023-10-09 17:44:54,425 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827962.6666666665, ans=0.1 2023-10-09 17:44:54,502 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2827962.6666666665, ans=0.95 2023-10-09 17:44:59,174 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2827962.6666666665, ans=0.125 2023-10-09 17:45:07,950 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2828009.3333333335, ans=0.125 2023-10-09 17:45:18,305 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=12.0 2023-10-09 17:45:25,607 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2828056.0, ans=0.125 2023-10-09 17:45:34,767 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=22.5 2023-10-09 17:45:42,995 INFO [train.py:1031] (1/4) Epoch 14, batch 21300, loss[loss=0.2039, simple_loss=0.256, pruned_loss=0.05708, ctc_loss=0.09424, over 16748.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2898, pruned_loss=0.06311, ctc_loss=0.1101, over 3272625.44 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:46:12,729 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-10-09 17:46:14,301 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+02 3.461e+02 4.159e+02 5.409e+02 1.290e+03, threshold=8.318e+02, percent-clipped=7.0 2023-10-09 17:46:43,736 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2023-10-09 17:46:45,012 INFO [train.py:1031] (1/4) Epoch 14, batch 21350, loss[loss=0.208, simple_loss=0.288, pruned_loss=0.04605, ctc_loss=0.08996, over 16440.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.289, pruned_loss=0.0612, ctc_loss=0.1074, over 3281780.88 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:46:46,965 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2828382.6666666665, ans=0.125 2023-10-09 17:46:48,200 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2828382.6666666665, ans=0.125 2023-10-09 17:47:14,323 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2828476.0, ans=0.0 2023-10-09 17:47:15,986 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2828476.0, ans=0.0 2023-10-09 17:47:33,539 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2828569.3333333335, ans=15.0 2023-10-09 17:47:38,111 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2828569.3333333335, ans=0.1 2023-10-09 17:47:42,537 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2828569.3333333335, ans=0.2 2023-10-09 17:47:47,070 INFO [train.py:1031] (1/4) Epoch 14, batch 21400, loss[loss=0.2086, simple_loss=0.2668, pruned_loss=0.05626, ctc_loss=0.09475, over 10677.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2864, pruned_loss=0.06265, ctc_loss=0.11, over 3280794.61 frames. ], batch size: 35, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:48:00,794 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2828662.6666666665, ans=0.2 2023-10-09 17:48:06,232 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2828662.6666666665, ans=0.125 2023-10-09 17:48:10,603 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2828709.3333333335, ans=0.0 2023-10-09 17:48:16,171 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2023-10-09 17:48:18,724 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2828709.3333333335, ans=0.125 2023-10-09 17:48:19,467 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 3.090e+02 3.533e+02 3.983e+02 1.095e+03, threshold=7.067e+02, percent-clipped=1.0 2023-10-09 17:48:29,923 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2828756.0, ans=0.125 2023-10-09 17:48:33,117 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2828756.0, ans=10.0 2023-10-09 17:48:35,011 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-10-09 17:48:46,356 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2828802.6666666665, ans=0.125 2023-10-09 17:48:48,679 INFO [train.py:1031] (1/4) Epoch 14, batch 21450, loss[loss=0.2118, simple_loss=0.2587, pruned_loss=0.06252, ctc_loss=0.0997, over 16599.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2817, pruned_loss=0.06377, ctc_loss=0.1116, over 3278295.65 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:48:58,522 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2828849.3333333335, ans=0.0 2023-10-09 17:49:02,284 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2828896.0, ans=0.2 2023-10-09 17:49:06,858 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2828896.0, ans=6.0 2023-10-09 17:49:11,449 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2828942.6666666665, ans=0.0 2023-10-09 17:49:17,414 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-10-09 17:49:40,876 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2829036.0, ans=0.125 2023-10-09 17:49:42,038 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2829036.0, ans=0.125 2023-10-09 17:49:49,290 INFO [train.py:1031] (1/4) Epoch 14, batch 21500, loss[loss=0.2042, simple_loss=0.2537, pruned_loss=0.057, ctc_loss=0.1017, over 16951.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2773, pruned_loss=0.06353, ctc_loss=0.1109, over 3292984.22 frames. ], batch size: 216, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:49:58,132 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:50:05,958 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2023-10-09 17:50:06,585 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2829129.3333333335, ans=0.125 2023-10-09 17:50:22,938 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 3.091e+02 3.543e+02 4.001e+02 7.738e+02, threshold=7.086e+02, percent-clipped=2.0 2023-10-09 17:50:27,422 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2829222.6666666665, ans=0.125 2023-10-09 17:50:49,221 INFO [train.py:1031] (1/4) Epoch 14, batch 21550, loss[loss=0.213, simple_loss=0.2808, pruned_loss=0.05251, ctc_loss=0.1002, over 16746.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2727, pruned_loss=0.06271, ctc_loss=0.1097, over 3297387.08 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:50:58,838 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2829316.0, ans=0.0 2023-10-09 17:50:58,924 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2829316.0, ans=0.1 2023-10-09 17:51:15,447 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.15 vs. limit=15.0 2023-10-09 17:51:25,295 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2829409.3333333335, ans=0.125 2023-10-09 17:51:40,065 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:51:42,931 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2829502.6666666665, ans=0.1 2023-10-09 17:51:52,385 INFO [train.py:1031] (1/4) Epoch 14, batch 21600, loss[loss=0.2646, simple_loss=0.3184, pruned_loss=0.07796, ctc_loss=0.1373, over 16778.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2768, pruned_loss=0.06302, ctc_loss=0.1106, over 3296122.98 frames. ], batch size: 308, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:52:00,195 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2829549.3333333335, ans=0.125 2023-10-09 17:52:19,764 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2829642.6666666665, ans=0.0 2023-10-09 17:52:29,888 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 3.318e+02 3.916e+02 4.621e+02 6.071e+02, threshold=7.833e+02, percent-clipped=0.0 2023-10-09 17:52:55,797 INFO [train.py:1031] (1/4) Epoch 14, batch 21650, loss[loss=0.2618, simple_loss=0.3169, pruned_loss=0.07558, ctc_loss=0.1389, over 16840.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2843, pruned_loss=0.06646, ctc_loss=0.1165, over 3296264.15 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:52:59,281 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-10-09 17:53:15,365 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.84 vs. limit=5.0 2023-10-09 17:53:18,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2829829.3333333335, ans=0.0 2023-10-09 17:53:32,262 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2829876.0, ans=0.125 2023-10-09 17:53:35,462 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2023-10-09 17:53:59,381 INFO [train.py:1031] (1/4) Epoch 14, batch 21700, loss[loss=0.2917, simple_loss=0.3332, pruned_loss=0.09107, ctc_loss=0.1701, over 16635.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.2905, pruned_loss=0.06904, ctc_loss=0.1208, over 3299327.60 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:54:00,432 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2830016.0, ans=0.0 2023-10-09 17:54:07,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2830016.0, ans=0.125 2023-10-09 17:54:21,762 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2830062.6666666665, ans=0.0 2023-10-09 17:54:25,041 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2830109.3333333335, ans=0.125 2023-10-09 17:54:28,651 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2830109.3333333335, ans=0.2 2023-10-09 17:54:34,705 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.454e+02 3.938e+02 4.640e+02 9.291e+02, threshold=7.877e+02, percent-clipped=1.0 2023-10-09 17:54:58,955 INFO [train.py:1031] (1/4) Epoch 14, batch 21750, loss[loss=0.2251, simple_loss=0.2957, pruned_loss=0.05556, ctc_loss=0.1085, over 16387.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2941, pruned_loss=0.06844, ctc_loss=0.1196, over 3283886.53 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:55:19,262 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.39 vs. limit=10.0 2023-10-09 17:55:23,618 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2830342.6666666665, ans=0.1 2023-10-09 17:55:35,991 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2830389.3333333335, ans=0.125 2023-10-09 17:55:39,679 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2830389.3333333335, ans=0.125 2023-10-09 17:55:47,905 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2830436.0, ans=0.0 2023-10-09 17:56:00,634 INFO [train.py:1031] (1/4) Epoch 14, batch 21800, loss[loss=0.1468, simple_loss=0.2313, pruned_loss=0.0229, ctc_loss=0.04145, over 16839.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2888, pruned_loss=0.06404, ctc_loss=0.1119, over 3285822.94 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:56:37,329 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.581e+02 3.060e+02 4.394e+02 8.007e+02, threshold=6.120e+02, percent-clipped=1.0 2023-10-09 17:56:50,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2830669.3333333335, ans=0.0 2023-10-09 17:56:55,573 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-10-09 17:57:03,733 INFO [train.py:1031] (1/4) Epoch 14, batch 21850, loss[loss=0.2391, simple_loss=0.344, pruned_loss=0.04886, ctc_loss=0.09125, over 15136.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.285, pruned_loss=0.05973, ctc_loss=0.105, over 3287774.55 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:57:04,585 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2023-10-09 17:57:05,532 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-10-09 17:57:09,420 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2830716.0, ans=0.0 2023-10-09 17:57:13,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2830716.0, ans=0.0 2023-10-09 17:57:18,974 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2830762.6666666665, ans=0.1 2023-10-09 17:57:35,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2830809.3333333335, ans=0.125 2023-10-09 17:58:02,542 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2830902.6666666665, ans=0.07 2023-10-09 17:58:06,349 INFO [train.py:1031] (1/4) Epoch 14, batch 21900, loss[loss=0.2544, simple_loss=0.3019, pruned_loss=0.07696, ctc_loss=0.1325, over 16796.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2895, pruned_loss=0.06105, ctc_loss=0.1072, over 3296912.61 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:58:17,309 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2830949.3333333335, ans=0.0 2023-10-09 17:58:25,370 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2830996.0, ans=0.1 2023-10-09 17:58:31,789 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-10-09 17:58:37,097 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2831042.6666666665, ans=0.125 2023-10-09 17:58:40,942 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2831042.6666666665, ans=0.125 2023-10-09 17:58:47,463 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 3.180e+02 3.668e+02 4.478e+02 7.065e+02, threshold=7.335e+02, percent-clipped=3.0 2023-10-09 17:58:54,440 WARNING [train.py:1204] (1/4) Exclude cut with ID X0000003684_17524832_S00712_sp1.1 from training. Number of frames (before subsampling): 130. Number of frames (after subsampling): 31. Text: 哒哒哒哒哒哒哒哒哒哒哒哒. Tokens: ['▁', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>']. Number of tokens: 37 2023-10-09 17:59:00,178 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=22.5 2023-10-09 17:59:01,944 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2831136.0, ans=0.125 2023-10-09 17:59:07,442 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2831136.0, ans=0.0 2023-10-09 17:59:10,948 INFO [train.py:1031] (1/4) Epoch 14, batch 21950, loss[loss=0.3119, simple_loss=0.3907, pruned_loss=0.08561, ctc_loss=0.155, over 15147.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.298, pruned_loss=0.06628, ctc_loss=0.1155, over 3294376.89 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:59:21,814 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831182.6666666665, ans=0.1 2023-10-09 17:59:25,723 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2831229.3333333335, ans=0.2 2023-10-09 17:59:32,685 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2831229.3333333335, ans=0.0 2023-10-09 17:59:37,133 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2831276.0, ans=0.0 2023-10-09 17:59:41,672 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2831276.0, ans=0.09899494936611666 2023-10-09 17:59:51,431 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2831322.6666666665, ans=0.0 2023-10-09 18:00:09,048 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2831369.3333333335, ans=0.1 2023-10-09 18:00:14,568 INFO [train.py:1031] (1/4) Epoch 14, batch 22000, loss[loss=0.2266, simple_loss=0.2809, pruned_loss=0.06321, ctc_loss=0.1144, over 16808.00 frames. ], tot_loss[loss=0.2502, simple_loss=0.3089, pruned_loss=0.071, ctc_loss=0.1239, over 3295301.58 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:00:42,384 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2831509.3333333335, ans=0.0 2023-10-09 18:00:43,387 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2831509.3333333335, ans=0.125 2023-10-09 18:00:45,784 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2023-10-09 18:00:47,581 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-10-09 18:00:55,419 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.995e+02 5.154e+02 7.072e+02 9.807e+02, threshold=1.031e+03, percent-clipped=19.0 2023-10-09 18:01:05,411 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2831602.6666666665, ans=0.0 2023-10-09 18:01:14,133 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2831602.6666666665, ans=0.1 2023-10-09 18:01:14,190 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2831602.6666666665, ans=0.1 2023-10-09 18:01:17,525 INFO [train.py:1031] (1/4) Epoch 14, batch 22050, loss[loss=0.1956, simple_loss=0.2466, pruned_loss=0.05263, ctc_loss=0.09828, over 16815.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.2996, pruned_loss=0.06973, ctc_loss=0.1218, over 3301009.94 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:01:22,828 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=22.5 2023-10-09 18:01:52,141 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2831742.6666666665, ans=0.0 2023-10-09 18:02:00,707 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2831789.3333333335, ans=0.0 2023-10-09 18:02:10,780 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2831836.0, ans=0.125 2023-10-09 18:02:10,859 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2831836.0, ans=0.125 2023-10-09 18:02:13,688 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2831836.0, ans=0.5 2023-10-09 18:02:22,388 INFO [train.py:1031] (1/4) Epoch 14, batch 22100, loss[loss=0.1936, simple_loss=0.2348, pruned_loss=0.05807, ctc_loss=0.09035, over 16655.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2947, pruned_loss=0.0688, ctc_loss=0.1192, over 3301823.86 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:02:23,742 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2831882.6666666665, ans=0.2 2023-10-09 18:02:37,660 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2831929.3333333335, ans=0.125 2023-10-09 18:02:44,122 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.16 vs. limit=10.0 2023-10-09 18:02:49,336 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2831976.0, ans=0.125 2023-10-09 18:03:04,527 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+02 3.384e+02 3.750e+02 4.334e+02 8.202e+02, threshold=7.499e+02, percent-clipped=0.0 2023-10-09 18:03:10,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2832069.3333333335, ans=0.0 2023-10-09 18:03:21,224 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2832069.3333333335, ans=0.125 2023-10-09 18:03:22,964 INFO [train.py:1031] (1/4) Epoch 14, batch 22150, loss[loss=0.2508, simple_loss=0.3047, pruned_loss=0.07239, ctc_loss=0.1303, over 15205.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.2971, pruned_loss=0.07038, ctc_loss=0.1212, over 3299757.27 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:03:31,212 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2832116.0, ans=0.2 2023-10-09 18:03:31,332 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832116.0, ans=0.1 2023-10-09 18:03:34,511 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2832116.0, ans=0.2 2023-10-09 18:03:41,227 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2832162.6666666665, ans=0.1 2023-10-09 18:03:41,383 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-10-09 18:03:46,344 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2832162.6666666665, ans=0.125 2023-10-09 18:04:25,076 INFO [train.py:1031] (1/4) Epoch 14, batch 22200, loss[loss=0.3003, simple_loss=0.3416, pruned_loss=0.09398, ctc_loss=0.1774, over 16691.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.298, pruned_loss=0.07082, ctc_loss=0.1226, over 3299033.32 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:04:52,814 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-10-09 18:04:58,789 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2832489.3333333335, ans=0.0 2023-10-09 18:05:03,935 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=22.5 2023-10-09 18:05:06,089 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.127e+02 3.515e+02 4.166e+02 8.841e+02, threshold=7.030e+02, percent-clipped=1.0 2023-10-09 18:05:07,462 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2832489.3333333335, ans=0.125 2023-10-09 18:05:14,272 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-10-09 18:05:16,200 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2832536.0, ans=0.0 2023-10-09 18:05:22,205 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2832536.0, ans=0.125 2023-10-09 18:05:24,121 INFO [train.py:1031] (1/4) Epoch 14, batch 22250, loss[loss=0.2297, simple_loss=0.297, pruned_loss=0.06015, ctc_loss=0.1056, over 16794.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.2987, pruned_loss=0.06927, ctc_loss=0.1209, over 3300945.27 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:05:27,573 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2832582.6666666665, ans=0.0 2023-10-09 18:06:12,856 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2832769.3333333335, ans=0.04949747468305833 2023-10-09 18:06:25,776 INFO [train.py:1031] (1/4) Epoch 14, batch 22300, loss[loss=0.2658, simple_loss=0.2964, pruned_loss=0.08702, ctc_loss=0.1526, over 16536.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2995, pruned_loss=0.07093, ctc_loss=0.1239, over 3308087.77 frames. ], batch size: 416, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:06:35,640 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2832816.0, ans=0.0 2023-10-09 18:06:36,712 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2832862.6666666665, ans=0.0 2023-10-09 18:06:51,710 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2023-10-09 18:06:55,586 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2832909.3333333335, ans=0.125 2023-10-09 18:07:03,252 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2832956.0, ans=0.0 2023-10-09 18:07:07,708 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.458e+02 3.881e+02 4.380e+02 7.162e+02, threshold=7.762e+02, percent-clipped=2.0 2023-10-09 18:07:25,992 INFO [train.py:1031] (1/4) Epoch 14, batch 22350, loss[loss=0.3257, simple_loss=0.3594, pruned_loss=0.1075, ctc_loss=0.1926, over 16596.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.301, pruned_loss=0.07261, ctc_loss=0.1269, over 3314245.70 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:07:35,325 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2833049.3333333335, ans=0.0 2023-10-09 18:07:38,104 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2833096.0, ans=0.125 2023-10-09 18:08:22,085 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2833236.0, ans=0.0 2023-10-09 18:08:27,619 INFO [train.py:1031] (1/4) Epoch 14, batch 22400, loss[loss=0.2476, simple_loss=0.3476, pruned_loss=0.05391, ctc_loss=0.09943, over 15165.00 frames. ], tot_loss[loss=0.2482, simple_loss=0.302, pruned_loss=0.07201, ctc_loss=0.126, over 3287473.06 frames. ], batch size: 525, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:08:34,444 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2833282.6666666665, ans=0.125 2023-10-09 18:08:34,810 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2023-10-09 18:08:48,843 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2833329.3333333335, ans=0.2 2023-10-09 18:08:52,035 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2833376.0, ans=0.125 2023-10-09 18:09:11,652 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.558e+02 3.421e+02 3.975e+02 5.211e+02 8.186e+02, threshold=7.949e+02, percent-clipped=2.0 2023-10-09 18:09:28,591 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2833516.0, ans=0.015 2023-10-09 18:09:29,995 INFO [train.py:1031] (1/4) Epoch 14, batch 22450, loss[loss=0.244, simple_loss=0.2976, pruned_loss=0.07213, ctc_loss=0.1155, over 16800.00 frames. ], tot_loss[loss=0.2482, simple_loss=0.3028, pruned_loss=0.07176, ctc_loss=0.1255, over 3280359.21 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:09:40,067 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:09:45,766 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2023-10-09 18:10:04,291 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.41 vs. limit=10.0 2023-10-09 18:10:26,643 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.57 vs. limit=10.0 2023-10-09 18:10:31,932 INFO [train.py:1031] (1/4) Epoch 14, batch 22500, loss[loss=0.2031, simple_loss=0.2538, pruned_loss=0.05581, ctc_loss=0.102, over 16790.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.2996, pruned_loss=0.07144, ctc_loss=0.1249, over 3289999.92 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:10:33,121 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2833749.3333333335, ans=0.125 2023-10-09 18:10:36,985 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2833749.3333333335, ans=0.0 2023-10-09 18:10:51,541 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2833796.0, ans=0.125 2023-10-09 18:10:54,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2833842.6666666665, ans=0.125 2023-10-09 18:11:17,901 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+02 3.228e+02 3.590e+02 3.967e+02 7.433e+02, threshold=7.180e+02, percent-clipped=0.0 2023-10-09 18:11:32,583 INFO [train.py:1031] (1/4) Epoch 14, batch 22550, loss[loss=0.1911, simple_loss=0.2623, pruned_loss=0.04421, ctc_loss=0.07865, over 15193.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2906, pruned_loss=0.06909, ctc_loss=0.1208, over 3299008.19 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:11:42,706 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2833982.6666666665, ans=0.125 2023-10-09 18:11:44,979 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2834029.3333333335, ans=15.0 2023-10-09 18:11:54,275 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=22.5 2023-10-09 18:12:10,637 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2834122.6666666665, ans=0.125 2023-10-09 18:12:11,619 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=2834122.6666666665, ans=0.2 2023-10-09 18:12:33,380 INFO [train.py:1031] (1/4) Epoch 14, batch 22600, loss[loss=0.212, simple_loss=0.2805, pruned_loss=0.05269, ctc_loss=0.09503, over 16382.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2843, pruned_loss=0.06445, ctc_loss=0.1132, over 3293143.05 frames. ], batch size: 416, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:12:38,694 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2834216.0, ans=0.2 2023-10-09 18:12:39,101 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=22.5 2023-10-09 18:13:01,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834309.3333333335, ans=0.1 2023-10-09 18:13:17,082 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=22.5 2023-10-09 18:13:20,265 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 2.912e+02 3.400e+02 4.128e+02 6.956e+02, threshold=6.801e+02, percent-clipped=0.0 2023-10-09 18:13:27,808 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2834402.6666666665, ans=0.125 2023-10-09 18:13:34,039 INFO [train.py:1031] (1/4) Epoch 14, batch 22650, loss[loss=0.2265, simple_loss=0.273, pruned_loss=0.06579, ctc_loss=0.1207, over 16774.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2796, pruned_loss=0.06378, ctc_loss=0.1121, over 3290803.08 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:13:35,499 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2834449.3333333335, ans=0.125 2023-10-09 18:13:36,584 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2834449.3333333335, ans=0.125 2023-10-09 18:13:38,897 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2023-10-09 18:13:40,611 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2834449.3333333335, ans=0.0 2023-10-09 18:13:49,297 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-10-09 18:13:53,210 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2834496.0, ans=0.09899494936611666 2023-10-09 18:14:06,497 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2023-10-09 18:14:23,153 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=22.5 2023-10-09 18:14:26,429 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2023-10-09 18:14:29,810 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2834636.0, ans=0.015 2023-10-09 18:14:35,174 INFO [train.py:1031] (1/4) Epoch 14, batch 22700, loss[loss=0.246, simple_loss=0.2969, pruned_loss=0.07336, ctc_loss=0.121, over 16833.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2782, pruned_loss=0.06473, ctc_loss=0.1135, over 3290626.81 frames. ], batch size: 141, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:14:36,818 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-10-09 18:14:40,717 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2834682.6666666665, ans=0.125 2023-10-09 18:14:47,893 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2834729.3333333335, ans=0.125 2023-10-09 18:15:00,055 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2834776.0, ans=0.0 2023-10-09 18:15:05,562 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2834776.0, ans=0.125 2023-10-09 18:15:06,676 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2834776.0, ans=0.0 2023-10-09 18:15:15,411 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-10-09 18:15:24,588 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.390e+02 4.032e+02 4.588e+02 8.428e+02, threshold=8.064e+02, percent-clipped=2.0 2023-10-09 18:15:35,812 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2834869.3333333335, ans=0.09899494936611666 2023-10-09 18:15:36,856 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2834916.0, ans=0.0 2023-10-09 18:15:37,583 INFO [train.py:1031] (1/4) Epoch 14, batch 22750, loss[loss=0.2637, simple_loss=0.3067, pruned_loss=0.08204, ctc_loss=0.1416, over 16704.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2843, pruned_loss=0.06796, ctc_loss=0.1189, over 3290484.02 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:15:40,145 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.38 vs. limit=10.0 2023-10-09 18:15:56,187 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834962.6666666665, ans=0.1 2023-10-09 18:15:59,346 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2023-10-09 18:16:14,739 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2835056.0, ans=0.0 2023-10-09 18:16:39,367 INFO [train.py:1031] (1/4) Epoch 14, batch 22800, loss[loss=0.2661, simple_loss=0.3101, pruned_loss=0.08301, ctc_loss=0.1404, over 16852.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.2903, pruned_loss=0.07128, ctc_loss=0.1247, over 3300706.98 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:17:05,342 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2835242.6666666665, ans=0.125 2023-10-09 18:17:11,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2835242.6666666665, ans=0.125 2023-10-09 18:17:17,486 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835289.3333333335, ans=0.125 2023-10-09 18:17:18,469 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2835289.3333333335, ans=0.0 2023-10-09 18:17:18,870 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2835289.3333333335, ans=22.5 2023-10-09 18:17:26,389 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2835289.3333333335, ans=15.0 2023-10-09 18:17:28,723 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+02 3.223e+02 3.755e+02 4.885e+02 7.657e+02, threshold=7.509e+02, percent-clipped=0.0 2023-10-09 18:17:39,518 INFO [train.py:1031] (1/4) Epoch 14, batch 22850, loss[loss=0.2097, simple_loss=0.2759, pruned_loss=0.05242, ctc_loss=0.09658, over 16832.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2919, pruned_loss=0.06894, ctc_loss=0.1209, over 3300433.71 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:17:58,572 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2835429.3333333335, ans=0.0 2023-10-09 18:17:59,614 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:18:07,735 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2835476.0, ans=0.0 2023-10-09 18:18:19,148 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2835522.6666666665, ans=0.125 2023-10-09 18:18:34,921 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:18:38,785 INFO [train.py:1031] (1/4) Epoch 14, batch 22900, loss[loss=0.2318, simple_loss=0.2912, pruned_loss=0.06338, ctc_loss=0.1139, over 16989.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2902, pruned_loss=0.06736, ctc_loss=0.1181, over 3300130.73 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:18:40,670 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2835616.0, ans=0.125 2023-10-09 18:18:53,382 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2023-10-09 18:18:53,949 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2835662.6666666665, ans=0.0 2023-10-09 18:19:05,362 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=22.5 2023-10-09 18:19:14,604 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2835756.0, ans=0.1 2023-10-09 18:19:28,354 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2835802.6666666665, ans=0.0 2023-10-09 18:19:29,094 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+02 3.037e+02 3.390e+02 3.855e+02 5.718e+02, threshold=6.781e+02, percent-clipped=0.0 2023-10-09 18:19:30,577 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2835802.6666666665, ans=0.95 2023-10-09 18:19:40,763 INFO [train.py:1031] (1/4) Epoch 14, batch 22950, loss[loss=0.2192, simple_loss=0.2722, pruned_loss=0.06233, ctc_loss=0.104, over 16733.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2881, pruned_loss=0.06737, ctc_loss=0.1178, over 3313849.99 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:19:52,824 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-10-09 18:20:03,036 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2835896.0, ans=0.125 2023-10-09 18:20:15,190 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2835942.6666666665, ans=0.09899494936611666 2023-10-09 18:20:25,448 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2835989.3333333335, ans=0.2 2023-10-09 18:20:37,256 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2836036.0, ans=0.0 2023-10-09 18:20:42,885 INFO [train.py:1031] (1/4) Epoch 14, batch 23000, loss[loss=0.2232, simple_loss=0.2727, pruned_loss=0.06505, ctc_loss=0.1088, over 16735.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2891, pruned_loss=0.06533, ctc_loss=0.1146, over 3283924.55 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:20:57,502 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2023-10-09 18:20:58,093 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2836129.3333333335, ans=0.2 2023-10-09 18:21:08,006 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2023-10-09 18:21:12,329 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2836176.0, ans=0.04949747468305833 2023-10-09 18:21:21,031 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2836222.6666666665, ans=0.125 2023-10-09 18:21:25,916 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2836222.6666666665, ans=0.0 2023-10-09 18:21:28,074 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2836222.6666666665, ans=0.125 2023-10-09 18:21:36,007 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 3.336e+02 3.961e+02 4.906e+02 8.428e+02, threshold=7.922e+02, percent-clipped=4.0 2023-10-09 18:21:41,314 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2836269.3333333335, ans=0.0 2023-10-09 18:21:43,816 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-10-09 18:21:45,236 INFO [train.py:1031] (1/4) Epoch 14, batch 23050, loss[loss=0.2228, simple_loss=0.2849, pruned_loss=0.05931, ctc_loss=0.1053, over 16897.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2928, pruned_loss=0.06739, ctc_loss=0.1183, over 3279967.01 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:22:02,588 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=22.5 2023-10-09 18:22:05,108 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2836362.6666666665, ans=0.125 2023-10-09 18:22:09,726 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2836409.3333333335, ans=0.125 2023-10-09 18:22:47,969 INFO [train.py:1031] (1/4) Epoch 14, batch 23100, loss[loss=0.192, simple_loss=0.2679, pruned_loss=0.04169, ctc_loss=0.08197, over 16778.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2895, pruned_loss=0.06381, ctc_loss=0.1128, over 3273624.79 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:22:55,842 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2836549.3333333335, ans=0.2 2023-10-09 18:23:02,858 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-10-09 18:23:17,079 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2836642.6666666665, ans=0.125 2023-10-09 18:23:20,168 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-10-09 18:23:28,117 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836689.3333333335, ans=0.1 2023-10-09 18:23:40,308 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2836736.0, ans=0.04949747468305833 2023-10-09 18:23:41,038 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.931e+02 3.346e+02 4.278e+02 6.701e+02, threshold=6.692e+02, percent-clipped=0.0 2023-10-09 18:23:42,350 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2836736.0, ans=0.125 2023-10-09 18:23:50,130 INFO [train.py:1031] (1/4) Epoch 14, batch 23150, loss[loss=0.1892, simple_loss=0.2425, pruned_loss=0.04991, ctc_loss=0.09011, over 16810.00 frames. ], tot_loss[loss=0.227, simple_loss=0.285, pruned_loss=0.06234, ctc_loss=0.1107, over 3285889.62 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:23:57,567 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2836782.6666666665, ans=0.1 2023-10-09 18:24:14,564 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836876.0, ans=0.1 2023-10-09 18:24:18,694 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.50 vs. limit=10.0 2023-10-09 18:24:24,108 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2023-10-09 18:24:27,259 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-10-09 18:24:50,638 INFO [train.py:1031] (1/4) Epoch 14, batch 23200, loss[loss=0.2333, simple_loss=0.3259, pruned_loss=0.05002, ctc_loss=0.1015, over 15122.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2819, pruned_loss=0.06161, ctc_loss=0.1096, over 3292267.95 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:24:55,913 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2837016.0, ans=0.0 2023-10-09 18:25:15,081 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-10-09 18:25:23,189 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2837109.3333333335, ans=0.125 2023-10-09 18:25:23,264 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2837109.3333333335, ans=0.125 2023-10-09 18:25:34,807 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2837156.0, ans=0.125 2023-10-09 18:25:35,963 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2837156.0, ans=0.125 2023-10-09 18:25:42,424 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2837202.6666666665, ans=0.125 2023-10-09 18:25:43,461 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2837202.6666666665, ans=0.125 2023-10-09 18:25:47,045 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+02 3.050e+02 3.396e+02 3.920e+02 6.096e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 18:25:53,636 INFO [train.py:1031] (1/4) Epoch 14, batch 23250, loss[loss=0.2161, simple_loss=0.2685, pruned_loss=0.0605, ctc_loss=0.1068, over 16983.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2818, pruned_loss=0.06169, ctc_loss=0.1096, over 3291066.19 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:26:00,771 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2837249.3333333335, ans=0.0 2023-10-09 18:26:22,850 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-10-09 18:26:36,426 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2837389.3333333335, ans=0.09899494936611666 2023-10-09 18:26:40,557 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=15.0 2023-10-09 18:26:49,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2837436.0, ans=0.125 2023-10-09 18:26:59,152 INFO [train.py:1031] (1/4) Epoch 14, batch 23300, loss[loss=0.233, simple_loss=0.2954, pruned_loss=0.06242, ctc_loss=0.1146, over 16692.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2785, pruned_loss=0.06177, ctc_loss=0.1094, over 3303358.93 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:26:59,437 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2837482.6666666665, ans=0.1 2023-10-09 18:27:12,783 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2837529.3333333335, ans=0.125 2023-10-09 18:27:14,250 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2837529.3333333335, ans=22.5 2023-10-09 18:27:17,034 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2837529.3333333335, ans=0.1 2023-10-09 18:27:22,527 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2837529.3333333335, ans=0.0 2023-10-09 18:27:23,550 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2837576.0, ans=0.125 2023-10-09 18:27:30,050 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2837576.0, ans=0.0 2023-10-09 18:27:54,755 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2837669.3333333335, ans=0.0 2023-10-09 18:27:57,269 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.134e+02 3.806e+02 4.608e+02 8.711e+02, threshold=7.613e+02, percent-clipped=4.0 2023-10-09 18:28:01,936 INFO [train.py:1031] (1/4) Epoch 14, batch 23350, loss[loss=0.1999, simple_loss=0.2578, pruned_loss=0.05341, ctc_loss=0.08811, over 16728.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2758, pruned_loss=0.06035, ctc_loss=0.107, over 3306583.47 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:28:09,469 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2837716.0, ans=0.05 2023-10-09 18:29:03,743 INFO [train.py:1031] (1/4) Epoch 14, batch 23400, loss[loss=0.2008, simple_loss=0.247, pruned_loss=0.05762, ctc_loss=0.09844, over 16655.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2713, pruned_loss=0.06039, ctc_loss=0.1065, over 3305882.28 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:29:20,260 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2837996.0, ans=0.0 2023-10-09 18:30:00,414 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 3.093e+02 3.637e+02 4.189e+02 1.057e+03, threshold=7.274e+02, percent-clipped=1.0 2023-10-09 18:30:00,795 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2838136.0, ans=0.07 2023-10-09 18:30:04,495 INFO [train.py:1031] (1/4) Epoch 14, batch 23450, loss[loss=0.1942, simple_loss=0.2514, pruned_loss=0.05023, ctc_loss=0.09141, over 16755.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2677, pruned_loss=0.06002, ctc_loss=0.1054, over 3306336.58 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:30:43,428 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=22.5 2023-10-09 18:31:02,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2838369.3333333335, ans=0.125 2023-10-09 18:31:06,571 INFO [train.py:1031] (1/4) Epoch 14, batch 23500, loss[loss=0.219, simple_loss=0.2714, pruned_loss=0.06262, ctc_loss=0.1035, over 16956.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2675, pruned_loss=0.06069, ctc_loss=0.1068, over 3310537.94 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:31:27,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2838462.6666666665, ans=0.09899494936611666 2023-10-09 18:31:34,715 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2838509.3333333335, ans=0.0 2023-10-09 18:32:01,848 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2838602.6666666665, ans=0.035 2023-10-09 18:32:03,951 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2838602.6666666665, ans=0.0 2023-10-09 18:32:03,972 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2838602.6666666665, ans=0.125 2023-10-09 18:32:05,663 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+02 3.415e+02 3.721e+02 4.306e+02 1.300e+03, threshold=7.442e+02, percent-clipped=1.0 2023-10-09 18:32:07,715 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2838649.3333333335, ans=0.0 2023-10-09 18:32:08,412 INFO [train.py:1031] (1/4) Epoch 14, batch 23550, loss[loss=0.1893, simple_loss=0.2463, pruned_loss=0.04985, ctc_loss=0.08174, over 16954.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2726, pruned_loss=0.06321, ctc_loss=0.111, over 3313661.38 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:32:11,676 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2838649.3333333335, ans=0.125 2023-10-09 18:32:13,220 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=22.5 2023-10-09 18:32:34,826 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2838742.6666666665, ans=0.125 2023-10-09 18:32:42,148 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2838742.6666666665, ans=0.125 2023-10-09 18:32:44,105 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.01 vs. limit=10.0 2023-10-09 18:32:44,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2838789.3333333335, ans=0.0 2023-10-09 18:32:54,316 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-10-09 18:32:58,408 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2838836.0, ans=0.0 2023-10-09 18:33:05,954 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2838836.0, ans=0.0 2023-10-09 18:33:08,875 INFO [train.py:1031] (1/4) Epoch 14, batch 23600, loss[loss=0.2191, simple_loss=0.2755, pruned_loss=0.06112, ctc_loss=0.1011, over 16909.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2691, pruned_loss=0.06263, ctc_loss=0.1098, over 3308399.48 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:33:24,303 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2838929.3333333335, ans=0.125 2023-10-09 18:33:26,957 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2838929.3333333335, ans=0.1 2023-10-09 18:34:09,459 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 2.978e+02 3.333e+02 3.973e+02 8.640e+02, threshold=6.667e+02, percent-clipped=1.0 2023-10-09 18:34:10,552 INFO [train.py:1031] (1/4) Epoch 14, batch 23650, loss[loss=0.2336, simple_loss=0.3141, pruned_loss=0.05646, ctc_loss=0.1005, over 16940.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2715, pruned_loss=0.06186, ctc_loss=0.1085, over 3305255.02 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:34:25,771 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2839162.6666666665, ans=0.125 2023-10-09 18:34:29,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2839162.6666666665, ans=0.125 2023-10-09 18:34:33,900 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2839209.3333333335, ans=0.125 2023-10-09 18:34:42,709 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2839209.3333333335, ans=0.2 2023-10-09 18:35:04,195 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2839302.6666666665, ans=0.1 2023-10-09 18:35:07,913 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2839302.6666666665, ans=0.09899494936611666 2023-10-09 18:35:11,978 INFO [train.py:1031] (1/4) Epoch 14, batch 23700, loss[loss=0.1762, simple_loss=0.2525, pruned_loss=0.03731, ctc_loss=0.06318, over 16835.00 frames. ], tot_loss[loss=0.215, simple_loss=0.272, pruned_loss=0.05841, ctc_loss=0.1027, over 3290423.14 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:35:16,718 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2839349.3333333335, ans=0.125 2023-10-09 18:35:29,028 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2839396.0, ans=0.125 2023-10-09 18:35:31,217 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2839396.0, ans=0.125 2023-10-09 18:35:39,725 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2839442.6666666665, ans=0.125 2023-10-09 18:35:50,079 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2023-10-09 18:35:52,326 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2839489.3333333335, ans=0.125 2023-10-09 18:36:05,105 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2839536.0, ans=0.125 2023-10-09 18:36:07,703 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2839536.0, ans=10.0 2023-10-09 18:36:11,209 INFO [train.py:1031] (1/4) Epoch 14, batch 23750, loss[loss=0.2341, simple_loss=0.3178, pruned_loss=0.05447, ctc_loss=0.1037, over 15250.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2765, pruned_loss=0.05872, ctc_loss=0.1038, over 3302919.67 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 18:36:12,957 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.785e+02 3.356e+02 4.379e+02 6.615e+02, threshold=6.712e+02, percent-clipped=0.0 2023-10-09 18:36:33,889 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2839676.0, ans=0.1 2023-10-09 18:36:35,837 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2839676.0, ans=0.1 2023-10-09 18:36:45,020 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2839676.0, ans=0.1 2023-10-09 18:37:11,734 INFO [train.py:1031] (1/4) Epoch 14, batch 23800, loss[loss=0.1337, simple_loss=0.1867, pruned_loss=0.02983, ctc_loss=0.0527, over 12014.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2766, pruned_loss=0.05622, ctc_loss=0.09999, over 3302174.08 frames. ], batch size: 44, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:37:15,225 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2839816.0, ans=0.0 2023-10-09 18:37:29,870 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=2839862.6666666665, ans=15.0 2023-10-09 18:37:29,943 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=22.5 2023-10-09 18:37:41,119 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2839909.3333333335, ans=0.0 2023-10-09 18:37:44,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2839909.3333333335, ans=0.0 2023-10-09 18:37:47,317 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.04 vs. limit=10.0 2023-10-09 18:37:51,929 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2839956.0, ans=0.0 2023-10-09 18:38:12,935 INFO [train.py:1031] (1/4) Epoch 14, batch 23850, loss[loss=0.2644, simple_loss=0.3537, pruned_loss=0.0647, ctc_loss=0.1144, over 15194.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2821, pruned_loss=0.05624, ctc_loss=0.1001, over 3298289.29 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:38:14,610 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 3.207e+02 4.081e+02 4.991e+02 8.849e+02, threshold=8.163e+02, percent-clipped=8.0 2023-10-09 18:38:17,893 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2840049.3333333335, ans=0.125 2023-10-09 18:38:21,193 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2840049.3333333335, ans=0.0 2023-10-09 18:38:30,149 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2840096.0, ans=0.0 2023-10-09 18:38:30,179 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2840096.0, ans=0.0 2023-10-09 18:38:38,007 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:38:38,986 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2840142.6666666665, ans=0.125 2023-10-09 18:38:52,433 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2840189.3333333335, ans=0.125 2023-10-09 18:39:09,421 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2840236.0, ans=0.125 2023-10-09 18:39:13,694 INFO [train.py:1031] (1/4) Epoch 14, batch 23900, loss[loss=0.2527, simple_loss=0.3061, pruned_loss=0.07455, ctc_loss=0.1254, over 16990.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2835, pruned_loss=0.05789, ctc_loss=0.1025, over 3299963.85 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:39:21,579 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2840282.6666666665, ans=0.125 2023-10-09 18:39:48,662 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2840376.0, ans=0.07 2023-10-09 18:40:15,806 INFO [train.py:1031] (1/4) Epoch 14, batch 23950, loss[loss=0.2386, simple_loss=0.2926, pruned_loss=0.06978, ctc_loss=0.1124, over 16951.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2819, pruned_loss=0.06009, ctc_loss=0.1056, over 3310277.46 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:40:16,838 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+02 3.283e+02 3.829e+02 4.670e+02 8.731e+02, threshold=7.659e+02, percent-clipped=1.0 2023-10-09 18:40:21,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2840516.0, ans=0.125 2023-10-09 18:40:37,642 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2840562.6666666665, ans=0.125 2023-10-09 18:40:45,030 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2840609.3333333335, ans=0.125 2023-10-09 18:40:56,876 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:40:56,909 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2840656.0, ans=0.5 2023-10-09 18:40:56,967 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2840656.0, ans=0.125 2023-10-09 18:41:15,727 INFO [train.py:1031] (1/4) Epoch 14, batch 24000, loss[loss=0.1981, simple_loss=0.2823, pruned_loss=0.0413, ctc_loss=0.0782, over 16919.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2819, pruned_loss=0.06164, ctc_loss=0.1081, over 3317449.26 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:41:15,727 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 18:41:25,357 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0394, 2.3323, 4.5316, 1.7156], device='cuda:1') 2023-10-09 18:41:33,415 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2354, simple_loss=0.3014, pruned_loss=0.06541, ctc_loss=0.09632, over 1796401.00 frames. 2023-10-09 18:41:33,415 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 18:41:44,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2840796.0, ans=0.95 2023-10-09 18:41:50,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-10-09 18:42:02,698 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2023-10-09 18:42:10,216 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.20 vs. limit=15.0 2023-10-09 18:42:19,189 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:42:19,710 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.31 vs. limit=12.0 2023-10-09 18:42:22,491 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2840936.0, ans=0.5 2023-10-09 18:42:36,284 INFO [train.py:1031] (1/4) Epoch 14, batch 24050, loss[loss=0.2732, simple_loss=0.3257, pruned_loss=0.0819, ctc_loss=0.1422, over 16794.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2858, pruned_loss=0.06224, ctc_loss=0.1097, over 3305893.46 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:42:40,008 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.406e+02 3.196e+02 3.829e+02 4.589e+02 8.519e+02, threshold=7.658e+02, percent-clipped=2.0 2023-10-09 18:42:49,280 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-10-09 18:42:50,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2841029.3333333335, ans=0.0 2023-10-09 18:43:07,991 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2841076.0, ans=0.125 2023-10-09 18:43:31,144 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2841169.3333333335, ans=0.0 2023-10-09 18:43:37,903 INFO [train.py:1031] (1/4) Epoch 14, batch 24100, loss[loss=0.2213, simple_loss=0.2899, pruned_loss=0.05735, ctc_loss=0.09506, over 16754.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2895, pruned_loss=0.06446, ctc_loss=0.1132, over 3303135.29 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:43:38,721 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-10-09 18:44:12,604 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2023-10-09 18:44:15,225 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2841356.0, ans=0.025 2023-10-09 18:44:16,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2841356.0, ans=0.125 2023-10-09 18:44:31,612 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2841402.6666666665, ans=0.125 2023-10-09 18:44:33,677 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841402.6666666665, ans=0.1 2023-10-09 18:44:39,362 INFO [train.py:1031] (1/4) Epoch 14, batch 24150, loss[loss=0.1974, simple_loss=0.2577, pruned_loss=0.0501, ctc_loss=0.09208, over 16811.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2819, pruned_loss=0.06114, ctc_loss=0.1079, over 3300986.88 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:44:43,200 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.029e+02 3.494e+02 3.950e+02 7.485e+02, threshold=6.988e+02, percent-clipped=0.0 2023-10-09 18:44:53,023 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2841496.0, ans=0.2 2023-10-09 18:45:17,421 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-10-09 18:45:23,900 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2841589.3333333335, ans=0.125 2023-10-09 18:45:42,158 INFO [train.py:1031] (1/4) Epoch 14, batch 24200, loss[loss=0.1894, simple_loss=0.2489, pruned_loss=0.04849, ctc_loss=0.08209, over 16716.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2785, pruned_loss=0.05776, ctc_loss=0.1028, over 3293144.97 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:45:49,561 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2841682.6666666665, ans=0.025 2023-10-09 18:46:15,285 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2841776.0, ans=0.125 2023-10-09 18:46:25,129 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2841822.6666666665, ans=0.125 2023-10-09 18:46:29,065 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=22.5 2023-10-09 18:46:43,488 INFO [train.py:1031] (1/4) Epoch 14, batch 24250, loss[loss=0.2332, simple_loss=0.2875, pruned_loss=0.0672, ctc_loss=0.111, over 16849.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2759, pruned_loss=0.05714, ctc_loss=0.1016, over 3293018.75 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:46:47,015 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2841916.0, ans=0.2 2023-10-09 18:46:49,469 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.981e+02 3.499e+02 4.269e+02 8.354e+02, threshold=6.999e+02, percent-clipped=3.0 2023-10-09 18:47:01,603 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=12.0 2023-10-09 18:47:21,351 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2842056.0, ans=0.125 2023-10-09 18:47:23,500 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2842056.0, ans=0.2 2023-10-09 18:47:27,256 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2023-10-09 18:47:44,948 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2842102.6666666665, ans=0.125 2023-10-09 18:47:46,789 INFO [train.py:1031] (1/4) Epoch 14, batch 24300, loss[loss=0.2385, simple_loss=0.2773, pruned_loss=0.07226, ctc_loss=0.1379, over 15313.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2825, pruned_loss=0.06146, ctc_loss=0.1088, over 3293114.82 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:47:59,662 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2023-10-09 18:48:18,265 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2842242.6666666665, ans=0.125 2023-10-09 18:48:33,038 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2842289.3333333335, ans=0.0 2023-10-09 18:48:42,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2842336.0, ans=0.125 2023-10-09 18:48:42,790 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2842336.0, ans=0.0 2023-10-09 18:48:48,964 INFO [train.py:1031] (1/4) Epoch 14, batch 24350, loss[loss=0.2288, simple_loss=0.2726, pruned_loss=0.06835, ctc_loss=0.1208, over 16762.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2857, pruned_loss=0.06207, ctc_loss=0.1097, over 3290605.50 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:48:49,228 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2842382.6666666665, ans=0.125 2023-10-09 18:48:55,800 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+02 3.451e+02 4.035e+02 4.756e+02 1.145e+03, threshold=8.070e+02, percent-clipped=2.0 2023-10-09 18:48:57,155 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2842382.6666666665, ans=0.125 2023-10-09 18:49:24,444 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2842522.6666666665, ans=0.0 2023-10-09 18:49:24,517 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2842522.6666666665, ans=0.125 2023-10-09 18:49:26,233 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2842522.6666666665, ans=0.1 2023-10-09 18:49:29,194 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:49:34,249 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2842522.6666666665, ans=0.0 2023-10-09 18:49:49,969 INFO [train.py:1031] (1/4) Epoch 14, batch 24400, loss[loss=0.2274, simple_loss=0.2893, pruned_loss=0.062, ctc_loss=0.104, over 16983.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2848, pruned_loss=0.06334, ctc_loss=0.1118, over 3291258.83 frames. ], batch size: 216, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:49:59,419 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2842616.0, ans=0.125 2023-10-09 18:50:02,551 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2842662.6666666665, ans=0.0 2023-10-09 18:50:06,285 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.62 vs. limit=6.0 2023-10-09 18:50:08,026 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2842662.6666666665, ans=0.125 2023-10-09 18:50:27,748 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2842756.0, ans=0.125 2023-10-09 18:50:47,119 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2842802.6666666665, ans=0.1 2023-10-09 18:50:48,314 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2842802.6666666665, ans=0.05 2023-10-09 18:50:50,579 INFO [train.py:1031] (1/4) Epoch 14, batch 24450, loss[loss=0.2337, simple_loss=0.2727, pruned_loss=0.07308, ctc_loss=0.1213, over 16721.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2839, pruned_loss=0.06411, ctc_loss=0.1131, over 3282756.93 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:50:54,751 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2842849.3333333335, ans=0.0 2023-10-09 18:50:57,483 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.474e+02 3.798e+02 4.507e+02 6.680e+02, threshold=7.596e+02, percent-clipped=0.0 2023-10-09 18:51:38,676 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2843036.0, ans=0.0 2023-10-09 18:51:51,711 INFO [train.py:1031] (1/4) Epoch 14, batch 24500, loss[loss=0.2063, simple_loss=0.2521, pruned_loss=0.06111, ctc_loss=0.09563, over 16790.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2805, pruned_loss=0.06352, ctc_loss=0.1109, over 3276098.01 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:51:53,994 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2023-10-09 18:51:54,736 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2843082.6666666665, ans=0.0 2023-10-09 18:51:56,830 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843082.6666666665, ans=0.1 2023-10-09 18:52:02,786 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843129.3333333335, ans=0.1 2023-10-09 18:52:05,510 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2843129.3333333335, ans=0.125 2023-10-09 18:52:08,135 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2843129.3333333335, ans=0.0 2023-10-09 18:52:33,028 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2843222.6666666665, ans=0.125 2023-10-09 18:52:54,587 INFO [train.py:1031] (1/4) Epoch 14, batch 24550, loss[loss=0.2416, simple_loss=0.3304, pruned_loss=0.05425, ctc_loss=0.1107, over 16452.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2782, pruned_loss=0.06149, ctc_loss=0.1068, over 3280140.65 frames. ], batch size: 416, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:53:01,234 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2023-10-09 18:53:03,099 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+02 3.410e+02 4.193e+02 5.169e+02 8.028e+02, threshold=8.385e+02, percent-clipped=3.0 2023-10-09 18:53:12,548 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2843362.6666666665, ans=0.125 2023-10-09 18:53:15,918 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2843362.6666666665, ans=0.125 2023-10-09 18:53:41,228 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2843456.0, ans=0.0 2023-10-09 18:53:46,449 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=22.5 2023-10-09 18:53:57,854 INFO [train.py:1031] (1/4) Epoch 14, batch 24600, loss[loss=0.2882, simple_loss=0.3293, pruned_loss=0.09262, ctc_loss=0.1548, over 16506.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2817, pruned_loss=0.0615, ctc_loss=0.1074, over 3290001.09 frames. ], batch size: 416, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:54:12,086 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2843596.0, ans=0.125 2023-10-09 18:54:23,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2843642.6666666665, ans=0.0 2023-10-09 18:54:37,895 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2843689.3333333335, ans=0.0 2023-10-09 18:54:43,498 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2843689.3333333335, ans=0.0 2023-10-09 18:54:43,704 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-10-09 18:54:44,476 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2843689.3333333335, ans=0.04949747468305833 2023-10-09 18:54:47,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2843689.3333333335, ans=0.1 2023-10-09 18:54:49,363 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2843736.0, ans=0.0 2023-10-09 18:55:00,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2843736.0, ans=0.0 2023-10-09 18:55:02,707 INFO [train.py:1031] (1/4) Epoch 14, batch 24650, loss[loss=0.2578, simple_loss=0.3144, pruned_loss=0.0755, ctc_loss=0.1257, over 16585.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.29, pruned_loss=0.06498, ctc_loss=0.1138, over 3297195.48 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:55:03,041 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2843782.6666666665, ans=0.5 2023-10-09 18:55:13,667 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.365e+02 3.995e+02 4.722e+02 9.808e+02, threshold=7.989e+02, percent-clipped=0.0 2023-10-09 18:55:20,770 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2843829.3333333335, ans=0.05 2023-10-09 18:55:45,624 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2023-10-09 18:55:54,333 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2023-10-09 18:56:00,447 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2843969.3333333335, ans=0.125 2023-10-09 18:56:06,105 INFO [train.py:1031] (1/4) Epoch 14, batch 24700, loss[loss=0.3018, simple_loss=0.3515, pruned_loss=0.09539, ctc_loss=0.1535, over 16752.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2987, pruned_loss=0.06623, ctc_loss=0.1162, over 3289103.83 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:56:19,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2844062.6666666665, ans=0.125 2023-10-09 18:56:27,635 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2844062.6666666665, ans=0.125 2023-10-09 18:56:28,725 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2844062.6666666665, ans=0.05 2023-10-09 18:56:37,810 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-10-09 18:56:45,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2844156.0, ans=0.1 2023-10-09 18:56:51,831 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2844156.0, ans=0.125 2023-10-09 18:57:10,469 INFO [train.py:1031] (1/4) Epoch 14, batch 24750, loss[loss=0.2388, simple_loss=0.3028, pruned_loss=0.06533, ctc_loss=0.1105, over 16869.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.3014, pruned_loss=0.06854, ctc_loss=0.1195, over 3281451.74 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:57:23,591 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.844e+02 3.622e+02 4.141e+02 4.992e+02 1.091e+03, threshold=8.281e+02, percent-clipped=4.0 2023-10-09 18:57:28,448 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2844296.0, ans=0.2 2023-10-09 18:57:34,998 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2844296.0, ans=0.125 2023-10-09 18:58:17,286 INFO [train.py:1031] (1/4) Epoch 14, batch 24800, loss[loss=0.2963, simple_loss=0.3448, pruned_loss=0.09234, ctc_loss=0.1581, over 16561.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.2991, pruned_loss=0.06771, ctc_loss=0.1171, over 3285380.34 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:58:41,384 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2844576.0, ans=0.0 2023-10-09 18:58:44,143 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2844576.0, ans=0.1 2023-10-09 18:58:57,836 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2844622.6666666665, ans=0.125 2023-10-09 18:59:11,923 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-10-09 18:59:20,853 INFO [train.py:1031] (1/4) Epoch 14, batch 24850, loss[loss=0.2652, simple_loss=0.3178, pruned_loss=0.07933, ctc_loss=0.135, over 16914.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.3, pruned_loss=0.06904, ctc_loss=0.1189, over 3280732.14 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 18:59:26,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2844716.0, ans=0.2 2023-10-09 18:59:34,837 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2023-10-09 18:59:35,025 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.266e+02 3.931e+02 4.617e+02 8.041e+02, threshold=7.862e+02, percent-clipped=0.0 2023-10-09 18:59:37,782 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2844762.6666666665, ans=0.1 2023-10-09 18:59:51,302 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2844809.3333333335, ans=0.2 2023-10-09 18:59:59,333 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2844856.0, ans=0.0 2023-10-09 19:00:17,804 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.27 vs. limit=6.0 2023-10-09 19:00:27,327 INFO [train.py:1031] (1/4) Epoch 14, batch 24900, loss[loss=0.3439, simple_loss=0.4048, pruned_loss=0.1036, ctc_loss=0.1891, over 16677.00 frames. ], tot_loss[loss=0.2471, simple_loss=0.3048, pruned_loss=0.07036, ctc_loss=0.1216, over 3275471.23 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:00:43,987 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-10-09 19:00:58,065 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2023-10-09 19:01:01,558 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-10-09 19:01:11,262 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-10-09 19:01:24,787 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2845136.0, ans=10.0 2023-10-09 19:01:29,983 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2845182.6666666665, ans=0.04949747468305833 2023-10-09 19:01:30,761 INFO [train.py:1031] (1/4) Epoch 14, batch 24950, loss[loss=0.2391, simple_loss=0.2987, pruned_loss=0.06704, ctc_loss=0.1136, over 16813.00 frames. ], tot_loss[loss=0.2463, simple_loss=0.3064, pruned_loss=0.06912, ctc_loss=0.1198, over 3272754.79 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:01:31,101 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2845182.6666666665, ans=0.09899494936611666 2023-10-09 19:01:43,764 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2845229.3333333335, ans=0.0 2023-10-09 19:01:46,519 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.532e+02 4.120e+02 4.965e+02 9.701e+02, threshold=8.240e+02, percent-clipped=4.0 2023-10-09 19:01:57,509 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2845276.0, ans=0.1 2023-10-09 19:02:13,803 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2845322.6666666665, ans=0.1 2023-10-09 19:02:18,617 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2845322.6666666665, ans=0.125 2023-10-09 19:02:29,368 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2845369.3333333335, ans=0.125 2023-10-09 19:02:31,455 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2845416.0, ans=0.125 2023-10-09 19:02:31,534 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2845416.0, ans=0.0 2023-10-09 19:02:32,823 INFO [train.py:1031] (1/4) Epoch 14, batch 25000, loss[loss=0.2183, simple_loss=0.2803, pruned_loss=0.0568, ctc_loss=0.1067, over 16897.00 frames. ], tot_loss[loss=0.2442, simple_loss=0.3024, pruned_loss=0.06903, ctc_loss=0.1197, over 3287342.42 frames. ], batch size: 273, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:03:00,743 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2845509.3333333335, ans=0.2 2023-10-09 19:03:07,758 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2845556.0, ans=0.0 2023-10-09 19:03:15,363 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2845556.0, ans=0.2 2023-10-09 19:03:16,830 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2023-10-09 19:03:18,464 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2845556.0, ans=0.125 2023-10-09 19:03:31,556 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-10-09 19:03:33,095 INFO [train.py:1031] (1/4) Epoch 14, batch 25050, loss[loss=0.2149, simple_loss=0.2646, pruned_loss=0.06197, ctc_loss=0.1035, over 16898.00 frames. ], tot_loss[loss=0.241, simple_loss=0.2972, pruned_loss=0.06861, ctc_loss=0.1192, over 3302652.31 frames. ], batch size: 189, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:03:46,361 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-10-09 19:03:50,008 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+02 3.393e+02 3.859e+02 4.552e+02 1.527e+03, threshold=7.717e+02, percent-clipped=2.0 2023-10-09 19:03:54,219 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2845696.0, ans=0.125 2023-10-09 19:04:01,019 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2845742.6666666665, ans=0.2 2023-10-09 19:04:21,233 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2845836.0, ans=0.125 2023-10-09 19:04:24,372 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2845836.0, ans=6.0 2023-10-09 19:04:34,840 INFO [train.py:1031] (1/4) Epoch 14, batch 25100, loss[loss=0.2179, simple_loss=0.2713, pruned_loss=0.06041, ctc_loss=0.1091, over 16779.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2931, pruned_loss=0.06669, ctc_loss=0.1162, over 3309671.64 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:04:52,211 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2845929.3333333335, ans=0.125 2023-10-09 19:04:55,950 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2845929.3333333335, ans=0.125 2023-10-09 19:05:01,457 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2845976.0, ans=0.125 2023-10-09 19:05:11,063 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2846022.6666666665, ans=0.0 2023-10-09 19:05:11,553 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-10-09 19:05:36,319 INFO [train.py:1031] (1/4) Epoch 14, batch 25150, loss[loss=0.232, simple_loss=0.2929, pruned_loss=0.0643, ctc_loss=0.1062, over 16986.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2867, pruned_loss=0.06482, ctc_loss=0.113, over 3303627.17 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:05:52,020 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 2.975e+02 3.474e+02 4.105e+02 7.010e+02, threshold=6.948e+02, percent-clipped=0.0 2023-10-09 19:05:58,352 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2846209.3333333335, ans=0.0 2023-10-09 19:06:27,890 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2846302.6666666665, ans=0.125 2023-10-09 19:06:34,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2846349.3333333335, ans=0.125 2023-10-09 19:06:36,059 INFO [train.py:1031] (1/4) Epoch 14, batch 25200, loss[loss=0.2021, simple_loss=0.2543, pruned_loss=0.05552, ctc_loss=0.09745, over 16942.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2839, pruned_loss=0.06502, ctc_loss=0.1135, over 3300014.20 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:06:47,088 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2846396.0, ans=0.0 2023-10-09 19:07:21,783 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2846489.3333333335, ans=0.125 2023-10-09 19:07:35,929 INFO [train.py:1031] (1/4) Epoch 14, batch 25250, loss[loss=0.2378, simple_loss=0.2725, pruned_loss=0.07525, ctc_loss=0.1316, over 16671.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2816, pruned_loss=0.06539, ctc_loss=0.1141, over 3311302.49 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:07:36,149 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2846582.6666666665, ans=0.1 2023-10-09 19:07:38,932 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2846582.6666666665, ans=0.02 2023-10-09 19:07:38,963 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2846582.6666666665, ans=0.125 2023-10-09 19:07:40,790 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2846582.6666666665, ans=0.2 2023-10-09 19:07:56,503 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+02 3.269e+02 3.734e+02 4.463e+02 8.122e+02, threshold=7.469e+02, percent-clipped=1.0 2023-10-09 19:07:56,826 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2846629.3333333335, ans=0.0 2023-10-09 19:08:00,684 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-10-09 19:08:38,210 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-10-09 19:08:39,562 INFO [train.py:1031] (1/4) Epoch 14, batch 25300, loss[loss=0.2855, simple_loss=0.3678, pruned_loss=0.07392, ctc_loss=0.1386, over 15303.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2881, pruned_loss=0.06698, ctc_loss=0.1176, over 3306449.11 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:08:50,339 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2846816.0, ans=0.125 2023-10-09 19:08:57,906 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2846862.6666666665, ans=0.0 2023-10-09 19:09:01,711 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2846862.6666666665, ans=0.07 2023-10-09 19:09:22,352 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2846956.0, ans=0.125 2023-10-09 19:09:26,693 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2846956.0, ans=0.125 2023-10-09 19:09:34,094 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2847002.6666666665, ans=0.125 2023-10-09 19:09:41,238 INFO [train.py:1031] (1/4) Epoch 14, batch 25350, loss[loss=0.2172, simple_loss=0.2786, pruned_loss=0.05777, ctc_loss=0.1008, over 16915.00 frames. ], tot_loss[loss=0.2411, simple_loss=0.2962, pruned_loss=0.0688, ctc_loss=0.1209, over 3304037.80 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:09:54,015 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2847096.0, ans=0.125 2023-10-09 19:09:56,118 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2847096.0, ans=0.0 2023-10-09 19:10:01,554 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.479e+02 4.151e+02 5.048e+02 8.470e+02, threshold=8.302e+02, percent-clipped=4.0 2023-10-09 19:10:11,989 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2847142.6666666665, ans=0.1 2023-10-09 19:10:23,334 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2847189.3333333335, ans=0.0 2023-10-09 19:10:30,850 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2847236.0, ans=0.2 2023-10-09 19:10:41,702 INFO [train.py:1031] (1/4) Epoch 14, batch 25400, loss[loss=0.2404, simple_loss=0.2904, pruned_loss=0.07044, ctc_loss=0.124, over 16953.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2935, pruned_loss=0.06892, ctc_loss=0.1211, over 3314874.92 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:11:03,254 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2847329.3333333335, ans=0.1 2023-10-09 19:11:16,768 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2847422.6666666665, ans=0.125 2023-10-09 19:11:32,219 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2847469.3333333335, ans=0.05 2023-10-09 19:11:38,034 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2847469.3333333335, ans=0.0 2023-10-09 19:11:39,066 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2847469.3333333335, ans=0.1 2023-10-09 19:11:40,829 INFO [train.py:1031] (1/4) Epoch 14, batch 25450, loss[loss=0.2127, simple_loss=0.276, pruned_loss=0.05594, ctc_loss=0.09385, over 16785.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2898, pruned_loss=0.06841, ctc_loss=0.1199, over 3316428.22 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:11:52,966 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2847562.6666666665, ans=0.125 2023-10-09 19:12:01,164 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+02 3.142e+02 3.636e+02 4.300e+02 1.054e+03, threshold=7.273e+02, percent-clipped=3.0 2023-10-09 19:12:02,968 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2847562.6666666665, ans=0.125 2023-10-09 19:12:37,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2847702.6666666665, ans=0.125 2023-10-09 19:12:41,752 INFO [train.py:1031] (1/4) Epoch 14, batch 25500, loss[loss=0.1891, simple_loss=0.2595, pruned_loss=0.04315, ctc_loss=0.08113, over 16786.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2853, pruned_loss=0.06602, ctc_loss=0.1161, over 3322282.65 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:12:45,134 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=12.0 2023-10-09 19:12:47,551 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2847749.3333333335, ans=0.125 2023-10-09 19:12:49,589 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2847749.3333333335, ans=0.125 2023-10-09 19:12:59,105 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2847796.0, ans=0.125 2023-10-09 19:12:59,320 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-10-09 19:13:01,809 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2847796.0, ans=0.125 2023-10-09 19:13:02,872 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2847796.0, ans=0.07 2023-10-09 19:13:05,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2847796.0, ans=0.125 2023-10-09 19:13:05,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2847796.0, ans=0.125 2023-10-09 19:13:13,222 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.93 vs. limit=10.0 2023-10-09 19:13:34,985 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2847936.0, ans=0.0 2023-10-09 19:13:41,366 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2847936.0, ans=0.125 2023-10-09 19:13:44,908 INFO [train.py:1031] (1/4) Epoch 14, batch 25550, loss[loss=0.2668, simple_loss=0.292, pruned_loss=0.08869, ctc_loss=0.1603, over 15223.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2882, pruned_loss=0.068, ctc_loss=0.1193, over 3317264.32 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:13:49,360 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2847982.6666666665, ans=0.125 2023-10-09 19:13:49,553 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-10-09 19:13:51,420 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2847982.6666666665, ans=0.2 2023-10-09 19:14:04,733 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2848029.3333333335, ans=0.1 2023-10-09 19:14:07,200 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+02 3.271e+02 3.768e+02 4.486e+02 1.096e+03, threshold=7.537e+02, percent-clipped=1.0 2023-10-09 19:14:14,161 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2848076.0, ans=0.2 2023-10-09 19:14:15,276 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2848076.0, ans=0.2 2023-10-09 19:14:17,954 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=22.5 2023-10-09 19:14:18,950 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-10-09 19:14:34,121 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2848169.3333333335, ans=0.0 2023-10-09 19:14:45,706 INFO [train.py:1031] (1/4) Epoch 14, batch 25600, loss[loss=0.2444, simple_loss=0.2982, pruned_loss=0.07038, ctc_loss=0.1247, over 16204.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.2921, pruned_loss=0.07038, ctc_loss=0.1236, over 3311616.52 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:14:46,356 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2023-10-09 19:15:13,727 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-10-09 19:15:15,816 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2848309.3333333335, ans=0.125 2023-10-09 19:15:40,142 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.06 vs. limit=10.0 2023-10-09 19:15:45,278 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2848402.6666666665, ans=0.0 2023-10-09 19:15:47,650 INFO [train.py:1031] (1/4) Epoch 14, batch 25650, loss[loss=0.2694, simple_loss=0.3103, pruned_loss=0.08284, ctc_loss=0.157, over 15211.00 frames. ], tot_loss[loss=0.2473, simple_loss=0.2988, pruned_loss=0.07242, ctc_loss=0.1274, over 3308267.78 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:16:11,372 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+02 3.570e+02 3.954e+02 4.505e+02 1.083e+03, threshold=7.908e+02, percent-clipped=2.0 2023-10-09 19:16:16,556 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2848542.6666666665, ans=0.125 2023-10-09 19:16:22,011 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2848542.6666666665, ans=0.125 2023-10-09 19:16:22,061 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2848542.6666666665, ans=0.025 2023-10-09 19:16:24,609 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-10-09 19:16:34,394 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=22.5 2023-10-09 19:16:36,318 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.95 vs. limit=22.5 2023-10-09 19:16:49,693 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2848682.6666666665, ans=0.125 2023-10-09 19:16:50,496 INFO [train.py:1031] (1/4) Epoch 14, batch 25700, loss[loss=0.2693, simple_loss=0.3092, pruned_loss=0.08556, ctc_loss=0.146, over 16808.00 frames. ], tot_loss[loss=0.2526, simple_loss=0.304, pruned_loss=0.07451, ctc_loss=0.1305, over 3304533.96 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:17:02,354 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=22.5 2023-10-09 19:17:06,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2848729.3333333335, ans=0.125 2023-10-09 19:17:11,397 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2848729.3333333335, ans=0.0 2023-10-09 19:17:11,410 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2848729.3333333335, ans=0.125 2023-10-09 19:17:14,905 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2848776.0, ans=0.09899494936611666 2023-10-09 19:17:16,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2848776.0, ans=0.125 2023-10-09 19:17:32,321 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-10-09 19:17:33,528 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2848822.6666666665, ans=0.125 2023-10-09 19:17:38,923 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2848869.3333333335, ans=0.0 2023-10-09 19:17:44,390 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2848869.3333333335, ans=0.125 2023-10-09 19:17:51,107 INFO [train.py:1031] (1/4) Epoch 14, batch 25750, loss[loss=0.2273, simple_loss=0.2726, pruned_loss=0.06915, ctc_loss=0.1093, over 16830.00 frames. ], tot_loss[loss=0.2536, simple_loss=0.3053, pruned_loss=0.07482, ctc_loss=0.1308, over 3301681.94 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:18:06,800 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2848962.6666666665, ans=0.125 2023-10-09 19:18:17,301 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+02 3.581e+02 3.886e+02 4.426e+02 7.686e+02, threshold=7.772e+02, percent-clipped=0.0 2023-10-09 19:18:23,132 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2849009.3333333335, ans=0.125 2023-10-09 19:18:56,344 INFO [train.py:1031] (1/4) Epoch 14, batch 25800, loss[loss=0.2351, simple_loss=0.327, pruned_loss=0.05195, ctc_loss=0.09836, over 15090.00 frames. ], tot_loss[loss=0.2459, simple_loss=0.3019, pruned_loss=0.07022, ctc_loss=0.1236, over 3300928.41 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:19:01,861 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=22.5 2023-10-09 19:19:06,719 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-10-09 19:19:32,949 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.76 vs. limit=22.5 2023-10-09 19:19:43,396 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2849289.3333333335, ans=0.125 2023-10-09 19:19:57,606 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2849336.0, ans=0.125 2023-10-09 19:19:59,396 INFO [train.py:1031] (1/4) Epoch 14, batch 25850, loss[loss=0.2371, simple_loss=0.3051, pruned_loss=0.06228, ctc_loss=0.1112, over 16855.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2993, pruned_loss=0.06836, ctc_loss=0.1197, over 3297407.97 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:19:59,630 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2849382.6666666665, ans=0.04949747468305833 2023-10-09 19:20:04,016 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2849382.6666666665, ans=0.0 2023-10-09 19:20:10,765 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:20:24,801 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.590e+02 3.413e+02 3.966e+02 4.957e+02 9.645e+02, threshold=7.933e+02, percent-clipped=3.0 2023-10-09 19:21:00,864 INFO [train.py:1031] (1/4) Epoch 14, batch 25900, loss[loss=0.2107, simple_loss=0.3013, pruned_loss=0.04421, ctc_loss=0.07921, over 16290.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2947, pruned_loss=0.06683, ctc_loss=0.1158, over 3296638.69 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:21:01,199 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2849616.0, ans=0.0 2023-10-09 19:21:13,054 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849662.6666666665, ans=0.125 2023-10-09 19:21:16,948 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2849662.6666666665, ans=0.0 2023-10-09 19:21:20,787 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2849662.6666666665, ans=0.0 2023-10-09 19:21:32,865 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2849709.3333333335, ans=0.125 2023-10-09 19:22:01,802 INFO [train.py:1031] (1/4) Epoch 14, batch 25950, loss[loss=0.216, simple_loss=0.2789, pruned_loss=0.05647, ctc_loss=0.1005, over 16938.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2896, pruned_loss=0.06277, ctc_loss=0.1092, over 3293636.01 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:22:21,343 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-10-09 19:22:28,776 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.824e+02 3.535e+02 4.166e+02 1.027e+03, threshold=7.071e+02, percent-clipped=2.0 2023-10-09 19:22:29,111 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2849942.6666666665, ans=0.1 2023-10-09 19:22:30,185 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2849942.6666666665, ans=0.125 2023-10-09 19:22:31,698 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2849942.6666666665, ans=0.125 2023-10-09 19:22:40,586 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2849989.3333333335, ans=6.0 2023-10-09 19:23:02,807 INFO [train.py:1031] (1/4) Epoch 14, batch 26000, loss[loss=0.2708, simple_loss=0.3063, pruned_loss=0.08619, ctc_loss=0.1572, over 16953.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2868, pruned_loss=0.06331, ctc_loss=0.1102, over 3304708.94 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:23:08,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2850082.6666666665, ans=0.0 2023-10-09 19:23:11,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2850082.6666666665, ans=0.125 2023-10-09 19:23:23,047 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2850129.3333333335, ans=0.0 2023-10-09 19:23:28,627 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2850176.0, ans=0.1 2023-10-09 19:23:43,882 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2850222.6666666665, ans=0.0 2023-10-09 19:23:45,442 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2850222.6666666665, ans=0.125 2023-10-09 19:24:01,727 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2850269.3333333335, ans=0.125 2023-10-09 19:24:04,540 INFO [train.py:1031] (1/4) Epoch 14, batch 26050, loss[loss=0.2053, simple_loss=0.2873, pruned_loss=0.04576, ctc_loss=0.07943, over 16789.00 frames. ], tot_loss[loss=0.227, simple_loss=0.286, pruned_loss=0.0623, ctc_loss=0.1084, over 3296671.55 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:24:09,837 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2850316.0, ans=0.125 2023-10-09 19:24:09,849 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2850316.0, ans=0.125 2023-10-09 19:24:20,751 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2023-10-09 19:24:31,163 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.011e+02 3.542e+02 4.270e+02 6.836e+02, threshold=7.085e+02, percent-clipped=0.0 2023-10-09 19:24:39,199 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2850456.0, ans=0.2 2023-10-09 19:24:48,523 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2850456.0, ans=0.125 2023-10-09 19:24:51,669 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2850502.6666666665, ans=0.0 2023-10-09 19:24:53,701 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=12.0 2023-10-09 19:25:04,426 INFO [train.py:1031] (1/4) Epoch 14, batch 26100, loss[loss=0.2137, simple_loss=0.2853, pruned_loss=0.054, ctc_loss=0.08554, over 16849.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2893, pruned_loss=0.06195, ctc_loss=0.1067, over 3292044.67 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:25:12,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2850549.3333333335, ans=0.1 2023-10-09 19:25:12,450 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-10-09 19:25:14,653 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=22.5 2023-10-09 19:25:24,445 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-10-09 19:26:06,160 INFO [train.py:1031] (1/4) Epoch 14, batch 26150, loss[loss=0.2751, simple_loss=0.3158, pruned_loss=0.08815, ctc_loss=0.1451, over 16916.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2919, pruned_loss=0.06396, ctc_loss=0.1103, over 3288009.69 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:26:10,318 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2850782.6666666665, ans=0.125 2023-10-09 19:26:36,019 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 3.218e+02 3.789e+02 4.435e+02 6.214e+02, threshold=7.579e+02, percent-clipped=0.0 2023-10-09 19:26:36,712 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-10-09 19:26:59,097 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2850969.3333333335, ans=0.1 2023-10-09 19:27:07,873 INFO [train.py:1031] (1/4) Epoch 14, batch 26200, loss[loss=0.2302, simple_loss=0.2881, pruned_loss=0.06425, ctc_loss=0.1094, over 16813.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2889, pruned_loss=0.06389, ctc_loss=0.1098, over 3293344.70 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:27:15,399 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.61 vs. limit=12.0 2023-10-09 19:27:17,104 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2851016.0, ans=0.125 2023-10-09 19:27:33,469 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2851109.3333333335, ans=10.0 2023-10-09 19:27:42,419 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2851109.3333333335, ans=0.125 2023-10-09 19:28:08,073 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.13 vs. limit=10.0 2023-10-09 19:28:09,474 INFO [train.py:1031] (1/4) Epoch 14, batch 26250, loss[loss=0.2007, simple_loss=0.2776, pruned_loss=0.04556, ctc_loss=0.08183, over 15145.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2796, pruned_loss=0.06103, ctc_loss=0.1046, over 3296315.67 frames. ], batch size: 529, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:28:15,778 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-10-09 19:28:21,786 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2851296.0, ans=0.125 2023-10-09 19:28:30,789 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2023-10-09 19:28:38,339 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2851342.6666666665, ans=0.125 2023-10-09 19:28:43,498 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 3.073e+02 4.030e+02 5.136e+02 8.779e+02, threshold=8.059e+02, percent-clipped=2.0 2023-10-09 19:28:50,315 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2851389.3333333335, ans=0.125 2023-10-09 19:29:11,868 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-10-09 19:29:13,859 INFO [train.py:1031] (1/4) Epoch 14, batch 26300, loss[loss=0.2541, simple_loss=0.3051, pruned_loss=0.07446, ctc_loss=0.1357, over 15258.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2839, pruned_loss=0.06103, ctc_loss=0.1052, over 3302473.97 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:29:23,857 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2851482.6666666665, ans=0.0 2023-10-09 19:29:35,088 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2851529.3333333335, ans=0.125 2023-10-09 19:29:41,324 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-10-09 19:29:42,681 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2851576.0, ans=0.125 2023-10-09 19:29:56,951 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2851622.6666666665, ans=0.025 2023-10-09 19:29:57,329 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2023-10-09 19:30:00,794 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2851622.6666666665, ans=0.125 2023-10-09 19:30:03,903 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-10-09 19:30:06,018 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.53 vs. limit=10.0 2023-10-09 19:30:09,643 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2851669.3333333335, ans=0.125 2023-10-09 19:30:18,137 INFO [train.py:1031] (1/4) Epoch 14, batch 26350, loss[loss=0.2847, simple_loss=0.3325, pruned_loss=0.0916, ctc_loss=0.1345, over 11987.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2913, pruned_loss=0.06395, ctc_loss=0.1114, over 3291584.33 frames. ], batch size: 36, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:30:29,436 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2851762.6666666665, ans=0.2 2023-10-09 19:30:49,815 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+02 3.520e+02 4.150e+02 4.845e+02 1.370e+03, threshold=8.299e+02, percent-clipped=2.0 2023-10-09 19:30:53,579 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2851809.3333333335, ans=0.125 2023-10-09 19:31:05,559 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2851856.0, ans=0.1 2023-10-09 19:31:17,654 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-10-09 19:31:20,350 INFO [train.py:1031] (1/4) Epoch 14, batch 26400, loss[loss=0.1903, simple_loss=0.2499, pruned_loss=0.04965, ctc_loss=0.07855, over 12923.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2925, pruned_loss=0.06498, ctc_loss=0.1134, over 3289173.80 frames. ], batch size: 38, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:32:18,664 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:32:20,973 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=22.5 2023-10-09 19:32:24,383 INFO [train.py:1031] (1/4) Epoch 14, batch 26450, loss[loss=0.2088, simple_loss=0.2654, pruned_loss=0.05699, ctc_loss=0.09557, over 16775.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2896, pruned_loss=0.06271, ctc_loss=0.1097, over 3287284.09 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:32:31,323 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2852182.6666666665, ans=0.1 2023-10-09 19:32:35,527 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2852229.3333333335, ans=0.1 2023-10-09 19:32:58,050 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.048e+02 3.586e+02 4.298e+02 7.757e+02, threshold=7.171e+02, percent-clipped=0.0 2023-10-09 19:33:04,542 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2852322.6666666665, ans=6.0 2023-10-09 19:33:05,909 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2852322.6666666665, ans=0.0 2023-10-09 19:33:28,751 INFO [train.py:1031] (1/4) Epoch 14, batch 26500, loss[loss=0.2261, simple_loss=0.2766, pruned_loss=0.06642, ctc_loss=0.1067, over 16822.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.291, pruned_loss=0.0635, ctc_loss=0.1106, over 3285887.25 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 19:33:42,168 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2852462.6666666665, ans=0.05 2023-10-09 19:34:11,528 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2852556.0, ans=0.125 2023-10-09 19:34:21,603 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2852602.6666666665, ans=0.125 2023-10-09 19:34:30,314 INFO [train.py:1031] (1/4) Epoch 14, batch 26550, loss[loss=0.315, simple_loss=0.3522, pruned_loss=0.1012, ctc_loss=0.1888, over 16721.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2943, pruned_loss=0.0662, ctc_loss=0.1151, over 3287177.98 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:34:42,928 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-10-09 19:34:45,888 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2852696.0, ans=0.125 2023-10-09 19:35:06,511 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 3.537e+02 4.198e+02 5.222e+02 9.143e+02, threshold=8.395e+02, percent-clipped=3.0 2023-10-09 19:35:21,531 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:35:32,289 INFO [train.py:1031] (1/4) Epoch 14, batch 26600, loss[loss=0.2121, simple_loss=0.26, pruned_loss=0.06153, ctc_loss=0.1029, over 16655.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2983, pruned_loss=0.06562, ctc_loss=0.1148, over 3287353.50 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:35:45,918 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2852929.3333333335, ans=0.125 2023-10-09 19:35:49,772 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2852929.3333333335, ans=0.125 2023-10-09 19:35:52,410 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2852929.3333333335, ans=10.0 2023-10-09 19:36:05,007 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2852976.0, ans=0.125 2023-10-09 19:36:23,913 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2853069.3333333335, ans=0.125 2023-10-09 19:36:34,527 INFO [train.py:1031] (1/4) Epoch 14, batch 26650, loss[loss=0.2127, simple_loss=0.2915, pruned_loss=0.04815, ctc_loss=0.09391, over 16814.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2954, pruned_loss=0.06152, ctc_loss=0.109, over 3293066.08 frames. ], batch size: 291, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:36:36,985 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2853116.0, ans=0.0 2023-10-09 19:36:49,624 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2853162.6666666665, ans=0.125 2023-10-09 19:36:50,783 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2853162.6666666665, ans=0.0 2023-10-09 19:36:52,389 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2853162.6666666665, ans=0.125 2023-10-09 19:37:01,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2853209.3333333335, ans=0.05 2023-10-09 19:37:01,909 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2853209.3333333335, ans=0.1 2023-10-09 19:37:10,950 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.027e+02 3.490e+02 4.414e+02 7.979e+02, threshold=6.980e+02, percent-clipped=0.0 2023-10-09 19:37:35,081 INFO [train.py:1031] (1/4) Epoch 14, batch 26700, loss[loss=0.1863, simple_loss=0.2436, pruned_loss=0.04822, ctc_loss=0.08133, over 16832.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2903, pruned_loss=0.05961, ctc_loss=0.1063, over 3300369.89 frames. ], batch size: 189, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:37:45,728 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2853349.3333333335, ans=0.2 2023-10-09 19:37:46,749 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2853396.0, ans=0.125 2023-10-09 19:38:36,852 INFO [train.py:1031] (1/4) Epoch 14, batch 26750, loss[loss=0.1909, simple_loss=0.2583, pruned_loss=0.04467, ctc_loss=0.08553, over 16877.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2827, pruned_loss=0.05923, ctc_loss=0.1051, over 3304210.22 frames. ], batch size: 244, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:38:46,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2853582.6666666665, ans=0.1 2023-10-09 19:38:51,546 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-10-09 19:38:55,789 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-10-09 19:39:14,605 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.221e+02 3.735e+02 4.264e+02 6.455e+02, threshold=7.471e+02, percent-clipped=0.0 2023-10-09 19:39:26,736 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2853769.3333333335, ans=0.0 2023-10-09 19:39:38,951 INFO [train.py:1031] (1/4) Epoch 14, batch 26800, loss[loss=0.2143, simple_loss=0.2556, pruned_loss=0.06305, ctc_loss=0.1172, over 16110.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2782, pruned_loss=0.05885, ctc_loss=0.1041, over 3304817.69 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:39:39,311 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2853816.0, ans=0.125 2023-10-09 19:39:44,484 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2853816.0, ans=0.0 2023-10-09 19:40:36,469 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2854002.6666666665, ans=15.0 2023-10-09 19:40:41,947 INFO [train.py:1031] (1/4) Epoch 14, batch 26850, loss[loss=0.2385, simple_loss=0.2937, pruned_loss=0.06754, ctc_loss=0.1207, over 16716.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2826, pruned_loss=0.06211, ctc_loss=0.1097, over 3312182.37 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:40:48,727 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2854049.3333333335, ans=10.0 2023-10-09 19:40:56,266 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2854096.0, ans=0.2 2023-10-09 19:41:21,412 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.526e+02 4.022e+02 4.797e+02 9.323e+02, threshold=8.043e+02, percent-clipped=3.0 2023-10-09 19:41:27,883 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2854189.3333333335, ans=0.04949747468305833 2023-10-09 19:41:36,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2854236.0, ans=0.125 2023-10-09 19:41:45,194 INFO [train.py:1031] (1/4) Epoch 14, batch 26900, loss[loss=0.2217, simple_loss=0.2911, pruned_loss=0.05498, ctc_loss=0.1058, over 16741.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2877, pruned_loss=0.06225, ctc_loss=0.1103, over 3315120.85 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:42:10,571 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2854376.0, ans=0.0 2023-10-09 19:42:33,676 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2854422.6666666665, ans=0.0 2023-10-09 19:42:33,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2854422.6666666665, ans=0.0 2023-10-09 19:42:40,415 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2854469.3333333335, ans=0.1 2023-10-09 19:42:46,329 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2854516.0, ans=0.125 2023-10-09 19:42:47,786 INFO [train.py:1031] (1/4) Epoch 14, batch 26950, loss[loss=0.224, simple_loss=0.2709, pruned_loss=0.06777, ctc_loss=0.1041, over 16850.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2895, pruned_loss=0.06209, ctc_loss=0.1103, over 3310587.97 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:42:57,576 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2854516.0, ans=0.125 2023-10-09 19:43:05,186 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=12.0 2023-10-09 19:43:16,322 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=22.5 2023-10-09 19:43:16,869 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:43:17,952 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2854609.3333333335, ans=0.2 2023-10-09 19:43:21,102 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2854609.3333333335, ans=15.0 2023-10-09 19:43:26,397 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+02 3.110e+02 3.559e+02 4.212e+02 9.939e+02, threshold=7.118e+02, percent-clipped=2.0 2023-10-09 19:43:33,630 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2854656.0, ans=0.0 2023-10-09 19:43:37,420 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2854702.6666666665, ans=0.0 2023-10-09 19:43:48,324 INFO [train.py:1031] (1/4) Epoch 14, batch 27000, loss[loss=0.2051, simple_loss=0.2394, pruned_loss=0.06297, ctc_loss=0.112, over 16367.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2834, pruned_loss=0.06261, ctc_loss=0.1106, over 3308989.46 frames. ], batch size: 417, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:43:48,325 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 19:44:01,347 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1212, 2.4883, 4.9643, 1.9504], device='cuda:1') 2023-10-09 19:44:04,115 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0096, 3.3346, 3.4221, 3.5080], device='cuda:1') 2023-10-09 19:44:06,707 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2336, simple_loss=0.3018, pruned_loss=0.06376, ctc_loss=0.09459, over 1796401.00 frames. 2023-10-09 19:44:06,708 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 19:44:09,851 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2854749.3333333335, ans=0.2 2023-10-09 19:44:48,050 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=22.5 2023-10-09 19:45:01,221 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-10-09 19:45:06,468 INFO [train.py:1031] (1/4) Epoch 14, batch 27050, loss[loss=0.2091, simple_loss=0.2516, pruned_loss=0.06176, ctc_loss=0.1075, over 16244.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2771, pruned_loss=0.06069, ctc_loss=0.1064, over 3307471.89 frames. ], batch size: 466, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:45:10,729 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2854982.6666666665, ans=0.125 2023-10-09 19:45:15,920 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2854982.6666666665, ans=15.0 2023-10-09 19:45:20,745 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-10-09 19:45:23,397 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2855029.3333333335, ans=0.07 2023-10-09 19:45:26,759 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2855029.3333333335, ans=0.0 2023-10-09 19:45:28,601 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2855076.0, ans=0.125 2023-10-09 19:45:43,661 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2855122.6666666665, ans=0.1 2023-10-09 19:45:44,970 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.824e+02 3.206e+02 4.209e+02 1.336e+03, threshold=6.413e+02, percent-clipped=5.0 2023-10-09 19:46:05,133 INFO [train.py:1031] (1/4) Epoch 14, batch 27100, loss[loss=0.2288, simple_loss=0.2753, pruned_loss=0.068, ctc_loss=0.1156, over 16762.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2719, pruned_loss=0.05919, ctc_loss=0.1028, over 3311262.03 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:46:18,068 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2855262.6666666665, ans=10.0 2023-10-09 19:46:20,164 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2855262.6666666665, ans=0.125 2023-10-09 19:46:39,215 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2855356.0, ans=0.5 2023-10-09 19:46:52,507 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2855402.6666666665, ans=0.0 2023-10-09 19:47:04,172 INFO [train.py:1031] (1/4) Epoch 14, batch 27150, loss[loss=0.167, simple_loss=0.2329, pruned_loss=0.03767, ctc_loss=0.06428, over 14191.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.27, pruned_loss=0.05949, ctc_loss=0.1029, over 3289885.35 frames. ], batch size: 51, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:47:04,426 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2855449.3333333335, ans=0.125 2023-10-09 19:47:23,176 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2855496.0, ans=0.04949747468305833 2023-10-09 19:47:46,371 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+02 3.038e+02 3.523e+02 4.275e+02 1.319e+03, threshold=7.047e+02, percent-clipped=7.0 2023-10-09 19:47:48,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2855589.3333333335, ans=0.125 2023-10-09 19:47:48,813 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=15.0 2023-10-09 19:47:52,015 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=2855589.3333333335, ans=12.0 2023-10-09 19:47:59,308 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2855636.0, ans=0.0 2023-10-09 19:48:02,017 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2855636.0, ans=0.2 2023-10-09 19:48:05,720 INFO [train.py:1031] (1/4) Epoch 14, batch 27200, loss[loss=0.3034, simple_loss=0.3767, pruned_loss=0.08479, ctc_loss=0.1515, over 16612.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2792, pruned_loss=0.06034, ctc_loss=0.1051, over 3286540.80 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:48:37,016 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2855776.0, ans=0.1 2023-10-09 19:48:39,885 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2855776.0, ans=0.0 2023-10-09 19:49:00,097 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2855869.3333333335, ans=0.125 2023-10-09 19:49:06,309 INFO [train.py:1031] (1/4) Epoch 14, batch 27250, loss[loss=0.1857, simple_loss=0.241, pruned_loss=0.04957, ctc_loss=0.0782, over 12029.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2808, pruned_loss=0.06039, ctc_loss=0.1054, over 3294138.67 frames. ], batch size: 36, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:49:36,572 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2856009.3333333335, ans=0.1 2023-10-09 19:49:50,667 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2856056.0, ans=0.125 2023-10-09 19:49:51,930 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 3.213e+02 3.949e+02 4.744e+02 1.249e+03, threshold=7.899e+02, percent-clipped=6.0 2023-10-09 19:50:10,442 INFO [train.py:1031] (1/4) Epoch 14, batch 27300, loss[loss=0.1944, simple_loss=0.2698, pruned_loss=0.04436, ctc_loss=0.07557, over 16856.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2759, pruned_loss=0.05925, ctc_loss=0.1038, over 3290064.02 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:50:46,234 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2856242.6666666665, ans=0.125 2023-10-09 19:50:47,340 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:51:13,360 INFO [train.py:1031] (1/4) Epoch 14, batch 27350, loss[loss=0.184, simple_loss=0.2562, pruned_loss=0.04145, ctc_loss=0.07239, over 16864.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2745, pruned_loss=0.057, ctc_loss=0.1005, over 3285787.52 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:51:21,848 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2856382.6666666665, ans=0.125 2023-10-09 19:51:22,269 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-10-09 19:51:38,273 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2856476.0, ans=0.2 2023-10-09 19:51:42,754 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2856476.0, ans=0.125 2023-10-09 19:51:58,974 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.714e+02 3.156e+02 4.138e+02 1.229e+03, threshold=6.312e+02, percent-clipped=2.0 2023-10-09 19:52:07,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2856569.3333333335, ans=0.1 2023-10-09 19:52:15,065 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-10-09 19:52:15,420 INFO [train.py:1031] (1/4) Epoch 14, batch 27400, loss[loss=0.1562, simple_loss=0.2362, pruned_loss=0.02803, ctc_loss=0.05039, over 16858.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2715, pruned_loss=0.05398, ctc_loss=0.09571, over 3293342.66 frames. ], batch size: 189, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:52:23,793 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2856616.0, ans=0.125 2023-10-09 19:52:31,941 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2856662.6666666665, ans=0.125 2023-10-09 19:52:41,071 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2856709.3333333335, ans=0.1 2023-10-09 19:52:43,693 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2856709.3333333335, ans=0.125 2023-10-09 19:52:43,830 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2856709.3333333335, ans=0.125 2023-10-09 19:52:47,517 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:52:58,719 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2856756.0, ans=0.0 2023-10-09 19:53:14,353 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2856849.3333333335, ans=0.2 2023-10-09 19:53:15,196 INFO [train.py:1031] (1/4) Epoch 14, batch 27450, loss[loss=0.1979, simple_loss=0.2532, pruned_loss=0.05258, ctc_loss=0.09385, over 16767.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2677, pruned_loss=0.05421, ctc_loss=0.09628, over 3293280.47 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:53:21,680 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2856849.3333333335, ans=0.125 2023-10-09 19:53:22,676 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:53:40,505 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-10-09 19:53:46,068 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2856942.6666666665, ans=0.125 2023-10-09 19:54:00,072 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.825e+02 3.198e+02 4.029e+02 6.832e+02, threshold=6.397e+02, percent-clipped=4.0 2023-10-09 19:54:16,221 INFO [train.py:1031] (1/4) Epoch 14, batch 27500, loss[loss=0.2216, simple_loss=0.2801, pruned_loss=0.05927, ctc_loss=0.1116, over 16423.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2664, pruned_loss=0.05353, ctc_loss=0.09522, over 3279096.74 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:54:30,411 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2857129.3333333335, ans=0.5 2023-10-09 19:54:49,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2857176.0, ans=0.1 2023-10-09 19:54:56,416 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2857222.6666666665, ans=0.125 2023-10-09 19:55:07,885 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2857269.3333333335, ans=0.2 2023-10-09 19:55:17,198 INFO [train.py:1031] (1/4) Epoch 14, batch 27550, loss[loss=0.204, simple_loss=0.2488, pruned_loss=0.05953, ctc_loss=0.1002, over 16658.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2662, pruned_loss=0.05431, ctc_loss=0.09642, over 3287724.75 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:55:26,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2857316.0, ans=0.125 2023-10-09 19:55:32,674 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2857362.6666666665, ans=0.1 2023-10-09 19:55:34,853 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2857362.6666666665, ans=0.125 2023-10-09 19:55:36,525 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2857362.6666666665, ans=0.2 2023-10-09 19:55:39,373 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2857362.6666666665, ans=0.2 2023-10-09 19:55:40,744 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-10-09 19:55:41,997 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2857409.3333333335, ans=0.2 2023-10-09 19:55:54,729 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2023-10-09 19:56:06,684 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.165e+02 3.747e+02 4.293e+02 1.170e+03, threshold=7.493e+02, percent-clipped=3.0 2023-10-09 19:56:09,773 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2857502.6666666665, ans=0.125 2023-10-09 19:56:20,205 INFO [train.py:1031] (1/4) Epoch 14, batch 27600, loss[loss=0.2796, simple_loss=0.31, pruned_loss=0.09303, ctc_loss=0.1579, over 16676.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2669, pruned_loss=0.05575, ctc_loss=0.09887, over 3285772.00 frames. ], batch size: 386, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:56:40,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2857596.0, ans=0.1 2023-10-09 19:56:42,492 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2857596.0, ans=0.2 2023-10-09 19:56:55,843 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-10-09 19:57:01,502 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2857689.3333333335, ans=0.125 2023-10-09 19:57:04,995 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2857689.3333333335, ans=22.5 2023-10-09 19:57:16,175 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2857736.0, ans=0.125 2023-10-09 19:57:16,180 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2857736.0, ans=0.5 2023-10-09 19:57:19,213 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2857736.0, ans=0.1 2023-10-09 19:57:21,377 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2857782.6666666665, ans=0.125 2023-10-09 19:57:22,126 INFO [train.py:1031] (1/4) Epoch 14, batch 27650, loss[loss=0.1653, simple_loss=0.229, pruned_loss=0.03808, ctc_loss=0.0635, over 16718.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2695, pruned_loss=0.05602, ctc_loss=0.09955, over 3290897.46 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:57:59,206 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2857922.6666666665, ans=0.0 2023-10-09 19:58:10,217 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.178e+02 3.688e+02 4.513e+02 1.131e+03, threshold=7.375e+02, percent-clipped=1.0 2023-10-09 19:58:16,615 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2857969.3333333335, ans=0.125 2023-10-09 19:58:22,686 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2857969.3333333335, ans=0.125 2023-10-09 19:58:24,467 INFO [train.py:1031] (1/4) Epoch 14, batch 27700, loss[loss=0.235, simple_loss=0.2711, pruned_loss=0.07512, ctc_loss=0.1216, over 11744.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.2676, pruned_loss=0.05744, ctc_loss=0.1013, over 3282833.01 frames. ], batch size: 35, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:58:39,289 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2858062.6666666665, ans=0.125 2023-10-09 19:58:40,220 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2858062.6666666665, ans=0.125 2023-10-09 19:58:50,475 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:58:51,822 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-10-09 19:58:55,409 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2858109.3333333335, ans=0.0 2023-10-09 19:59:04,160 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2858156.0, ans=0.0 2023-10-09 19:59:24,059 INFO [train.py:1031] (1/4) Epoch 14, batch 27750, loss[loss=0.258, simple_loss=0.296, pruned_loss=0.08308, ctc_loss=0.1345, over 16930.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2662, pruned_loss=0.05907, ctc_loss=0.1039, over 3296757.78 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:59:28,488 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=22.5 2023-10-09 19:59:30,277 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2858249.3333333335, ans=0.0 2023-10-09 19:59:33,497 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2858249.3333333335, ans=0.1 2023-10-09 19:59:40,665 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2858296.0, ans=0.125 2023-10-09 19:59:47,592 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2858342.6666666665, ans=0.025 2023-10-09 19:59:53,982 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2858342.6666666665, ans=0.0 2023-10-09 20:00:04,979 WARNING [train.py:1204] (1/4) Exclude cut with ID R0014_M0086-0174-157 from training. Number of frames (before subsampling): 147. Number of frames (after subsampling): 35. Text: 你买多少东西一会儿他就送你这么多东西啊啊三大桶那三大桶得用多少时间就啊. Tokens: ['▁你', '买', '多', '少', '东', '西', '一', '会', '儿', '他', '就', '送', '你', '这', '么', '多', '东', '西', '啊', '啊', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '那', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '得', '用', '多', '少', '时', '间', '就', '啊']. Number of tokens: 39 2023-10-09 20:00:14,002 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+02 3.399e+02 3.890e+02 4.499e+02 8.877e+02, threshold=7.779e+02, percent-clipped=2.0 2023-10-09 20:00:24,188 INFO [train.py:1031] (1/4) Epoch 14, batch 27800, loss[loss=0.1931, simple_loss=0.2546, pruned_loss=0.04928, ctc_loss=0.08278, over 16887.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2664, pruned_loss=0.06048, ctc_loss=0.1063, over 3294587.02 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:00:27,721 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2858482.6666666665, ans=0.1 2023-10-09 20:00:34,422 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2858482.6666666665, ans=0.125 2023-10-09 20:01:10,573 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2858622.6666666665, ans=0.2 2023-10-09 20:01:15,532 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2858669.3333333335, ans=0.125 2023-10-09 20:01:27,840 INFO [train.py:1031] (1/4) Epoch 14, batch 27850, loss[loss=0.3341, simple_loss=0.3661, pruned_loss=0.1095, ctc_loss=0.2079, over 16721.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2759, pruned_loss=0.06389, ctc_loss=0.1133, over 3299153.36 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 20:01:29,336 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2858716.0, ans=0.025 2023-10-09 20:01:31,418 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2858716.0, ans=0.125 2023-10-09 20:01:53,047 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:01:55,736 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2858809.3333333335, ans=0.125 2023-10-09 20:02:18,205 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+02 3.601e+02 4.394e+02 5.369e+02 1.444e+03, threshold=8.787e+02, percent-clipped=3.0 2023-10-09 20:02:27,373 INFO [train.py:1031] (1/4) Epoch 14, batch 27900, loss[loss=0.2164, simple_loss=0.2922, pruned_loss=0.05148, ctc_loss=0.09413, over 16889.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2807, pruned_loss=0.06301, ctc_loss=0.1133, over 3304391.05 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:02:34,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2858949.3333333335, ans=0.2 2023-10-09 20:02:44,451 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2858996.0, ans=0.0 2023-10-09 20:02:45,705 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2858996.0, ans=22.5 2023-10-09 20:02:52,139 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:02:55,876 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2859042.6666666665, ans=0.125 2023-10-09 20:03:07,055 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2859089.3333333335, ans=0.2 2023-10-09 20:03:08,340 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-10-09 20:03:21,680 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2859136.0, ans=0.0 2023-10-09 20:03:29,819 INFO [train.py:1031] (1/4) Epoch 14, batch 27950, loss[loss=0.1767, simple_loss=0.2651, pruned_loss=0.03132, ctc_loss=0.06413, over 16880.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2777, pruned_loss=0.05831, ctc_loss=0.1059, over 3299801.32 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:03:34,448 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2859182.6666666665, ans=0.125 2023-10-09 20:03:36,552 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2859182.6666666665, ans=0.125 2023-10-09 20:03:46,692 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-10-09 20:03:57,232 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2859276.0, ans=0.125 2023-10-09 20:04:21,887 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.805e+02 3.200e+02 4.012e+02 8.186e+02, threshold=6.399e+02, percent-clipped=0.0 2023-10-09 20:04:25,429 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-10-09 20:04:31,540 INFO [train.py:1031] (1/4) Epoch 14, batch 28000, loss[loss=0.181, simple_loss=0.2423, pruned_loss=0.04436, ctc_loss=0.07724, over 16739.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2726, pruned_loss=0.05662, ctc_loss=0.1027, over 3294856.84 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:04:38,319 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2859416.0, ans=0.0 2023-10-09 20:04:49,980 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.81 vs. limit=10.0 2023-10-09 20:05:03,072 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=12.0 2023-10-09 20:05:06,654 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2859509.3333333335, ans=0.1 2023-10-09 20:05:16,953 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2859556.0, ans=0.125 2023-10-09 20:05:33,909 INFO [train.py:1031] (1/4) Epoch 14, batch 28050, loss[loss=0.2097, simple_loss=0.255, pruned_loss=0.06088, ctc_loss=0.1068, over 16830.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2698, pruned_loss=0.05801, ctc_loss=0.1045, over 3299578.35 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:05:46,405 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2859696.0, ans=0.125 2023-10-09 20:05:46,422 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2859696.0, ans=0.07 2023-10-09 20:05:54,111 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2859696.0, ans=0.1 2023-10-09 20:05:58,483 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2859742.6666666665, ans=0.0 2023-10-09 20:06:14,909 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2859789.3333333335, ans=0.0 2023-10-09 20:06:17,963 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.13 vs. limit=10.0 2023-10-09 20:06:25,713 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.244e+02 3.661e+02 4.395e+02 6.655e+02, threshold=7.321e+02, percent-clipped=2.0 2023-10-09 20:06:26,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2859836.0, ans=0.125 2023-10-09 20:06:34,744 INFO [train.py:1031] (1/4) Epoch 14, batch 28100, loss[loss=0.2167, simple_loss=0.2649, pruned_loss=0.06332, ctc_loss=0.105, over 16738.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2711, pruned_loss=0.06027, ctc_loss=0.1077, over 3306224.45 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:06:47,542 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2859929.3333333335, ans=0.125 2023-10-09 20:06:47,549 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2859929.3333333335, ans=0.2 2023-10-09 20:06:48,566 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2859929.3333333335, ans=0.125 2023-10-09 20:06:52,016 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2859929.3333333335, ans=0.0 2023-10-09 20:07:11,293 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2859976.0, ans=0.125 2023-10-09 20:07:23,608 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2860022.6666666665, ans=0.025 2023-10-09 20:07:27,912 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2860069.3333333335, ans=0.2 2023-10-09 20:07:38,441 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2860116.0, ans=0.125 2023-10-09 20:07:39,119 INFO [train.py:1031] (1/4) Epoch 14, batch 28150, loss[loss=0.2303, simple_loss=0.3314, pruned_loss=0.04548, ctc_loss=0.09588, over 15498.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2805, pruned_loss=0.06059, ctc_loss=0.1092, over 3302229.06 frames. ], batch size: 529, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:07:43,732 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2860116.0, ans=0.2 2023-10-09 20:08:07,185 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-10-09 20:08:08,211 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2860209.3333333335, ans=0.0 2023-10-09 20:08:26,875 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.63 vs. limit=6.0 2023-10-09 20:08:27,004 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-10-09 20:08:28,461 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2860302.6666666665, ans=0.1 2023-10-09 20:08:34,457 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.258e+02 3.630e+02 4.315e+02 7.484e+02, threshold=7.260e+02, percent-clipped=1.0 2023-10-09 20:08:41,523 INFO [train.py:1031] (1/4) Epoch 14, batch 28200, loss[loss=0.2687, simple_loss=0.313, pruned_loss=0.08265, ctc_loss=0.1479, over 16878.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2878, pruned_loss=0.06301, ctc_loss=0.1129, over 3297159.38 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:08:49,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2860349.3333333335, ans=0.1 2023-10-09 20:09:24,334 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2860489.3333333335, ans=0.05 2023-10-09 20:09:30,671 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2860536.0, ans=0.0 2023-10-09 20:09:43,226 INFO [train.py:1031] (1/4) Epoch 14, batch 28250, loss[loss=0.2217, simple_loss=0.268, pruned_loss=0.06593, ctc_loss=0.1087, over 16555.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2898, pruned_loss=0.06589, ctc_loss=0.1166, over 3296667.94 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:09:47,190 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2860582.6666666665, ans=0.1 2023-10-09 20:09:58,905 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2860629.3333333335, ans=0.1 2023-10-09 20:10:22,734 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2860722.6666666665, ans=0.125 2023-10-09 20:10:41,217 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+02 3.506e+02 4.003e+02 4.873e+02 1.007e+03, threshold=8.006e+02, percent-clipped=4.0 2023-10-09 20:10:46,100 INFO [train.py:1031] (1/4) Epoch 14, batch 28300, loss[loss=0.2188, simple_loss=0.2678, pruned_loss=0.06324, ctc_loss=0.1081, over 16817.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2883, pruned_loss=0.06666, ctc_loss=0.1173, over 3296284.65 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:10:51,510 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-10-09 20:11:10,579 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2023-10-09 20:11:11,755 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=22.5 2023-10-09 20:11:17,202 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-10-09 20:11:26,759 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2860956.0, ans=0.1 2023-10-09 20:11:38,844 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2861002.6666666665, ans=0.125 2023-10-09 20:11:40,874 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2861002.6666666665, ans=0.125 2023-10-09 20:11:40,916 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2861002.6666666665, ans=0.0 2023-10-09 20:11:48,191 INFO [train.py:1031] (1/4) Epoch 14, batch 28350, loss[loss=0.1799, simple_loss=0.2362, pruned_loss=0.04615, ctc_loss=0.07818, over 16662.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2829, pruned_loss=0.06578, ctc_loss=0.1152, over 3294366.79 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:11:53,566 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-10-09 20:12:07,021 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2861096.0, ans=0.025 2023-10-09 20:12:16,349 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-10-09 20:12:23,560 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-10-09 20:12:31,160 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2861189.3333333335, ans=0.0 2023-10-09 20:12:32,325 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2861189.3333333335, ans=0.0 2023-10-09 20:12:41,591 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=22.5 2023-10-09 20:12:42,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2861236.0, ans=0.2 2023-10-09 20:12:46,034 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.563e+02 3.327e+02 3.829e+02 4.439e+02 7.732e+02, threshold=7.659e+02, percent-clipped=0.0 2023-10-09 20:12:46,440 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2861236.0, ans=0.125 2023-10-09 20:12:46,698 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-10-09 20:12:50,327 INFO [train.py:1031] (1/4) Epoch 14, batch 28400, loss[loss=0.2311, simple_loss=0.311, pruned_loss=0.05532, ctc_loss=0.1016, over 16860.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2861, pruned_loss=0.06642, ctc_loss=0.1165, over 3301722.30 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:13:06,205 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2861329.3333333335, ans=0.07 2023-10-09 20:13:43,647 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-10-09 20:13:44,361 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2861469.3333333335, ans=0.1 2023-10-09 20:13:48,362 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2861469.3333333335, ans=0.0 2023-10-09 20:13:56,935 INFO [train.py:1031] (1/4) Epoch 14, batch 28450, loss[loss=0.2165, simple_loss=0.3037, pruned_loss=0.04721, ctc_loss=0.08733, over 16863.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2938, pruned_loss=0.06641, ctc_loss=0.1171, over 3299681.06 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:14:03,241 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2861516.0, ans=0.0 2023-10-09 20:14:26,393 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2861609.3333333335, ans=0.07 2023-10-09 20:14:31,324 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2861609.3333333335, ans=0.125 2023-10-09 20:14:36,263 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2861656.0, ans=0.0 2023-10-09 20:14:42,166 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2861656.0, ans=0.125 2023-10-09 20:14:58,087 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+02 3.582e+02 4.557e+02 5.514e+02 1.079e+03, threshold=9.115e+02, percent-clipped=9.0 2023-10-09 20:15:01,353 INFO [train.py:1031] (1/4) Epoch 14, batch 28500, loss[loss=0.2084, simple_loss=0.301, pruned_loss=0.04184, ctc_loss=0.08016, over 16881.00 frames. ], tot_loss[loss=0.243, simple_loss=0.3036, pruned_loss=0.06728, ctc_loss=0.1195, over 3293764.57 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:15:03,885 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2861749.3333333335, ans=0.07 2023-10-09 20:15:14,773 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2861796.0, ans=0.125 2023-10-09 20:15:31,177 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2861842.6666666665, ans=0.2 2023-10-09 20:16:03,042 INFO [train.py:1031] (1/4) Epoch 14, batch 28550, loss[loss=0.1535, simple_loss=0.242, pruned_loss=0.02398, ctc_loss=0.04232, over 16858.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2995, pruned_loss=0.06204, ctc_loss=0.1106, over 3291192.27 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:16:30,667 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-10-09 20:16:41,978 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2862122.6666666665, ans=0.125 2023-10-09 20:16:50,452 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2862169.3333333335, ans=10.0 2023-10-09 20:17:00,496 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.790e+02 3.333e+02 3.902e+02 5.980e+02, threshold=6.666e+02, percent-clipped=0.0 2023-10-09 20:17:03,187 INFO [train.py:1031] (1/4) Epoch 14, batch 28600, loss[loss=0.2294, simple_loss=0.2847, pruned_loss=0.06602, ctc_loss=0.105, over 16912.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2938, pruned_loss=0.06024, ctc_loss=0.1073, over 3304054.88 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:17:54,532 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2862402.6666666665, ans=0.1 2023-10-09 20:18:05,183 INFO [train.py:1031] (1/4) Epoch 14, batch 28650, loss[loss=0.2438, simple_loss=0.3009, pruned_loss=0.06705, ctc_loss=0.1316, over 16620.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2896, pruned_loss=0.06063, ctc_loss=0.1073, over 3298290.57 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:18:13,498 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=10.0 2023-10-09 20:18:18,120 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2862496.0, ans=0.125 2023-10-09 20:18:33,740 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2862542.6666666665, ans=0.125 2023-10-09 20:18:42,930 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=12.0 2023-10-09 20:18:47,514 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862589.3333333335, ans=0.1 2023-10-09 20:18:47,563 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2862589.3333333335, ans=0.025 2023-10-09 20:19:01,539 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2862636.0, ans=0.125 2023-10-09 20:19:05,964 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 2.995e+02 3.402e+02 4.215e+02 9.672e+02, threshold=6.804e+02, percent-clipped=2.0 2023-10-09 20:19:07,092 INFO [train.py:1031] (1/4) Epoch 14, batch 28700, loss[loss=0.176, simple_loss=0.238, pruned_loss=0.04278, ctc_loss=0.07107, over 16823.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2853, pruned_loss=0.05777, ctc_loss=0.1028, over 3295077.02 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:19:13,324 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2862682.6666666665, ans=0.125 2023-10-09 20:19:49,171 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2862822.6666666665, ans=0.025 2023-10-09 20:20:07,318 INFO [train.py:1031] (1/4) Epoch 14, batch 28750, loss[loss=0.205, simple_loss=0.2708, pruned_loss=0.05022, ctc_loss=0.09696, over 16878.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2826, pruned_loss=0.05661, ctc_loss=0.1009, over 3297835.34 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:20:13,552 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2862916.0, ans=15.0 2023-10-09 20:20:28,594 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2862962.6666666665, ans=0.125 2023-10-09 20:20:31,153 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2863009.3333333335, ans=0.125 2023-10-09 20:20:56,587 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2863102.6666666665, ans=0.125 2023-10-09 20:20:58,189 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2863102.6666666665, ans=0.1 2023-10-09 20:21:04,594 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2863102.6666666665, ans=0.125 2023-10-09 20:21:09,038 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 3.101e+02 3.665e+02 4.221e+02 6.562e+02, threshold=7.330e+02, percent-clipped=0.0 2023-10-09 20:21:09,064 INFO [train.py:1031] (1/4) Epoch 14, batch 28800, loss[loss=0.252, simple_loss=0.2815, pruned_loss=0.08231, ctc_loss=0.1447, over 16556.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2821, pruned_loss=0.05876, ctc_loss=0.1043, over 3304513.68 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:21:15,464 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=22.5 2023-10-09 20:21:23,459 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2863196.0, ans=0.125 2023-10-09 20:21:25,786 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-10-09 20:21:31,633 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2863196.0, ans=15.0 2023-10-09 20:21:39,212 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2863242.6666666665, ans=0.0 2023-10-09 20:21:40,323 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2863242.6666666665, ans=0.1 2023-10-09 20:21:55,957 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2863289.3333333335, ans=0.125 2023-10-09 20:22:03,562 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2863336.0, ans=0.125 2023-10-09 20:22:05,265 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2863336.0, ans=0.125 2023-10-09 20:22:06,306 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2863336.0, ans=0.2 2023-10-09 20:22:10,814 INFO [train.py:1031] (1/4) Epoch 14, batch 28850, loss[loss=0.2227, simple_loss=0.2688, pruned_loss=0.06652, ctc_loss=0.109, over 16804.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2787, pruned_loss=0.06015, ctc_loss=0.1065, over 3309309.09 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:22:24,245 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2863429.3333333335, ans=0.0 2023-10-09 20:22:56,441 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2863522.6666666665, ans=0.125 2023-10-09 20:22:58,611 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2863569.3333333335, ans=0.125 2023-10-09 20:23:03,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2863569.3333333335, ans=0.1 2023-10-09 20:23:12,083 INFO [train.py:1031] (1/4) Epoch 14, batch 28900, loss[loss=0.224, simple_loss=0.2625, pruned_loss=0.06945, ctc_loss=0.1164, over 16656.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2736, pruned_loss=0.06072, ctc_loss=0.1068, over 3309079.31 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:23:13,123 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+02 3.415e+02 3.744e+02 4.568e+02 8.890e+02, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 20:23:37,754 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2863709.3333333335, ans=0.125 2023-10-09 20:23:43,123 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2863709.3333333335, ans=0.125 2023-10-09 20:23:47,658 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-10-09 20:23:54,556 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2863756.0, ans=0.125 2023-10-09 20:24:13,690 INFO [train.py:1031] (1/4) Epoch 14, batch 28950, loss[loss=0.175, simple_loss=0.2373, pruned_loss=0.04173, ctc_loss=0.07294, over 16832.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2726, pruned_loss=0.0608, ctc_loss=0.1058, over 3302961.37 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:24:58,916 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=12.0 2023-10-09 20:25:15,082 INFO [train.py:1031] (1/4) Epoch 14, batch 29000, loss[loss=0.1895, simple_loss=0.2769, pruned_loss=0.03718, ctc_loss=0.0692, over 16578.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.27, pruned_loss=0.05895, ctc_loss=0.1021, over 3299369.00 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:25:15,373 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:25:17,199 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+02 3.225e+02 3.785e+02 4.643e+02 9.976e+02, threshold=7.570e+02, percent-clipped=3.0 2023-10-09 20:25:37,499 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2864129.3333333335, ans=0.0 2023-10-09 20:26:15,126 INFO [train.py:1031] (1/4) Epoch 14, batch 29050, loss[loss=0.2225, simple_loss=0.2721, pruned_loss=0.06387, ctc_loss=0.1128, over 16936.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2709, pruned_loss=0.05866, ctc_loss=0.1019, over 3297942.21 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:26:15,447 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2864316.0, ans=0.0 2023-10-09 20:26:16,529 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2864316.0, ans=0.125 2023-10-09 20:26:27,233 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2864362.6666666665, ans=0.2 2023-10-09 20:26:49,969 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2864409.3333333335, ans=0.1 2023-10-09 20:27:17,013 INFO [train.py:1031] (1/4) Epoch 14, batch 29100, loss[loss=0.2595, simple_loss=0.3045, pruned_loss=0.07918, ctc_loss=0.1402, over 16917.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2742, pruned_loss=0.06161, ctc_loss=0.1073, over 3308141.26 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:27:19,533 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2864549.3333333335, ans=0.125 2023-10-09 20:27:20,255 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+02 3.447e+02 3.769e+02 4.635e+02 6.729e+02, threshold=7.539e+02, percent-clipped=0.0 2023-10-09 20:27:50,756 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-10-09 20:28:02,746 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2864689.3333333335, ans=0.125 2023-10-09 20:28:06,430 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2864736.0, ans=0.2 2023-10-09 20:28:10,656 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2864736.0, ans=0.2 2023-10-09 20:28:16,818 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-10-09 20:28:18,188 INFO [train.py:1031] (1/4) Epoch 14, batch 29150, loss[loss=0.2352, simple_loss=0.2834, pruned_loss=0.07079, ctc_loss=0.1138, over 16768.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2787, pruned_loss=0.06385, ctc_loss=0.1115, over 3315547.97 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:28:19,492 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.07 vs. limit=6.0 2023-10-09 20:28:25,042 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-10-09 20:28:52,437 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.69 vs. limit=10.0 2023-10-09 20:28:54,088 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2864876.0, ans=0.125 2023-10-09 20:29:15,417 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2864969.3333333335, ans=0.2 2023-10-09 20:29:18,688 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2864969.3333333335, ans=0.125 2023-10-09 20:29:22,662 INFO [train.py:1031] (1/4) Epoch 14, batch 29200, loss[loss=0.2144, simple_loss=0.2766, pruned_loss=0.057, ctc_loss=0.09526, over 16800.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2817, pruned_loss=0.06428, ctc_loss=0.1123, over 3309081.43 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:29:27,765 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2023-10-09 20:29:28,273 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+02 3.299e+02 3.814e+02 4.330e+02 6.435e+02, threshold=7.628e+02, percent-clipped=0.0 2023-10-09 20:29:35,829 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2865062.6666666665, ans=0.125 2023-10-09 20:29:36,896 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2865062.6666666665, ans=0.125 2023-10-09 20:29:49,513 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2865109.3333333335, ans=0.125 2023-10-09 20:30:09,534 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2865156.0, ans=10.0 2023-10-09 20:30:24,818 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2865202.6666666665, ans=0.1 2023-10-09 20:30:27,649 INFO [train.py:1031] (1/4) Epoch 14, batch 29250, loss[loss=0.297, simple_loss=0.3594, pruned_loss=0.0876, ctc_loss=0.1482, over 16802.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2853, pruned_loss=0.06274, ctc_loss=0.1101, over 3307420.97 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:30:38,661 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=12.0 2023-10-09 20:30:42,596 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2865296.0, ans=0.125 2023-10-09 20:30:50,887 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-10-09 20:31:01,892 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-10-09 20:31:20,874 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2865436.0, ans=0.125 2023-10-09 20:31:23,552 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2865436.0, ans=0.2 2023-10-09 20:31:26,706 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2865436.0, ans=0.05 2023-10-09 20:31:32,754 INFO [train.py:1031] (1/4) Epoch 14, batch 29300, loss[loss=0.2696, simple_loss=0.3133, pruned_loss=0.08474, ctc_loss=0.1409, over 16567.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2953, pruned_loss=0.065, ctc_loss=0.1143, over 3289627.92 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:31:36,694 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2865482.6666666665, ans=0.0 2023-10-09 20:31:37,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2865482.6666666665, ans=0.0 2023-10-09 20:31:38,516 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 3.153e+02 3.767e+02 4.679e+02 9.052e+02, threshold=7.535e+02, percent-clipped=4.0 2023-10-09 20:32:28,864 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2865669.3333333335, ans=0.1 2023-10-09 20:32:33,854 INFO [train.py:1031] (1/4) Epoch 14, batch 29350, loss[loss=0.2158, simple_loss=0.2586, pruned_loss=0.06536, ctc_loss=0.1057, over 16786.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2925, pruned_loss=0.06565, ctc_loss=0.115, over 3287225.11 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:32:50,013 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-10-09 20:32:56,125 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2865762.6666666665, ans=0.04949747468305833 2023-10-09 20:33:01,435 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2865809.3333333335, ans=0.2 2023-10-09 20:33:03,023 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2865809.3333333335, ans=0.0 2023-10-09 20:33:08,925 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2865809.3333333335, ans=0.125 2023-10-09 20:33:20,084 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2865856.0, ans=0.125 2023-10-09 20:33:21,720 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2865856.0, ans=0.0 2023-10-09 20:33:28,139 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-10-09 20:33:28,911 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2865902.6666666665, ans=0.0 2023-10-09 20:33:29,910 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2865902.6666666665, ans=0.1 2023-10-09 20:33:36,260 INFO [train.py:1031] (1/4) Epoch 14, batch 29400, loss[loss=0.1751, simple_loss=0.2521, pruned_loss=0.03613, ctc_loss=0.0643, over 16914.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2859, pruned_loss=0.06228, ctc_loss=0.1094, over 3281749.49 frames. ], batch size: 229, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:33:44,077 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.925e+02 3.429e+02 4.063e+02 7.311e+02, threshold=6.858e+02, percent-clipped=0.0 2023-10-09 20:33:50,372 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2865996.0, ans=0.125 2023-10-09 20:33:59,226 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2865996.0, ans=0.125 2023-10-09 20:34:09,783 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2866042.6666666665, ans=0.2 2023-10-09 20:34:12,037 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-10-09 20:34:25,666 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2866089.3333333335, ans=0.0 2023-10-09 20:34:30,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2866136.0, ans=0.0 2023-10-09 20:34:38,645 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2023-10-09 20:34:40,014 INFO [train.py:1031] (1/4) Epoch 14, batch 29450, loss[loss=0.2768, simple_loss=0.341, pruned_loss=0.07526, ctc_loss=0.1554, over 16557.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2808, pruned_loss=0.05817, ctc_loss=0.1033, over 3287173.80 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:34:42,086 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2866182.6666666665, ans=0.125 2023-10-09 20:35:01,579 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2866229.3333333335, ans=0.0 2023-10-09 20:35:23,430 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2866322.6666666665, ans=0.2 2023-10-09 20:35:23,745 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-10-09 20:35:27,706 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2866322.6666666665, ans=0.0 2023-10-09 20:35:35,737 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2866369.3333333335, ans=0.0 2023-10-09 20:35:43,430 INFO [train.py:1031] (1/4) Epoch 14, batch 29500, loss[loss=0.2491, simple_loss=0.312, pruned_loss=0.06733, ctc_loss=0.1288, over 16573.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2838, pruned_loss=0.05644, ctc_loss=0.1015, over 3291158.68 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 20:35:51,252 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.902e+02 3.659e+02 4.459e+02 8.520e+02, threshold=7.319e+02, percent-clipped=6.0 2023-10-09 20:36:02,763 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866462.6666666665, ans=0.1 2023-10-09 20:36:44,252 INFO [train.py:1031] (1/4) Epoch 14, batch 29550, loss[loss=0.2116, simple_loss=0.2492, pruned_loss=0.06394, ctc_loss=0.1155, over 16770.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2782, pruned_loss=0.05598, ctc_loss=0.1006, over 3290003.60 frames. ], batch size: 141, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:36:50,042 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-10-09 20:36:51,179 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-10-09 20:37:15,930 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=2866742.6666666665, ans=22.5 2023-10-09 20:37:42,097 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2866836.0, ans=0.125 2023-10-09 20:37:44,950 INFO [train.py:1031] (1/4) Epoch 14, batch 29600, loss[loss=0.2078, simple_loss=0.2782, pruned_loss=0.05077, ctc_loss=0.08958, over 16888.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.274, pruned_loss=0.05628, ctc_loss=0.1008, over 3303200.26 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:37:47,778 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=12.0 2023-10-09 20:37:54,761 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.047e+02 3.582e+02 4.028e+02 6.950e+02, threshold=7.163e+02, percent-clipped=0.0 2023-10-09 20:37:59,825 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2866929.3333333335, ans=0.0 2023-10-09 20:38:14,760 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866976.0, ans=0.1 2023-10-09 20:38:46,697 INFO [train.py:1031] (1/4) Epoch 14, batch 29650, loss[loss=0.21, simple_loss=0.2762, pruned_loss=0.05319, ctc_loss=0.09359, over 16773.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2757, pruned_loss=0.05675, ctc_loss=0.1013, over 3301048.34 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:38:47,001 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2867116.0, ans=0.0 2023-10-09 20:38:57,486 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2867116.0, ans=15.0 2023-10-09 20:39:19,036 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2867209.3333333335, ans=0.1 2023-10-09 20:39:41,290 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-10-09 20:39:48,359 INFO [train.py:1031] (1/4) Epoch 14, batch 29700, loss[loss=0.2484, simple_loss=0.2976, pruned_loss=0.07325, ctc_loss=0.1317, over 16901.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2778, pruned_loss=0.05848, ctc_loss=0.1039, over 3305773.51 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:39:59,297 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.266e+02 3.794e+02 4.396e+02 1.319e+03, threshold=7.588e+02, percent-clipped=2.0 2023-10-09 20:40:03,463 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2867396.0, ans=0.125 2023-10-09 20:40:15,633 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2867442.6666666665, ans=0.0 2023-10-09 20:40:26,750 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2867489.3333333335, ans=0.125 2023-10-09 20:40:31,438 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2867489.3333333335, ans=0.125 2023-10-09 20:40:41,862 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2867536.0, ans=0.1 2023-10-09 20:40:50,194 INFO [train.py:1031] (1/4) Epoch 14, batch 29750, loss[loss=0.2325, simple_loss=0.2891, pruned_loss=0.06465, ctc_loss=0.1167, over 16914.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2786, pruned_loss=0.06027, ctc_loss=0.1068, over 3308307.51 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:40:56,445 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2867582.6666666665, ans=0.125 2023-10-09 20:41:02,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2867629.3333333335, ans=0.125 2023-10-09 20:41:52,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2867816.0, ans=0.2 2023-10-09 20:41:53,581 INFO [train.py:1031] (1/4) Epoch 14, batch 29800, loss[loss=0.2179, simple_loss=0.2885, pruned_loss=0.05464, ctc_loss=0.09485, over 16658.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.281, pruned_loss=0.06285, ctc_loss=0.1108, over 3306141.32 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:41:53,913 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2867816.0, ans=0.125 2023-10-09 20:41:57,735 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2867816.0, ans=0.05 2023-10-09 20:42:05,752 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.683e+02 3.252e+02 3.750e+02 4.690e+02 1.156e+03, threshold=7.500e+02, percent-clipped=2.0 2023-10-09 20:42:13,757 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2867862.6666666665, ans=0.2 2023-10-09 20:42:21,039 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2023-10-09 20:42:22,744 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2867909.3333333335, ans=0.0 2023-10-09 20:42:30,881 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2867956.0, ans=0.0 2023-10-09 20:42:30,909 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=2867956.0, ans=0.1 2023-10-09 20:42:43,170 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-10-09 20:42:56,196 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2868049.3333333335, ans=0.125 2023-10-09 20:42:56,940 INFO [train.py:1031] (1/4) Epoch 14, batch 29850, loss[loss=0.252, simple_loss=0.3042, pruned_loss=0.07324, ctc_loss=0.1336, over 16913.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2903, pruned_loss=0.06377, ctc_loss=0.1129, over 3300512.14 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:42:58,384 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2868049.3333333335, ans=0.0 2023-10-09 20:43:41,542 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-10-09 20:43:42,727 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2023-10-09 20:43:54,688 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2868236.0, ans=0.1 2023-10-09 20:44:02,050 INFO [train.py:1031] (1/4) Epoch 14, batch 29900, loss[loss=0.2588, simple_loss=0.3099, pruned_loss=0.07686, ctc_loss=0.1351, over 16828.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2947, pruned_loss=0.06708, ctc_loss=0.1179, over 3297460.27 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:44:13,151 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2868329.3333333335, ans=0.125 2023-10-09 20:44:15,803 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+02 3.520e+02 3.961e+02 4.963e+02 1.132e+03, threshold=7.922e+02, percent-clipped=8.0 2023-10-09 20:44:25,019 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2868329.3333333335, ans=0.125 2023-10-09 20:44:38,697 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-10-09 20:45:04,793 INFO [train.py:1031] (1/4) Epoch 14, batch 29950, loss[loss=0.1914, simple_loss=0.2384, pruned_loss=0.05353, ctc_loss=0.09307, over 16758.00 frames. ], tot_loss[loss=0.2408, simple_loss=0.2971, pruned_loss=0.06847, ctc_loss=0.1189, over 3302962.24 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:45:21,964 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2868562.6666666665, ans=0.125 2023-10-09 20:45:40,592 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2868656.0, ans=0.0 2023-10-09 20:45:40,699 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2868656.0, ans=0.125 2023-10-09 20:45:58,889 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2868702.6666666665, ans=0.0 2023-10-09 20:45:59,110 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=22.5 2023-10-09 20:46:05,466 INFO [train.py:1031] (1/4) Epoch 14, batch 30000, loss[loss=0.2372, simple_loss=0.3154, pruned_loss=0.0573, ctc_loss=0.1111, over 16269.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.3004, pruned_loss=0.06727, ctc_loss=0.1171, over 3294164.40 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 20:46:05,467 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 20:46:20,039 INFO [zipformer.py:1853] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6594, 2.2434, 4.7575, 4.2143], device='cuda:1') 2023-10-09 20:46:22,663 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2308, simple_loss=0.3022, pruned_loss=0.06118, ctc_loss=0.09249, over 1796401.00 frames. 2023-10-09 20:46:22,664 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 20:46:36,772 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.188e+02 3.941e+02 4.902e+02 7.309e+02, threshold=7.881e+02, percent-clipped=0.0 2023-10-09 20:46:41,492 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2868796.0, ans=0.0 2023-10-09 20:46:56,186 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2868842.6666666665, ans=0.0 2023-10-09 20:47:06,059 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2868889.3333333335, ans=0.125 2023-10-09 20:47:07,765 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2868889.3333333335, ans=0.125 2023-10-09 20:47:10,303 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2023-10-09 20:47:14,382 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-10-09 20:47:19,512 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2868936.0, ans=0.0 2023-10-09 20:47:24,788 INFO [train.py:1031] (1/4) Epoch 14, batch 30050, loss[loss=0.1832, simple_loss=0.2653, pruned_loss=0.03765, ctc_loss=0.06462, over 16835.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2976, pruned_loss=0.06598, ctc_loss=0.1152, over 3298078.55 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:47:55,881 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2869076.0, ans=0.0 2023-10-09 20:47:59,379 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2869076.0, ans=0.125 2023-10-09 20:48:02,229 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2869122.6666666665, ans=0.125 2023-10-09 20:48:03,341 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:48:11,437 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2869122.6666666665, ans=0.125 2023-10-09 20:48:25,619 INFO [train.py:1031] (1/4) Epoch 14, batch 30100, loss[loss=0.2671, simple_loss=0.3375, pruned_loss=0.07117, ctc_loss=0.136, over 16434.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2968, pruned_loss=0.06404, ctc_loss=0.113, over 3300913.61 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:48:43,029 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+02 3.136e+02 3.710e+02 4.656e+02 9.667e+02, threshold=7.419e+02, percent-clipped=2.0 2023-10-09 20:49:08,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2869356.0, ans=0.95 2023-10-09 20:49:26,432 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2869449.3333333335, ans=0.1 2023-10-09 20:49:27,216 INFO [train.py:1031] (1/4) Epoch 14, batch 30150, loss[loss=0.2344, simple_loss=0.294, pruned_loss=0.06615, ctc_loss=0.1063, over 16726.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2963, pruned_loss=0.06363, ctc_loss=0.1126, over 3292099.73 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:49:28,580 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2869449.3333333335, ans=0.0 2023-10-09 20:49:44,956 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-10-09 20:49:52,590 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2869542.6666666665, ans=0.0 2023-10-09 20:50:00,142 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2869542.6666666665, ans=0.125 2023-10-09 20:50:25,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2869636.0, ans=0.09899494936611666 2023-10-09 20:50:27,445 INFO [train.py:1031] (1/4) Epoch 14, batch 30200, loss[loss=0.2419, simple_loss=0.3019, pruned_loss=0.06773, ctc_loss=0.116, over 16891.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2973, pruned_loss=0.06527, ctc_loss=0.1149, over 3277654.93 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:50:45,495 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.166e+02 3.687e+02 4.321e+02 7.960e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 20:50:52,110 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2869776.0, ans=0.125 2023-10-09 20:51:08,310 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2869822.6666666665, ans=0.0 2023-10-09 20:51:28,774 INFO [train.py:1031] (1/4) Epoch 14, batch 30250, loss[loss=0.2329, simple_loss=0.2883, pruned_loss=0.06642, ctc_loss=0.1114, over 16727.00 frames. ], tot_loss[loss=0.2408, simple_loss=0.2994, pruned_loss=0.06743, ctc_loss=0.1184, over 3268993.06 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:51:37,760 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2869916.0, ans=0.0 2023-10-09 20:51:40,398 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2869962.6666666665, ans=0.2 2023-10-09 20:51:43,302 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2869962.6666666665, ans=0.125 2023-10-09 20:51:47,334 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-10-09 20:52:01,439 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-10-09 20:52:05,995 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2870056.0, ans=0.125 2023-10-09 20:52:07,373 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-10-09 20:52:19,620 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2870102.6666666665, ans=0.0 2023-10-09 20:52:32,397 INFO [train.py:1031] (1/4) Epoch 14, batch 30300, loss[loss=0.2579, simple_loss=0.3096, pruned_loss=0.07457, ctc_loss=0.1426, over 16838.00 frames. ], tot_loss[loss=0.2438, simple_loss=0.3005, pruned_loss=0.06926, ctc_loss=0.1214, over 3249874.54 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:52:33,846 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2870149.3333333335, ans=0.07 2023-10-09 20:52:48,960 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2870196.0, ans=0.125 2023-10-09 20:52:51,951 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+02 3.410e+02 3.951e+02 4.930e+02 7.071e+02, threshold=7.902e+02, percent-clipped=0.0 2023-10-09 20:53:02,326 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:53:08,699 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2870289.3333333335, ans=0.0 2023-10-09 20:53:33,824 INFO [train.py:1031] (1/4) Epoch 14, batch 30350, loss[loss=0.2389, simple_loss=0.2884, pruned_loss=0.07038, ctc_loss=0.1215, over 16789.00 frames. ], tot_loss[loss=0.242, simple_loss=0.2974, pruned_loss=0.06914, ctc_loss=0.1209, over 3240399.17 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:53:56,886 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2870476.0, ans=0.1 2023-10-09 20:54:34,586 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=12.0 2023-10-09 20:54:35,117 INFO [train.py:1031] (1/4) Epoch 14, batch 30400, loss[loss=0.2106, simple_loss=0.263, pruned_loss=0.05908, ctc_loss=0.1001, over 16862.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2929, pruned_loss=0.06903, ctc_loss=0.1202, over 3246243.30 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:54:44,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2870616.0, ans=0.0 2023-10-09 20:54:54,414 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.508e+02 3.255e+02 4.087e+02 4.759e+02 9.430e+02, threshold=8.174e+02, percent-clipped=1.0 2023-10-09 20:55:08,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2870709.3333333335, ans=0.2 2023-10-09 20:55:08,354 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2870709.3333333335, ans=0.1 2023-10-09 20:55:16,470 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2870756.0, ans=0.125 2023-10-09 20:55:26,518 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-10-09 20:55:35,536 INFO [train.py:1031] (1/4) Epoch 14, batch 30450, loss[loss=0.2011, simple_loss=0.2466, pruned_loss=0.05733, ctc_loss=0.1025, over 16714.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2858, pruned_loss=0.06792, ctc_loss=0.1187, over 3253306.86 frames. ], batch size: 201, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:55:49,815 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2870896.0, ans=0.0 2023-10-09 20:55:55,672 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2870896.0, ans=0.0 2023-10-09 20:56:04,185 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2870942.6666666665, ans=0.1 2023-10-09 20:56:38,905 INFO [train.py:1031] (1/4) Epoch 14, batch 30500, loss[loss=0.2324, simple_loss=0.3339, pruned_loss=0.04676, ctc_loss=0.09337, over 15169.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.283, pruned_loss=0.06592, ctc_loss=0.1154, over 3252015.07 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:57:00,011 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.155e+02 3.681e+02 4.532e+02 7.008e+02, threshold=7.361e+02, percent-clipped=0.0 2023-10-09 20:57:01,599 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2871129.3333333335, ans=0.125 2023-10-09 20:57:09,401 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2871176.0, ans=0.0 2023-10-09 20:57:28,997 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2871269.3333333335, ans=0.0 2023-10-09 20:57:41,471 INFO [train.py:1031] (1/4) Epoch 14, batch 30550, loss[loss=0.2334, simple_loss=0.2776, pruned_loss=0.07014, ctc_loss=0.1222, over 16753.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2871, pruned_loss=0.06559, ctc_loss=0.1151, over 3251913.51 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:01,895 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:58:18,259 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:58:36,457 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2871502.6666666665, ans=0.125 2023-10-09 20:58:41,005 INFO [train.py:1031] (1/4) Epoch 14, batch 30600, loss[loss=0.198, simple_loss=0.2533, pruned_loss=0.0523, ctc_loss=0.09528, over 16843.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.286, pruned_loss=0.06633, ctc_loss=0.1161, over 3263946.62 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:41,515 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-10-09 20:58:45,813 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-10-09 20:58:46,651 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2871549.3333333335, ans=0.09899494936611666 2023-10-09 20:59:01,071 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.194e+02 3.624e+02 4.233e+02 1.074e+03, threshold=7.249e+02, percent-clipped=2.0 2023-10-09 20:59:21,298 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2871689.3333333335, ans=0.07 2023-10-09 20:59:27,636 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2871736.0, ans=0.2 2023-10-09 20:59:32,897 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2871736.0, ans=0.2 2023-10-09 20:59:40,079 INFO [train.py:1031] (1/4) Epoch 14, batch 30650, loss[loss=0.2224, simple_loss=0.2704, pruned_loss=0.064, ctc_loss=0.1158, over 16785.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2811, pruned_loss=0.0655, ctc_loss=0.1146, over 3278318.13 frames. ], batch size: 329, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:00:07,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-10-09 21:00:31,344 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2871969.3333333335, ans=0.125 2023-10-09 21:00:37,475 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2871969.3333333335, ans=0.09899494936611666 2023-10-09 21:00:41,202 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2872016.0, ans=0.125 2023-10-09 21:00:41,982 INFO [train.py:1031] (1/4) Epoch 14, batch 30700, loss[loss=0.1702, simple_loss=0.2342, pruned_loss=0.03961, ctc_loss=0.06766, over 16814.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2742, pruned_loss=0.06151, ctc_loss=0.1079, over 3280856.20 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:00:51,989 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2023-10-09 21:01:06,276 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.086e+02 3.703e+02 4.369e+02 9.445e+02, threshold=7.405e+02, percent-clipped=1.0 2023-10-09 21:01:06,724 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2872109.3333333335, ans=0.125 2023-10-09 21:01:07,622 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2872109.3333333335, ans=0.125 2023-10-09 21:01:08,899 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2872109.3333333335, ans=0.125 2023-10-09 21:01:10,199 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-10-09 21:01:15,365 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2872109.3333333335, ans=0.0 2023-10-09 21:01:28,837 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2872156.0, ans=0.0 2023-10-09 21:01:46,062 INFO [train.py:1031] (1/4) Epoch 14, batch 30750, loss[loss=0.2115, simple_loss=0.2537, pruned_loss=0.06323, ctc_loss=0.107, over 16784.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2756, pruned_loss=0.06118, ctc_loss=0.1062, over 3285959.47 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:01:49,166 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2872249.3333333335, ans=0.125 2023-10-09 21:02:34,699 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2872389.3333333335, ans=0.125 2023-10-09 21:02:50,792 INFO [train.py:1031] (1/4) Epoch 14, batch 30800, loss[loss=0.2409, simple_loss=0.3176, pruned_loss=0.06055, ctc_loss=0.1081, over 16864.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.287, pruned_loss=0.06324, ctc_loss=0.1099, over 3294079.32 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:01,689 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2872482.6666666665, ans=0.0 2023-10-09 21:03:14,992 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-10-09 21:03:16,584 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+02 3.887e+02 4.535e+02 5.921e+02 9.056e+02, threshold=9.070e+02, percent-clipped=5.0 2023-10-09 21:03:28,205 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2872622.6666666665, ans=0.2 2023-10-09 21:03:36,169 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.10 vs. limit=6.0 2023-10-09 21:03:44,897 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.81 vs. limit=5.0 2023-10-09 21:03:48,590 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=12.0 2023-10-09 21:03:54,404 INFO [train.py:1031] (1/4) Epoch 14, batch 30850, loss[loss=0.1815, simple_loss=0.2285, pruned_loss=0.0499, ctc_loss=0.08665, over 16734.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2866, pruned_loss=0.06393, ctc_loss=0.1114, over 3298887.53 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:04:06,839 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2872762.6666666665, ans=0.125 2023-10-09 21:04:32,714 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2872856.0, ans=0.125 2023-10-09 21:04:53,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2872902.6666666665, ans=0.125 2023-10-09 21:04:56,197 INFO [train.py:1031] (1/4) Epoch 14, batch 30900, loss[loss=0.1916, simple_loss=0.2376, pruned_loss=0.05293, ctc_loss=0.09934, over 16091.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2789, pruned_loss=0.0621, ctc_loss=0.1085, over 3299621.83 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:05:07,160 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2872996.0, ans=0.125 2023-10-09 21:05:20,050 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.130e+02 3.635e+02 4.208e+02 6.076e+02, threshold=7.270e+02, percent-clipped=0.0 2023-10-09 21:05:21,932 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873042.6666666665, ans=0.1 2023-10-09 21:05:40,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2873089.3333333335, ans=0.0 2023-10-09 21:05:42,044 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:05:42,350 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.36 vs. limit=10.0 2023-10-09 21:05:49,116 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2873136.0, ans=0.2 2023-10-09 21:05:56,102 INFO [train.py:1031] (1/4) Epoch 14, batch 30950, loss[loss=0.2663, simple_loss=0.2938, pruned_loss=0.08855, ctc_loss=0.1542, over 16540.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2757, pruned_loss=0.06131, ctc_loss=0.1074, over 3304966.02 frames. ], batch size: 415, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:06:13,485 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2873229.3333333335, ans=0.125 2023-10-09 21:06:21,713 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873276.0, ans=0.1 2023-10-09 21:06:57,946 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2873416.0, ans=0.125 2023-10-09 21:06:58,772 INFO [train.py:1031] (1/4) Epoch 14, batch 31000, loss[loss=0.1997, simple_loss=0.2571, pruned_loss=0.05308, ctc_loss=0.0904, over 16601.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2772, pruned_loss=0.06251, ctc_loss=0.1094, over 3301115.47 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:07:02,940 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873416.0, ans=0.1 2023-10-09 21:07:21,024 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2873462.6666666665, ans=0.125 2023-10-09 21:07:25,522 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+02 3.246e+02 3.902e+02 4.981e+02 7.271e+02, threshold=7.805e+02, percent-clipped=1.0 2023-10-09 21:07:28,527 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2873509.3333333335, ans=0.0 2023-10-09 21:07:32,109 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-10-09 21:07:42,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2873556.0, ans=0.2 2023-10-09 21:07:42,216 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2873556.0, ans=0.125 2023-10-09 21:07:51,326 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873602.6666666665, ans=0.1 2023-10-09 21:07:58,606 INFO [train.py:1031] (1/4) Epoch 14, batch 31050, loss[loss=0.1716, simple_loss=0.2438, pruned_loss=0.03617, ctc_loss=0.06743, over 16786.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2747, pruned_loss=0.05945, ctc_loss=0.1041, over 3306214.06 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:07:58,904 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2873649.3333333335, ans=0.0 2023-10-09 21:08:00,987 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2873649.3333333335, ans=0.2 2023-10-09 21:08:23,168 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2873742.6666666665, ans=0.07 2023-10-09 21:08:41,142 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2873789.3333333335, ans=0.125 2023-10-09 21:08:48,078 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2873836.0, ans=0.125 2023-10-09 21:08:58,998 INFO [train.py:1031] (1/4) Epoch 14, batch 31100, loss[loss=0.1955, simple_loss=0.2561, pruned_loss=0.04997, ctc_loss=0.08756, over 16754.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2716, pruned_loss=0.05816, ctc_loss=0.1018, over 3307554.28 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:09:16,948 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2873929.3333333335, ans=0.2 2023-10-09 21:09:26,091 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.900e+02 3.232e+02 3.684e+02 6.119e+02, threshold=6.464e+02, percent-clipped=0.0 2023-10-09 21:09:29,580 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2873976.0, ans=0.2 2023-10-09 21:09:30,621 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2873976.0, ans=0.125 2023-10-09 21:09:40,026 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2874022.6666666665, ans=0.125 2023-10-09 21:09:53,981 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2874069.3333333335, ans=0.125 2023-10-09 21:09:57,958 INFO [train.py:1031] (1/4) Epoch 14, batch 31150, loss[loss=0.2144, simple_loss=0.2883, pruned_loss=0.05108, ctc_loss=0.09567, over 16797.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2729, pruned_loss=0.0596, ctc_loss=0.1044, over 3308731.54 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:10:00,321 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2874116.0, ans=0.125 2023-10-09 21:10:20,124 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2023-10-09 21:10:25,621 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874209.3333333335, ans=0.1 2023-10-09 21:10:57,601 INFO [train.py:1031] (1/4) Epoch 14, batch 31200, loss[loss=0.2257, simple_loss=0.275, pruned_loss=0.0659, ctc_loss=0.1117, over 16867.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2718, pruned_loss=0.06025, ctc_loss=0.1052, over 3294001.13 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:10:57,889 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874349.3333333335, ans=0.1 2023-10-09 21:10:58,142 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.19 vs. limit=6.0 2023-10-09 21:11:05,339 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2874349.3333333335, ans=0.07 2023-10-09 21:11:06,497 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2874349.3333333335, ans=0.125 2023-10-09 21:11:11,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2874396.0, ans=0.125 2023-10-09 21:11:17,356 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2874396.0, ans=0.0 2023-10-09 21:11:27,521 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.224e+02 3.763e+02 4.515e+02 7.909e+02, threshold=7.526e+02, percent-clipped=5.0 2023-10-09 21:11:29,022 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874442.6666666665, ans=0.1 2023-10-09 21:11:51,575 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.80 vs. limit=10.0 2023-10-09 21:11:57,394 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2874582.6666666665, ans=0.125 2023-10-09 21:11:58,148 INFO [train.py:1031] (1/4) Epoch 14, batch 31250, loss[loss=0.2243, simple_loss=0.2732, pruned_loss=0.06415, ctc_loss=0.1177, over 16752.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2693, pruned_loss=0.06027, ctc_loss=0.1053, over 3282377.01 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:12:01,611 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2874582.6666666665, ans=0.125 2023-10-09 21:12:02,675 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2874582.6666666665, ans=0.0 2023-10-09 21:12:06,040 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2874582.6666666665, ans=0.2 2023-10-09 21:12:06,044 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2874582.6666666665, ans=0.0 2023-10-09 21:12:06,462 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2023-10-09 21:12:15,448 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-10-09 21:12:23,434 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-10-09 21:12:24,019 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874676.0, ans=0.1 2023-10-09 21:12:38,030 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=22.5 2023-10-09 21:12:52,454 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:13:01,145 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874816.0, ans=0.1 2023-10-09 21:13:01,898 INFO [train.py:1031] (1/4) Epoch 14, batch 31300, loss[loss=0.1921, simple_loss=0.2452, pruned_loss=0.05149, ctc_loss=0.09024, over 16700.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2664, pruned_loss=0.06033, ctc_loss=0.1053, over 3285072.19 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:13:06,721 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2874816.0, ans=0.2 2023-10-09 21:13:32,854 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.069e+02 3.507e+02 3.932e+02 8.166e+02, threshold=7.015e+02, percent-clipped=1.0 2023-10-09 21:13:42,583 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2874956.0, ans=0.0 2023-10-09 21:13:53,712 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2875002.6666666665, ans=0.125 2023-10-09 21:14:03,890 INFO [train.py:1031] (1/4) Epoch 14, batch 31350, loss[loss=0.1848, simple_loss=0.2342, pruned_loss=0.04968, ctc_loss=0.09026, over 16857.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2633, pruned_loss=0.06017, ctc_loss=0.1052, over 3287950.74 frames. ], batch size: 189, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:14:08,380 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2875049.3333333335, ans=0.2 2023-10-09 21:14:43,115 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2023-10-09 21:14:54,965 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2875236.0, ans=0.125 2023-10-09 21:15:01,323 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2875282.6666666665, ans=0.0 2023-10-09 21:15:02,106 INFO [train.py:1031] (1/4) Epoch 14, batch 31400, loss[loss=0.1845, simple_loss=0.2487, pruned_loss=0.04384, ctc_loss=0.08144, over 16629.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2603, pruned_loss=0.05991, ctc_loss=0.1046, over 3294617.91 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:15:12,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2875282.6666666665, ans=0.125 2023-10-09 21:15:15,859 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2875329.3333333335, ans=0.0 2023-10-09 21:15:34,909 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.143e+02 3.682e+02 4.471e+02 1.037e+03, threshold=7.364e+02, percent-clipped=4.0 2023-10-09 21:15:52,223 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2875469.3333333335, ans=0.2 2023-10-09 21:15:54,809 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2875469.3333333335, ans=0.04949747468305833 2023-10-09 21:15:57,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2875469.3333333335, ans=0.125 2023-10-09 21:16:03,129 INFO [train.py:1031] (1/4) Epoch 14, batch 31450, loss[loss=0.1831, simple_loss=0.2273, pruned_loss=0.05, ctc_loss=0.09705, over 16337.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2593, pruned_loss=0.05875, ctc_loss=0.1027, over 3301754.69 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:16:04,551 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2875516.0, ans=0.125 2023-10-09 21:16:18,663 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2875562.6666666665, ans=0.0 2023-10-09 21:17:00,306 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2875702.6666666665, ans=0.05 2023-10-09 21:17:06,120 INFO [train.py:1031] (1/4) Epoch 14, batch 31500, loss[loss=0.216, simple_loss=0.2553, pruned_loss=0.06663, ctc_loss=0.1088, over 16785.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2597, pruned_loss=0.06004, ctc_loss=0.1048, over 3310406.89 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:17:07,080 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2875749.3333333335, ans=0.2 2023-10-09 21:17:20,813 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2875796.0, ans=0.0 2023-10-09 21:17:34,540 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2875842.6666666665, ans=0.125 2023-10-09 21:17:40,594 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.158e+02 3.690e+02 4.602e+02 7.979e+02, threshold=7.380e+02, percent-clipped=2.0 2023-10-09 21:18:09,325 INFO [train.py:1031] (1/4) Epoch 14, batch 31550, loss[loss=0.2215, simple_loss=0.2618, pruned_loss=0.06782, ctc_loss=0.114, over 16750.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2686, pruned_loss=0.06231, ctc_loss=0.1085, over 3315002.29 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:18:53,346 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2876122.6666666665, ans=0.125 2023-10-09 21:18:59,109 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2876169.3333333335, ans=15.0 2023-10-09 21:19:00,722 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2876169.3333333335, ans=0.0 2023-10-09 21:19:08,936 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-10-09 21:19:09,418 INFO [train.py:1031] (1/4) Epoch 14, batch 31600, loss[loss=0.2643, simple_loss=0.2906, pruned_loss=0.08948, ctc_loss=0.1476, over 16650.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2736, pruned_loss=0.06438, ctc_loss=0.1119, over 3315942.29 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:19:14,073 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2876216.0, ans=0.5 2023-10-09 21:19:14,415 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2023-10-09 21:19:45,054 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+02 3.252e+02 3.692e+02 4.282e+02 8.692e+02, threshold=7.384e+02, percent-clipped=4.0 2023-10-09 21:19:57,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2876356.0, ans=0.0 2023-10-09 21:20:03,596 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2876402.6666666665, ans=0.125 2023-10-09 21:20:13,361 INFO [train.py:1031] (1/4) Epoch 14, batch 31650, loss[loss=0.2199, simple_loss=0.3002, pruned_loss=0.05168, ctc_loss=0.09063, over 16814.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2768, pruned_loss=0.06437, ctc_loss=0.112, over 3322825.76 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:20:14,745 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2876449.3333333335, ans=0.2 2023-10-09 21:20:15,229 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-10-09 21:20:38,418 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:20:48,709 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2876542.6666666665, ans=0.125 2023-10-09 21:21:00,879 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2023-10-09 21:21:15,706 INFO [train.py:1031] (1/4) Epoch 14, batch 31700, loss[loss=0.3348, simple_loss=0.3525, pruned_loss=0.1181, ctc_loss=0.2023, over 16676.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2777, pruned_loss=0.06233, ctc_loss=0.109, over 3313535.84 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:21:39,770 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2876729.3333333335, ans=0.2 2023-10-09 21:21:52,053 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2876776.0, ans=0.0 2023-10-09 21:21:52,728 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+02 3.115e+02 3.904e+02 4.739e+02 1.536e+03, threshold=7.807e+02, percent-clipped=3.0 2023-10-09 21:22:18,292 INFO [train.py:1031] (1/4) Epoch 14, batch 31750, loss[loss=0.2734, simple_loss=0.321, pruned_loss=0.08289, ctc_loss=0.1503, over 16589.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2843, pruned_loss=0.06527, ctc_loss=0.1143, over 3317439.22 frames. ], batch size: 350, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:22:24,526 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2876916.0, ans=15.0 2023-10-09 21:22:31,089 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2876962.6666666665, ans=0.05 2023-10-09 21:22:42,794 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2877009.3333333335, ans=0.1 2023-10-09 21:22:46,446 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-10-09 21:23:00,140 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-10-09 21:23:00,164 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-10-09 21:23:11,242 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2877102.6666666665, ans=0.125 2023-10-09 21:23:12,382 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2877102.6666666665, ans=0.125 2023-10-09 21:23:20,605 INFO [train.py:1031] (1/4) Epoch 14, batch 31800, loss[loss=0.253, simple_loss=0.2919, pruned_loss=0.07906, ctc_loss=0.1399, over 16622.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.285, pruned_loss=0.06598, ctc_loss=0.1155, over 3317520.94 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:23:20,797 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2877149.3333333335, ans=0.1 2023-10-09 21:23:21,934 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2877149.3333333335, ans=0.035 2023-10-09 21:23:33,401 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2877196.0, ans=0.0 2023-10-09 21:23:57,867 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+02 3.284e+02 3.680e+02 4.274e+02 9.032e+02, threshold=7.360e+02, percent-clipped=1.0 2023-10-09 21:24:08,945 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:24:15,359 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:24:18,607 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2877336.0, ans=0.125 2023-10-09 21:24:21,254 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2877382.6666666665, ans=0.125 2023-10-09 21:24:22,030 INFO [train.py:1031] (1/4) Epoch 14, batch 31850, loss[loss=0.2011, simple_loss=0.2558, pruned_loss=0.05424, ctc_loss=0.09471, over 16279.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.282, pruned_loss=0.06602, ctc_loss=0.1153, over 3303459.43 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:24:40,077 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2877429.3333333335, ans=0.1 2023-10-09 21:24:45,714 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2877476.0, ans=10.0 2023-10-09 21:24:54,828 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:25:23,224 INFO [train.py:1031] (1/4) Epoch 14, batch 31900, loss[loss=0.2421, simple_loss=0.2652, pruned_loss=0.08096, ctc_loss=0.1426, over 16563.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2762, pruned_loss=0.06501, ctc_loss=0.1132, over 3311321.91 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:25:29,618 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2877616.0, ans=0.1 2023-10-09 21:25:47,466 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:25:55,106 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2877709.3333333335, ans=0.125 2023-10-09 21:25:59,808 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2877756.0, ans=0.125 2023-10-09 21:26:03,164 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+02 3.194e+02 3.623e+02 4.182e+02 7.324e+02, threshold=7.246e+02, percent-clipped=0.0 2023-10-09 21:26:24,435 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2877849.3333333335, ans=0.125 2023-10-09 21:26:25,781 INFO [train.py:1031] (1/4) Epoch 14, batch 31950, loss[loss=0.1637, simple_loss=0.2256, pruned_loss=0.03728, ctc_loss=0.06807, over 16815.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2677, pruned_loss=0.06076, ctc_loss=0.1064, over 3310283.63 frames. ], batch size: 189, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:26:38,641 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2877896.0, ans=0.125 2023-10-09 21:26:43,035 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-10-09 21:26:44,922 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-10-09 21:26:57,560 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2877942.6666666665, ans=0.125 2023-10-09 21:27:26,901 INFO [train.py:1031] (1/4) Epoch 14, batch 32000, loss[loss=0.1886, simple_loss=0.2446, pruned_loss=0.04899, ctc_loss=0.08675, over 16661.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2637, pruned_loss=0.06024, ctc_loss=0.1056, over 3311843.93 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:27:28,941 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2878082.6666666665, ans=0.125 2023-10-09 21:27:33,639 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-10-09 21:27:45,510 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2878129.3333333335, ans=0.125 2023-10-09 21:27:46,914 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-10-09 21:28:02,938 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2878176.0, ans=0.125 2023-10-09 21:28:03,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2878176.0, ans=0.125 2023-10-09 21:28:05,470 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-10-09 21:28:06,910 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+02 3.047e+02 3.544e+02 4.263e+02 6.076e+02, threshold=7.087e+02, percent-clipped=0.0 2023-10-09 21:28:17,971 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2878269.3333333335, ans=0.125 2023-10-09 21:28:30,828 INFO [train.py:1031] (1/4) Epoch 14, batch 32050, loss[loss=0.1961, simple_loss=0.2629, pruned_loss=0.04733, ctc_loss=0.08667, over 16719.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2663, pruned_loss=0.05836, ctc_loss=0.1028, over 3303990.92 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:28:57,199 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2878409.3333333335, ans=0.2 2023-10-09 21:29:02,000 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2878409.3333333335, ans=0.125 2023-10-09 21:29:05,221 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2878409.3333333335, ans=0.125 2023-10-09 21:29:17,723 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2023-10-09 21:29:33,753 INFO [train.py:1031] (1/4) Epoch 14, batch 32100, loss[loss=0.2462, simple_loss=0.2942, pruned_loss=0.07419, ctc_loss=0.1244, over 16535.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2734, pruned_loss=0.05748, ctc_loss=0.1018, over 3293155.50 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:30:13,060 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.944e+02 3.415e+02 4.141e+02 9.202e+02, threshold=6.830e+02, percent-clipped=4.0 2023-10-09 21:30:27,674 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2878736.0, ans=0.1 2023-10-09 21:30:32,512 INFO [train.py:1031] (1/4) Epoch 14, batch 32150, loss[loss=0.2296, simple_loss=0.2677, pruned_loss=0.06944, ctc_loss=0.1318, over 16367.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2741, pruned_loss=0.05635, ctc_loss=0.0996, over 3296513.70 frames. ], batch size: 415, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:30:39,350 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2878782.6666666665, ans=0.0 2023-10-09 21:30:55,701 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2878876.0, ans=0.5 2023-10-09 21:31:02,557 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2878876.0, ans=0.2 2023-10-09 21:31:09,171 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2878922.6666666665, ans=0.0 2023-10-09 21:31:11,322 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878922.6666666665, ans=0.1 2023-10-09 21:31:14,212 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-10-09 21:31:14,945 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2878922.6666666665, ans=0.0 2023-10-09 21:31:16,011 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2878922.6666666665, ans=0.125 2023-10-09 21:31:19,080 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2878969.3333333335, ans=0.1 2023-10-09 21:31:33,111 INFO [train.py:1031] (1/4) Epoch 14, batch 32200, loss[loss=0.2026, simple_loss=0.2481, pruned_loss=0.05864, ctc_loss=0.09961, over 16787.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2699, pruned_loss=0.05702, ctc_loss=0.1003, over 3286879.79 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:31:58,460 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2879109.3333333335, ans=0.125 2023-10-09 21:32:06,564 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2879109.3333333335, ans=0.0 2023-10-09 21:32:06,947 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-10-09 21:32:14,182 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+02 3.053e+02 3.349e+02 3.952e+02 6.213e+02, threshold=6.698e+02, percent-clipped=0.0 2023-10-09 21:32:20,318 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2879202.6666666665, ans=0.0 2023-10-09 21:32:24,675 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2879202.6666666665, ans=0.0 2023-10-09 21:32:32,677 INFO [train.py:1031] (1/4) Epoch 14, batch 32250, loss[loss=0.1851, simple_loss=0.2344, pruned_loss=0.05165, ctc_loss=0.08141, over 16807.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.266, pruned_loss=0.05764, ctc_loss=0.1009, over 3289751.05 frames. ], batch size: 141, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:33:03,373 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2879342.6666666665, ans=0.125 2023-10-09 21:33:29,745 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.12 vs. limit=10.0 2023-10-09 21:33:33,810 INFO [train.py:1031] (1/4) Epoch 14, batch 32300, loss[loss=0.1769, simple_loss=0.2393, pruned_loss=0.04219, ctc_loss=0.07523, over 16685.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2651, pruned_loss=0.05868, ctc_loss=0.103, over 3294474.81 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:33:35,219 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2879482.6666666665, ans=0.125 2023-10-09 21:33:55,052 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2879529.3333333335, ans=0.125 2023-10-09 21:34:13,722 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-10-09 21:34:15,106 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2879622.6666666665, ans=0.0 2023-10-09 21:34:19,604 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+02 3.404e+02 3.971e+02 4.753e+02 7.959e+02, threshold=7.942e+02, percent-clipped=3.0 2023-10-09 21:34:36,670 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2879669.3333333335, ans=0.125 2023-10-09 21:34:39,152 INFO [train.py:1031] (1/4) Epoch 14, batch 32350, loss[loss=0.2125, simple_loss=0.2853, pruned_loss=0.05217, ctc_loss=0.08832, over 16815.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.274, pruned_loss=0.05942, ctc_loss=0.1056, over 3304725.36 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:34:39,459 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2879716.0, ans=0.0 2023-10-09 21:34:51,334 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2879762.6666666665, ans=0.0 2023-10-09 21:34:58,583 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-10-09 21:34:59,967 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2879762.6666666665, ans=0.1 2023-10-09 21:35:12,096 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2879809.3333333335, ans=0.125 2023-10-09 21:35:13,147 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:35:20,700 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2879856.0, ans=0.2 2023-10-09 21:35:29,096 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2879902.6666666665, ans=0.5 2023-10-09 21:35:31,337 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2023-10-09 21:35:31,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2879902.6666666665, ans=0.07 2023-10-09 21:35:40,798 INFO [train.py:1031] (1/4) Epoch 14, batch 32400, loss[loss=0.2272, simple_loss=0.2775, pruned_loss=0.06493, ctc_loss=0.1178, over 16937.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2761, pruned_loss=0.0596, ctc_loss=0.1058, over 3302373.30 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:35:42,197 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2879949.3333333335, ans=0.0 2023-10-09 21:35:55,504 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-10-09 21:36:01,101 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2879996.0, ans=0.125 2023-10-09 21:36:09,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2880042.6666666665, ans=0.125 2023-10-09 21:36:10,428 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2880042.6666666665, ans=0.0 2023-10-09 21:36:22,696 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2880089.3333333335, ans=0.125 2023-10-09 21:36:26,219 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.219e+02 3.562e+02 4.144e+02 6.944e+02, threshold=7.124e+02, percent-clipped=0.0 2023-10-09 21:36:43,386 INFO [train.py:1031] (1/4) Epoch 14, batch 32450, loss[loss=0.2022, simple_loss=0.2663, pruned_loss=0.05146, ctc_loss=0.088, over 16889.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2745, pruned_loss=0.06052, ctc_loss=0.107, over 3307082.72 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:36:53,864 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2880182.6666666665, ans=0.125 2023-10-09 21:37:09,871 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2880276.0, ans=0.125 2023-10-09 21:37:09,920 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2880276.0, ans=0.125 2023-10-09 21:37:09,975 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2880276.0, ans=0.0 2023-10-09 21:37:11,890 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2880276.0, ans=0.125 2023-10-09 21:37:18,985 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2880322.6666666665, ans=0.125 2023-10-09 21:37:39,159 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2880369.3333333335, ans=0.2 2023-10-09 21:37:44,360 INFO [train.py:1031] (1/4) Epoch 14, batch 32500, loss[loss=0.1842, simple_loss=0.2397, pruned_loss=0.04741, ctc_loss=0.08472, over 16713.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2688, pruned_loss=0.0603, ctc_loss=0.1065, over 3306009.27 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:37:44,664 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2880416.0, ans=0.0 2023-10-09 21:37:53,196 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2880416.0, ans=0.125 2023-10-09 21:38:02,003 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2880462.6666666665, ans=0.125 2023-10-09 21:38:03,761 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2880462.6666666665, ans=0.2 2023-10-09 21:38:15,047 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2023-10-09 21:38:29,825 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=12.0 2023-10-09 21:38:31,978 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.513e+02 2.966e+02 3.455e+02 3.936e+02 8.435e+02, threshold=6.910e+02, percent-clipped=1.0 2023-10-09 21:38:35,443 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2880602.6666666665, ans=10.0 2023-10-09 21:38:37,134 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2880602.6666666665, ans=0.95 2023-10-09 21:38:46,475 INFO [train.py:1031] (1/4) Epoch 14, batch 32550, loss[loss=0.189, simple_loss=0.2502, pruned_loss=0.04612, ctc_loss=0.08885, over 16865.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2621, pruned_loss=0.05594, ctc_loss=0.09878, over 3305949.71 frames. ], batch size: 293, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:38:54,778 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=2880649.3333333335, ans=0.2 2023-10-09 21:39:02,503 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2880696.0, ans=0.125 2023-10-09 21:39:04,564 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2880696.0, ans=0.125 2023-10-09 21:39:25,328 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2880789.3333333335, ans=0.04949747468305833 2023-10-09 21:39:40,740 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2880836.0, ans=0.125 2023-10-09 21:39:47,249 INFO [train.py:1031] (1/4) Epoch 14, batch 32600, loss[loss=0.2091, simple_loss=0.2588, pruned_loss=0.06029, ctc_loss=0.09725, over 16062.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2582, pruned_loss=0.05466, ctc_loss=0.09641, over 3297600.05 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:39:57,761 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2880929.3333333335, ans=0.125 2023-10-09 21:40:17,373 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2880976.0, ans=0.09899494936611666 2023-10-09 21:40:28,971 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2881022.6666666665, ans=0.125 2023-10-09 21:40:34,041 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.902e+02 3.409e+02 5.024e+02 1.088e+03, threshold=6.817e+02, percent-clipped=5.0 2023-10-09 21:40:48,729 INFO [train.py:1031] (1/4) Epoch 14, batch 32650, loss[loss=0.2191, simple_loss=0.2755, pruned_loss=0.06155, ctc_loss=0.09924, over 16690.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2642, pruned_loss=0.05594, ctc_loss=0.09785, over 3279158.22 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:40:49,064 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2881116.0, ans=0.0 2023-10-09 21:41:02,220 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2023-10-09 21:41:12,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=15.0 2023-10-09 21:41:33,780 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2881256.0, ans=0.125 2023-10-09 21:41:49,234 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2881302.6666666665, ans=0.125 2023-10-09 21:41:52,682 INFO [train.py:1031] (1/4) Epoch 14, batch 32700, loss[loss=0.2729, simple_loss=0.3276, pruned_loss=0.07999, ctc_loss=0.1457, over 16844.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2758, pruned_loss=0.06018, ctc_loss=0.1051, over 3275643.78 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:42:00,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2881349.3333333335, ans=0.125 2023-10-09 21:42:05,145 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2881396.0, ans=0.95 2023-10-09 21:42:12,117 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:42:41,824 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+02 3.539e+02 4.014e+02 5.290e+02 1.076e+03, threshold=8.028e+02, percent-clipped=8.0 2023-10-09 21:42:55,734 INFO [train.py:1031] (1/4) Epoch 14, batch 32750, loss[loss=0.2356, simple_loss=0.288, pruned_loss=0.06731, ctc_loss=0.1213, over 16943.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2826, pruned_loss=0.06362, ctc_loss=0.1112, over 3280708.36 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:43:03,413 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2881582.6666666665, ans=0.125 2023-10-09 21:43:05,051 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2881582.6666666665, ans=0.09899494936611666 2023-10-09 21:43:42,574 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2881722.6666666665, ans=0.125 2023-10-09 21:43:57,092 INFO [train.py:1031] (1/4) Epoch 14, batch 32800, loss[loss=0.2176, simple_loss=0.2796, pruned_loss=0.05901, ctc_loss=0.0938, over 16856.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2818, pruned_loss=0.06415, ctc_loss=0.1119, over 3288048.97 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:44:10,703 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=22.5 2023-10-09 21:44:22,566 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2881909.3333333335, ans=0.2 2023-10-09 21:44:37,563 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2881956.0, ans=0.125 2023-10-09 21:44:37,864 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-10-09 21:44:46,104 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.217e+02 3.697e+02 4.305e+02 8.023e+02, threshold=7.395e+02, percent-clipped=0.0 2023-10-09 21:44:46,934 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2882002.6666666665, ans=10.0 2023-10-09 21:44:50,602 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882002.6666666665, ans=0.1 2023-10-09 21:44:55,786 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-10-09 21:44:57,216 INFO [train.py:1031] (1/4) Epoch 14, batch 32850, loss[loss=0.2143, simple_loss=0.2433, pruned_loss=0.06755, ctc_loss=0.1254, over 15536.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2812, pruned_loss=0.06415, ctc_loss=0.1117, over 3290913.08 frames. ], batch size: 530, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:45:06,014 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2882049.3333333335, ans=0.2 2023-10-09 21:45:26,310 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2882142.6666666665, ans=0.2 2023-10-09 21:45:52,169 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2882236.0, ans=0.05 2023-10-09 21:45:59,355 INFO [train.py:1031] (1/4) Epoch 14, batch 32900, loss[loss=0.1929, simple_loss=0.2682, pruned_loss=0.04355, ctc_loss=0.07654, over 16756.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2823, pruned_loss=0.06437, ctc_loss=0.1122, over 3296851.03 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:46:03,616 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2882282.6666666665, ans=0.2 2023-10-09 21:46:06,075 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2882282.6666666665, ans=0.0 2023-10-09 21:46:20,318 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2882329.3333333335, ans=0.09899494936611666 2023-10-09 21:46:21,430 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2882329.3333333335, ans=0.125 2023-10-09 21:46:51,627 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+02 3.233e+02 3.650e+02 4.547e+02 8.623e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 21:47:02,688 INFO [train.py:1031] (1/4) Epoch 14, batch 32950, loss[loss=0.2455, simple_loss=0.2844, pruned_loss=0.07557, ctc_loss=0.1388, over 15272.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2881, pruned_loss=0.06545, ctc_loss=0.1146, over 3292719.41 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:47:04,786 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.38 vs. limit=10.0 2023-10-09 21:47:22,797 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2882562.6666666665, ans=0.1 2023-10-09 21:47:32,267 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2882609.3333333335, ans=0.035 2023-10-09 21:47:52,463 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2882702.6666666665, ans=0.05 2023-10-09 21:48:03,473 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2882702.6666666665, ans=0.09899494936611666 2023-10-09 21:48:05,296 INFO [train.py:1031] (1/4) Epoch 14, batch 33000, loss[loss=0.2115, simple_loss=0.26, pruned_loss=0.06066, ctc_loss=0.1044, over 16791.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2898, pruned_loss=0.06765, ctc_loss=0.1179, over 3288677.05 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:48:05,297 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 21:48:23,060 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2327, simple_loss=0.3031, pruned_loss=0.06268, ctc_loss=0.09218, over 1796401.00 frames. 2023-10-09 21:48:23,060 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 21:48:32,031 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2882749.3333333335, ans=0.05 2023-10-09 21:48:51,037 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2882842.6666666665, ans=0.125 2023-10-09 21:48:58,518 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2882889.3333333335, ans=0.1 2023-10-09 21:48:58,722 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2023-10-09 21:49:06,133 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2882889.3333333335, ans=0.125 2023-10-09 21:49:13,414 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.435e+02 3.950e+02 5.096e+02 8.924e+02, threshold=7.899e+02, percent-clipped=1.0 2023-10-09 21:49:24,081 INFO [train.py:1031] (1/4) Epoch 14, batch 33050, loss[loss=0.209, simple_loss=0.2416, pruned_loss=0.06465, ctc_loss=0.1178, over 15522.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2886, pruned_loss=0.06853, ctc_loss=0.1194, over 3298077.32 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:49:25,040 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2882982.6666666665, ans=0.1 2023-10-09 21:49:25,957 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2882982.6666666665, ans=0.1 2023-10-09 21:49:31,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2882982.6666666665, ans=0.0 2023-10-09 21:49:44,642 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2883029.3333333335, ans=0.125 2023-10-09 21:49:46,064 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2023-10-09 21:49:49,767 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2883076.0, ans=0.0 2023-10-09 21:50:03,010 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=22.5 2023-10-09 21:50:12,094 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2883169.3333333335, ans=0.0 2023-10-09 21:50:15,753 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2883169.3333333335, ans=0.0 2023-10-09 21:50:25,700 INFO [train.py:1031] (1/4) Epoch 14, batch 33100, loss[loss=0.214, simple_loss=0.2366, pruned_loss=0.07052, ctc_loss=0.1261, over 15616.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2861, pruned_loss=0.0681, ctc_loss=0.1187, over 3310114.48 frames. ], batch size: 530, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:51:18,564 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.087e+02 3.637e+02 4.211e+02 8.906e+02, threshold=7.275e+02, percent-clipped=1.0 2023-10-09 21:51:25,227 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2883402.6666666665, ans=0.125 2023-10-09 21:51:28,124 INFO [train.py:1031] (1/4) Epoch 14, batch 33150, loss[loss=0.2061, simple_loss=0.2864, pruned_loss=0.04539, ctc_loss=0.08733, over 16861.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2816, pruned_loss=0.06487, ctc_loss=0.1133, over 3304493.75 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:51:46,617 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2883496.0, ans=0.1 2023-10-09 21:51:51,304 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2883496.0, ans=0.125 2023-10-09 21:52:09,626 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-10-09 21:52:16,831 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2023-10-09 21:52:31,978 INFO [train.py:1031] (1/4) Epoch 14, batch 33200, loss[loss=0.2159, simple_loss=0.2576, pruned_loss=0.06376, ctc_loss=0.1165, over 16376.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2801, pruned_loss=0.06266, ctc_loss=0.1103, over 3295285.82 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:53:00,926 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2883776.0, ans=0.0 2023-10-09 21:53:05,096 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2023-10-09 21:53:12,785 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2883822.6666666665, ans=0.0 2023-10-09 21:53:20,954 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-10-09 21:53:25,119 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+02 3.107e+02 3.465e+02 4.067e+02 6.400e+02, threshold=6.930e+02, percent-clipped=0.0 2023-10-09 21:53:28,597 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2883869.3333333335, ans=0.2 2023-10-09 21:53:31,931 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2883916.0, ans=0.125 2023-10-09 21:53:32,624 INFO [train.py:1031] (1/4) Epoch 14, batch 33250, loss[loss=0.1997, simple_loss=0.2442, pruned_loss=0.0571, ctc_loss=0.1023, over 16793.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2751, pruned_loss=0.06221, ctc_loss=0.1094, over 3287683.77 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:53:33,378 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-10-09 21:53:54,138 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.71 vs. limit=6.0 2023-10-09 21:54:09,118 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2884056.0, ans=0.125 2023-10-09 21:54:13,713 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2884056.0, ans=0.125 2023-10-09 21:54:18,163 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2884056.0, ans=0.125 2023-10-09 21:54:32,669 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2884102.6666666665, ans=0.2 2023-10-09 21:54:34,330 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2884149.3333333335, ans=0.2 2023-10-09 21:54:35,073 INFO [train.py:1031] (1/4) Epoch 14, batch 33300, loss[loss=0.2143, simple_loss=0.252, pruned_loss=0.06683, ctc_loss=0.1077, over 16067.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2692, pruned_loss=0.06149, ctc_loss=0.1079, over 3291970.37 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:55:07,246 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2884242.6666666665, ans=0.5 2023-10-09 21:55:11,227 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2884242.6666666665, ans=0.125 2023-10-09 21:55:22,789 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2884289.3333333335, ans=0.0 2023-10-09 21:55:32,063 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.141e+02 3.663e+02 4.502e+02 8.687e+02, threshold=7.326e+02, percent-clipped=2.0 2023-10-09 21:55:38,503 INFO [train.py:1031] (1/4) Epoch 14, batch 33350, loss[loss=0.252, simple_loss=0.331, pruned_loss=0.06272, ctc_loss=0.1186, over 16875.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2736, pruned_loss=0.06214, ctc_loss=0.1096, over 3301127.97 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:56:03,377 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-10-09 21:56:10,727 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2884476.0, ans=0.125 2023-10-09 21:56:39,485 INFO [train.py:1031] (1/4) Epoch 14, batch 33400, loss[loss=0.215, simple_loss=0.2703, pruned_loss=0.05939, ctc_loss=0.102, over 16933.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2776, pruned_loss=0.06301, ctc_loss=0.1109, over 3294883.88 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:56:44,078 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-10-09 21:56:46,851 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2884616.0, ans=0.0 2023-10-09 21:56:48,889 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2884616.0, ans=0.0 2023-10-09 21:56:59,684 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2884662.6666666665, ans=0.025 2023-10-09 21:57:09,463 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2884709.3333333335, ans=0.0 2023-10-09 21:57:12,315 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2884709.3333333335, ans=0.125 2023-10-09 21:57:29,263 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2884802.6666666665, ans=0.125 2023-10-09 21:57:32,112 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2884802.6666666665, ans=0.125 2023-10-09 21:57:36,673 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+02 3.309e+02 3.821e+02 4.723e+02 1.099e+03, threshold=7.641e+02, percent-clipped=5.0 2023-10-09 21:57:42,138 INFO [train.py:1031] (1/4) Epoch 14, batch 33450, loss[loss=0.2103, simple_loss=0.2935, pruned_loss=0.04704, ctc_loss=0.08249, over 16355.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2804, pruned_loss=0.06344, ctc_loss=0.1116, over 3284428.34 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:57:46,541 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2884849.3333333335, ans=0.125 2023-10-09 21:58:40,832 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2885036.0, ans=0.0 2023-10-09 21:58:47,411 INFO [train.py:1031] (1/4) Epoch 14, batch 33500, loss[loss=0.2207, simple_loss=0.2595, pruned_loss=0.06783, ctc_loss=0.1153, over 10721.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2824, pruned_loss=0.06365, ctc_loss=0.1105, over 3283360.16 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:58:49,167 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2023-10-09 21:59:04,058 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2885129.3333333335, ans=0.125 2023-10-09 21:59:28,353 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-10-09 21:59:28,451 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2023-10-09 21:59:46,063 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.525e+02 4.202e+02 5.122e+02 8.777e+02, threshold=8.403e+02, percent-clipped=5.0 2023-10-09 21:59:48,887 INFO [train.py:1031] (1/4) Epoch 14, batch 33550, loss[loss=0.2145, simple_loss=0.2702, pruned_loss=0.0589, ctc_loss=0.1023, over 16897.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2789, pruned_loss=0.06344, ctc_loss=0.1098, over 3273984.45 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:59:51,336 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2885316.0, ans=0.0 2023-10-09 22:00:05,406 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2885362.6666666665, ans=0.0 2023-10-09 22:00:16,522 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2885409.3333333335, ans=0.0 2023-10-09 22:00:17,566 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2885409.3333333335, ans=0.2 2023-10-09 22:00:26,537 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2885456.0, ans=0.2 2023-10-09 22:00:30,551 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=12.0 2023-10-09 22:00:49,678 INFO [train.py:1031] (1/4) Epoch 14, batch 33600, loss[loss=0.1991, simple_loss=0.2403, pruned_loss=0.05886, ctc_loss=0.1007, over 16816.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2734, pruned_loss=0.06297, ctc_loss=0.109, over 3279845.78 frames. ], batch size: 141, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:00:50,906 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2885549.3333333335, ans=0.0 2023-10-09 22:00:56,845 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2885549.3333333335, ans=0.125 2023-10-09 22:00:58,777 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2885549.3333333335, ans=10.0 2023-10-09 22:00:59,152 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.80 vs. limit=22.5 2023-10-09 22:01:20,311 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2885642.6666666665, ans=0.125 2023-10-09 22:01:48,016 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.170e+02 3.773e+02 4.552e+02 1.576e+03, threshold=7.545e+02, percent-clipped=1.0 2023-10-09 22:01:49,718 INFO [train.py:1031] (1/4) Epoch 14, batch 33650, loss[loss=0.2892, simple_loss=0.2939, pruned_loss=0.1048, ctc_loss=0.1874, over 16561.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2687, pruned_loss=0.06249, ctc_loss=0.1083, over 3281477.30 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:02:09,146 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885829.3333333335, ans=0.1 2023-10-09 22:02:17,449 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2885876.0, ans=0.025 2023-10-09 22:02:30,606 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2885922.6666666665, ans=0.125 2023-10-09 22:02:42,619 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2885969.3333333335, ans=0.125 2023-10-09 22:02:43,620 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2885969.3333333335, ans=0.0 2023-10-09 22:02:52,482 INFO [train.py:1031] (1/4) Epoch 14, batch 33700, loss[loss=0.2372, simple_loss=0.2848, pruned_loss=0.07147, ctc_loss=0.1166, over 16869.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2735, pruned_loss=0.06516, ctc_loss=0.1131, over 3280750.28 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:02:54,926 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886016.0, ans=0.1 2023-10-09 22:03:03,490 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2023-10-09 22:03:11,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2886062.6666666665, ans=0.07 2023-10-09 22:03:17,096 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2886109.3333333335, ans=0.0 2023-10-09 22:03:17,383 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.30 vs. limit=15.0 2023-10-09 22:03:19,245 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2886109.3333333335, ans=0.0 2023-10-09 22:03:20,291 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2886109.3333333335, ans=0.0 2023-10-09 22:03:27,053 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=22.5 2023-10-09 22:03:29,067 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-10-09 22:03:37,733 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=12.0 2023-10-09 22:03:42,520 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2886202.6666666665, ans=0.0 2023-10-09 22:03:50,575 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2886202.6666666665, ans=0.0 2023-10-09 22:03:52,829 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+02 3.271e+02 3.899e+02 4.405e+02 9.865e+02, threshold=7.797e+02, percent-clipped=1.0 2023-10-09 22:03:52,856 INFO [train.py:1031] (1/4) Epoch 14, batch 33750, loss[loss=0.2379, simple_loss=0.2913, pruned_loss=0.06774, ctc_loss=0.1225, over 16996.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2781, pruned_loss=0.06707, ctc_loss=0.1165, over 3289444.48 frames. ], batch size: 293, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:04:11,267 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2886296.0, ans=0.0 2023-10-09 22:04:20,299 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=22.5 2023-10-09 22:04:42,121 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2886436.0, ans=0.1 2023-10-09 22:04:49,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2886436.0, ans=0.125 2023-10-09 22:04:50,793 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886436.0, ans=0.1 2023-10-09 22:04:54,305 INFO [train.py:1031] (1/4) Epoch 14, batch 33800, loss[loss=0.2545, simple_loss=0.2787, pruned_loss=0.08608, ctc_loss=0.1455, over 16674.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2767, pruned_loss=0.06654, ctc_loss=0.1156, over 3298032.09 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:05:03,391 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2886482.6666666665, ans=0.125 2023-10-09 22:05:03,639 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-10-09 22:05:16,541 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2886529.3333333335, ans=0.125 2023-10-09 22:05:23,460 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2886576.0, ans=0.125 2023-10-09 22:05:32,626 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2886622.6666666665, ans=0.125 2023-10-09 22:05:55,375 INFO [train.py:1031] (1/4) Epoch 14, batch 33850, loss[loss=0.22, simple_loss=0.2632, pruned_loss=0.06414, ctc_loss=0.1213, over 16750.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.271, pruned_loss=0.06518, ctc_loss=0.1135, over 3304773.73 frames. ], batch size: 328, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:05:56,413 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+02 3.178e+02 3.599e+02 4.092e+02 7.716e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:06:03,332 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2886716.0, ans=0.2 2023-10-09 22:06:48,008 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2886902.6666666665, ans=0.125 2023-10-09 22:06:56,611 INFO [train.py:1031] (1/4) Epoch 14, batch 33900, loss[loss=0.2195, simple_loss=0.2852, pruned_loss=0.05732, ctc_loss=0.09765, over 16789.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2734, pruned_loss=0.06534, ctc_loss=0.1135, over 3294256.83 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:06:59,474 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=22.5 2023-10-09 22:07:00,732 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2886949.3333333335, ans=0.0 2023-10-09 22:07:38,690 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.87 vs. limit=22.5 2023-10-09 22:07:42,436 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2887089.3333333335, ans=0.125 2023-10-09 22:07:43,882 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=12.0 2023-10-09 22:07:46,290 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2887136.0, ans=0.125 2023-10-09 22:07:46,389 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2887136.0, ans=0.125 2023-10-09 22:07:49,101 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2887136.0, ans=0.0 2023-10-09 22:07:54,225 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-10-09 22:07:57,194 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2887136.0, ans=0.1 2023-10-09 22:07:59,530 INFO [train.py:1031] (1/4) Epoch 14, batch 33950, loss[loss=0.2185, simple_loss=0.3105, pruned_loss=0.0448, ctc_loss=0.09219, over 16918.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2816, pruned_loss=0.06285, ctc_loss=0.1098, over 3299188.29 frames. ], batch size: 258, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:08:03,405 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.763e+02 3.464e+02 4.205e+02 4.959e+02 7.578e+02, threshold=8.409e+02, percent-clipped=4.0 2023-10-09 22:08:35,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2887276.0, ans=0.1 2023-10-09 22:09:02,845 INFO [train.py:1031] (1/4) Epoch 14, batch 34000, loss[loss=0.2259, simple_loss=0.3177, pruned_loss=0.04945, ctc_loss=0.0879, over 16761.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2966, pruned_loss=0.06265, ctc_loss=0.1115, over 3293206.91 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:09:15,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2887462.6666666665, ans=0.2 2023-10-09 22:10:03,845 INFO [train.py:1031] (1/4) Epoch 14, batch 34050, loss[loss=0.2136, simple_loss=0.2721, pruned_loss=0.05628, ctc_loss=0.1064, over 16417.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2939, pruned_loss=0.06076, ctc_loss=0.1085, over 3293466.62 frames. ], batch size: 466, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:10:08,601 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.098e+02 3.845e+02 4.884e+02 8.519e+02, threshold=7.690e+02, percent-clipped=1.0 2023-10-09 22:10:13,867 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2887649.3333333335, ans=15.0 2023-10-09 22:10:27,754 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2887742.6666666665, ans=0.2 2023-10-09 22:10:31,584 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.44 vs. limit=15.0 2023-10-09 22:10:35,284 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2887742.6666666665, ans=15.0 2023-10-09 22:11:04,701 INFO [train.py:1031] (1/4) Epoch 14, batch 34100, loss[loss=0.2184, simple_loss=0.2913, pruned_loss=0.05373, ctc_loss=0.09486, over 16858.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2911, pruned_loss=0.06177, ctc_loss=0.1102, over 3302122.90 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:11:07,122 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2887882.6666666665, ans=0.125 2023-10-09 22:11:13,429 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2887882.6666666665, ans=0.0 2023-10-09 22:11:25,943 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2887929.3333333335, ans=0.2 2023-10-09 22:11:41,811 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2888022.6666666665, ans=0.1 2023-10-09 22:11:48,790 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888022.6666666665, ans=0.125 2023-10-09 22:11:52,943 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2888022.6666666665, ans=0.125 2023-10-09 22:12:05,345 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2888116.0, ans=0.0 2023-10-09 22:12:05,995 INFO [train.py:1031] (1/4) Epoch 14, batch 34150, loss[loss=0.2377, simple_loss=0.2763, pruned_loss=0.07348, ctc_loss=0.1301, over 15291.00 frames. ], tot_loss[loss=0.232, simple_loss=0.292, pruned_loss=0.06341, ctc_loss=0.1127, over 3303067.08 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:12:10,659 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2888116.0, ans=0.025 2023-10-09 22:12:11,414 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+02 3.257e+02 3.702e+02 4.193e+02 7.598e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 22:12:16,376 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2023-10-09 22:12:39,025 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2888209.3333333335, ans=0.125 2023-10-09 22:12:54,623 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2023-10-09 22:13:08,599 INFO [train.py:1031] (1/4) Epoch 14, batch 34200, loss[loss=0.2072, simple_loss=0.2523, pruned_loss=0.06087, ctc_loss=0.1007, over 16696.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2886, pruned_loss=0.064, ctc_loss=0.1129, over 3299247.82 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:13:31,861 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888442.6666666665, ans=0.1 2023-10-09 22:13:49,331 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2888489.3333333335, ans=0.0 2023-10-09 22:13:53,120 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2888489.3333333335, ans=0.125 2023-10-09 22:14:09,171 INFO [train.py:1031] (1/4) Epoch 14, batch 34250, loss[loss=0.2251, simple_loss=0.2733, pruned_loss=0.06355, ctc_loss=0.1244, over 15180.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2805, pruned_loss=0.06248, ctc_loss=0.1099, over 3290488.46 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:14:09,950 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-10-09 22:14:15,724 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.386e+02 3.191e+02 3.616e+02 4.129e+02 7.013e+02, threshold=7.231e+02, percent-clipped=0.0 2023-10-09 22:14:21,146 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2023-10-09 22:14:55,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2888722.6666666665, ans=0.125 2023-10-09 22:14:56,993 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2888722.6666666665, ans=0.0 2023-10-09 22:15:10,747 INFO [train.py:1031] (1/4) Epoch 14, batch 34300, loss[loss=0.2183, simple_loss=0.2775, pruned_loss=0.058, ctc_loss=0.1078, over 16998.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2785, pruned_loss=0.06314, ctc_loss=0.111, over 3290739.97 frames. ], batch size: 258, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:15:11,060 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2888816.0, ans=0.125 2023-10-09 22:15:11,477 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2023-10-09 22:15:19,630 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2888816.0, ans=0.125 2023-10-09 22:15:22,666 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2888862.6666666665, ans=0.0 2023-10-09 22:15:30,568 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2888862.6666666665, ans=0.0 2023-10-09 22:16:01,114 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:16:09,862 INFO [train.py:1031] (1/4) Epoch 14, batch 34350, loss[loss=0.2001, simple_loss=0.2353, pruned_loss=0.06052, ctc_loss=0.1096, over 15591.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2779, pruned_loss=0.06341, ctc_loss=0.111, over 3291717.40 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:16:10,200 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2889049.3333333335, ans=0.125 2023-10-09 22:16:16,841 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 3.283e+02 3.799e+02 4.453e+02 1.021e+03, threshold=7.599e+02, percent-clipped=4.0 2023-10-09 22:16:27,265 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2889096.0, ans=0.125 2023-10-09 22:16:27,719 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2023-10-09 22:16:35,677 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2023-10-09 22:17:00,119 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2889236.0, ans=0.0 2023-10-09 22:17:10,485 INFO [train.py:1031] (1/4) Epoch 14, batch 34400, loss[loss=0.2159, simple_loss=0.2739, pruned_loss=0.05899, ctc_loss=0.0997, over 16870.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2783, pruned_loss=0.06437, ctc_loss=0.1127, over 3291871.31 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:17:15,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2889282.6666666665, ans=0.125 2023-10-09 22:17:37,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2889376.0, ans=0.1 2023-10-09 22:17:52,471 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889422.6666666665, ans=0.1 2023-10-09 22:18:08,749 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2889469.3333333335, ans=0.125 2023-10-09 22:18:11,066 INFO [train.py:1031] (1/4) Epoch 14, batch 34450, loss[loss=0.2231, simple_loss=0.2808, pruned_loss=0.06152, ctc_loss=0.106, over 16768.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2782, pruned_loss=0.06507, ctc_loss=0.1134, over 3287816.54 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:18:15,333 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2889516.0, ans=0.0 2023-10-09 22:18:19,265 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+02 3.186e+02 3.591e+02 4.331e+02 7.838e+02, threshold=7.182e+02, percent-clipped=2.0 2023-10-09 22:18:31,944 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-10-09 22:18:32,867 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-10-09 22:18:47,003 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2889609.3333333335, ans=0.035 2023-10-09 22:18:58,041 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2889656.0, ans=0.0 2023-10-09 22:18:59,776 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2889656.0, ans=0.125 2023-10-09 22:19:09,452 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-10-09 22:19:11,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2889702.6666666665, ans=0.07 2023-10-09 22:19:14,167 INFO [train.py:1031] (1/4) Epoch 14, batch 34500, loss[loss=0.2527, simple_loss=0.3283, pruned_loss=0.06558, ctc_loss=0.1149, over 16903.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2859, pruned_loss=0.06693, ctc_loss=0.1166, over 3301764.61 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:19:20,575 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-10-09 22:19:22,252 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.37 vs. limit=6.0 2023-10-09 22:19:26,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2889796.0, ans=0.125 2023-10-09 22:19:32,731 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.67 vs. limit=10.0 2023-10-09 22:19:37,490 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=15.0 2023-10-09 22:20:11,391 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2889936.0, ans=0.125 2023-10-09 22:20:11,575 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=22.5 2023-10-09 22:20:20,487 INFO [train.py:1031] (1/4) Epoch 14, batch 34550, loss[loss=0.3007, simple_loss=0.3387, pruned_loss=0.09776, ctc_loss=0.1678, over 16524.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2926, pruned_loss=0.06552, ctc_loss=0.1152, over 3293572.80 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:20:29,434 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2023-10-09 22:20:30,360 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.650e+02 3.661e+02 4.529e+02 6.004e+02 9.470e+02, threshold=9.059e+02, percent-clipped=10.0 2023-10-09 22:20:41,166 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2890029.3333333335, ans=0.2 2023-10-09 22:20:41,268 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2890029.3333333335, ans=0.1 2023-10-09 22:20:45,519 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-10-09 22:21:10,163 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2890169.3333333335, ans=0.125 2023-10-09 22:21:24,119 INFO [train.py:1031] (1/4) Epoch 14, batch 34600, loss[loss=0.2144, simple_loss=0.2923, pruned_loss=0.04979, ctc_loss=0.09266, over 15164.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2905, pruned_loss=0.06365, ctc_loss=0.1122, over 3288178.99 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:21:34,136 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2890216.0, ans=0.125 2023-10-09 22:21:40,780 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2890262.6666666665, ans=0.125 2023-10-09 22:21:52,302 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2890309.3333333335, ans=0.125 2023-10-09 22:21:53,395 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2890309.3333333335, ans=0.0 2023-10-09 22:21:54,911 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2023-10-09 22:21:56,111 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2890309.3333333335, ans=0.0 2023-10-09 22:22:15,020 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2023-10-09 22:22:25,918 INFO [train.py:1031] (1/4) Epoch 14, batch 34650, loss[loss=0.1945, simple_loss=0.2649, pruned_loss=0.04486, ctc_loss=0.08582, over 16900.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2845, pruned_loss=0.06025, ctc_loss=0.1067, over 3292977.90 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:22:27,318 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2890449.3333333335, ans=0.0 2023-10-09 22:22:33,041 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=22.5 2023-10-09 22:22:37,035 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.928e+02 3.445e+02 4.113e+02 6.666e+02, threshold=6.890e+02, percent-clipped=0.0 2023-10-09 22:22:41,192 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2890496.0, ans=0.0 2023-10-09 22:22:46,070 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2890496.0, ans=0.125 2023-10-09 22:23:02,702 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2890589.3333333335, ans=0.0 2023-10-09 22:23:09,620 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2890589.3333333335, ans=0.0 2023-10-09 22:23:09,653 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2890589.3333333335, ans=10.0 2023-10-09 22:23:10,549 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2890589.3333333335, ans=0.025 2023-10-09 22:23:13,306 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2890589.3333333335, ans=0.125 2023-10-09 22:23:27,779 INFO [train.py:1031] (1/4) Epoch 14, batch 34700, loss[loss=0.2438, simple_loss=0.2922, pruned_loss=0.07168, ctc_loss=0.1298, over 16867.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.284, pruned_loss=0.06198, ctc_loss=0.1095, over 3299238.78 frames. ], batch size: 291, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:23:40,183 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=15.0 2023-10-09 22:23:47,097 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2890729.3333333335, ans=0.125 2023-10-09 22:24:03,570 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2890776.0, ans=0.1 2023-10-09 22:24:23,325 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2890869.3333333335, ans=0.0 2023-10-09 22:24:31,575 INFO [train.py:1031] (1/4) Epoch 14, batch 34750, loss[loss=0.2325, simple_loss=0.2844, pruned_loss=0.06987, ctc_loss=0.1021, over 12283.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.289, pruned_loss=0.06667, ctc_loss=0.1176, over 3294212.80 frames. ], batch size: 38, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:24:40,801 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2890916.0, ans=0.125 2023-10-09 22:24:42,693 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+02 3.549e+02 4.003e+02 4.772e+02 8.039e+02, threshold=8.005e+02, percent-clipped=2.0 2023-10-09 22:24:58,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2891009.3333333335, ans=0.125 2023-10-09 22:25:13,512 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2891056.0, ans=0.0 2023-10-09 22:25:20,013 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2891102.6666666665, ans=0.125 2023-10-09 22:25:24,756 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2891102.6666666665, ans=0.1 2023-10-09 22:25:31,230 INFO [train.py:1031] (1/4) Epoch 14, batch 34800, loss[loss=0.2169, simple_loss=0.2781, pruned_loss=0.05759, ctc_loss=0.1012, over 16848.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2883, pruned_loss=0.06793, ctc_loss=0.1195, over 3305532.69 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:25:35,740 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2891149.3333333335, ans=0.125 2023-10-09 22:25:40,630 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2891149.3333333335, ans=0.2 2023-10-09 22:26:07,384 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-10-09 22:26:21,099 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2891336.0, ans=0.125 2023-10-09 22:26:33,344 INFO [train.py:1031] (1/4) Epoch 14, batch 34850, loss[loss=0.2738, simple_loss=0.2863, pruned_loss=0.0957, ctc_loss=0.175, over 16651.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2843, pruned_loss=0.06761, ctc_loss=0.1187, over 3303676.62 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:26:36,339 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2891382.6666666665, ans=0.0 2023-10-09 22:26:41,134 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2891382.6666666665, ans=0.125 2023-10-09 22:26:46,831 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+02 3.209e+02 3.596e+02 4.244e+02 8.793e+02, threshold=7.192e+02, percent-clipped=1.0 2023-10-09 22:27:12,944 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2891522.6666666665, ans=0.1 2023-10-09 22:27:18,853 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2891522.6666666665, ans=0.0 2023-10-09 22:27:26,960 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2023-10-09 22:27:35,832 INFO [train.py:1031] (1/4) Epoch 14, batch 34900, loss[loss=0.1842, simple_loss=0.2416, pruned_loss=0.04638, ctc_loss=0.08516, over 16099.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2783, pruned_loss=0.06666, ctc_loss=0.1168, over 3296016.95 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:27:59,213 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2023-10-09 22:28:38,132 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2891849.3333333335, ans=0.0 2023-10-09 22:28:38,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2891849.3333333335, ans=0.1 2023-10-09 22:28:38,934 INFO [train.py:1031] (1/4) Epoch 14, batch 34950, loss[loss=0.255, simple_loss=0.3009, pruned_loss=0.0762, ctc_loss=0.1418, over 16611.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2804, pruned_loss=0.06633, ctc_loss=0.1162, over 3298503.16 frames. ], batch size: 418, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:28:54,409 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 3.348e+02 3.779e+02 4.801e+02 1.162e+03, threshold=7.559e+02, percent-clipped=3.0 2023-10-09 22:29:38,620 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2892036.0, ans=0.1 2023-10-09 22:29:42,586 INFO [train.py:1031] (1/4) Epoch 14, batch 35000, loss[loss=0.1785, simple_loss=0.2762, pruned_loss=0.02892, ctc_loss=0.05743, over 16262.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2825, pruned_loss=0.06603, ctc_loss=0.116, over 3295056.58 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:29:55,917 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2892129.3333333335, ans=0.125 2023-10-09 22:30:00,670 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-10-09 22:30:00,861 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2023-10-09 22:30:14,837 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2892176.0, ans=0.0 2023-10-09 22:30:21,570 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2892222.6666666665, ans=0.125 2023-10-09 22:30:34,879 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2023-10-09 22:30:37,895 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:30:40,640 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2892269.3333333335, ans=0.07 2023-10-09 22:30:48,022 INFO [train.py:1031] (1/4) Epoch 14, batch 35050, loss[loss=0.22, simple_loss=0.2856, pruned_loss=0.05632, ctc_loss=0.1042, over 16787.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2853, pruned_loss=0.06562, ctc_loss=0.1161, over 3295385.38 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:30:59,761 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2892362.6666666665, ans=0.125 2023-10-09 22:31:04,383 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 3.170e+02 3.753e+02 4.510e+02 9.970e+02, threshold=7.506e+02, percent-clipped=2.0 2023-10-09 22:31:27,706 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2892456.0, ans=0.0 2023-10-09 22:31:43,789 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:31:46,506 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2892502.6666666665, ans=0.125 2023-10-09 22:31:51,702 INFO [train.py:1031] (1/4) Epoch 14, batch 35100, loss[loss=0.2051, simple_loss=0.2769, pruned_loss=0.04774, ctc_loss=0.09464, over 16927.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2874, pruned_loss=0.06477, ctc_loss=0.1151, over 3307061.99 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:32:13,334 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2892596.0, ans=0.1 2023-10-09 22:32:14,663 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-10-09 22:32:19,048 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2892642.6666666665, ans=0.125 2023-10-09 22:32:38,473 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-10-09 22:32:54,767 INFO [train.py:1031] (1/4) Epoch 14, batch 35150, loss[loss=0.234, simple_loss=0.2961, pruned_loss=0.06358, ctc_loss=0.1118, over 16834.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2891, pruned_loss=0.06564, ctc_loss=0.1167, over 3298136.29 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:33:12,901 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.271e+02 3.877e+02 4.489e+02 9.044e+02, threshold=7.754e+02, percent-clipped=1.0 2023-10-09 22:33:13,633 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-10-09 22:33:42,114 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2892922.6666666665, ans=0.125 2023-10-09 22:33:56,338 INFO [train.py:1031] (1/4) Epoch 14, batch 35200, loss[loss=0.1893, simple_loss=0.2755, pruned_loss=0.03805, ctc_loss=0.06757, over 16869.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2885, pruned_loss=0.06299, ctc_loss=0.112, over 3307467.26 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:34:03,294 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-10-09 22:34:21,238 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:34:26,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2893109.3333333335, ans=0.0 2023-10-09 22:34:38,737 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2893156.0, ans=0.0 2023-10-09 22:34:45,234 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2893202.6666666665, ans=0.1 2023-10-09 22:34:59,203 INFO [train.py:1031] (1/4) Epoch 14, batch 35250, loss[loss=0.2544, simple_loss=0.3177, pruned_loss=0.06939, ctc_loss=0.1308, over 16887.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2849, pruned_loss=0.06145, ctc_loss=0.1089, over 3307364.65 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:35:06,453 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2893249.3333333335, ans=0.2 2023-10-09 22:35:19,505 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.987e+02 3.599e+02 4.398e+02 6.579e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:35:28,715 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-10-09 22:35:40,613 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2893389.3333333335, ans=0.125 2023-10-09 22:35:47,441 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2893389.3333333335, ans=0.0 2023-10-09 22:35:52,460 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893436.0, ans=0.1 2023-10-09 22:36:01,347 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2893436.0, ans=0.2 2023-10-09 22:36:05,972 INFO [train.py:1031] (1/4) Epoch 14, batch 35300, loss[loss=0.2695, simple_loss=0.3558, pruned_loss=0.06552, ctc_loss=0.1306, over 15068.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2948, pruned_loss=0.06345, ctc_loss=0.1126, over 3306197.36 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:36:07,427 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2893482.6666666665, ans=0.0 2023-10-09 22:36:11,666 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2893482.6666666665, ans=0.125 2023-10-09 22:36:18,282 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2893529.3333333335, ans=0.0 2023-10-09 22:36:27,297 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-10-09 22:36:42,173 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2893576.0, ans=0.125 2023-10-09 22:36:42,336 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2893576.0, ans=0.125 2023-10-09 22:36:45,446 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-10-09 22:36:47,991 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2893622.6666666665, ans=0.0 2023-10-09 22:36:51,083 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2893622.6666666665, ans=0.125 2023-10-09 22:36:56,726 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2893669.3333333335, ans=0.125 2023-10-09 22:36:59,615 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2893669.3333333335, ans=0.125 2023-10-09 22:37:10,998 INFO [train.py:1031] (1/4) Epoch 14, batch 35350, loss[loss=0.2519, simple_loss=0.3062, pruned_loss=0.07178, ctc_loss=0.1352, over 16174.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2984, pruned_loss=0.0664, ctc_loss=0.1175, over 3306564.24 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:37:20,354 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2893716.0, ans=0.125 2023-10-09 22:37:31,481 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+02 3.418e+02 3.862e+02 4.842e+02 9.244e+02, threshold=7.725e+02, percent-clipped=2.0 2023-10-09 22:37:37,673 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2893809.3333333335, ans=0.0 2023-10-09 22:37:48,295 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893856.0, ans=0.1 2023-10-09 22:38:02,596 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2893902.6666666665, ans=0.125 2023-10-09 22:38:11,759 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2893902.6666666665, ans=0.0 2023-10-09 22:38:14,148 INFO [train.py:1031] (1/4) Epoch 14, batch 35400, loss[loss=0.2535, simple_loss=0.3095, pruned_loss=0.07108, ctc_loss=0.1382, over 15238.00 frames. ], tot_loss[loss=0.2426, simple_loss=0.3032, pruned_loss=0.06717, ctc_loss=0.119, over 3302577.88 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:38:22,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2893949.3333333335, ans=0.0 2023-10-09 22:38:35,051 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2893996.0, ans=0.125 2023-10-09 22:38:41,097 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2894042.6666666665, ans=0.125 2023-10-09 22:38:43,063 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2894042.6666666665, ans=0.07 2023-10-09 22:38:51,499 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2894089.3333333335, ans=0.125 2023-10-09 22:38:54,032 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=22.5 2023-10-09 22:38:59,148 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2894089.3333333335, ans=0.125 2023-10-09 22:39:06,791 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2894136.0, ans=0.125 2023-10-09 22:39:14,642 INFO [train.py:1031] (1/4) Epoch 14, batch 35450, loss[loss=0.189, simple_loss=0.2424, pruned_loss=0.05125, ctc_loss=0.08249, over 11241.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2954, pruned_loss=0.06583, ctc_loss=0.1167, over 3294460.30 frames. ], batch size: 35, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:39:30,260 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2894229.3333333335, ans=0.0 2023-10-09 22:39:36,532 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+02 3.230e+02 3.810e+02 4.860e+02 8.869e+02, threshold=7.620e+02, percent-clipped=1.0 2023-10-09 22:39:54,208 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2894322.6666666665, ans=0.0 2023-10-09 22:40:13,294 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2023-10-09 22:40:17,594 INFO [train.py:1031] (1/4) Epoch 14, batch 35500, loss[loss=0.2497, simple_loss=0.3129, pruned_loss=0.06797, ctc_loss=0.1266, over 16709.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2924, pruned_loss=0.06683, ctc_loss=0.1184, over 3298866.60 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:40:42,750 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2894509.3333333335, ans=0.125 2023-10-09 22:40:45,124 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-10-09 22:40:48,355 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2894509.3333333335, ans=0.125 2023-10-09 22:40:54,233 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2894556.0, ans=0.125 2023-10-09 22:41:06,153 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-10-09 22:41:20,535 INFO [train.py:1031] (1/4) Epoch 14, batch 35550, loss[loss=0.2509, simple_loss=0.3016, pruned_loss=0.07488, ctc_loss=0.1261, over 16819.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.294, pruned_loss=0.06866, ctc_loss=0.1208, over 3297823.95 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:41:23,105 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2894649.3333333335, ans=0.125 2023-10-09 22:41:32,758 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2894696.0, ans=0.125 2023-10-09 22:41:42,149 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.738e+02 3.683e+02 4.220e+02 5.051e+02 8.035e+02, threshold=8.441e+02, percent-clipped=1.0 2023-10-09 22:42:01,210 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-10-09 22:42:09,582 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2894836.0, ans=0.1 2023-10-09 22:42:14,155 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2894836.0, ans=0.0 2023-10-09 22:42:22,017 INFO [train.py:1031] (1/4) Epoch 14, batch 35600, loss[loss=0.2125, simple_loss=0.2779, pruned_loss=0.05627, ctc_loss=0.08608, over 16789.00 frames. ], tot_loss[loss=0.2401, simple_loss=0.2944, pruned_loss=0.0688, ctc_loss=0.1207, over 3295756.57 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:42:33,129 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2894929.3333333335, ans=0.0 2023-10-09 22:42:48,580 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2894976.0, ans=0.2 2023-10-09 22:42:59,986 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2895022.6666666665, ans=10.0 2023-10-09 22:43:01,418 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=12.0 2023-10-09 22:43:20,849 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2023-10-09 22:43:23,147 INFO [train.py:1031] (1/4) Epoch 14, batch 35650, loss[loss=0.1631, simple_loss=0.2181, pruned_loss=0.04034, ctc_loss=0.0684, over 16444.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2885, pruned_loss=0.0639, ctc_loss=0.1126, over 3295570.81 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:43:33,483 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2895116.0, ans=0.0 2023-10-09 22:43:46,393 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2895162.6666666665, ans=10.0 2023-10-09 22:43:47,003 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.989e+02 3.692e+02 4.285e+02 1.206e+03, threshold=7.384e+02, percent-clipped=2.0 2023-10-09 22:44:03,862 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2895256.0, ans=0.125 2023-10-09 22:44:07,757 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2895256.0, ans=0.0 2023-10-09 22:44:19,454 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2895302.6666666665, ans=0.0 2023-10-09 22:44:26,166 INFO [train.py:1031] (1/4) Epoch 14, batch 35700, loss[loss=0.2851, simple_loss=0.3196, pruned_loss=0.09248, ctc_loss=0.1641, over 16441.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2933, pruned_loss=0.06526, ctc_loss=0.1152, over 3291453.77 frames. ], batch size: 414, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:44:32,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2895349.3333333335, ans=0.125 2023-10-09 22:44:37,455 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2895396.0, ans=0.125 2023-10-09 22:44:47,473 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-10-09 22:44:58,377 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2895442.6666666665, ans=0.0 2023-10-09 22:44:59,722 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2023-10-09 22:45:02,554 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2895489.3333333335, ans=0.125 2023-10-09 22:45:22,585 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2895536.0, ans=0.0 2023-10-09 22:45:27,098 INFO [train.py:1031] (1/4) Epoch 14, batch 35750, loss[loss=0.2287, simple_loss=0.2797, pruned_loss=0.0645, ctc_loss=0.1216, over 16941.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2925, pruned_loss=0.06626, ctc_loss=0.1171, over 3297468.66 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:45:28,284 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-10-09 22:45:29,140 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-10-09 22:45:53,023 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.764e+02 4.390e+02 5.354e+02 1.212e+03, threshold=8.781e+02, percent-clipped=8.0 2023-10-09 22:46:11,994 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2895722.6666666665, ans=0.125 2023-10-09 22:46:29,802 INFO [train.py:1031] (1/4) Epoch 14, batch 35800, loss[loss=0.2247, simple_loss=0.2783, pruned_loss=0.06495, ctc_loss=0.1028, over 16766.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2929, pruned_loss=0.06819, ctc_loss=0.1198, over 3305833.92 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:46:33,244 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2895816.0, ans=0.035 2023-10-09 22:46:39,744 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2895816.0, ans=0.1 2023-10-09 22:46:45,938 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2895862.6666666665, ans=10.0 2023-10-09 22:46:48,033 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2895862.6666666665, ans=0.0 2023-10-09 22:47:19,386 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2896002.6666666665, ans=0.0 2023-10-09 22:47:31,696 INFO [train.py:1031] (1/4) Epoch 14, batch 35850, loss[loss=0.2695, simple_loss=0.3334, pruned_loss=0.07526, ctc_loss=0.1376, over 16808.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.2967, pruned_loss=0.06917, ctc_loss=0.1213, over 3301174.12 frames. ], batch size: 308, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:47:37,546 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2896049.3333333335, ans=0.1 2023-10-09 22:47:57,701 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 3.376e+02 4.105e+02 5.188e+02 8.758e+02, threshold=8.210e+02, percent-clipped=0.0 2023-10-09 22:48:12,833 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2896189.3333333335, ans=0.125 2023-10-09 22:48:20,492 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2896236.0, ans=0.125 2023-10-09 22:48:32,282 INFO [train.py:1031] (1/4) Epoch 14, batch 35900, loss[loss=0.204, simple_loss=0.2973, pruned_loss=0.03879, ctc_loss=0.08284, over 16878.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2936, pruned_loss=0.06353, ctc_loss=0.112, over 3305250.90 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:48:35,338 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.68 vs. limit=22.5 2023-10-09 22:48:43,552 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2896282.6666666665, ans=0.125 2023-10-09 22:48:44,893 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-10-09 22:48:55,664 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2896329.3333333335, ans=0.09899494936611666 2023-10-09 22:49:10,108 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=22.5 2023-10-09 22:49:14,851 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2896422.6666666665, ans=0.0 2023-10-09 22:49:14,904 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2896422.6666666665, ans=0.05 2023-10-09 22:49:28,556 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2896469.3333333335, ans=0.0 2023-10-09 22:49:36,332 INFO [train.py:1031] (1/4) Epoch 14, batch 35950, loss[loss=0.1779, simple_loss=0.2515, pruned_loss=0.0387, ctc_loss=0.06733, over 16797.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2917, pruned_loss=0.06017, ctc_loss=0.1069, over 3307298.81 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:49:57,809 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2896562.6666666665, ans=0.1 2023-10-09 22:50:03,309 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2896609.3333333335, ans=0.1 2023-10-09 22:50:03,677 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2023-10-09 22:50:04,032 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.666e+02 3.384e+02 4.357e+02 7.839e+02, threshold=6.768e+02, percent-clipped=0.0 2023-10-09 22:50:13,404 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2896656.0, ans=0.0 2023-10-09 22:50:28,147 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2896702.6666666665, ans=0.125 2023-10-09 22:50:38,133 INFO [train.py:1031] (1/4) Epoch 14, batch 36000, loss[loss=0.1855, simple_loss=0.2526, pruned_loss=0.04361, ctc_loss=0.078, over 16908.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2798, pruned_loss=0.05426, ctc_loss=0.09665, over 3311201.32 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:50:38,134 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 22:50:58,874 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2335, simple_loss=0.304, pruned_loss=0.06295, ctc_loss=0.09275, over 1796401.00 frames. 2023-10-09 22:50:58,875 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 22:51:19,257 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2896796.0, ans=0.125 2023-10-09 22:51:37,828 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-10-09 22:51:46,855 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2896936.0, ans=0.2 2023-10-09 22:51:53,412 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2896936.0, ans=0.125 2023-10-09 22:51:59,929 INFO [train.py:1031] (1/4) Epoch 14, batch 36050, loss[loss=0.2055, simple_loss=0.2722, pruned_loss=0.05057, ctc_loss=0.09389, over 16917.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2772, pruned_loss=0.05507, ctc_loss=0.09766, over 3308775.44 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:52:02,113 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:52:05,094 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-10-09 22:52:18,311 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2897029.3333333335, ans=0.125 2023-10-09 22:52:18,374 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2897029.3333333335, ans=0.125 2023-10-09 22:52:23,428 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2897029.3333333335, ans=0.5 2023-10-09 22:52:29,190 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.810e+02 3.555e+02 4.396e+02 7.920e+02, threshold=7.110e+02, percent-clipped=1.0 2023-10-09 22:52:47,098 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2897122.6666666665, ans=0.0 2023-10-09 22:52:56,690 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2897169.3333333335, ans=0.125 2023-10-09 22:52:59,466 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2897169.3333333335, ans=0.0 2023-10-09 22:53:01,390 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2023-10-09 22:53:03,014 INFO [train.py:1031] (1/4) Epoch 14, batch 36100, loss[loss=0.2711, simple_loss=0.3253, pruned_loss=0.08162, ctc_loss=0.134, over 17024.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2826, pruned_loss=0.05991, ctc_loss=0.1061, over 3311961.85 frames. ], batch size: 91, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:53:08,977 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2897216.0, ans=0.1 2023-10-09 22:53:24,456 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2897262.6666666665, ans=0.125 2023-10-09 22:54:02,952 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2897402.6666666665, ans=0.125 2023-10-09 22:54:06,391 INFO [train.py:1031] (1/4) Epoch 14, batch 36150, loss[loss=0.202, simple_loss=0.2511, pruned_loss=0.05752, ctc_loss=0.09478, over 10685.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2851, pruned_loss=0.06199, ctc_loss=0.1096, over 3309285.46 frames. ], batch size: 37, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:54:06,939 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2023-10-09 22:54:07,733 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2897449.3333333335, ans=0.2 2023-10-09 22:54:24,207 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2897496.0, ans=0.125 2023-10-09 22:54:36,550 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+02 3.412e+02 4.167e+02 5.128e+02 1.236e+03, threshold=8.334e+02, percent-clipped=3.0 2023-10-09 22:54:37,266 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2023-10-09 22:54:46,361 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2897589.3333333335, ans=0.035 2023-10-09 22:54:50,331 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2897589.3333333335, ans=0.125 2023-10-09 22:55:09,643 INFO [train.py:1031] (1/4) Epoch 14, batch 36200, loss[loss=0.2307, simple_loss=0.2894, pruned_loss=0.06285, ctc_loss=0.1156, over 16755.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2917, pruned_loss=0.06435, ctc_loss=0.1144, over 3314642.13 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:55:14,983 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-10-09 22:55:20,947 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-10-09 22:55:27,709 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2897729.3333333335, ans=0.1 2023-10-09 22:55:33,533 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.49 vs. limit=6.0 2023-10-09 22:55:45,921 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2023-10-09 22:55:53,854 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-10-09 22:55:59,805 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2897869.3333333335, ans=0.0 2023-10-09 22:56:11,656 INFO [train.py:1031] (1/4) Epoch 14, batch 36250, loss[loss=0.223, simple_loss=0.2854, pruned_loss=0.05953, ctc_loss=0.1039, over 16880.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2942, pruned_loss=0.06363, ctc_loss=0.1143, over 3309996.26 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:56:20,708 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2897916.0, ans=0.0 2023-10-09 22:56:27,112 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2897962.6666666665, ans=0.125 2023-10-09 22:56:31,547 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-10-09 22:56:42,282 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.544e+02 3.450e+02 4.069e+02 4.879e+02 1.069e+03, threshold=8.138e+02, percent-clipped=4.0 2023-10-09 22:56:42,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2898009.3333333335, ans=0.125 2023-10-09 22:57:03,577 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2898102.6666666665, ans=0.2 2023-10-09 22:57:03,595 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2898102.6666666665, ans=0.2 2023-10-09 22:57:10,690 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2898102.6666666665, ans=0.05 2023-10-09 22:57:13,566 INFO [train.py:1031] (1/4) Epoch 14, batch 36300, loss[loss=0.2073, simple_loss=0.2799, pruned_loss=0.05018, ctc_loss=0.08601, over 16806.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.292, pruned_loss=0.06371, ctc_loss=0.1141, over 3309510.39 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:57:29,987 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-10-09 22:57:46,733 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2898242.6666666665, ans=0.0 2023-10-09 22:58:01,264 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2898289.3333333335, ans=0.2 2023-10-09 22:58:14,032 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-10-09 22:58:16,200 INFO [train.py:1031] (1/4) Epoch 14, batch 36350, loss[loss=0.2674, simple_loss=0.3131, pruned_loss=0.08229, ctc_loss=0.1427, over 16891.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2939, pruned_loss=0.06554, ctc_loss=0.1167, over 3307966.32 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:58:24,282 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2898382.6666666665, ans=0.0 2023-10-09 22:58:37,284 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2898429.3333333335, ans=0.0 2023-10-09 22:58:48,784 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+02 3.450e+02 4.170e+02 4.968e+02 1.204e+03, threshold=8.340e+02, percent-clipped=3.0 2023-10-09 22:58:59,611 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2898522.6666666665, ans=0.125 2023-10-09 22:59:07,287 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2898569.3333333335, ans=0.0 2023-10-09 22:59:10,961 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2898569.3333333335, ans=0.1 2023-10-09 22:59:19,342 INFO [train.py:1031] (1/4) Epoch 14, batch 36400, loss[loss=0.2686, simple_loss=0.2959, pruned_loss=0.08961, ctc_loss=0.1552, over 16500.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2922, pruned_loss=0.0663, ctc_loss=0.1172, over 3306779.76 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:59:20,553 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2898616.0, ans=0.125 2023-10-09 22:59:38,083 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.67 vs. limit=6.0 2023-10-09 22:59:43,617 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2898709.3333333335, ans=0.0 2023-10-09 22:59:47,409 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2898709.3333333335, ans=0.2 2023-10-09 22:59:54,563 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2898709.3333333335, ans=0.125 2023-10-09 23:00:03,506 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2898756.0, ans=0.2 2023-10-09 23:00:21,503 INFO [train.py:1031] (1/4) Epoch 14, batch 36450, loss[loss=0.2366, simple_loss=0.2669, pruned_loss=0.07838, ctc_loss=0.1239, over 16910.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2852, pruned_loss=0.06573, ctc_loss=0.1155, over 3306172.08 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:00:28,187 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2898849.3333333335, ans=0.1 2023-10-09 23:00:42,059 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2898896.0, ans=0.125 2023-10-09 23:00:42,118 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2898896.0, ans=0.0 2023-10-09 23:00:51,412 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2898942.6666666665, ans=0.0 2023-10-09 23:00:54,965 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.085e+02 3.494e+02 4.091e+02 1.458e+03, threshold=6.988e+02, percent-clipped=1.0 2023-10-09 23:00:55,602 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=22.5 2023-10-09 23:00:56,807 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-10-09 23:01:03,229 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2898989.3333333335, ans=0.125 2023-10-09 23:01:03,623 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.67 vs. limit=22.5 2023-10-09 23:01:09,964 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2898989.3333333335, ans=0.0 2023-10-09 23:01:12,307 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2023-10-09 23:01:24,211 INFO [train.py:1031] (1/4) Epoch 14, batch 36500, loss[loss=0.2292, simple_loss=0.2731, pruned_loss=0.0693, ctc_loss=0.1165, over 16872.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2781, pruned_loss=0.06423, ctc_loss=0.1128, over 3301669.79 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:01:29,433 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2899082.6666666665, ans=0.2 2023-10-09 23:01:45,419 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2899129.3333333335, ans=0.125 2023-10-09 23:02:02,894 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2899222.6666666665, ans=0.125 2023-10-09 23:02:26,331 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2899316.0, ans=0.125 2023-10-09 23:02:27,713 INFO [train.py:1031] (1/4) Epoch 14, batch 36550, loss[loss=0.2256, simple_loss=0.2892, pruned_loss=0.06022, ctc_loss=0.1037, over 16759.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.277, pruned_loss=0.0635, ctc_loss=0.1113, over 3299362.04 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:02:37,236 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2899316.0, ans=0.0 2023-10-09 23:02:47,242 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-10-09 23:02:49,158 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2899362.6666666665, ans=0.125 2023-10-09 23:03:01,173 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.250e+02 3.665e+02 4.225e+02 1.129e+03, threshold=7.330e+02, percent-clipped=1.0 2023-10-09 23:03:06,790 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-10-09 23:03:17,569 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2899502.6666666665, ans=0.125 2023-10-09 23:03:28,066 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2899549.3333333335, ans=0.125 2023-10-09 23:03:28,788 INFO [train.py:1031] (1/4) Epoch 14, batch 36600, loss[loss=0.2135, simple_loss=0.2502, pruned_loss=0.06631, ctc_loss=0.1106, over 16740.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2756, pruned_loss=0.06225, ctc_loss=0.1092, over 3304576.72 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:03:45,244 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2899596.0, ans=0.0 2023-10-09 23:03:59,034 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=22.5 2023-10-09 23:04:30,819 INFO [train.py:1031] (1/4) Epoch 14, batch 36650, loss[loss=0.1992, simple_loss=0.2526, pruned_loss=0.05487, ctc_loss=0.09026, over 15999.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2696, pruned_loss=0.06065, ctc_loss=0.1065, over 3302158.39 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:04:56,361 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2899876.0, ans=0.125 2023-10-09 23:04:56,384 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2899876.0, ans=0.0 2023-10-09 23:04:56,401 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2899876.0, ans=0.07 2023-10-09 23:05:06,012 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 3.036e+02 3.411e+02 4.060e+02 1.638e+03, threshold=6.823e+02, percent-clipped=3.0 2023-10-09 23:05:33,282 INFO [train.py:1031] (1/4) Epoch 14, batch 36700, loss[loss=0.2289, simple_loss=0.2817, pruned_loss=0.06611, ctc_loss=0.1098, over 16921.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2652, pruned_loss=0.06018, ctc_loss=0.1055, over 3311824.54 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:05:35,744 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2900016.0, ans=0.0 2023-10-09 23:06:08,429 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2900156.0, ans=0.0 2023-10-09 23:06:11,679 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2023-10-09 23:06:13,237 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2900156.0, ans=0.125 2023-10-09 23:06:18,524 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2900156.0, ans=0.125 2023-10-09 23:06:33,071 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2900249.3333333335, ans=0.125 2023-10-09 23:06:34,443 INFO [train.py:1031] (1/4) Epoch 14, batch 36750, loss[loss=0.2402, simple_loss=0.2865, pruned_loss=0.0732, ctc_loss=0.1185, over 16583.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2668, pruned_loss=0.0618, ctc_loss=0.1077, over 3313797.70 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:06:37,829 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2900249.3333333335, ans=0.0 2023-10-09 23:06:56,769 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2900342.6666666665, ans=0.1 2023-10-09 23:07:09,706 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+02 3.153e+02 3.511e+02 4.063e+02 5.415e+02, threshold=7.022e+02, percent-clipped=0.0 2023-10-09 23:07:28,809 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2900436.0, ans=0.125 2023-10-09 23:07:28,818 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2900436.0, ans=0.0 2023-10-09 23:07:28,849 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2900436.0, ans=0.125 2023-10-09 23:07:34,282 INFO [train.py:1031] (1/4) Epoch 14, batch 36800, loss[loss=0.1975, simple_loss=0.2488, pruned_loss=0.05502, ctc_loss=0.09049, over 16722.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2689, pruned_loss=0.06194, ctc_loss=0.1074, over 3319836.69 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:07:39,685 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2023-10-09 23:08:09,993 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2900622.6666666665, ans=0.125 2023-10-09 23:08:14,767 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2900622.6666666665, ans=0.125 2023-10-09 23:08:23,603 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2900669.3333333335, ans=0.2 2023-10-09 23:08:35,594 INFO [train.py:1031] (1/4) Epoch 14, batch 36850, loss[loss=0.2827, simple_loss=0.3314, pruned_loss=0.08799, ctc_loss=0.1451, over 16843.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2713, pruned_loss=0.06212, ctc_loss=0.1071, over 3314986.19 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:08:41,844 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2900716.0, ans=0.125 2023-10-09 23:08:52,528 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-10-09 23:09:11,521 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2900809.3333333335, ans=0.125 2023-10-09 23:09:16,101 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+02 3.452e+02 4.218e+02 5.067e+02 9.154e+02, threshold=8.437e+02, percent-clipped=6.0 2023-10-09 23:09:20,324 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2900856.0, ans=0.0 2023-10-09 23:09:22,284 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=22.5 2023-10-09 23:09:24,102 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2900856.0, ans=0.125 2023-10-09 23:09:38,573 INFO [train.py:1031] (1/4) Epoch 14, batch 36900, loss[loss=0.2441, simple_loss=0.2892, pruned_loss=0.07496, ctc_loss=0.1229, over 17045.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2783, pruned_loss=0.06529, ctc_loss=0.113, over 3302794.04 frames. ], batch size: 216, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:09:55,966 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2900996.0, ans=0.1 2023-10-09 23:10:02,049 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2900996.0, ans=0.0 2023-10-09 23:10:24,340 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2023-10-09 23:10:26,101 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2901089.3333333335, ans=0.125 2023-10-09 23:10:43,336 INFO [train.py:1031] (1/4) Epoch 14, batch 36950, loss[loss=0.2461, simple_loss=0.3102, pruned_loss=0.06826, ctc_loss=0.1139, over 12470.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2848, pruned_loss=0.06866, ctc_loss=0.1193, over 3297552.32 frames. ], batch size: 38, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:10:51,450 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901182.6666666665, ans=0.1 2023-10-09 23:10:56,774 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2901229.3333333335, ans=0.125 2023-10-09 23:11:08,271 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=22.5 2023-10-09 23:11:25,048 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+02 3.608e+02 4.056e+02 4.983e+02 1.030e+03, threshold=8.112e+02, percent-clipped=3.0 2023-10-09 23:11:38,256 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2901369.3333333335, ans=0.1 2023-10-09 23:11:40,310 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2901369.3333333335, ans=0.125 2023-10-09 23:11:46,857 INFO [train.py:1031] (1/4) Epoch 14, batch 37000, loss[loss=0.2327, simple_loss=0.2753, pruned_loss=0.07094, ctc_loss=0.1207, over 16729.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2917, pruned_loss=0.06977, ctc_loss=0.1213, over 3297814.20 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:12:10,107 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901462.6666666665, ans=0.1 2023-10-09 23:12:26,158 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2901556.0, ans=0.1 2023-10-09 23:12:49,891 INFO [train.py:1031] (1/4) Epoch 14, batch 37050, loss[loss=0.191, simple_loss=0.2433, pruned_loss=0.05162, ctc_loss=0.08867, over 16696.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.2863, pruned_loss=0.06794, ctc_loss=0.1182, over 3301127.46 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:13:23,732 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=22.5 2023-10-09 23:13:31,556 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.263e+02 3.806e+02 4.315e+02 8.340e+02, threshold=7.611e+02, percent-clipped=1.0 2023-10-09 23:13:51,985 INFO [train.py:1031] (1/4) Epoch 14, batch 37100, loss[loss=0.2088, simple_loss=0.2604, pruned_loss=0.05859, ctc_loss=0.1001, over 16807.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.279, pruned_loss=0.06619, ctc_loss=0.1153, over 3297725.08 frames. ], batch size: 273, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:14:02,904 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901929.3333333335, ans=0.1 2023-10-09 23:14:19,968 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-10-09 23:14:20,082 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=22.5 2023-10-09 23:14:53,065 INFO [train.py:1031] (1/4) Epoch 14, batch 37150, loss[loss=0.1912, simple_loss=0.2434, pruned_loss=0.05101, ctc_loss=0.09247, over 16735.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2725, pruned_loss=0.06518, ctc_loss=0.1138, over 3303296.81 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:15:10,545 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2902162.6666666665, ans=0.125 2023-10-09 23:15:13,232 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2902162.6666666665, ans=0.125 2023-10-09 23:15:13,283 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2902162.6666666665, ans=0.125 2023-10-09 23:15:16,390 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:15:23,441 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2902209.3333333335, ans=0.125 2023-10-09 23:15:24,693 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902209.3333333335, ans=0.1 2023-10-09 23:15:34,534 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 3.053e+02 3.584e+02 4.083e+02 7.481e+02, threshold=7.169e+02, percent-clipped=0.0 2023-10-09 23:15:40,991 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-10-09 23:15:45,980 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2902302.6666666665, ans=0.0 2023-10-09 23:15:54,260 INFO [train.py:1031] (1/4) Epoch 14, batch 37200, loss[loss=0.2167, simple_loss=0.2663, pruned_loss=0.0649, ctc_loss=0.09345, over 11370.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2766, pruned_loss=0.06337, ctc_loss=0.1112, over 3300395.93 frames. ], batch size: 39, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:16:16,032 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.28 vs. limit=22.5 2023-10-09 23:16:28,823 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2902489.3333333335, ans=0.125 2023-10-09 23:16:31,453 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2902489.3333333335, ans=0.125 2023-10-09 23:16:36,678 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=22.5 2023-10-09 23:16:42,480 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2902536.0, ans=0.125 2023-10-09 23:16:44,628 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2902536.0, ans=0.1 2023-10-09 23:16:47,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2902536.0, ans=0.1 2023-10-09 23:16:51,532 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2902536.0, ans=0.125 2023-10-09 23:16:53,192 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2902582.6666666665, ans=0.125 2023-10-09 23:16:53,922 INFO [train.py:1031] (1/4) Epoch 14, batch 37250, loss[loss=0.2168, simple_loss=0.2814, pruned_loss=0.05718, ctc_loss=0.0946, over 16835.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2815, pruned_loss=0.06174, ctc_loss=0.1088, over 3310302.62 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:17:07,530 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2902629.3333333335, ans=0.125 2023-10-09 23:17:21,525 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2902676.0, ans=0.125 2023-10-09 23:17:36,506 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 2.936e+02 3.384e+02 3.917e+02 6.225e+02, threshold=6.767e+02, percent-clipped=0.0 2023-10-09 23:17:54,216 INFO [train.py:1031] (1/4) Epoch 14, batch 37300, loss[loss=0.1974, simple_loss=0.2598, pruned_loss=0.05106, ctc_loss=0.08244, over 16724.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.28, pruned_loss=0.06081, ctc_loss=0.1072, over 3307089.59 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:18:01,941 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2902816.0, ans=0.2 2023-10-09 23:18:06,727 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-10-09 23:18:08,490 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902862.6666666665, ans=0.1 2023-10-09 23:18:12,221 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=22.5 2023-10-09 23:18:21,065 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2902909.3333333335, ans=0.125 2023-10-09 23:18:26,173 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-10-09 23:18:42,635 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2903002.6666666665, ans=0.09899494936611666 2023-10-09 23:18:55,646 INFO [train.py:1031] (1/4) Epoch 14, batch 37350, loss[loss=0.2124, simple_loss=0.2731, pruned_loss=0.05676, ctc_loss=0.09537, over 16787.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2816, pruned_loss=0.05953, ctc_loss=0.1052, over 3305848.67 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:19:05,799 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2903049.3333333335, ans=0.0 2023-10-09 23:19:13,947 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-10-09 23:19:14,650 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2903096.0, ans=0.0 2023-10-09 23:19:16,024 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=22.5 2023-10-09 23:19:23,416 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2903142.6666666665, ans=0.125 2023-10-09 23:19:24,478 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2903142.6666666665, ans=0.09899494936611666 2023-10-09 23:19:25,513 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2903142.6666666665, ans=0.2 2023-10-09 23:19:32,273 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2903189.3333333335, ans=0.125 2023-10-09 23:19:38,026 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 2.949e+02 3.528e+02 4.105e+02 1.147e+03, threshold=7.057e+02, percent-clipped=0.0 2023-10-09 23:19:51,807 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2903236.0, ans=0.025 2023-10-09 23:19:54,562 INFO [train.py:1031] (1/4) Epoch 14, batch 37400, loss[loss=0.1996, simple_loss=0.2502, pruned_loss=0.05543, ctc_loss=0.0952, over 16710.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2767, pruned_loss=0.0596, ctc_loss=0.1047, over 3303010.04 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:20:10,605 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2903329.3333333335, ans=0.125 2023-10-09 23:20:24,958 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2903376.0, ans=0.0 2023-10-09 23:20:55,593 INFO [train.py:1031] (1/4) Epoch 14, batch 37450, loss[loss=0.2471, simple_loss=0.3028, pruned_loss=0.07233, ctc_loss=0.117, over 16596.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2761, pruned_loss=0.0592, ctc_loss=0.1042, over 3298598.12 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:21:05,764 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2903516.0, ans=0.0 2023-10-09 23:21:06,222 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=15.0 2023-10-09 23:21:41,546 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 2.982e+02 3.878e+02 4.493e+02 7.805e+02, threshold=7.755e+02, percent-clipped=2.0 2023-10-09 23:21:50,278 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-10-09 23:21:58,570 INFO [train.py:1031] (1/4) Epoch 14, batch 37500, loss[loss=0.2375, simple_loss=0.3216, pruned_loss=0.05599, ctc_loss=0.1037, over 16234.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2794, pruned_loss=0.06065, ctc_loss=0.1066, over 3281065.04 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:22:06,162 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2903749.3333333335, ans=0.125 2023-10-09 23:22:10,599 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2903796.0, ans=0.125 2023-10-09 23:22:11,653 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:22:31,413 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2903842.6666666665, ans=0.2 2023-10-09 23:22:57,862 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=22.5 2023-10-09 23:22:58,675 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2903982.6666666665, ans=0.125 2023-10-09 23:22:59,330 INFO [train.py:1031] (1/4) Epoch 14, batch 37550, loss[loss=0.2059, simple_loss=0.2687, pruned_loss=0.05217, ctc_loss=0.09672, over 16818.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2783, pruned_loss=0.05847, ctc_loss=0.1032, over 3280880.87 frames. ], batch size: 310, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:23:10,878 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2904029.3333333335, ans=22.5 2023-10-09 23:23:15,357 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:23:30,790 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-10-09 23:23:41,140 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2904122.6666666665, ans=0.2 2023-10-09 23:23:46,184 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.897e+02 3.320e+02 4.036e+02 7.809e+02, threshold=6.640e+02, percent-clipped=1.0 2023-10-09 23:23:59,292 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2904216.0, ans=0.125 2023-10-09 23:24:00,672 INFO [train.py:1031] (1/4) Epoch 14, batch 37600, loss[loss=0.1892, simple_loss=0.2282, pruned_loss=0.0551, ctc_loss=0.1001, over 16082.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2734, pruned_loss=0.05835, ctc_loss=0.1029, over 3275869.77 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:24:03,032 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2904216.0, ans=0.125 2023-10-09 23:24:08,352 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2904216.0, ans=0.0 2023-10-09 23:24:09,688 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2023-10-09 23:24:23,395 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2904309.3333333335, ans=0.0 2023-10-09 23:24:26,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2904309.3333333335, ans=0.2 2023-10-09 23:24:27,550 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2904309.3333333335, ans=0.125 2023-10-09 23:24:34,191 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2904309.3333333335, ans=0.1 2023-10-09 23:24:40,311 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-10-09 23:24:55,779 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=12.0 2023-10-09 23:24:59,293 INFO [train.py:1031] (1/4) Epoch 14, batch 37650, loss[loss=0.2231, simple_loss=0.2708, pruned_loss=0.0654, ctc_loss=0.1114, over 16898.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2734, pruned_loss=0.06062, ctc_loss=0.1065, over 3266240.21 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:25:37,886 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=22.5 2023-10-09 23:25:48,255 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+02 3.454e+02 4.118e+02 4.727e+02 1.151e+03, threshold=8.236e+02, percent-clipped=7.0 2023-10-09 23:25:50,675 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2904636.0, ans=0.0 2023-10-09 23:25:58,489 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2904636.0, ans=0.125 2023-10-09 23:26:01,811 INFO [train.py:1031] (1/4) Epoch 14, batch 37700, loss[loss=0.178, simple_loss=0.2804, pruned_loss=0.02789, ctc_loss=0.0495, over 16270.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2748, pruned_loss=0.05998, ctc_loss=0.1053, over 3262481.69 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:26:13,829 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-10-09 23:26:43,779 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2904822.6666666665, ans=0.125 2023-10-09 23:26:45,530 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:27:01,455 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2904869.3333333335, ans=0.125 2023-10-09 23:27:05,162 INFO [train.py:1031] (1/4) Epoch 14, batch 37750, loss[loss=0.2379, simple_loss=0.3073, pruned_loss=0.06247, ctc_loss=0.1087, over 16852.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2739, pruned_loss=0.05601, ctc_loss=0.09897, over 3267350.69 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:27:25,259 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2904962.6666666665, ans=0.125 2023-10-09 23:27:37,024 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2905009.3333333335, ans=0.1 2023-10-09 23:27:40,820 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2905009.3333333335, ans=0.125 2023-10-09 23:27:52,762 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2905056.0, ans=0.125 2023-10-09 23:27:56,214 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.858e+02 3.601e+02 4.405e+02 1.102e+03, threshold=7.202e+02, percent-clipped=1.0 2023-10-09 23:28:03,033 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2905102.6666666665, ans=0.125 2023-10-09 23:28:07,549 INFO [train.py:1031] (1/4) Epoch 14, batch 37800, loss[loss=0.2494, simple_loss=0.3291, pruned_loss=0.06122, ctc_loss=0.1181, over 16686.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2811, pruned_loss=0.05787, ctc_loss=0.1027, over 3283697.39 frames. ], batch size: 271, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:28:10,715 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2905149.3333333335, ans=0.125 2023-10-09 23:28:12,896 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2905149.3333333335, ans=0.1 2023-10-09 23:28:13,378 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2023-10-09 23:28:20,050 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2905196.0, ans=0.125 2023-10-09 23:28:35,810 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2905242.6666666665, ans=0.0 2023-10-09 23:28:46,442 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2905289.3333333335, ans=0.1 2023-10-09 23:28:54,598 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2905289.3333333335, ans=0.2 2023-10-09 23:29:06,863 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2905336.0, ans=0.125 2023-10-09 23:29:08,609 INFO [train.py:1031] (1/4) Epoch 14, batch 37850, loss[loss=0.2161, simple_loss=0.3006, pruned_loss=0.04846, ctc_loss=0.08672, over 16895.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2861, pruned_loss=0.05693, ctc_loss=0.1015, over 3289941.58 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:29:25,753 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2905429.3333333335, ans=0.125 2023-10-09 23:29:53,112 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2905522.6666666665, ans=0.0 2023-10-09 23:30:00,964 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 3.190e+02 3.752e+02 4.348e+02 7.334e+02, threshold=7.503e+02, percent-clipped=1.0 2023-10-09 23:30:01,369 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2905569.3333333335, ans=0.2 2023-10-09 23:30:13,338 INFO [train.py:1031] (1/4) Epoch 14, batch 37900, loss[loss=0.2467, simple_loss=0.3034, pruned_loss=0.07177, ctc_loss=0.116, over 16656.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2896, pruned_loss=0.05974, ctc_loss=0.106, over 3300528.76 frames. ], batch size: 111, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:30:17,885 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2905616.0, ans=0.0 2023-10-09 23:30:21,210 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2905616.0, ans=0.0 2023-10-09 23:30:26,742 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2905662.6666666665, ans=0.125 2023-10-09 23:30:28,348 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2023-10-09 23:30:47,730 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2023-10-09 23:30:58,700 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2905756.0, ans=0.0 2023-10-09 23:31:13,597 INFO [train.py:1031] (1/4) Epoch 14, batch 37950, loss[loss=0.2151, simple_loss=0.275, pruned_loss=0.05854, ctc_loss=0.09522, over 16914.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2906, pruned_loss=0.0629, ctc_loss=0.1109, over 3303786.24 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:31:35,241 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=22.5 2023-10-09 23:31:42,695 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2905942.6666666665, ans=0.2 2023-10-09 23:31:43,692 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2905942.6666666665, ans=0.125 2023-10-09 23:32:05,288 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.279e+02 3.863e+02 4.623e+02 8.979e+02, threshold=7.726e+02, percent-clipped=3.0 2023-10-09 23:32:15,504 INFO [train.py:1031] (1/4) Epoch 14, batch 38000, loss[loss=0.2115, simple_loss=0.2529, pruned_loss=0.06343, ctc_loss=0.1081, over 16754.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.284, pruned_loss=0.06301, ctc_loss=0.1108, over 3304305.98 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:32:16,240 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2023-10-09 23:32:22,839 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2906082.6666666665, ans=0.2 2023-10-09 23:32:46,144 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=12.0 2023-10-09 23:32:48,875 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2906176.0, ans=0.125 2023-10-09 23:33:03,093 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=22.5 2023-10-09 23:33:16,626 INFO [train.py:1031] (1/4) Epoch 14, batch 38050, loss[loss=0.2266, simple_loss=0.2948, pruned_loss=0.05722, ctc_loss=0.1102, over 16870.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2826, pruned_loss=0.06444, ctc_loss=0.1133, over 3298727.69 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:33:54,733 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-10-09 23:33:59,686 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:34:07,703 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2906502.6666666665, ans=0.0 2023-10-09 23:34:08,099 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2023-10-09 23:34:10,170 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.275e+02 3.696e+02 4.420e+02 6.425e+02, threshold=7.391e+02, percent-clipped=0.0 2023-10-09 23:34:12,717 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2023-10-09 23:34:18,440 INFO [train.py:1031] (1/4) Epoch 14, batch 38100, loss[loss=0.3141, simple_loss=0.3372, pruned_loss=0.1078, ctc_loss=0.1888, over 16648.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2852, pruned_loss=0.06563, ctc_loss=0.115, over 3301336.23 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:34:35,168 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2906596.0, ans=0.125 2023-10-09 23:34:35,178 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2906596.0, ans=0.0 2023-10-09 23:34:36,785 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2906596.0, ans=0.5 2023-10-09 23:35:23,914 INFO [train.py:1031] (1/4) Epoch 14, batch 38150, loss[loss=0.2819, simple_loss=0.3542, pruned_loss=0.07693, ctc_loss=0.1393, over 16738.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.2989, pruned_loss=0.06978, ctc_loss=0.1235, over 3297666.48 frames. ], batch size: 271, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:35:35,435 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2906782.6666666665, ans=0.1 2023-10-09 23:35:38,805 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2906829.3333333335, ans=0.2 2023-10-09 23:35:40,537 INFO [scaling.py:979] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2023-10-09 23:35:49,851 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2906876.0, ans=0.0 2023-10-09 23:36:06,373 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2023-10-09 23:36:10,120 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2906922.6666666665, ans=0.125 2023-10-09 23:36:22,910 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.043e+02 3.850e+02 4.496e+02 5.551e+02 1.259e+03, threshold=8.992e+02, percent-clipped=8.0 2023-10-09 23:36:27,229 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2906969.3333333335, ans=0.125 2023-10-09 23:36:29,637 INFO [train.py:1031] (1/4) Epoch 14, batch 38200, loss[loss=0.2463, simple_loss=0.3078, pruned_loss=0.06968, ctc_loss=0.1134, over 16747.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.3038, pruned_loss=0.0716, ctc_loss=0.1269, over 3296382.99 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:36:36,173 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2907016.0, ans=0.125 2023-10-09 23:36:40,794 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2907016.0, ans=0.125 2023-10-09 23:37:07,680 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907156.0, ans=0.1 2023-10-09 23:37:21,643 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2907202.6666666665, ans=0.125 2023-10-09 23:37:33,239 INFO [train.py:1031] (1/4) Epoch 14, batch 38250, loss[loss=0.2133, simple_loss=0.2731, pruned_loss=0.05685, ctc_loss=0.09946, over 16769.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.3038, pruned_loss=0.0692, ctc_loss=0.123, over 3294948.93 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:37:58,973 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2907342.6666666665, ans=0.125 2023-10-09 23:38:29,122 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.325e+02 3.787e+02 4.498e+02 1.020e+03, threshold=7.574e+02, percent-clipped=1.0 2023-10-09 23:38:34,806 INFO [train.py:1031] (1/4) Epoch 14, batch 38300, loss[loss=0.2194, simple_loss=0.2685, pruned_loss=0.06344, ctc_loss=0.1086, over 16955.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.2989, pruned_loss=0.06839, ctc_loss=0.1211, over 3304846.02 frames. ], batch size: 243, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:38:40,546 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907482.6666666665, ans=0.1 2023-10-09 23:39:02,896 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2907576.0, ans=0.0 2023-10-09 23:39:09,986 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907576.0, ans=0.1 2023-10-09 23:39:14,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2907622.6666666665, ans=0.125 2023-10-09 23:39:19,756 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2907622.6666666665, ans=0.2 2023-10-09 23:39:19,849 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2907622.6666666665, ans=0.0 2023-10-09 23:39:25,330 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2907669.3333333335, ans=0.0 2023-10-09 23:39:36,990 INFO [train.py:1031] (1/4) Epoch 14, batch 38350, loss[loss=0.2425, simple_loss=0.3072, pruned_loss=0.06518, ctc_loss=0.1185, over 16641.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.3015, pruned_loss=0.06898, ctc_loss=0.1218, over 3305353.86 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:39:41,777 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2907716.0, ans=0.125 2023-10-09 23:39:48,921 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907762.6666666665, ans=0.1 2023-10-09 23:39:50,021 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2907762.6666666665, ans=0.125 2023-10-09 23:40:23,092 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:40:24,016 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2907856.0, ans=0.125 2023-10-09 23:40:24,035 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:40:29,975 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2907902.6666666665, ans=0.125 2023-10-09 23:40:35,688 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.571e+02 4.561e+02 5.514e+02 1.040e+03, threshold=9.121e+02, percent-clipped=3.0 2023-10-09 23:40:41,305 INFO [train.py:1031] (1/4) Epoch 14, batch 38400, loss[loss=0.2851, simple_loss=0.3109, pruned_loss=0.09504, ctc_loss=0.1728, over 15278.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.3057, pruned_loss=0.07069, ctc_loss=0.1248, over 3302598.19 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:40:45,664 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2907949.3333333335, ans=0.125 2023-10-09 23:40:52,495 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2907949.3333333335, ans=0.125 2023-10-09 23:41:08,133 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2908042.6666666665, ans=0.125 2023-10-09 23:41:25,779 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2908089.3333333335, ans=0.5 2023-10-09 23:41:32,724 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2023-10-09 23:41:44,680 INFO [train.py:1031] (1/4) Epoch 14, batch 38450, loss[loss=0.2625, simple_loss=0.3213, pruned_loss=0.07527, ctc_loss=0.1328, over 16630.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.3059, pruned_loss=0.0709, ctc_loss=0.125, over 3303335.48 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:41:53,385 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2908182.6666666665, ans=0.5 2023-10-09 23:41:54,380 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2908182.6666666665, ans=0.125 2023-10-09 23:42:20,015 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2908276.0, ans=10.0 2023-10-09 23:42:21,906 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-10-09 23:42:42,961 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.189e+02 3.714e+02 4.508e+02 1.225e+03, threshold=7.428e+02, percent-clipped=2.0 2023-10-09 23:42:47,070 INFO [train.py:1031] (1/4) Epoch 14, batch 38500, loss[loss=0.2121, simple_loss=0.2587, pruned_loss=0.06151, ctc_loss=0.1061, over 16574.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.3046, pruned_loss=0.06889, ctc_loss=0.1215, over 3298535.93 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:42:47,343 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2908416.0, ans=0.0 2023-10-09 23:42:52,708 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2908416.0, ans=0.125 2023-10-09 23:43:04,143 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2908462.6666666665, ans=0.125 2023-10-09 23:43:16,350 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2023-10-09 23:43:34,484 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2908556.0, ans=0.125 2023-10-09 23:43:35,584 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2908602.6666666665, ans=0.2 2023-10-09 23:43:42,274 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2908602.6666666665, ans=0.125 2023-10-09 23:43:47,375 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2908602.6666666665, ans=0.0 2023-10-09 23:43:48,621 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.75 vs. limit=10.0 2023-10-09 23:43:49,234 INFO [train.py:1031] (1/4) Epoch 14, batch 38550, loss[loss=0.2257, simple_loss=0.2822, pruned_loss=0.06395, ctc_loss=0.1032, over 16939.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.3028, pruned_loss=0.06967, ctc_loss=0.1224, over 3297596.37 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:43:57,451 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2908649.3333333335, ans=0.125 2023-10-09 23:44:04,891 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2908696.0, ans=0.125 2023-10-09 23:44:09,841 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2908696.0, ans=0.1 2023-10-09 23:44:10,764 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2908696.0, ans=0.2 2023-10-09 23:44:19,571 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2908742.6666666665, ans=0.05 2023-10-09 23:44:48,216 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+02 3.231e+02 3.757e+02 4.445e+02 8.253e+02, threshold=7.513e+02, percent-clipped=2.0 2023-10-09 23:44:49,816 INFO [train.py:1031] (1/4) Epoch 14, batch 38600, loss[loss=0.2839, simple_loss=0.3067, pruned_loss=0.09655, ctc_loss=0.1703, over 16880.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.2975, pruned_loss=0.0696, ctc_loss=0.1218, over 3309755.03 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:44:50,463 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.35 vs. limit=10.0 2023-10-09 23:45:24,648 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2908976.0, ans=0.1 2023-10-09 23:45:32,476 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-10-09 23:45:43,884 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2909069.3333333335, ans=0.125 2023-10-09 23:45:49,162 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2909069.3333333335, ans=0.125 2023-10-09 23:45:51,560 INFO [train.py:1031] (1/4) Epoch 14, batch 38650, loss[loss=0.2354, simple_loss=0.2831, pruned_loss=0.06885, ctc_loss=0.125, over 16900.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2922, pruned_loss=0.06859, ctc_loss=0.1198, over 3310890.13 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:46:12,725 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2909162.6666666665, ans=0.125 2023-10-09 23:46:16,973 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2909209.3333333335, ans=0.1 2023-10-09 23:46:34,132 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-10-09 23:46:35,258 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-10-09 23:46:35,933 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2909256.0, ans=0.0 2023-10-09 23:46:53,352 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2023-10-09 23:46:54,836 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.287e+02 3.697e+02 4.587e+02 9.347e+02, threshold=7.394e+02, percent-clipped=1.0 2023-10-09 23:46:54,864 INFO [train.py:1031] (1/4) Epoch 14, batch 38700, loss[loss=0.2268, simple_loss=0.2816, pruned_loss=0.06452, ctc_loss=0.1074, over 16813.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2908, pruned_loss=0.06956, ctc_loss=0.1219, over 3314308.52 frames. ], batch size: 141, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:47:07,632 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2909396.0, ans=0.2 2023-10-09 23:47:09,191 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2909396.0, ans=0.125 2023-10-09 23:47:20,922 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2909442.6666666665, ans=0.05 2023-10-09 23:47:34,124 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2909489.3333333335, ans=0.025 2023-10-09 23:47:38,286 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2909489.3333333335, ans=0.0 2023-10-09 23:47:39,472 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2909489.3333333335, ans=0.125 2023-10-09 23:47:41,785 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2909489.3333333335, ans=0.0 2023-10-09 23:47:48,860 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2909536.0, ans=0.125 2023-10-09 23:47:58,600 INFO [train.py:1031] (1/4) Epoch 14, batch 38750, loss[loss=0.2081, simple_loss=0.2799, pruned_loss=0.04953, ctc_loss=0.09307, over 16855.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.2926, pruned_loss=0.06953, ctc_loss=0.1222, over 3317071.62 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:48:17,298 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2023-10-09 23:48:18,586 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2909629.3333333335, ans=0.0 2023-10-09 23:48:19,829 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2909629.3333333335, ans=0.125 2023-10-09 23:48:25,636 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2909676.0, ans=0.0 2023-10-09 23:48:45,497 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2909722.6666666665, ans=0.125 2023-10-09 23:49:02,386 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 3.418e+02 4.112e+02 5.352e+02 1.046e+03, threshold=8.225e+02, percent-clipped=4.0 2023-10-09 23:49:02,414 INFO [train.py:1031] (1/4) Epoch 14, batch 38800, loss[loss=0.2723, simple_loss=0.3605, pruned_loss=0.06737, ctc_loss=0.1231, over 16478.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2951, pruned_loss=0.06625, ctc_loss=0.1169, over 3311982.89 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:49:03,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2909816.0, ans=0.125 2023-10-09 23:49:06,619 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2909816.0, ans=0.0 2023-10-09 23:49:22,678 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2909862.6666666665, ans=0.2 2023-10-09 23:49:34,872 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:49:38,878 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2909956.0, ans=0.125 2023-10-09 23:49:54,057 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2910002.6666666665, ans=0.125 2023-10-09 23:50:04,803 INFO [train.py:1031] (1/4) Epoch 14, batch 38850, loss[loss=0.2299, simple_loss=0.2897, pruned_loss=0.06161, ctc_loss=0.1173, over 16899.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2974, pruned_loss=0.06508, ctc_loss=0.1157, over 3307649.56 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:50:27,196 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2910096.0, ans=0.1 2023-10-09 23:50:28,228 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2910096.0, ans=0.07 2023-10-09 23:50:29,188 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910142.6666666665, ans=0.1 2023-10-09 23:50:35,045 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2910142.6666666665, ans=0.125 2023-10-09 23:50:48,046 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2023-10-09 23:50:58,699 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2910236.0, ans=0.07 2023-10-09 23:51:06,339 INFO [train.py:1031] (1/4) Epoch 14, batch 38900, loss[loss=0.282, simple_loss=0.3169, pruned_loss=0.09095, ctc_loss=0.1628, over 16849.00 frames. ], tot_loss[loss=0.237, simple_loss=0.2954, pruned_loss=0.06588, ctc_loss=0.117, over 3315808.12 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:51:06,716 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2910282.6666666665, ans=0.0 2023-10-09 23:51:07,979 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+02 3.456e+02 4.311e+02 5.586e+02 1.002e+03, threshold=8.621e+02, percent-clipped=2.0 2023-10-09 23:51:12,592 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910282.6666666665, ans=0.1 2023-10-09 23:51:34,536 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2910376.0, ans=0.2 2023-10-09 23:51:56,668 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2023-10-09 23:52:06,934 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2910469.3333333335, ans=0.2 2023-10-09 23:52:09,370 INFO [train.py:1031] (1/4) Epoch 14, batch 38950, loss[loss=0.2156, simple_loss=0.267, pruned_loss=0.06001, ctc_loss=0.1105, over 16451.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2918, pruned_loss=0.06698, ctc_loss=0.1186, over 3308306.79 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:52:20,594 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2910516.0, ans=0.125 2023-10-09 23:52:27,266 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2023-10-09 23:52:27,997 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2910562.6666666665, ans=0.125 2023-10-09 23:52:31,226 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2023-10-09 23:52:33,767 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2023-10-09 23:52:50,140 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-10-09 23:52:58,713 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2910656.0, ans=0.125 2023-10-09 23:53:14,606 INFO [train.py:1031] (1/4) Epoch 14, batch 39000, loss[loss=0.2422, simple_loss=0.2949, pruned_loss=0.07028, ctc_loss=0.1221, over 16937.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.2913, pruned_loss=0.06754, ctc_loss=0.1193, over 3301062.25 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:53:14,606 INFO [train.py:1054] (1/4) Computing validation loss 2023-10-09 23:53:33,417 INFO [train.py:1063] (1/4) Epoch 14, validation: loss=0.2363, simple_loss=0.3035, pruned_loss=0.06558, ctc_loss=0.09478, over 1796401.00 frames. 2023-10-09 23:53:33,417 INFO [train.py:1064] (1/4) Maximum memory allocated so far is 14570MB 2023-10-09 23:53:34,086 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=12.0 2023-10-09 23:53:35,527 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+02 3.267e+02 3.662e+02 4.475e+02 7.642e+02, threshold=7.323e+02, percent-clipped=0.0 2023-10-09 23:53:46,792 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2023-10-09 23:54:05,707 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910842.6666666665, ans=0.1 2023-10-09 23:54:09,712 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2910889.3333333335, ans=0.125 2023-10-09 23:54:17,216 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2910889.3333333335, ans=0.0 2023-10-09 23:54:27,073 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2910936.0, ans=0.04949747468305833 2023-10-09 23:54:27,100 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2910936.0, ans=0.1 2023-10-09 23:54:34,748 INFO [train.py:1031] (1/4) Epoch 14, batch 39050, loss[loss=0.2046, simple_loss=0.2409, pruned_loss=0.06274, ctc_loss=0.1071, over 16726.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2908, pruned_loss=0.06891, ctc_loss=0.1212, over 3301145.22 frames. ], batch size: 201, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:54:38,578 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2910982.6666666665, ans=15.0 2023-10-09 23:54:49,000 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2911029.3333333335, ans=0.1 2023-10-09 23:55:13,686 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2911122.6666666665, ans=0.0 2023-10-09 23:55:34,217 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:55:35,633 INFO [train.py:1031] (1/4) Epoch 14, batch 39100, loss[loss=0.2701, simple_loss=0.2714, pruned_loss=0.09912, ctc_loss=0.1764, over 16563.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2837, pruned_loss=0.06789, ctc_loss=0.1189, over 3301118.67 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:55:39,883 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.243e+02 3.619e+02 4.225e+02 8.592e+02, threshold=7.239e+02, percent-clipped=2.0 2023-10-09 23:55:41,918 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2911216.0, ans=0.125 2023-10-09 23:55:50,830 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2911262.6666666665, ans=0.125 2023-10-09 23:55:53,572 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2911262.6666666665, ans=0.125 2023-10-09 23:55:57,567 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2911262.6666666665, ans=0.125 2023-10-09 23:56:37,077 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:56:39,877 INFO [train.py:1031] (1/4) Epoch 14, batch 39150, loss[loss=0.2728, simple_loss=0.3603, pruned_loss=0.06694, ctc_loss=0.1284, over 16796.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2896, pruned_loss=0.0676, ctc_loss=0.1185, over 3298353.01 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:56:45,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2911449.3333333335, ans=0.125 2023-10-09 23:56:52,466 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2911496.0, ans=0.04949747468305833 2023-10-09 23:57:14,281 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2911542.6666666665, ans=0.125 2023-10-09 23:57:35,750 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-10-09 23:57:44,713 INFO [train.py:1031] (1/4) Epoch 14, batch 39200, loss[loss=0.1606, simple_loss=0.1981, pruned_loss=0.0474, ctc_loss=0.07093, over 16600.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2915, pruned_loss=0.06595, ctc_loss=0.1157, over 3297787.20 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:57:49,007 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+02 3.956e+02 5.075e+02 6.707e+02 1.311e+03, threshold=1.015e+03, percent-clipped=19.0 2023-10-09 23:57:51,951 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2911682.6666666665, ans=0.125 2023-10-09 23:58:38,741 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2911869.3333333335, ans=0.2 2023-10-09 23:58:47,249 INFO [train.py:1031] (1/4) Epoch 14, batch 39250, loss[loss=0.1624, simple_loss=0.2077, pruned_loss=0.04445, ctc_loss=0.0707, over 16589.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2918, pruned_loss=0.06523, ctc_loss=0.1132, over 3292347.82 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:58:49,249 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2911916.0, ans=0.1 2023-10-09 23:59:08,342 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2911962.6666666665, ans=0.0 2023-10-09 23:59:09,709 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=22.5 2023-10-09 23:59:13,266 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2911962.6666666665, ans=0.125 2023-10-09 23:59:44,346 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2912102.6666666665, ans=0.125 2023-10-09 23:59:53,290 INFO [train.py:1031] (1/4) Epoch 14, batch 39300, loss[loss=0.2047, simple_loss=0.2179, pruned_loss=0.07318, ctc_loss=0.1132, over 11061.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2904, pruned_loss=0.06323, ctc_loss=0.1095, over 3281373.29 frames. ], batch size: 39, lr: 2.52e-03, grad_scale: 4.0 2023-10-10 00:00:00,604 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+02 3.228e+02 3.774e+02 4.947e+02 8.395e+02, threshold=7.547e+02, percent-clipped=0.0 2023-10-10 00:00:08,905 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2912196.0, ans=0.0 2023-10-10 00:00:10,718 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2912196.0, ans=0.125 2023-10-10 00:00:13,577 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2912196.0, ans=0.125 2023-10-10 00:00:26,267 INFO [scaling.py:1069] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:00:45,472 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=12.0 2023-10-10 00:00:46,120 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2912336.0, ans=0.0 2023-10-10 00:00:57,813 INFO [train.py:1031] (1/4) Epoch 14, batch 39350, loss[loss=0.2094, simple_loss=0.2904, pruned_loss=0.04718, ctc_loss=0.08527, over 16760.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2885, pruned_loss=0.06049, ctc_loss=0.1054, over 3287900.86 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:01:01,146 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-10-10 00:01:01,797 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2912382.6666666665, ans=0.035 2023-10-10 00:01:17,869 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2912429.3333333335, ans=0.2 2023-10-10 00:01:21,864 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2912476.0, ans=0.125 2023-10-10 00:01:30,760 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2912476.0, ans=0.04949747468305833 2023-10-10 00:01:32,875 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2912476.0, ans=0.2 2023-10-10 00:01:33,984 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2912522.6666666665, ans=0.125 2023-10-10 00:01:48,617 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2912569.3333333335, ans=0.0 2023-10-10 00:01:51,698 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2912569.3333333335, ans=0.0 2023-10-10 00:01:59,508 INFO [train.py:1031] (1/4) Epoch 14, batch 39400, loss[loss=0.1928, simple_loss=0.253, pruned_loss=0.04919, ctc_loss=0.08533, over 16805.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2871, pruned_loss=0.05963, ctc_loss=0.1045, over 3286905.66 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:02:06,854 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.092e+02 4.162e+02 5.159e+02 1.181e+03, threshold=8.323e+02, percent-clipped=5.0 2023-10-10 00:02:09,039 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912616.0, ans=0.1 2023-10-10 00:02:17,168 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-10-10 00:02:44,002 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2912756.0, ans=0.125 2023-10-10 00:02:51,354 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2912802.6666666665, ans=0.125 2023-10-10 00:02:51,442 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2912802.6666666665, ans=0.125 2023-10-10 00:02:59,395 INFO [train.py:1031] (1/4) Epoch 14, batch 39450, loss[loss=0.2081, simple_loss=0.2583, pruned_loss=0.0594, ctc_loss=0.09794, over 16937.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2804, pruned_loss=0.05908, ctc_loss=0.1033, over 3280394.78 frames. ], batch size: 82, lr: 2.51e-03, grad_scale: 1.0 2023-10-10 00:03:03,480 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2912849.3333333335, ans=0.0 2023-10-10 00:03:16,171 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2912896.0, ans=0.2 2023-10-10 00:03:29,782 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912942.6666666665, ans=0.1 2023-10-10 00:03:39,828 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2912989.3333333335, ans=0.125 2023-10-10 00:03:42,576 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2912989.3333333335, ans=0.5 2023-10-10 00:03:43,658 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2912989.3333333335, ans=0.0 2023-10-10 00:03:54,889 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2913036.0, ans=0.07 2023-10-10 00:03:54,929 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2913036.0, ans=0.09899494936611666 2023-10-10 00:04:00,452 INFO [train.py:1031] (1/4) Epoch 14, batch 39500, loss[loss=0.1696, simple_loss=0.2339, pruned_loss=0.03819, ctc_loss=0.07225, over 16852.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.271, pruned_loss=0.05456, ctc_loss=0.0955, over 3288537.15 frames. ], batch size: 202, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:04:10,626 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.656e+02 3.168e+02 3.988e+02 1.383e+03, threshold=6.335e+02, percent-clipped=1.0 2023-10-10 00:04:11,093 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2913082.6666666665, ans=0.5 2023-10-10 00:04:21,200 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2023-10-10 00:04:31,831 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2913176.0, ans=0.125 2023-10-10 00:04:51,793 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2913269.3333333335, ans=0.1 2023-10-10 00:04:54,644 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2913269.3333333335, ans=0.2 2023-10-10 00:04:54,938 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.90 vs. limit=6.0 2023-10-10 00:04:57,766 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2913269.3333333335, ans=0.1 2023-10-10 00:05:01,690 INFO [train.py:1031] (1/4) Epoch 14, batch 39550, loss[loss=0.2225, simple_loss=0.2855, pruned_loss=0.05803, ctc_loss=0.1084, over 16952.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2713, pruned_loss=0.05631, ctc_loss=0.09855, over 3298821.50 frames. ], batch size: 309, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:05:15,687 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2913362.6666666665, ans=0.0 2023-10-10 00:05:16,073 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2023-10-10 00:05:18,436 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2913362.6666666665, ans=0.125 2023-10-10 00:05:21,116 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2913362.6666666665, ans=0.125 2023-10-10 00:05:24,812 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2913362.6666666665, ans=0.0 2023-10-10 00:05:42,834 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2913456.0, ans=0.0 2023-10-10 00:05:48,436 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2913456.0, ans=0.2 2023-10-10 00:05:55,653 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2023-10-10 00:06:03,880 INFO [train.py:1031] (1/4) Epoch 14, batch 39600, loss[loss=0.1879, simple_loss=0.252, pruned_loss=0.04697, ctc_loss=0.07473, over 16831.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2717, pruned_loss=0.05478, ctc_loss=0.09592, over 3305437.63 frames. ], batch size: 176, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:06:08,341 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-10-10 00:06:11,403 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2913549.3333333335, ans=0.125 2023-10-10 00:06:13,820 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.076e+02 3.392e+02 3.895e+02 1.156e+03, threshold=6.785e+02, percent-clipped=2.0 2023-10-10 00:06:35,051 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2913642.6666666665, ans=0.125 2023-10-10 00:06:44,938 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2023-10-10 00:06:56,195 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2913736.0, ans=0.0 2023-10-10 00:07:03,058 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-10-10 00:07:06,391 INFO [train.py:1031] (1/4) Epoch 14, batch 39650, loss[loss=0.2703, simple_loss=0.3357, pruned_loss=0.07692, ctc_loss=0.1276, over 16784.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2809, pruned_loss=0.05984, ctc_loss=0.1047, over 3310686.76 frames. ], batch size: 102, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:07:21,944 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2913829.3333333335, ans=0.125 2023-10-10 00:07:38,795 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2913876.0, ans=0.07 2023-10-10 00:07:48,094 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2913922.6666666665, ans=0.07 2023-10-10 00:08:06,320 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2913969.3333333335, ans=0.2 2023-10-10 00:08:09,799 INFO [train.py:1031] (1/4) Epoch 14, batch 39700, loss[loss=0.2639, simple_loss=0.3178, pruned_loss=0.07622, ctc_loss=0.1438, over 16834.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.287, pruned_loss=0.06435, ctc_loss=0.1128, over 3302070.13 frames. ], batch size: 309, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:08:21,352 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+02 3.745e+02 4.250e+02 5.439e+02 1.201e+03, threshold=8.500e+02, percent-clipped=8.0 2023-10-10 00:08:28,295 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2914062.6666666665, ans=0.125 2023-10-10 00:08:28,634 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-10-10 00:08:35,922 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914109.3333333335, ans=0.1 2023-10-10 00:08:37,020 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2914109.3333333335, ans=0.125 2023-10-10 00:08:38,068 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2914109.3333333335, ans=0.125 2023-10-10 00:08:46,117 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=22.5 2023-10-10 00:08:48,081 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2914156.0, ans=0.04949747468305833 2023-10-10 00:08:53,937 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2914156.0, ans=0.2 2023-10-10 00:09:09,831 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-10-10 00:09:11,180 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914202.6666666665, ans=0.1 2023-10-10 00:09:13,569 INFO [train.py:1031] (1/4) Epoch 14, batch 39750, loss[loss=0.2048, simple_loss=0.2583, pruned_loss=0.05619, ctc_loss=0.09705, over 16857.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2869, pruned_loss=0.06593, ctc_loss=0.1154, over 3310304.24 frames. ], batch size: 258, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:09:16,107 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2023-10-10 00:09:20,228 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2914249.3333333335, ans=0.125 2023-10-10 00:09:20,275 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2914249.3333333335, ans=0.125 2023-10-10 00:09:22,324 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2914249.3333333335, ans=0.05 2023-10-10 00:09:31,302 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2914296.0, ans=0.125 2023-10-10 00:09:43,504 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2914342.6666666665, ans=0.125 2023-10-10 00:09:53,566 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-10-10 00:10:13,894 INFO [train.py:1031] (1/4) Epoch 14, batch 39800, loss[loss=0.2161, simple_loss=0.2591, pruned_loss=0.0647, ctc_loss=0.109, over 16596.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2789, pruned_loss=0.06441, ctc_loss=0.1123, over 3307176.97 frames. ], batch size: 110, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:10:26,701 INFO [optim.py:471] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+02 3.167e+02 3.567e+02 4.087e+02 1.118e+03, threshold=7.135e+02, percent-clipped=1.0 2023-10-10 00:10:36,954 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2023-10-10 00:10:43,071 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-10-10 00:10:44,029 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2914576.0, ans=6.0 2023-10-10 00:10:45,810 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2914576.0, ans=0.0 2023-10-10 00:10:51,495 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2914622.6666666665, ans=0.125 2023-10-10 00:10:59,127 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2914622.6666666665, ans=0.0 2023-10-10 00:11:02,799 INFO [scaling.py:979] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.21 vs. limit=12.0 2023-10-10 00:11:07,275 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2914669.3333333335, ans=0.0 2023-10-10 00:11:13,385 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2914669.3333333335, ans=0.1 2023-10-10 00:11:15,153 INFO [train.py:1031] (1/4) Epoch 14, batch 39850, loss[loss=0.2099, simple_loss=0.249, pruned_loss=0.06279, ctc_loss=0.1133, over 16832.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2728, pruned_loss=0.06359, ctc_loss=0.1109, over 3300937.52 frames. ], batch size: 330, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:11:20,789 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2914716.0, ans=0.0 2023-10-10 00:11:46,269 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2914809.3333333335, ans=0.125 2023-10-10 00:11:54,781 INFO [scaling.py:199] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2914856.0, ans=0.125