2023-10-09 10:20:55,467 INFO [train.py:1099] (0/4) Training started 2023-10-09 10:20:55,486 INFO [train.py:1109] (0/4) Device: cuda:0 2023-10-09 10:20:55,489 INFO [train.py:1121] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '821ebc378e7fb99b8adc81950227963332821e01', 'k2-git-date': 'Wed Jul 19 15:38:25 2023', 'lhotse-version': '1.16.0.dev+git.1db4d97a.clean', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev_multi_zh-hans', 'icefall-git-sha1': '919793d-dirty', 'icefall-git-date': 'Thu Sep 7 21:06:37 2023', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.3.dev20230721+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.16.0.dev0+git.1db4d97a.clean-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-1-1220091118-57c4d55446-mvd6x', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 14, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-w-ctc'), 'bpe_model': 'data/lang_bpe_2000/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 700, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'vocab_size': 2000} 2023-10-09 10:20:55,489 INFO [train.py:1123] (0/4) About to create model 2023-10-09 10:20:56,085 INFO [train.py:1127] (0/4) Number of model parameters: 69651511 2023-10-09 10:20:56,684 INFO [checkpoint.py:112] (0/4) Loading checkpoint from zipformer/exp-w-ctc/epoch-13.pt 2023-10-09 10:20:58,247 INFO [checkpoint.py:131] (0/4) Loading averaged model 2023-10-09 10:21:02,632 INFO [train.py:1142] (0/4) Using DDP 2023-10-09 10:21:05,303 INFO [train.py:1154] (0/4) Loading optimizer state dict 2023-10-09 10:21:05,863 INFO [train.py:1162] (0/4) Loading scheduler state dict 2023-10-09 10:21:05,864 INFO [multi_dataset.py:52] (0/4) About to get multidataset train cuts 2023-10-09 10:21:05,864 INFO [multi_dataset.py:55] (0/4) Loading THCHS-30 in lazy mode 2023-10-09 10:21:05,921 INFO [multi_dataset.py:61] (0/4) Loading Aishell-1 in lazy mode 2023-10-09 10:21:05,924 INFO [multi_dataset.py:67] (0/4) Loading Aishell-2 in lazy mode 2023-10-09 10:21:05,926 INFO [multi_dataset.py:73] (0/4) Loading Aishell-4 in lazy mode 2023-10-09 10:21:05,931 INFO [multi_dataset.py:85] (0/4) Loading ST-CMDS in lazy mode 2023-10-09 10:21:05,932 INFO [multi_dataset.py:89] (0/4) Loading Primewords in lazy mode 2023-10-09 10:21:05,934 INFO [multi_dataset.py:95] (0/4) Loading MagicData in lazy mode 2023-10-09 10:21:05,935 INFO [multi_dataset.py:101] (0/4) Loading Aidatatang_200zh in lazy mode 2023-10-09 10:21:05,936 INFO [multi_dataset.py:107] (0/4) Loading Ali-Meeting in lazy mode 2023-10-09 10:21:05,938 INFO [multi_dataset.py:113] (0/4) Loading WeNetSpeech in lazy mode 2023-10-09 10:21:05,943 INFO [multi_dataset.py:119] (0/4) Loading KeSpeech in lazy mode 2023-10-09 10:22:53,903 INFO [asr_datamodule.py:218] (0/4) Enable MUSAN 2023-10-09 10:22:53,903 INFO [asr_datamodule.py:219] (0/4) About to get Musan cuts 2023-10-09 10:22:56,084 INFO [asr_datamodule.py:243] (0/4) Enable SpecAugment 2023-10-09 10:22:56,084 INFO [asr_datamodule.py:244] (0/4) Time warp factor: 80 2023-10-09 10:22:56,084 INFO [asr_datamodule.py:254] (0/4) Num frame mask: 10 2023-10-09 10:22:56,084 INFO [asr_datamodule.py:267] (0/4) About to create train dataset 2023-10-09 10:22:56,085 INFO [asr_datamodule.py:294] (0/4) Using DynamicBucketingSampler. 2023-10-09 10:22:59,940 INFO [asr_datamodule.py:309] (0/4) About to create train dataloader 2023-10-09 10:22:59,941 INFO [multi_dataset.py:161] (0/4) About to get multidataset dev cuts 2023-10-09 10:22:59,941 INFO [multi_dataset.py:164] (0/4) Loading Aidatatang_200zh DEV set in lazy mode 2023-10-09 10:22:59,943 INFO [multi_dataset.py:170] (0/4) Loading Aishell DEV set in lazy mode 2023-10-09 10:22:59,944 INFO [multi_dataset.py:176] (0/4) Loading Aishell-2 DEV set in lazy mode 2023-10-09 10:22:59,945 INFO [multi_dataset.py:182] (0/4) Loading Ali-Meeting DEV set in lazy mode 2023-10-09 10:22:59,946 INFO [multi_dataset.py:188] (0/4) Loading MagicData DEV set in lazy mode 2023-10-09 10:22:59,948 INFO [multi_dataset.py:194] (0/4) Loading KeSpeech DEV set in lazy mode 2023-10-09 10:22:59,950 INFO [multi_dataset.py:203] (0/4) Loading WeNetSpeech DEV set in lazy mode 2023-10-09 10:22:59,951 INFO [asr_datamodule.py:340] (0/4) About to create dev dataset 2023-10-09 10:23:00,444 INFO [asr_datamodule.py:357] (0/4) About to create dev dataloader 2023-10-09 10:23:00,444 INFO [train.py:1243] (0/4) Loading grad scaler state dict 2023-10-09 10:23:19,455 INFO [train.py:1031] (0/4) Epoch 14, batch 0, loss[loss=0.2173, simple_loss=0.2703, pruned_loss=0.06081, ctc_loss=0.1066, over 16740.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2703, pruned_loss=0.06081, ctc_loss=0.1066, over 16740.00 frames. ], batch size: 272, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:23:19,456 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 10:23:33,189 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2325, simple_loss=0.3081, pruned_loss=0.06029, ctc_loss=0.09091, over 1796401.00 frames. 2023-10-09 10:23:33,190 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 12828MB 2023-10-09 10:23:46,071 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2728796.0, ans=0.125 2023-10-09 10:23:48,865 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.348e+02 4.018e+02 4.917e+02 9.056e+02, threshold=8.035e+02, percent-clipped=7.0 2023-10-09 10:24:08,235 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-10-09 10:24:19,568 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-10-09 10:24:20,675 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2023-10-09 10:24:31,241 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2728936.0, ans=0.125 2023-10-09 10:24:33,525 INFO [train.py:1031] (0/4) Epoch 14, batch 50, loss[loss=0.2143, simple_loss=0.2857, pruned_loss=0.05329, ctc_loss=0.09055, over 16830.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2827, pruned_loss=0.06307, ctc_loss=0.109, over 749482.30 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:24:47,299 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2729029.3333333335, ans=0.125 2023-10-09 10:24:49,374 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2729029.3333333335, ans=0.125 2023-10-09 10:25:09,099 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2729122.6666666665, ans=0.125 2023-10-09 10:25:16,549 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-10-09 10:25:18,673 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2023-10-09 10:25:26,446 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2729169.3333333335, ans=10.0 2023-10-09 10:25:31,421 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2729169.3333333335, ans=0.0 2023-10-09 10:25:34,395 INFO [train.py:1031] (0/4) Epoch 14, batch 100, loss[loss=0.2592, simple_loss=0.3368, pruned_loss=0.06795, ctc_loss=0.1142, over 16830.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2987, pruned_loss=0.06693, ctc_loss=0.1164, over 1310263.25 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:25:49,380 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.274e+02 3.784e+02 4.390e+02 8.009e+02, threshold=7.568e+02, percent-clipped=0.0 2023-10-09 10:25:50,833 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2729262.6666666665, ans=0.125 2023-10-09 10:25:54,203 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-10-09 10:26:15,478 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2729356.0, ans=0.2 2023-10-09 10:26:28,475 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2729402.6666666665, ans=0.0 2023-10-09 10:26:28,897 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=22.5 2023-10-09 10:26:34,425 INFO [train.py:1031] (0/4) Epoch 14, batch 150, loss[loss=0.2285, simple_loss=0.2992, pruned_loss=0.05819, ctc_loss=0.1033, over 16878.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.3097, pruned_loss=0.06589, ctc_loss=0.116, over 1753395.28 frames. ], batch size: 176, lr: 2.60e-03, grad_scale: 1.0 2023-10-09 10:26:41,233 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2729449.3333333335, ans=0.1 2023-10-09 10:27:02,905 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2729542.6666666665, ans=0.125 2023-10-09 10:27:03,982 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729542.6666666665, ans=0.1 2023-10-09 10:27:30,078 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2729636.0, ans=0.0 2023-10-09 10:27:31,200 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2729636.0, ans=0.125 2023-10-09 10:27:35,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2729682.6666666665, ans=0.125 2023-10-09 10:27:36,037 INFO [train.py:1031] (0/4) Epoch 14, batch 200, loss[loss=0.3191, simple_loss=0.3817, pruned_loss=0.09387, ctc_loss=0.172, over 16676.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.314, pruned_loss=0.0679, ctc_loss=0.1199, over 2099613.98 frames. ], batch size: 384, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:27:54,277 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.044e+02 3.577e+02 4.251e+02 7.739e+02, threshold=7.154e+02, percent-clipped=1.0 2023-10-09 10:28:35,930 INFO [train.py:1031] (0/4) Epoch 14, batch 250, loss[loss=0.2272, simple_loss=0.3259, pruned_loss=0.0469, ctc_loss=0.08674, over 16266.00 frames. ], tot_loss[loss=0.2488, simple_loss=0.315, pruned_loss=0.06735, ctc_loss=0.1196, over 2361232.24 frames. ], batch size: 463, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:28:38,486 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-10-09 10:28:56,075 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2729962.6666666665, ans=0.0 2023-10-09 10:29:25,352 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=2730102.6666666665, ans=0.2 2023-10-09 10:29:28,438 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2730102.6666666665, ans=0.125 2023-10-09 10:29:37,214 INFO [train.py:1031] (0/4) Epoch 14, batch 300, loss[loss=0.2427, simple_loss=0.3074, pruned_loss=0.06511, ctc_loss=0.1195, over 16767.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.3085, pruned_loss=0.06491, ctc_loss=0.1153, over 2570181.86 frames. ], batch size: 272, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:29:47,434 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2730149.3333333335, ans=0.125 2023-10-09 10:29:47,720 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2023-10-09 10:29:53,967 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-10-09 10:29:54,696 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2730196.0, ans=0.0 2023-10-09 10:29:56,421 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+02 3.126e+02 3.650e+02 4.282e+02 7.513e+02, threshold=7.299e+02, percent-clipped=1.0 2023-10-09 10:29:56,835 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2730196.0, ans=0.125 2023-10-09 10:30:01,492 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=22.5 2023-10-09 10:30:19,812 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2730289.3333333335, ans=0.125 2023-10-09 10:30:38,035 INFO [train.py:1031] (0/4) Epoch 14, batch 350, loss[loss=0.2969, simple_loss=0.3187, pruned_loss=0.1017, ctc_loss=0.1792, over 16761.00 frames. ], tot_loss[loss=0.2472, simple_loss=0.3089, pruned_loss=0.06854, ctc_loss=0.1211, over 2742479.91 frames. ], batch size: 384, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:30:46,353 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2730382.6666666665, ans=0.125 2023-10-09 10:31:20,456 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2730522.6666666665, ans=0.1 2023-10-09 10:31:38,087 INFO [train.py:1031] (0/4) Epoch 14, batch 400, loss[loss=0.2472, simple_loss=0.2878, pruned_loss=0.07692, ctc_loss=0.132, over 16994.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.3021, pruned_loss=0.06896, ctc_loss=0.1215, over 2875001.49 frames. ], batch size: 202, lr: 2.60e-03, grad_scale: 8.0 2023-10-09 10:31:41,469 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2023-10-09 10:31:43,892 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2730616.0, ans=0.09899494936611666 2023-10-09 10:31:43,894 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:31:57,468 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.680e+02 3.285e+02 3.968e+02 4.685e+02 8.332e+02, threshold=7.936e+02, percent-clipped=1.0 2023-10-09 10:32:02,768 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2730709.3333333335, ans=0.0 2023-10-09 10:32:10,822 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2730709.3333333335, ans=0.125 2023-10-09 10:32:15,636 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2730756.0, ans=0.025 2023-10-09 10:32:25,317 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2730756.0, ans=0.0 2023-10-09 10:32:25,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2730756.0, ans=0.0 2023-10-09 10:32:27,031 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2730802.6666666665, ans=0.125 2023-10-09 10:32:39,447 INFO [train.py:1031] (0/4) Epoch 14, batch 450, loss[loss=0.2309, simple_loss=0.3347, pruned_loss=0.04527, ctc_loss=0.0915, over 15224.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.3002, pruned_loss=0.06849, ctc_loss=0.1205, over 2963786.54 frames. ], batch size: 526, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:32:57,244 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2730896.0, ans=0.125 2023-10-09 10:33:01,792 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2730896.0, ans=0.1 2023-10-09 10:33:17,740 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2730989.3333333335, ans=0.0 2023-10-09 10:33:19,963 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=15.0 2023-10-09 10:33:31,769 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-10-09 10:33:40,962 INFO [train.py:1031] (0/4) Epoch 14, batch 500, loss[loss=0.2115, simple_loss=0.2603, pruned_loss=0.06014, ctc_loss=0.1062, over 16843.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2941, pruned_loss=0.06611, ctc_loss=0.1164, over 3034630.55 frames. ], batch size: 229, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:33:47,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2731082.6666666665, ans=0.0 2023-10-09 10:33:53,917 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2731129.3333333335, ans=0.125 2023-10-09 10:34:00,456 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.135e+02 3.674e+02 4.514e+02 8.848e+02, threshold=7.348e+02, percent-clipped=4.0 2023-10-09 10:34:21,316 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-10-09 10:34:38,235 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2731269.3333333335, ans=0.125 2023-10-09 10:34:40,390 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2731316.0, ans=0.1 2023-10-09 10:34:41,176 INFO [train.py:1031] (0/4) Epoch 14, batch 550, loss[loss=0.1843, simple_loss=0.2396, pruned_loss=0.04858, ctc_loss=0.07931, over 16907.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2851, pruned_loss=0.06441, ctc_loss=0.1128, over 3101542.46 frames. ], batch size: 90, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:34:55,575 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2731362.6666666665, ans=0.125 2023-10-09 10:35:14,728 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-10-09 10:35:16,730 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-10-09 10:35:17,010 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2023-10-09 10:35:21,975 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2731456.0, ans=0.125 2023-10-09 10:35:22,029 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2731456.0, ans=0.05 2023-10-09 10:35:42,184 INFO [train.py:1031] (0/4) Epoch 14, batch 600, loss[loss=0.2111, simple_loss=0.261, pruned_loss=0.06057, ctc_loss=0.1002, over 17009.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2797, pruned_loss=0.06388, ctc_loss=0.112, over 3152571.37 frames. ], batch size: 96, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:35:50,751 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-10-09 10:35:57,233 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-10-09 10:36:02,848 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 3.049e+02 3.429e+02 4.091e+02 7.448e+02, threshold=6.859e+02, percent-clipped=1.0 2023-10-09 10:36:08,006 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2731642.6666666665, ans=0.0 2023-10-09 10:36:16,233 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2731642.6666666665, ans=0.0 2023-10-09 10:36:19,524 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2731689.3333333335, ans=0.125 2023-10-09 10:36:26,665 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2731689.3333333335, ans=0.2 2023-10-09 10:36:35,916 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2731736.0, ans=0.125 2023-10-09 10:36:43,667 INFO [train.py:1031] (0/4) Epoch 14, batch 650, loss[loss=0.2292, simple_loss=0.2662, pruned_loss=0.07321, ctc_loss=0.1142, over 16544.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2736, pruned_loss=0.06272, ctc_loss=0.11, over 3182343.07 frames. ], batch size: 110, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:36:43,994 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2731782.6666666665, ans=0.1 2023-10-09 10:36:44,238 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-10-09 10:37:36,448 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2731969.3333333335, ans=0.125 2023-10-09 10:37:43,523 INFO [train.py:1031] (0/4) Epoch 14, batch 700, loss[loss=0.188, simple_loss=0.268, pruned_loss=0.03938, ctc_loss=0.07319, over 16890.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2725, pruned_loss=0.05998, ctc_loss=0.1059, over 3216968.76 frames. ], batch size: 258, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:37:52,347 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-10-09 10:37:53,012 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2732016.0, ans=0.95 2023-10-09 10:38:01,192 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2732062.6666666665, ans=0.2 2023-10-09 10:38:03,494 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2732062.6666666665, ans=0.1 2023-10-09 10:38:06,355 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.869e+02 3.199e+02 3.835e+02 8.884e+02, threshold=6.398e+02, percent-clipped=1.0 2023-10-09 10:38:07,751 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2732109.3333333335, ans=0.125 2023-10-09 10:38:07,802 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2732109.3333333335, ans=0.0 2023-10-09 10:38:09,793 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=22.5 2023-10-09 10:38:15,036 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2732109.3333333335, ans=0.125 2023-10-09 10:38:19,340 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2732156.0, ans=0.0 2023-10-09 10:38:32,260 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2732202.6666666665, ans=0.125 2023-10-09 10:38:36,063 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:38:40,889 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2732202.6666666665, ans=0.125 2023-10-09 10:38:44,810 INFO [train.py:1031] (0/4) Epoch 14, batch 750, loss[loss=0.2305, simple_loss=0.2962, pruned_loss=0.06092, ctc_loss=0.1076, over 16737.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2823, pruned_loss=0.05867, ctc_loss=0.1052, over 3238893.13 frames. ], batch size: 130, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:38:49,234 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-10-09 10:38:56,277 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2732249.3333333335, ans=0.125 2023-10-09 10:39:25,636 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2732389.3333333335, ans=0.0 2023-10-09 10:39:30,902 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2732389.3333333335, ans=0.125 2023-10-09 10:39:34,365 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2732389.3333333335, ans=0.125 2023-10-09 10:39:48,672 INFO [train.py:1031] (0/4) Epoch 14, batch 800, loss[loss=0.2419, simple_loss=0.3072, pruned_loss=0.06477, ctc_loss=0.1177, over 16809.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2959, pruned_loss=0.06109, ctc_loss=0.1097, over 3258424.19 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:40:08,879 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2732529.3333333335, ans=0.125 2023-10-09 10:40:12,733 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.375e+02 4.289e+02 5.326e+02 8.856e+02, threshold=8.578e+02, percent-clipped=11.0 2023-10-09 10:40:50,026 INFO [train.py:1031] (0/4) Epoch 14, batch 850, loss[loss=0.1888, simple_loss=0.2268, pruned_loss=0.05543, ctc_loss=0.0998, over 16920.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2989, pruned_loss=0.06087, ctc_loss=0.1093, over 3246205.06 frames. ], batch size: 78, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:40:52,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2732716.0, ans=0.125 2023-10-09 10:41:11,994 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2732762.6666666665, ans=0.0 2023-10-09 10:41:46,577 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2732902.6666666665, ans=0.125 2023-10-09 10:41:46,937 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2023-10-09 10:41:49,388 INFO [train.py:1031] (0/4) Epoch 14, batch 900, loss[loss=0.2215, simple_loss=0.2748, pruned_loss=0.06405, ctc_loss=0.1004, over 16762.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2975, pruned_loss=0.06104, ctc_loss=0.1092, over 3254032.90 frames. ], batch size: 111, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:42:04,613 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2732996.0, ans=0.0 2023-10-09 10:42:17,044 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.289e+02 4.047e+02 4.926e+02 9.646e+02, threshold=8.093e+02, percent-clipped=3.0 2023-10-09 10:42:20,805 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2733042.6666666665, ans=0.0 2023-10-09 10:42:27,013 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2733089.3333333335, ans=0.125 2023-10-09 10:42:51,355 INFO [train.py:1031] (0/4) Epoch 14, batch 950, loss[loss=0.2779, simple_loss=0.339, pruned_loss=0.08099, ctc_loss=0.1373, over 16944.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2953, pruned_loss=0.06271, ctc_loss=0.1115, over 3273209.47 frames. ], batch size: 258, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:43:07,091 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=22.5 2023-10-09 10:43:20,039 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2733276.0, ans=0.125 2023-10-09 10:43:21,131 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2733276.0, ans=0.125 2023-10-09 10:43:29,888 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-10-09 10:43:35,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2733322.6666666665, ans=0.125 2023-10-09 10:43:43,941 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2733369.3333333335, ans=0.1 2023-10-09 10:43:47,737 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2733369.3333333335, ans=0.125 2023-10-09 10:43:51,564 INFO [train.py:1031] (0/4) Epoch 14, batch 1000, loss[loss=0.2313, simple_loss=0.2835, pruned_loss=0.06716, ctc_loss=0.1119, over 16673.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.3018, pruned_loss=0.06595, ctc_loss=0.1168, over 3273707.11 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:43:56,581 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733416.0, ans=0.1 2023-10-09 10:44:03,082 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2733462.6666666665, ans=0.125 2023-10-09 10:44:05,132 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733462.6666666665, ans=0.1 2023-10-09 10:44:08,378 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2733462.6666666665, ans=0.0 2023-10-09 10:44:18,369 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+02 3.383e+02 4.196e+02 5.191e+02 1.287e+03, threshold=8.392e+02, percent-clipped=5.0 2023-10-09 10:44:30,747 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2733556.0, ans=0.125 2023-10-09 10:44:39,131 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:44:40,043 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2733602.6666666665, ans=0.125 2023-10-09 10:44:42,089 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=22.5 2023-10-09 10:44:43,779 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=15.0 2023-10-09 10:44:48,979 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2733602.6666666665, ans=0.125 2023-10-09 10:44:52,742 INFO [train.py:1031] (0/4) Epoch 14, batch 1050, loss[loss=0.2824, simple_loss=0.3077, pruned_loss=0.09474, ctc_loss=0.1694, over 16707.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2985, pruned_loss=0.06575, ctc_loss=0.1163, over 3281901.49 frames. ], batch size: 384, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:45:05,112 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2733696.0, ans=0.125 2023-10-09 10:45:19,560 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2733742.6666666665, ans=0.125 2023-10-09 10:45:20,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2733742.6666666665, ans=0.0 2023-10-09 10:45:21,772 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2733742.6666666665, ans=0.125 2023-10-09 10:45:52,761 INFO [train.py:1031] (0/4) Epoch 14, batch 1100, loss[loss=0.2323, simple_loss=0.2851, pruned_loss=0.06605, ctc_loss=0.1184, over 16926.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2935, pruned_loss=0.06615, ctc_loss=0.1165, over 3294768.15 frames. ], batch size: 292, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:45:56,159 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2733882.6666666665, ans=0.125 2023-10-09 10:46:21,063 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.252e+02 3.598e+02 4.155e+02 7.430e+02, threshold=7.195e+02, percent-clipped=0.0 2023-10-09 10:46:52,568 INFO [train.py:1031] (0/4) Epoch 14, batch 1150, loss[loss=0.2427, simple_loss=0.2713, pruned_loss=0.07872, ctc_loss=0.1415, over 16560.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2877, pruned_loss=0.06526, ctc_loss=0.1147, over 3291773.70 frames. ], batch size: 353, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:46:57,931 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.78 vs. limit=22.5 2023-10-09 10:47:22,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2734209.3333333335, ans=0.125 2023-10-09 10:47:27,197 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2734256.0, ans=0.09899494936611666 2023-10-09 10:47:44,744 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2734302.6666666665, ans=0.0 2023-10-09 10:47:47,322 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2734302.6666666665, ans=0.125 2023-10-09 10:47:51,158 INFO [train.py:1031] (0/4) Epoch 14, batch 1200, loss[loss=0.1948, simple_loss=0.2431, pruned_loss=0.05335, ctc_loss=0.09957, over 16667.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2801, pruned_loss=0.0643, ctc_loss=0.1132, over 3293382.49 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:47:55,019 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2023-10-09 10:48:00,349 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2734349.3333333335, ans=0.125 2023-10-09 10:48:00,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2734349.3333333335, ans=0.5 2023-10-09 10:48:08,466 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2734396.0, ans=0.0 2023-10-09 10:48:20,270 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 2.978e+02 3.440e+02 3.913e+02 6.490e+02, threshold=6.880e+02, percent-clipped=0.0 2023-10-09 10:48:51,817 INFO [train.py:1031] (0/4) Epoch 14, batch 1250, loss[loss=0.2897, simple_loss=0.3126, pruned_loss=0.09862, ctc_loss=0.1738, over 16747.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2784, pruned_loss=0.06511, ctc_loss=0.1146, over 3302945.22 frames. ], batch size: 353, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:48:57,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2734582.6666666665, ans=0.1 2023-10-09 10:49:23,997 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2734676.0, ans=0.2 2023-10-09 10:49:52,488 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2734816.0, ans=0.2 2023-10-09 10:49:53,653 INFO [train.py:1031] (0/4) Epoch 14, batch 1300, loss[loss=0.223, simple_loss=0.2895, pruned_loss=0.05796, ctc_loss=0.1016, over 17000.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2784, pruned_loss=0.06596, ctc_loss=0.1158, over 3299238.24 frames. ], batch size: 86, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:49:57,441 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2023-10-09 10:50:25,186 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+02 3.518e+02 3.904e+02 4.606e+02 8.060e+02, threshold=7.809e+02, percent-clipped=2.0 2023-10-09 10:50:29,513 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=22.5 2023-10-09 10:50:37,163 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2734956.0, ans=0.1 2023-10-09 10:50:39,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2734956.0, ans=0.125 2023-10-09 10:50:43,323 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.44 vs. limit=6.0 2023-10-09 10:50:49,453 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2735002.6666666665, ans=0.125 2023-10-09 10:50:54,236 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2735049.3333333335, ans=0.125 2023-10-09 10:50:54,947 INFO [train.py:1031] (0/4) Epoch 14, batch 1350, loss[loss=0.2349, simple_loss=0.2724, pruned_loss=0.07315, ctc_loss=0.1278, over 16659.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2758, pruned_loss=0.06604, ctc_loss=0.1162, over 3297859.42 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:51:13,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2735096.0, ans=0.125 2023-10-09 10:51:30,836 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-10-09 10:51:44,760 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2735236.0, ans=0.0 2023-10-09 10:51:55,208 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2735282.6666666665, ans=0.125 2023-10-09 10:51:55,301 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2735282.6666666665, ans=0.125 2023-10-09 10:51:55,993 INFO [train.py:1031] (0/4) Epoch 14, batch 1400, loss[loss=0.2118, simple_loss=0.2647, pruned_loss=0.05933, ctc_loss=0.1004, over 16835.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2724, pruned_loss=0.06595, ctc_loss=0.1155, over 3295649.11 frames. ], batch size: 259, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:52:17,023 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-10-09 10:52:18,860 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:52:24,706 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2735376.0, ans=0.125 2023-10-09 10:52:26,600 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2023-10-09 10:52:27,996 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.259e+02 3.795e+02 4.545e+02 1.175e+03, threshold=7.590e+02, percent-clipped=1.0 2023-10-09 10:52:28,392 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2735376.0, ans=0.125 2023-10-09 10:52:53,451 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2735469.3333333335, ans=0.125 2023-10-09 10:52:55,876 INFO [train.py:1031] (0/4) Epoch 14, batch 1450, loss[loss=0.2524, simple_loss=0.3107, pruned_loss=0.07045, ctc_loss=0.1329, over 16582.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2738, pruned_loss=0.06363, ctc_loss=0.1115, over 3295368.78 frames. ], batch size: 351, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:52:56,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2735516.0, ans=0.2 2023-10-09 10:53:07,385 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2735562.6666666665, ans=0.125 2023-10-09 10:53:13,149 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2735562.6666666665, ans=0.125 2023-10-09 10:53:23,353 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2023-10-09 10:53:25,885 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2735609.3333333335, ans=0.125 2023-10-09 10:53:30,819 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.09 vs. limit=6.0 2023-10-09 10:53:34,083 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2735656.0, ans=0.1 2023-10-09 10:53:47,552 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=22.5 2023-10-09 10:53:57,018 INFO [train.py:1031] (0/4) Epoch 14, batch 1500, loss[loss=0.2972, simple_loss=0.3128, pruned_loss=0.1041, ctc_loss=0.1836, over 16874.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2747, pruned_loss=0.06352, ctc_loss=0.1114, over 3296466.50 frames. ], batch size: 384, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:54:00,946 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2735749.3333333335, ans=0.0 2023-10-09 10:54:32,301 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+02 3.308e+02 3.843e+02 4.778e+02 1.080e+03, threshold=7.686e+02, percent-clipped=1.0 2023-10-09 10:54:42,981 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2735889.3333333335, ans=0.035 2023-10-09 10:55:00,101 INFO [train.py:1031] (0/4) Epoch 14, batch 1550, loss[loss=0.185, simple_loss=0.2635, pruned_loss=0.03858, ctc_loss=0.07325, over 16877.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2746, pruned_loss=0.06284, ctc_loss=0.1102, over 3288621.95 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:55:21,744 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2736029.3333333335, ans=0.1 2023-10-09 10:55:26,041 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2736076.0, ans=0.0 2023-10-09 10:55:30,190 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2736076.0, ans=0.0 2023-10-09 10:55:41,701 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2736122.6666666665, ans=0.125 2023-10-09 10:55:54,174 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2736169.3333333335, ans=0.2 2023-10-09 10:56:01,581 INFO [train.py:1031] (0/4) Epoch 14, batch 1600, loss[loss=0.1772, simple_loss=0.2453, pruned_loss=0.03992, ctc_loss=0.07334, over 16721.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.271, pruned_loss=0.05898, ctc_loss=0.1039, over 3285037.10 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:56:02,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2736216.0, ans=0.0 2023-10-09 10:56:16,576 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2736262.6666666665, ans=0.0 2023-10-09 10:56:20,348 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2736262.6666666665, ans=0.125 2023-10-09 10:56:29,986 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2736309.3333333335, ans=0.125 2023-10-09 10:56:36,432 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.648e+02 3.123e+02 3.834e+02 1.151e+03, threshold=6.247e+02, percent-clipped=2.0 2023-10-09 10:56:45,356 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2736356.0, ans=0.125 2023-10-09 10:56:59,887 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736402.6666666665, ans=0.1 2023-10-09 10:57:01,606 INFO [train.py:1031] (0/4) Epoch 14, batch 1650, loss[loss=0.2327, simple_loss=0.3004, pruned_loss=0.06267, ctc_loss=0.09951, over 16834.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2722, pruned_loss=0.05994, ctc_loss=0.1054, over 3298188.94 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:57:20,490 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=15.0 2023-10-09 10:57:25,034 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2736542.6666666665, ans=0.0 2023-10-09 10:57:31,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2736542.6666666665, ans=0.2 2023-10-09 10:58:03,246 INFO [train.py:1031] (0/4) Epoch 14, batch 1700, loss[loss=0.2289, simple_loss=0.2897, pruned_loss=0.06294, ctc_loss=0.1055, over 16751.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2794, pruned_loss=0.06364, ctc_loss=0.1116, over 3295857.48 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:58:22,341 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2736729.3333333335, ans=0.0 2023-10-09 10:58:28,463 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2736776.0, ans=0.125 2023-10-09 10:58:38,774 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.292e+02 3.848e+02 4.651e+02 1.016e+03, threshold=7.697e+02, percent-clipped=4.0 2023-10-09 10:58:43,370 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2736822.6666666665, ans=0.0 2023-10-09 10:58:47,121 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2736822.6666666665, ans=0.0 2023-10-09 10:58:50,884 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2736822.6666666665, ans=0.125 2023-10-09 10:59:04,580 INFO [train.py:1031] (0/4) Epoch 14, batch 1750, loss[loss=0.2588, simple_loss=0.315, pruned_loss=0.07655, ctc_loss=0.1238, over 17109.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2829, pruned_loss=0.06508, ctc_loss=0.1139, over 3297641.34 frames. ], batch size: 83, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:59:10,141 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2736916.0, ans=0.125 2023-10-09 10:59:45,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2737056.0, ans=0.0 2023-10-09 11:00:05,526 INFO [train.py:1031] (0/4) Epoch 14, batch 1800, loss[loss=0.2303, simple_loss=0.3002, pruned_loss=0.05844, ctc_loss=0.1089, over 16857.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2823, pruned_loss=0.06376, ctc_loss=0.112, over 3299417.85 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:00:10,584 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2737149.3333333335, ans=0.125 2023-10-09 11:00:13,432 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2737149.3333333335, ans=0.0 2023-10-09 11:00:19,393 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2737196.0, ans=0.0 2023-10-09 11:00:25,150 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2737196.0, ans=0.0 2023-10-09 11:00:43,557 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.917e+02 3.383e+02 3.800e+02 1.043e+03, threshold=6.767e+02, percent-clipped=1.0 2023-10-09 11:00:48,911 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-10-09 11:00:53,799 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2737336.0, ans=0.125 2023-10-09 11:00:55,733 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-10-09 11:01:00,819 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2737336.0, ans=0.125 2023-10-09 11:01:03,760 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2737336.0, ans=0.125 2023-10-09 11:01:06,588 INFO [train.py:1031] (0/4) Epoch 14, batch 1850, loss[loss=0.2709, simple_loss=0.331, pruned_loss=0.07789, ctc_loss=0.1376, over 16539.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2851, pruned_loss=0.06202, ctc_loss=0.1092, over 3308180.59 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:01:07,294 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-10-09 11:01:08,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2737382.6666666665, ans=0.2 2023-10-09 11:01:08,084 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2737382.6666666665, ans=0.125 2023-10-09 11:01:16,254 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.73 vs. limit=10.0 2023-10-09 11:01:23,783 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-10-09 11:01:33,636 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2737476.0, ans=0.0 2023-10-09 11:01:49,286 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2737522.6666666665, ans=0.0 2023-10-09 11:02:06,438 INFO [train.py:1031] (0/4) Epoch 14, batch 1900, loss[loss=0.2503, simple_loss=0.2934, pruned_loss=0.07644, ctc_loss=0.1357, over 16632.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2866, pruned_loss=0.0625, ctc_loss=0.11, over 3313277.60 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:02:15,544 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2737616.0, ans=0.0 2023-10-09 11:02:19,951 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-10-09 11:02:26,235 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2737662.6666666665, ans=0.125 2023-10-09 11:02:26,351 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2737662.6666666665, ans=0.125 2023-10-09 11:02:27,837 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2023-10-09 11:02:32,227 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2737709.3333333335, ans=0.125 2023-10-09 11:02:43,659 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.102e+02 3.624e+02 4.440e+02 7.780e+02, threshold=7.248e+02, percent-clipped=1.0 2023-10-09 11:02:49,757 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2023-10-09 11:03:06,281 INFO [train.py:1031] (0/4) Epoch 14, batch 1950, loss[loss=0.2229, simple_loss=0.266, pruned_loss=0.06679, ctc_loss=0.1154, over 16641.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2884, pruned_loss=0.06242, ctc_loss=0.1102, over 3311307.16 frames. ], batch size: 111, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:03:07,515 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-10-09 11:03:28,670 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2737896.0, ans=0.1 2023-10-09 11:03:40,309 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2737942.6666666665, ans=0.125 2023-10-09 11:03:45,043 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2737989.3333333335, ans=0.05 2023-10-09 11:04:08,694 INFO [train.py:1031] (0/4) Epoch 14, batch 2000, loss[loss=0.2345, simple_loss=0.285, pruned_loss=0.06872, ctc_loss=0.1165, over 16764.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2913, pruned_loss=0.06423, ctc_loss=0.1135, over 3312509.09 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:04:25,911 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2738129.3333333335, ans=0.0 2023-10-09 11:04:37,374 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2738176.0, ans=0.125 2023-10-09 11:04:48,440 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+02 3.379e+02 3.817e+02 4.663e+02 9.562e+02, threshold=7.635e+02, percent-clipped=5.0 2023-10-09 11:04:55,765 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2738269.3333333335, ans=0.0 2023-10-09 11:05:09,398 INFO [train.py:1031] (0/4) Epoch 14, batch 2050, loss[loss=0.3069, simple_loss=0.3356, pruned_loss=0.1012, ctc_loss=0.1896, over 16795.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2974, pruned_loss=0.06704, ctc_loss=0.119, over 3308057.84 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:05:09,743 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2738316.0, ans=0.125 2023-10-09 11:05:09,780 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2738316.0, ans=0.125 2023-10-09 11:05:22,566 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2738362.6666666665, ans=0.0 2023-10-09 11:05:55,139 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:05:59,335 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2738502.6666666665, ans=0.1 2023-10-09 11:06:03,014 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=22.5 2023-10-09 11:06:10,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2738549.3333333335, ans=0.125 2023-10-09 11:06:10,925 INFO [train.py:1031] (0/4) Epoch 14, batch 2100, loss[loss=0.2515, simple_loss=0.2902, pruned_loss=0.07934, ctc_loss=0.1356, over 16602.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.295, pruned_loss=0.06745, ctc_loss=0.1192, over 3314121.49 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:06:30,974 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2738596.0, ans=0.125 2023-10-09 11:06:37,862 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-10-09 11:06:53,249 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.150e+02 3.651e+02 4.552e+02 6.884e+02, threshold=7.301e+02, percent-clipped=0.0 2023-10-09 11:06:54,785 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2738689.3333333335, ans=0.0 2023-10-09 11:06:55,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2738689.3333333335, ans=0.125 2023-10-09 11:07:12,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2738782.6666666665, ans=0.0 2023-10-09 11:07:13,875 INFO [train.py:1031] (0/4) Epoch 14, batch 2150, loss[loss=0.2436, simple_loss=0.3495, pruned_loss=0.05053, ctc_loss=0.09182, over 15123.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2951, pruned_loss=0.06592, ctc_loss=0.1167, over 3296434.76 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:07:29,710 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-10-09 11:08:08,085 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2738969.3333333335, ans=0.2 2023-10-09 11:08:14,606 INFO [train.py:1031] (0/4) Epoch 14, batch 2200, loss[loss=0.2384, simple_loss=0.3009, pruned_loss=0.06582, ctc_loss=0.1108, over 17046.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2935, pruned_loss=0.06603, ctc_loss=0.1165, over 3299349.19 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:08:47,524 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2739109.3333333335, ans=0.0 2023-10-09 11:08:48,855 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.07 vs. limit=22.5 2023-10-09 11:08:58,372 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.214e+02 3.681e+02 4.721e+02 1.015e+03, threshold=7.363e+02, percent-clipped=4.0 2023-10-09 11:08:58,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2739156.0, ans=0.0 2023-10-09 11:09:16,600 INFO [train.py:1031] (0/4) Epoch 14, batch 2250, loss[loss=0.1951, simple_loss=0.2513, pruned_loss=0.05096, ctc_loss=0.0922, over 16851.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2872, pruned_loss=0.06527, ctc_loss=0.1153, over 3301378.86 frames. ], batch size: 189, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:09:26,540 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2023-10-09 11:09:43,712 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2739342.6666666665, ans=0.2 2023-10-09 11:09:48,491 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2739342.6666666665, ans=0.0 2023-10-09 11:10:04,363 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-10-09 11:10:05,221 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2739436.0, ans=0.125 2023-10-09 11:10:13,877 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2739436.0, ans=0.125 2023-10-09 11:10:18,410 INFO [train.py:1031] (0/4) Epoch 14, batch 2300, loss[loss=0.2417, simple_loss=0.2897, pruned_loss=0.07219, ctc_loss=0.1232, over 16715.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2801, pruned_loss=0.06396, ctc_loss=0.1126, over 3311085.02 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:10:35,869 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2739529.3333333335, ans=0.125 2023-10-09 11:11:00,480 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2739622.6666666665, ans=0.125 2023-10-09 11:11:04,760 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+02 3.279e+02 3.727e+02 4.728e+02 7.971e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 11:11:14,493 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2739669.3333333335, ans=0.125 2023-10-09 11:11:20,456 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2739716.0, ans=0.0 2023-10-09 11:11:21,217 INFO [train.py:1031] (0/4) Epoch 14, batch 2350, loss[loss=0.2342, simple_loss=0.2733, pruned_loss=0.07065, ctc_loss=0.1344, over 15293.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.283, pruned_loss=0.06562, ctc_loss=0.1156, over 3296096.08 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:11:48,985 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2739809.3333333335, ans=0.1 2023-10-09 11:11:59,545 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2739856.0, ans=0.125 2023-10-09 11:12:02,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2739856.0, ans=0.125 2023-10-09 11:12:22,657 INFO [train.py:1031] (0/4) Epoch 14, batch 2400, loss[loss=0.2165, simple_loss=0.265, pruned_loss=0.06164, ctc_loss=0.1121, over 16252.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2863, pruned_loss=0.06767, ctc_loss=0.1187, over 3306550.61 frames. ], batch size: 70, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:12:37,008 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2739996.0, ans=0.125 2023-10-09 11:13:00,546 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2740089.3333333335, ans=0.0 2023-10-09 11:13:09,494 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+02 3.337e+02 3.917e+02 4.663e+02 1.051e+03, threshold=7.833e+02, percent-clipped=2.0 2023-10-09 11:13:09,802 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2740089.3333333335, ans=0.5 2023-10-09 11:13:23,082 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2740136.0, ans=0.1 2023-10-09 11:13:23,609 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2023-10-09 11:13:25,548 INFO [train.py:1031] (0/4) Epoch 14, batch 2450, loss[loss=0.1993, simple_loss=0.2721, pruned_loss=0.04556, ctc_loss=0.08838, over 16957.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.285, pruned_loss=0.06784, ctc_loss=0.1189, over 3298993.35 frames. ], batch size: 293, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:13:44,974 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2740229.3333333335, ans=0.125 2023-10-09 11:14:28,353 INFO [train.py:1031] (0/4) Epoch 14, batch 2500, loss[loss=0.1906, simple_loss=0.2665, pruned_loss=0.04119, ctc_loss=0.08076, over 16874.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2794, pruned_loss=0.06242, ctc_loss=0.1101, over 3295482.70 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:14:33,769 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2740416.0, ans=0.125 2023-10-09 11:14:39,314 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2740416.0, ans=0.0 2023-10-09 11:14:43,024 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2740462.6666666665, ans=0.125 2023-10-09 11:14:43,117 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2740462.6666666665, ans=0.125 2023-10-09 11:14:51,676 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2740462.6666666665, ans=0.125 2023-10-09 11:14:54,270 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2740509.3333333335, ans=0.04949747468305833 2023-10-09 11:15:17,464 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.820e+02 3.218e+02 3.802e+02 1.081e+03, threshold=6.436e+02, percent-clipped=2.0 2023-10-09 11:15:33,264 INFO [train.py:1031] (0/4) Epoch 14, batch 2550, loss[loss=0.241, simple_loss=0.3005, pruned_loss=0.06964, ctc_loss=0.1054, over 16870.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2846, pruned_loss=0.06304, ctc_loss=0.11, over 3294771.84 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:15:42,978 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2740649.3333333335, ans=0.0 2023-10-09 11:16:03,168 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-10-09 11:16:35,625 INFO [train.py:1031] (0/4) Epoch 14, batch 2600, loss[loss=0.2466, simple_loss=0.3029, pruned_loss=0.06959, ctc_loss=0.128, over 16454.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2831, pruned_loss=0.0625, ctc_loss=0.109, over 3290147.89 frames. ], batch size: 350, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:16:58,007 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2023-10-09 11:17:05,671 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2740976.0, ans=0.0 2023-10-09 11:17:15,990 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2741022.6666666665, ans=0.0 2023-10-09 11:17:18,111 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=15.0 2023-10-09 11:17:23,702 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+02 2.979e+02 3.641e+02 4.453e+02 7.344e+02, threshold=7.282e+02, percent-clipped=4.0 2023-10-09 11:17:27,303 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2023-10-09 11:17:37,765 INFO [train.py:1031] (0/4) Epoch 14, batch 2650, loss[loss=0.2811, simple_loss=0.3288, pruned_loss=0.08486, ctc_loss=0.1589, over 16513.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2847, pruned_loss=0.06127, ctc_loss=0.1075, over 3289631.75 frames. ], batch size: 415, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:17:40,783 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2741116.0, ans=0.125 2023-10-09 11:17:48,591 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2741116.0, ans=0.0 2023-10-09 11:17:57,266 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2741162.6666666665, ans=0.025 2023-10-09 11:17:58,358 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2741162.6666666665, ans=0.0 2023-10-09 11:17:59,965 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2741162.6666666665, ans=0.05 2023-10-09 11:18:04,705 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2741209.3333333335, ans=0.2 2023-10-09 11:18:35,112 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2741302.6666666665, ans=0.07 2023-10-09 11:18:38,488 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2741349.3333333335, ans=0.0 2023-10-09 11:18:39,292 INFO [train.py:1031] (0/4) Epoch 14, batch 2700, loss[loss=0.2474, simple_loss=0.3185, pruned_loss=0.06519, ctc_loss=0.1148, over 16892.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2887, pruned_loss=0.06411, ctc_loss=0.1124, over 3284225.70 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:18:42,552 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2741349.3333333335, ans=0.0 2023-10-09 11:18:55,264 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2741396.0, ans=0.125 2023-10-09 11:19:04,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741442.6666666665, ans=0.1 2023-10-09 11:19:09,886 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2741442.6666666665, ans=0.0 2023-10-09 11:19:16,532 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2741489.3333333335, ans=0.1 2023-10-09 11:19:18,771 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2741489.3333333335, ans=0.05 2023-10-09 11:19:28,129 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2741489.3333333335, ans=0.05 2023-10-09 11:19:31,012 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+02 3.579e+02 4.156e+02 4.960e+02 1.400e+03, threshold=8.312e+02, percent-clipped=4.0 2023-10-09 11:19:41,658 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2741582.6666666665, ans=0.1 2023-10-09 11:19:42,377 INFO [train.py:1031] (0/4) Epoch 14, batch 2750, loss[loss=0.2157, simple_loss=0.3251, pruned_loss=0.03922, ctc_loss=0.06942, over 15106.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2934, pruned_loss=0.06372, ctc_loss=0.1121, over 3282968.00 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:19:53,759 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=22.5 2023-10-09 11:20:10,044 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2741676.0, ans=0.1 2023-10-09 11:20:11,104 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741676.0, ans=0.1 2023-10-09 11:20:14,121 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2741676.0, ans=6.0 2023-10-09 11:20:18,767 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741722.6666666665, ans=0.1 2023-10-09 11:20:19,176 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-10-09 11:20:20,506 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2741722.6666666665, ans=0.2 2023-10-09 11:20:44,739 INFO [train.py:1031] (0/4) Epoch 14, batch 2800, loss[loss=0.1885, simple_loss=0.2897, pruned_loss=0.0319, ctc_loss=0.05844, over 15064.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2888, pruned_loss=0.05946, ctc_loss=0.1054, over 3286826.20 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:20:54,178 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2023-10-09 11:21:10,123 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2741909.3333333335, ans=0.125 2023-10-09 11:21:33,673 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2023-10-09 11:21:35,708 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 3.033e+02 3.735e+02 4.727e+02 1.179e+03, threshold=7.471e+02, percent-clipped=1.0 2023-10-09 11:21:41,424 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2742002.6666666665, ans=0.125 2023-10-09 11:21:47,213 INFO [train.py:1031] (0/4) Epoch 14, batch 2850, loss[loss=0.1964, simple_loss=0.2567, pruned_loss=0.05048, ctc_loss=0.08782, over 16766.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2866, pruned_loss=0.05675, ctc_loss=0.101, over 3284229.18 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:22:16,279 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2742142.6666666665, ans=0.2 2023-10-09 11:22:51,982 INFO [train.py:1031] (0/4) Epoch 14, batch 2900, loss[loss=0.2082, simple_loss=0.2859, pruned_loss=0.04699, ctc_loss=0.09102, over 16796.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.288, pruned_loss=0.05506, ctc_loss=0.0985, over 3282996.85 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:23:02,237 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2742282.6666666665, ans=0.09899494936611666 2023-10-09 11:23:26,994 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2742376.0, ans=0.125 2023-10-09 11:23:35,067 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2742422.6666666665, ans=0.2 2023-10-09 11:23:35,948 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2742422.6666666665, ans=0.125 2023-10-09 11:23:37,084 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2742422.6666666665, ans=0.125 2023-10-09 11:23:38,741 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2742422.6666666665, ans=0.125 2023-10-09 11:23:43,139 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.117e+02 3.737e+02 4.874e+02 8.025e+02, threshold=7.473e+02, percent-clipped=2.0 2023-10-09 11:23:52,694 INFO [train.py:1031] (0/4) Epoch 14, batch 2950, loss[loss=0.2202, simple_loss=0.2696, pruned_loss=0.06294, ctc_loss=0.1123, over 16393.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2898, pruned_loss=0.05614, ctc_loss=0.09995, over 3287809.60 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:24:00,769 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-10-09 11:24:16,787 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2742609.3333333335, ans=0.0 2023-10-09 11:24:22,485 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2023-10-09 11:24:31,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2742656.0, ans=0.125 2023-10-09 11:24:55,799 INFO [train.py:1031] (0/4) Epoch 14, batch 3000, loss[loss=0.2963, simple_loss=0.3342, pruned_loss=0.09381, ctc_loss=0.1772, over 16616.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2879, pruned_loss=0.05902, ctc_loss=0.1049, over 3296187.92 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:24:55,800 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 11:25:13,600 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2392, simple_loss=0.3062, pruned_loss=0.06637, ctc_loss=0.09863, over 1796401.00 frames. 2023-10-09 11:25:13,600 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14582MB 2023-10-09 11:25:24,690 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2742796.0, ans=0.1 2023-10-09 11:25:24,695 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2742796.0, ans=0.125 2023-10-09 11:25:31,337 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2742796.0, ans=0.0 2023-10-09 11:25:34,389 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2742796.0, ans=0.125 2023-10-09 11:25:38,045 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2742842.6666666665, ans=0.1 2023-10-09 11:25:43,782 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:25:57,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2742889.3333333335, ans=0.125 2023-10-09 11:26:01,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2742936.0, ans=0.125 2023-10-09 11:26:05,142 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+02 3.037e+02 3.527e+02 4.152e+02 6.631e+02, threshold=7.054e+02, percent-clipped=0.0 2023-10-09 11:26:14,950 INFO [train.py:1031] (0/4) Epoch 14, batch 3050, loss[loss=0.2191, simple_loss=0.265, pruned_loss=0.06653, ctc_loss=0.1004, over 16771.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2827, pruned_loss=0.05914, ctc_loss=0.105, over 3295374.50 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:26:19,212 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2023-10-09 11:26:26,759 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2743029.3333333335, ans=10.0 2023-10-09 11:26:32,484 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2743029.3333333335, ans=0.0 2023-10-09 11:26:32,732 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-10-09 11:26:40,908 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2743076.0, ans=0.0 2023-10-09 11:27:15,160 INFO [train.py:1031] (0/4) Epoch 14, batch 3100, loss[loss=0.2542, simple_loss=0.2922, pruned_loss=0.07898, ctc_loss=0.1456, over 16467.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2758, pruned_loss=0.05899, ctc_loss=0.1043, over 3300159.38 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:27:26,177 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2743262.6666666665, ans=0.125 2023-10-09 11:27:50,845 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2743356.0, ans=0.125 2023-10-09 11:27:54,408 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2743356.0, ans=0.0 2023-10-09 11:27:57,475 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2743356.0, ans=0.0 2023-10-09 11:27:58,048 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.01 vs. limit=10.0 2023-10-09 11:28:01,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2743356.0, ans=0.125 2023-10-09 11:28:07,997 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.923e+02 3.339e+02 4.092e+02 6.355e+02, threshold=6.678e+02, percent-clipped=0.0 2023-10-09 11:28:15,796 INFO [train.py:1031] (0/4) Epoch 14, batch 3150, loss[loss=0.2104, simple_loss=0.2853, pruned_loss=0.04982, ctc_loss=0.08996, over 16343.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2697, pruned_loss=0.05715, ctc_loss=0.1005, over 3291455.42 frames. ], batch size: 466, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:28:17,746 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-10-09 11:28:26,386 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2743449.3333333335, ans=0.125 2023-10-09 11:28:48,345 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2023-10-09 11:29:17,393 INFO [train.py:1031] (0/4) Epoch 14, batch 3200, loss[loss=0.2299, simple_loss=0.3028, pruned_loss=0.0572, ctc_loss=0.1066, over 16871.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2751, pruned_loss=0.05762, ctc_loss=0.1019, over 3288957.33 frames. ], batch size: 242, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:29:19,736 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2743682.6666666665, ans=0.0 2023-10-09 11:29:27,567 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-10-09 11:29:27,742 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=22.5 2023-10-09 11:29:34,648 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2743729.3333333335, ans=0.125 2023-10-09 11:29:54,492 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:29:55,841 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2023-10-09 11:30:08,809 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2743869.3333333335, ans=0.125 2023-10-09 11:30:12,210 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.362e+02 3.915e+02 4.671e+02 1.064e+03, threshold=7.829e+02, percent-clipped=5.0 2023-10-09 11:30:18,612 INFO [train.py:1031] (0/4) Epoch 14, batch 3250, loss[loss=0.2489, simple_loss=0.3079, pruned_loss=0.07225, ctc_loss=0.1133, over 17007.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2786, pruned_loss=0.0603, ctc_loss=0.1059, over 3291685.58 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:30:40,533 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-588000.pt 2023-10-09 11:30:49,876 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-10-09 11:30:50,597 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2744009.3333333335, ans=0.0 2023-10-09 11:31:18,447 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2023-10-09 11:31:23,918 INFO [train.py:1031] (0/4) Epoch 14, batch 3300, loss[loss=0.3034, simple_loss=0.3401, pruned_loss=0.09914, ctc_loss=0.1711, over 16527.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2858, pruned_loss=0.06315, ctc_loss=0.111, over 3290661.38 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:31:38,535 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744196.0, ans=0.1 2023-10-09 11:31:39,544 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2744196.0, ans=0.025 2023-10-09 11:31:47,753 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2744196.0, ans=0.2 2023-10-09 11:32:16,491 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2744336.0, ans=0.125 2023-10-09 11:32:20,914 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.186e+02 3.803e+02 4.439e+02 1.060e+03, threshold=7.606e+02, percent-clipped=1.0 2023-10-09 11:32:26,285 INFO [train.py:1031] (0/4) Epoch 14, batch 3350, loss[loss=0.2152, simple_loss=0.2382, pruned_loss=0.07051, ctc_loss=0.1282, over 15312.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2843, pruned_loss=0.06412, ctc_loss=0.1127, over 3299146.68 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:32:38,590 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.40 vs. limit=10.0 2023-10-09 11:32:51,506 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2744476.0, ans=0.0 2023-10-09 11:32:54,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2744476.0, ans=0.04949747468305833 2023-10-09 11:33:14,677 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-10-09 11:33:18,248 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2744569.3333333335, ans=0.125 2023-10-09 11:33:29,577 INFO [train.py:1031] (0/4) Epoch 14, batch 3400, loss[loss=0.2713, simple_loss=0.3161, pruned_loss=0.08377, ctc_loss=0.1473, over 16519.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2829, pruned_loss=0.06421, ctc_loss=0.1128, over 3295102.31 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:33:29,935 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2744616.0, ans=0.125 2023-10-09 11:33:47,171 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2744662.6666666665, ans=0.125 2023-10-09 11:34:05,475 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744756.0, ans=0.1 2023-10-09 11:34:15,683 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744756.0, ans=0.1 2023-10-09 11:34:18,726 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2744802.6666666665, ans=0.125 2023-10-09 11:34:26,691 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.089e+02 3.600e+02 4.217e+02 8.048e+02, threshold=7.200e+02, percent-clipped=1.0 2023-10-09 11:34:30,974 INFO [train.py:1031] (0/4) Epoch 14, batch 3450, loss[loss=0.2795, simple_loss=0.3377, pruned_loss=0.08125, ctc_loss=0.1469, over 16614.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2873, pruned_loss=0.06395, ctc_loss=0.1127, over 3300627.58 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:34:33,568 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2744849.3333333335, ans=0.2 2023-10-09 11:34:42,454 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:34:42,473 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2744896.0, ans=0.125 2023-10-09 11:34:43,827 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2023-10-09 11:34:58,536 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2744942.6666666665, ans=0.0 2023-10-09 11:35:27,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2745036.0, ans=15.0 2023-10-09 11:35:32,419 INFO [train.py:1031] (0/4) Epoch 14, batch 3500, loss[loss=0.2132, simple_loss=0.2691, pruned_loss=0.05976, ctc_loss=0.09433, over 16793.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2835, pruned_loss=0.06123, ctc_loss=0.108, over 3304576.37 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:35:35,436 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2745082.6666666665, ans=0.0 2023-10-09 11:35:38,082 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2745082.6666666665, ans=0.125 2023-10-09 11:35:50,043 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-10-09 11:35:53,379 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2745129.3333333335, ans=0.125 2023-10-09 11:36:10,034 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2745222.6666666665, ans=0.2 2023-10-09 11:36:11,530 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2745222.6666666665, ans=0.0 2023-10-09 11:36:12,921 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2023-10-09 11:36:15,385 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2745222.6666666665, ans=0.125 2023-10-09 11:36:22,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2745269.3333333335, ans=0.125 2023-10-09 11:36:28,663 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 2.946e+02 3.396e+02 4.316e+02 6.919e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 11:36:31,829 INFO [train.py:1031] (0/4) Epoch 14, batch 3550, loss[loss=0.1692, simple_loss=0.2158, pruned_loss=0.04558, ctc_loss=0.07885, over 16183.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2765, pruned_loss=0.05972, ctc_loss=0.1052, over 3305348.10 frames. ], batch size: 70, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:36:34,180 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2023-10-09 11:36:47,194 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2745362.6666666665, ans=0.125 2023-10-09 11:36:55,087 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2745409.3333333335, ans=0.125 2023-10-09 11:37:29,373 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2745502.6666666665, ans=0.125 2023-10-09 11:37:31,032 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2745502.6666666665, ans=0.125 2023-10-09 11:37:32,881 INFO [train.py:1031] (0/4) Epoch 14, batch 3600, loss[loss=0.2015, simple_loss=0.2505, pruned_loss=0.05622, ctc_loss=0.1002, over 16783.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2696, pruned_loss=0.05934, ctc_loss=0.1042, over 3309528.28 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:37:44,856 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2745596.0, ans=0.2 2023-10-09 11:38:21,645 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2745736.0, ans=0.0 2023-10-09 11:38:31,939 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.169e+02 3.614e+02 4.285e+02 9.204e+02, threshold=7.228e+02, percent-clipped=2.0 2023-10-09 11:38:32,339 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2745782.6666666665, ans=0.0 2023-10-09 11:38:33,632 INFO [train.py:1031] (0/4) Epoch 14, batch 3650, loss[loss=0.2091, simple_loss=0.2555, pruned_loss=0.06057, ctc_loss=0.104, over 16622.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2663, pruned_loss=0.06058, ctc_loss=0.1062, over 3311101.79 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:38:41,261 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2023-10-09 11:38:45,783 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2745829.3333333335, ans=0.125 2023-10-09 11:38:45,875 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2745829.3333333335, ans=0.05 2023-10-09 11:39:14,816 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2745922.6666666665, ans=0.125 2023-10-09 11:39:15,823 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2745922.6666666665, ans=0.0 2023-10-09 11:39:36,568 INFO [train.py:1031] (0/4) Epoch 14, batch 3700, loss[loss=0.3373, simple_loss=0.3524, pruned_loss=0.1172, ctc_loss=0.2196, over 16639.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2719, pruned_loss=0.06383, ctc_loss=0.1117, over 3299928.23 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:39:53,217 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2746062.6666666665, ans=0.2 2023-10-09 11:39:57,634 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2023-10-09 11:39:58,293 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2746062.6666666665, ans=0.125 2023-10-09 11:40:12,267 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2746109.3333333335, ans=0.125 2023-10-09 11:40:12,337 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2746109.3333333335, ans=0.125 2023-10-09 11:40:40,095 INFO [train.py:1031] (0/4) Epoch 14, batch 3750, loss[loss=0.244, simple_loss=0.2918, pruned_loss=0.07365, ctc_loss=0.1222, over 16757.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2787, pruned_loss=0.06662, ctc_loss=0.1169, over 3298083.47 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:40:41,104 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+02 3.323e+02 3.693e+02 4.050e+02 7.078e+02, threshold=7.386e+02, percent-clipped=0.0 2023-10-09 11:41:00,884 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2746296.0, ans=0.0 2023-10-09 11:41:07,667 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2746342.6666666665, ans=0.2 2023-10-09 11:41:43,228 INFO [train.py:1031] (0/4) Epoch 14, batch 3800, loss[loss=0.2144, simple_loss=0.2661, pruned_loss=0.05887, ctc_loss=0.1121, over 16162.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2856, pruned_loss=0.06908, ctc_loss=0.1212, over 3294474.00 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:42:12,337 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2746576.0, ans=0.2 2023-10-09 11:42:25,138 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=22.5 2023-10-09 11:42:26,392 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-10-09 11:42:35,751 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-10-09 11:42:44,571 INFO [train.py:1031] (0/4) Epoch 14, batch 3850, loss[loss=0.2307, simple_loss=0.2665, pruned_loss=0.07112, ctc_loss=0.1317, over 16706.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.281, pruned_loss=0.06784, ctc_loss=0.1191, over 3292971.91 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:42:47,252 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.112e+02 3.545e+02 3.960e+02 7.617e+02, threshold=7.089e+02, percent-clipped=1.0 2023-10-09 11:42:47,929 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-10-09 11:42:58,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2746762.6666666665, ans=0.1 2023-10-09 11:43:31,759 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2746856.0, ans=0.2 2023-10-09 11:43:46,339 INFO [train.py:1031] (0/4) Epoch 14, batch 3900, loss[loss=0.2127, simple_loss=0.2889, pruned_loss=0.04901, ctc_loss=0.09615, over 16763.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2801, pruned_loss=0.06556, ctc_loss=0.1154, over 3292618.27 frames. ], batch size: 272, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:43:46,737 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2746949.3333333335, ans=0.125 2023-10-09 11:44:07,114 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2746996.0, ans=0.125 2023-10-09 11:44:09,398 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2747042.6666666665, ans=0.0 2023-10-09 11:44:10,480 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2747042.6666666665, ans=0.125 2023-10-09 11:44:16,941 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2747042.6666666665, ans=0.125 2023-10-09 11:44:24,615 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:44:40,733 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2747136.0, ans=0.1 2023-10-09 11:44:47,994 INFO [train.py:1031] (0/4) Epoch 14, batch 3950, loss[loss=0.2043, simple_loss=0.2576, pruned_loss=0.05547, ctc_loss=0.1001, over 16788.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2781, pruned_loss=0.06473, ctc_loss=0.1135, over 3291167.11 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:44:50,714 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.084e+02 3.430e+02 4.061e+02 1.180e+03, threshold=6.860e+02, percent-clipped=1.0 2023-10-09 11:44:51,116 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:44:53,222 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2747182.6666666665, ans=0.1 2023-10-09 11:45:16,557 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2747276.0, ans=15.0 2023-10-09 11:45:32,808 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2747322.6666666665, ans=0.125 2023-10-09 11:45:40,522 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2747369.3333333335, ans=0.2 2023-10-09 11:45:44,650 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-10-09 11:45:45,481 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:45:49,252 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2747416.0, ans=0.0 2023-10-09 11:45:50,017 INFO [train.py:1031] (0/4) Epoch 14, batch 4000, loss[loss=0.2613, simple_loss=0.3134, pruned_loss=0.07815, ctc_loss=0.1321, over 16739.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2773, pruned_loss=0.06511, ctc_loss=0.1141, over 3288278.49 frames. ], batch size: 111, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:45:55,911 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2023-10-09 11:46:02,680 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2747462.6666666665, ans=0.0 2023-10-09 11:46:04,570 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2747462.6666666665, ans=0.125 2023-10-09 11:46:10,869 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2747462.6666666665, ans=0.125 2023-10-09 11:46:11,244 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=22.5 2023-10-09 11:46:11,942 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2747462.6666666665, ans=0.0 2023-10-09 11:46:23,815 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2747509.3333333335, ans=0.2 2023-10-09 11:46:27,015 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2747556.0, ans=0.1 2023-10-09 11:46:30,691 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2747556.0, ans=0.125 2023-10-09 11:46:35,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2747556.0, ans=0.125 2023-10-09 11:46:44,184 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2747602.6666666665, ans=0.125 2023-10-09 11:46:51,638 INFO [train.py:1031] (0/4) Epoch 14, batch 4050, loss[loss=0.2627, simple_loss=0.3012, pruned_loss=0.08234, ctc_loss=0.1488, over 16751.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2819, pruned_loss=0.06733, ctc_loss=0.1178, over 3292922.76 frames. ], batch size: 353, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:46:54,422 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.332e+02 3.986e+02 4.544e+02 6.934e+02, threshold=7.972e+02, percent-clipped=1.0 2023-10-09 11:46:55,749 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2747649.3333333335, ans=0.125 2023-10-09 11:47:02,075 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2747649.3333333335, ans=0.0 2023-10-09 11:47:07,192 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-10-09 11:47:10,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2747696.0, ans=0.125 2023-10-09 11:47:14,823 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2747696.0, ans=0.0 2023-10-09 11:47:45,063 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2747836.0, ans=0.2 2023-10-09 11:47:52,883 INFO [train.py:1031] (0/4) Epoch 14, batch 4100, loss[loss=0.2205, simple_loss=0.2813, pruned_loss=0.05931, ctc_loss=0.1025, over 16790.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2816, pruned_loss=0.06774, ctc_loss=0.1181, over 3296728.90 frames. ], batch size: 272, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:47:54,283 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:47:54,305 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2747882.6666666665, ans=0.0 2023-10-09 11:48:08,956 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2023-10-09 11:48:17,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2747976.0, ans=0.125 2023-10-09 11:48:39,768 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2748022.6666666665, ans=0.125 2023-10-09 11:48:54,162 INFO [train.py:1031] (0/4) Epoch 14, batch 4150, loss[loss=0.2084, simple_loss=0.2636, pruned_loss=0.05647, ctc_loss=0.1004, over 16841.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2811, pruned_loss=0.06609, ctc_loss=0.1151, over 3300411.87 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:48:57,964 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 3.212e+02 3.640e+02 4.119e+02 7.384e+02, threshold=7.280e+02, percent-clipped=0.0 2023-10-09 11:49:04,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2748116.0, ans=0.125 2023-10-09 11:49:12,345 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2748162.6666666665, ans=0.2 2023-10-09 11:49:23,083 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2748209.3333333335, ans=0.125 2023-10-09 11:49:28,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2748209.3333333335, ans=0.125 2023-10-09 11:49:30,205 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2748209.3333333335, ans=0.95 2023-10-09 11:49:30,487 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=22.5 2023-10-09 11:49:52,991 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2748302.6666666665, ans=0.0 2023-10-09 11:49:56,389 INFO [train.py:1031] (0/4) Epoch 14, batch 4200, loss[loss=0.2002, simple_loss=0.2604, pruned_loss=0.05117, ctc_loss=0.09408, over 16930.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2773, pruned_loss=0.064, ctc_loss=0.1107, over 3309731.98 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:50:02,779 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=2748349.3333333335, ans=12.0 2023-10-09 11:50:14,918 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2748396.0, ans=0.1 2023-10-09 11:50:21,458 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2748442.6666666665, ans=0.0 2023-10-09 11:50:33,061 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2748489.3333333335, ans=0.05 2023-10-09 11:50:39,838 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2748489.3333333335, ans=0.125 2023-10-09 11:50:40,889 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2748489.3333333335, ans=0.125 2023-10-09 11:50:55,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2748582.6666666665, ans=0.125 2023-10-09 11:50:56,840 INFO [train.py:1031] (0/4) Epoch 14, batch 4250, loss[loss=0.284, simple_loss=0.3038, pruned_loss=0.09687, ctc_loss=0.1764, over 16841.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2793, pruned_loss=0.06564, ctc_loss=0.1135, over 3319043.49 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:51:03,484 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.251e+02 3.808e+02 4.615e+02 8.624e+02, threshold=7.616e+02, percent-clipped=2.0 2023-10-09 11:51:22,826 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2748676.0, ans=0.0 2023-10-09 11:51:32,092 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2748676.0, ans=0.0 2023-10-09 11:51:44,571 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:51:58,813 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-10-09 11:51:58,943 INFO [train.py:1031] (0/4) Epoch 14, batch 4300, loss[loss=0.2589, simple_loss=0.3289, pruned_loss=0.06943, ctc_loss=0.1253, over 16978.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2856, pruned_loss=0.06656, ctc_loss=0.1154, over 3308472.81 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:52:08,692 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2748816.0, ans=0.0 2023-10-09 11:52:09,845 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2748816.0, ans=0.125 2023-10-09 11:52:11,077 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2748862.6666666665, ans=0.125 2023-10-09 11:52:23,355 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2748862.6666666665, ans=0.0 2023-10-09 11:52:29,356 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-10-09 11:52:35,474 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2748909.3333333335, ans=0.125 2023-10-09 11:52:36,592 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2748909.3333333335, ans=0.125 2023-10-09 11:52:38,737 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2748956.0, ans=0.125 2023-10-09 11:52:44,545 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=12.0 2023-10-09 11:52:58,434 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2749002.6666666665, ans=0.125 2023-10-09 11:53:04,760 INFO [train.py:1031] (0/4) Epoch 14, batch 4350, loss[loss=0.2218, simple_loss=0.2569, pruned_loss=0.06951, ctc_loss=0.1193, over 16534.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2919, pruned_loss=0.06932, ctc_loss=0.1199, over 3292105.40 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:53:11,362 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+02 3.477e+02 4.126e+02 5.175e+02 8.890e+02, threshold=8.251e+02, percent-clipped=2.0 2023-10-09 11:53:13,841 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2749049.3333333335, ans=0.0 2023-10-09 11:53:35,392 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-10-09 11:54:06,537 INFO [train.py:1031] (0/4) Epoch 14, batch 4400, loss[loss=0.1995, simple_loss=0.2603, pruned_loss=0.05269, ctc_loss=0.08338, over 16874.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.287, pruned_loss=0.06838, ctc_loss=0.1174, over 3286228.60 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:54:19,220 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2749329.3333333335, ans=0.125 2023-10-09 11:54:22,231 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2749329.3333333335, ans=0.125 2023-10-09 11:54:32,572 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2749376.0, ans=0.5 2023-10-09 11:54:44,418 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2749422.6666666665, ans=0.125 2023-10-09 11:55:08,713 INFO [train.py:1031] (0/4) Epoch 14, batch 4450, loss[loss=0.3006, simple_loss=0.3159, pruned_loss=0.1057, ctc_loss=0.185, over 16833.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2825, pruned_loss=0.06677, ctc_loss=0.114, over 3298450.07 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:55:16,085 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+02 3.136e+02 3.606e+02 4.303e+02 6.153e+02, threshold=7.211e+02, percent-clipped=0.0 2023-10-09 11:55:39,640 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2749609.3333333335, ans=0.1 2023-10-09 11:55:42,820 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-10-09 11:56:07,466 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.96 vs. limit=15.0 2023-10-09 11:56:10,359 INFO [train.py:1031] (0/4) Epoch 14, batch 4500, loss[loss=0.2498, simple_loss=0.3063, pruned_loss=0.07133, ctc_loss=0.1264, over 16898.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2822, pruned_loss=0.0671, ctc_loss=0.1148, over 3300946.27 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:56:19,831 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2749749.3333333335, ans=0.0 2023-10-09 11:56:25,705 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:56:33,811 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2749796.0, ans=0.1 2023-10-09 11:56:56,632 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2749889.3333333335, ans=0.125 2023-10-09 11:56:57,766 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2749889.3333333335, ans=0.125 2023-10-09 11:57:04,858 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2749936.0, ans=0.125 2023-10-09 11:57:07,258 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-10-09 11:57:08,908 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2749936.0, ans=0.125 2023-10-09 11:57:12,371 INFO [train.py:1031] (0/4) Epoch 14, batch 4550, loss[loss=0.2323, simple_loss=0.3094, pruned_loss=0.05777, ctc_loss=0.09921, over 16928.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2865, pruned_loss=0.06625, ctc_loss=0.1138, over 3306948.92 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:57:20,843 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.133e+02 3.571e+02 4.090e+02 7.081e+02, threshold=7.142e+02, percent-clipped=0.0 2023-10-09 11:57:24,382 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2750029.3333333335, ans=0.125 2023-10-09 11:58:10,676 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2750169.3333333335, ans=0.0 2023-10-09 11:58:15,117 INFO [train.py:1031] (0/4) Epoch 14, batch 4600, loss[loss=0.2101, simple_loss=0.2776, pruned_loss=0.05267, ctc_loss=0.09307, over 16730.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2898, pruned_loss=0.06649, ctc_loss=0.1146, over 3309336.14 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:58:19,734 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2750216.0, ans=0.0 2023-10-09 11:58:21,157 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.01 vs. limit=22.5 2023-10-09 11:58:25,410 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2750216.0, ans=0.1 2023-10-09 11:58:28,099 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2750262.6666666665, ans=0.125 2023-10-09 11:58:34,862 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-10-09 11:59:09,512 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=12.0 2023-10-09 11:59:18,164 INFO [train.py:1031] (0/4) Epoch 14, batch 4650, loss[loss=0.1958, simple_loss=0.3029, pruned_loss=0.03161, ctc_loss=0.06359, over 16292.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2932, pruned_loss=0.0674, ctc_loss=0.1168, over 3299549.28 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:59:23,228 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:59:26,057 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2750449.3333333335, ans=0.125 2023-10-09 11:59:28,967 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+02 3.249e+02 3.763e+02 4.381e+02 6.611e+02, threshold=7.526e+02, percent-clipped=0.0 2023-10-09 11:59:58,273 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2750589.3333333335, ans=0.125 2023-10-09 12:00:02,476 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=15.0 2023-10-09 12:00:07,048 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2023-10-09 12:00:08,985 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2750636.0, ans=0.2 2023-10-09 12:00:09,003 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2750636.0, ans=0.2 2023-10-09 12:00:14,492 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2750636.0, ans=0.125 2023-10-09 12:00:19,912 INFO [train.py:1031] (0/4) Epoch 14, batch 4700, loss[loss=0.211, simple_loss=0.2686, pruned_loss=0.05729, ctc_loss=0.09697, over 16787.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2895, pruned_loss=0.06341, ctc_loss=0.1104, over 3309141.91 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:00:27,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2750682.6666666665, ans=0.125 2023-10-09 12:00:51,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2750776.0, ans=0.125 2023-10-09 12:01:00,913 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2750822.6666666665, ans=0.125 2023-10-09 12:01:21,796 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2750916.0, ans=0.125 2023-10-09 12:01:22,369 INFO [train.py:1031] (0/4) Epoch 14, batch 4750, loss[loss=0.2554, simple_loss=0.3243, pruned_loss=0.06733, ctc_loss=0.1295, over 16834.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2904, pruned_loss=0.06385, ctc_loss=0.1115, over 3303401.88 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 12:01:22,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2750916.0, ans=0.125 2023-10-09 12:01:35,194 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+02 3.139e+02 3.749e+02 4.382e+02 2.421e+03, threshold=7.497e+02, percent-clipped=2.0 2023-10-09 12:01:40,479 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2750962.6666666665, ans=0.0 2023-10-09 12:01:41,652 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:01:46,071 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2751009.3333333335, ans=0.0 2023-10-09 12:02:09,980 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2023-10-09 12:02:21,907 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2751102.6666666665, ans=0.07 2023-10-09 12:02:24,043 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2751149.3333333335, ans=0.0 2023-10-09 12:02:24,701 INFO [train.py:1031] (0/4) Epoch 14, batch 4800, loss[loss=0.2298, simple_loss=0.2825, pruned_loss=0.06552, ctc_loss=0.1151, over 16749.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2917, pruned_loss=0.06322, ctc_loss=0.1108, over 3312905.40 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:02:29,352 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=15.0 2023-10-09 12:02:49,004 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2751242.6666666665, ans=0.1 2023-10-09 12:03:09,613 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:03:11,102 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-10-09 12:03:12,349 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2751289.3333333335, ans=0.035 2023-10-09 12:03:15,201 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2751336.0, ans=0.125 2023-10-09 12:03:20,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2751336.0, ans=0.0 2023-10-09 12:03:28,563 INFO [train.py:1031] (0/4) Epoch 14, batch 4850, loss[loss=0.2405, simple_loss=0.2833, pruned_loss=0.07145, ctc_loss=0.1371, over 15241.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2964, pruned_loss=0.06669, ctc_loss=0.1173, over 3308997.79 frames. ], batch size: 529, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:03:28,840 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2751382.6666666665, ans=0.0 2023-10-09 12:03:38,272 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2751382.6666666665, ans=0.125 2023-10-09 12:03:42,635 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.280e+02 3.688e+02 4.479e+02 9.310e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 12:03:58,299 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2751476.0, ans=0.0 2023-10-09 12:04:00,273 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-10-09 12:04:11,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2751522.6666666665, ans=0.0 2023-10-09 12:04:17,070 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2751522.6666666665, ans=0.0 2023-10-09 12:04:31,378 INFO [train.py:1031] (0/4) Epoch 14, batch 4900, loss[loss=0.2577, simple_loss=0.3233, pruned_loss=0.06986, ctc_loss=0.131, over 16518.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2955, pruned_loss=0.06566, ctc_loss=0.1155, over 3305722.94 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:04:59,107 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2751709.3333333335, ans=0.125 2023-10-09 12:05:05,616 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2751709.3333333335, ans=0.0 2023-10-09 12:05:13,794 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.99 vs. limit=15.0 2023-10-09 12:05:36,550 INFO [train.py:1031] (0/4) Epoch 14, batch 4950, loss[loss=0.2508, simple_loss=0.3026, pruned_loss=0.07368, ctc_loss=0.1291, over 16849.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2964, pruned_loss=0.0667, ctc_loss=0.117, over 3310158.78 frames. ], batch size: 291, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:05:51,600 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.251e+02 3.637e+02 4.222e+02 8.685e+02, threshold=7.275e+02, percent-clipped=2.0 2023-10-09 12:05:51,931 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2751896.0, ans=0.04949747468305833 2023-10-09 12:06:13,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2751989.3333333335, ans=0.0 2023-10-09 12:06:28,505 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2752036.0, ans=0.0 2023-10-09 12:06:35,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2752036.0, ans=0.0 2023-10-09 12:06:36,696 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2023-10-09 12:06:39,358 INFO [train.py:1031] (0/4) Epoch 14, batch 5000, loss[loss=0.2577, simple_loss=0.2783, pruned_loss=0.08776, ctc_loss=0.154, over 16500.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2915, pruned_loss=0.06735, ctc_loss=0.1178, over 3304286.41 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:06:39,760 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:06:44,007 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2752082.6666666665, ans=0.0 2023-10-09 12:06:55,436 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752129.3333333335, ans=0.1 2023-10-09 12:07:02,947 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2752176.0, ans=0.2 2023-10-09 12:07:41,581 INFO [train.py:1031] (0/4) Epoch 14, batch 5050, loss[loss=0.2328, simple_loss=0.3051, pruned_loss=0.05788, ctc_loss=0.1117, over 16808.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2852, pruned_loss=0.06543, ctc_loss=0.1148, over 3305488.71 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:07:47,348 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2752316.0, ans=0.1 2023-10-09 12:07:56,490 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+02 3.308e+02 3.761e+02 4.513e+02 1.207e+03, threshold=7.522e+02, percent-clipped=1.0 2023-10-09 12:07:57,839 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752362.6666666665, ans=0.1 2023-10-09 12:07:58,867 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2752362.6666666665, ans=0.2 2023-10-09 12:07:59,848 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2752362.6666666665, ans=0.0 2023-10-09 12:08:08,932 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2752409.3333333335, ans=0.125 2023-10-09 12:08:17,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2752456.0, ans=0.125 2023-10-09 12:08:42,048 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.39 vs. limit=10.0 2023-10-09 12:08:42,500 INFO [train.py:1031] (0/4) Epoch 14, batch 5100, loss[loss=0.2831, simple_loss=0.3318, pruned_loss=0.0854, ctc_loss=0.1592, over 16830.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2896, pruned_loss=0.06468, ctc_loss=0.114, over 3299827.39 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:08:49,274 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2752549.3333333335, ans=0.0 2023-10-09 12:08:51,999 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2752549.3333333335, ans=0.125 2023-10-09 12:08:55,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752596.0, ans=0.1 2023-10-09 12:09:28,275 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2752689.3333333335, ans=0.125 2023-10-09 12:09:39,459 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2752736.0, ans=0.125 2023-10-09 12:09:43,373 INFO [train.py:1031] (0/4) Epoch 14, batch 5150, loss[loss=0.2291, simple_loss=0.2719, pruned_loss=0.06912, ctc_loss=0.1203, over 16858.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.291, pruned_loss=0.06639, ctc_loss=0.1167, over 3301437.78 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:10:00,362 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.267e+02 3.761e+02 4.649e+02 7.424e+02, threshold=7.522e+02, percent-clipped=0.0 2023-10-09 12:10:02,343 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752829.3333333335, ans=0.1 2023-10-09 12:10:06,031 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2752829.3333333335, ans=0.0 2023-10-09 12:10:07,978 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-10-09 12:10:21,389 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2752922.6666666665, ans=0.125 2023-10-09 12:10:45,573 INFO [train.py:1031] (0/4) Epoch 14, batch 5200, loss[loss=0.2466, simple_loss=0.3086, pruned_loss=0.06701, ctc_loss=0.1265, over 16468.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.292, pruned_loss=0.06623, ctc_loss=0.1168, over 3302743.53 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:10:55,219 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2753016.0, ans=0.0 2023-10-09 12:10:59,081 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2753062.6666666665, ans=0.125 2023-10-09 12:11:00,009 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753062.6666666665, ans=0.1 2023-10-09 12:11:04,233 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2753062.6666666665, ans=0.125 2023-10-09 12:11:09,110 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2753062.6666666665, ans=0.125 2023-10-09 12:11:12,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753109.3333333335, ans=0.1 2023-10-09 12:11:14,327 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2753109.3333333335, ans=0.125 2023-10-09 12:11:19,157 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2753109.3333333335, ans=0.125 2023-10-09 12:11:39,794 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753202.6666666665, ans=0.1 2023-10-09 12:11:41,942 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2753202.6666666665, ans=0.2 2023-10-09 12:11:47,486 INFO [train.py:1031] (0/4) Epoch 14, batch 5250, loss[loss=0.1937, simple_loss=0.2497, pruned_loss=0.05127, ctc_loss=0.08771, over 16674.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2846, pruned_loss=0.06499, ctc_loss=0.1143, over 3301989.91 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:12:05,605 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 2.914e+02 3.261e+02 3.772e+02 6.960e+02, threshold=6.522e+02, percent-clipped=0.0 2023-10-09 12:12:35,849 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2753436.0, ans=0.2 2023-10-09 12:12:39,129 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2753436.0, ans=0.2 2023-10-09 12:12:44,602 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2753436.0, ans=0.5 2023-10-09 12:12:48,625 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:12:49,312 INFO [train.py:1031] (0/4) Epoch 14, batch 5300, loss[loss=0.2762, simple_loss=0.3431, pruned_loss=0.07588, ctc_loss=0.1435, over 16837.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2898, pruned_loss=0.06769, ctc_loss=0.1187, over 3301336.02 frames. ], batch size: 242, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:12:49,724 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2753482.6666666665, ans=0.125 2023-10-09 12:12:54,214 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2753482.6666666665, ans=0.95 2023-10-09 12:12:58,759 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-10-09 12:13:06,226 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2753529.3333333335, ans=0.0 2023-10-09 12:13:12,416 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2753529.3333333335, ans=0.125 2023-10-09 12:13:24,510 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2753576.0, ans=0.125 2023-10-09 12:13:34,144 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2753622.6666666665, ans=0.125 2023-10-09 12:13:38,105 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2753622.6666666665, ans=0.125 2023-10-09 12:13:42,550 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=22.5 2023-10-09 12:13:51,900 INFO [train.py:1031] (0/4) Epoch 14, batch 5350, loss[loss=0.2592, simple_loss=0.3202, pruned_loss=0.07329, ctc_loss=0.129, over 16775.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.2998, pruned_loss=0.07101, ctc_loss=0.1243, over 3301298.68 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:14:12,278 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+02 3.646e+02 4.307e+02 5.553e+02 1.031e+03, threshold=8.614e+02, percent-clipped=13.0 2023-10-09 12:14:12,664 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753762.6666666665, ans=0.1 2023-10-09 12:14:22,450 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2023-10-09 12:14:28,870 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2753856.0, ans=0.0 2023-10-09 12:14:43,993 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2753902.6666666665, ans=0.0 2023-10-09 12:14:44,970 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2753902.6666666665, ans=0.0 2023-10-09 12:14:51,873 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-10-09 12:14:52,590 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2753902.6666666665, ans=0.2 2023-10-09 12:14:54,878 INFO [train.py:1031] (0/4) Epoch 14, batch 5400, loss[loss=0.213, simple_loss=0.2912, pruned_loss=0.05053, ctc_loss=0.08468, over 16743.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.3, pruned_loss=0.07083, ctc_loss=0.1239, over 3297236.75 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:15:23,287 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2754042.6666666665, ans=0.125 2023-10-09 12:15:27,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754042.6666666665, ans=0.1 2023-10-09 12:15:34,905 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2754089.3333333335, ans=10.0 2023-10-09 12:15:43,926 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2754136.0, ans=0.125 2023-10-09 12:15:48,905 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2754136.0, ans=0.2 2023-10-09 12:15:53,767 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754136.0, ans=0.1 2023-10-09 12:15:55,502 INFO [train.py:1031] (0/4) Epoch 14, batch 5450, loss[loss=0.1957, simple_loss=0.2403, pruned_loss=0.0561, ctc_loss=0.09737, over 16632.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2921, pruned_loss=0.06906, ctc_loss=0.1208, over 3292745.92 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:16:04,927 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.45 vs. limit=6.0 2023-10-09 12:16:16,184 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.013e+02 3.420e+02 3.920e+02 8.304e+02, threshold=6.840e+02, percent-clipped=0.0 2023-10-09 12:16:16,946 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2754229.3333333335, ans=0.125 2023-10-09 12:16:19,613 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2754276.0, ans=0.05 2023-10-09 12:16:22,950 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2754276.0, ans=0.0 2023-10-09 12:16:23,107 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-10-09 12:16:25,990 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2754276.0, ans=0.125 2023-10-09 12:16:35,750 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:16:48,077 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=22.5 2023-10-09 12:16:54,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2754369.3333333335, ans=0.125 2023-10-09 12:16:57,618 INFO [train.py:1031] (0/4) Epoch 14, batch 5500, loss[loss=0.2862, simple_loss=0.3106, pruned_loss=0.09787, ctc_loss=0.1649, over 16587.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.288, pruned_loss=0.06906, ctc_loss=0.1207, over 3298378.15 frames. ], batch size: 418, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:16:58,385 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=6.0 2023-10-09 12:17:10,548 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2754462.6666666665, ans=0.125 2023-10-09 12:17:17,826 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-10-09 12:17:21,320 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754509.3333333335, ans=0.1 2023-10-09 12:17:42,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754556.0, ans=0.1 2023-10-09 12:17:50,097 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2754602.6666666665, ans=0.125 2023-10-09 12:17:58,544 INFO [train.py:1031] (0/4) Epoch 14, batch 5550, loss[loss=0.2169, simple_loss=0.2961, pruned_loss=0.0488, ctc_loss=0.1001, over 16875.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2897, pruned_loss=0.06865, ctc_loss=0.1204, over 3300714.07 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:18:08,454 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:18:10,688 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2754696.0, ans=0.125 2023-10-09 12:18:18,621 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+02 3.038e+02 3.521e+02 4.365e+02 6.662e+02, threshold=7.043e+02, percent-clipped=0.0 2023-10-09 12:18:22,109 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2754742.6666666665, ans=0.125 2023-10-09 12:18:24,236 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2754742.6666666665, ans=0.125 2023-10-09 12:18:33,145 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-10-09 12:18:48,953 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2754836.0, ans=0.125 2023-10-09 12:18:51,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2754836.0, ans=0.125 2023-10-09 12:18:55,221 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2754836.0, ans=0.125 2023-10-09 12:18:59,758 INFO [train.py:1031] (0/4) Epoch 14, batch 5600, loss[loss=0.2035, simple_loss=0.2609, pruned_loss=0.05306, ctc_loss=0.09968, over 16917.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2878, pruned_loss=0.06664, ctc_loss=0.1173, over 3297030.99 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:19:00,118 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2754882.6666666665, ans=0.125 2023-10-09 12:19:06,988 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2754882.6666666665, ans=0.125 2023-10-09 12:19:07,136 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.31 vs. limit=10.0 2023-10-09 12:20:00,859 INFO [train.py:1031] (0/4) Epoch 14, batch 5650, loss[loss=0.2299, simple_loss=0.293, pruned_loss=0.06255, ctc_loss=0.1039, over 16793.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2856, pruned_loss=0.06628, ctc_loss=0.1167, over 3301579.85 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:20:06,475 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2755116.0, ans=0.0 2023-10-09 12:20:08,460 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2755116.0, ans=0.2 2023-10-09 12:20:15,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2755162.6666666665, ans=0.125 2023-10-09 12:20:18,244 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=22.5 2023-10-09 12:20:21,984 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2755162.6666666665, ans=0.0 2023-10-09 12:20:22,587 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+02 3.073e+02 3.464e+02 4.035e+02 6.010e+02, threshold=6.928e+02, percent-clipped=0.0 2023-10-09 12:20:51,947 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2755302.6666666665, ans=0.0 2023-10-09 12:20:51,949 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2755302.6666666665, ans=0.125 2023-10-09 12:21:00,527 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2755349.3333333335, ans=10.0 2023-10-09 12:21:01,266 INFO [train.py:1031] (0/4) Epoch 14, batch 5700, loss[loss=0.2297, simple_loss=0.2625, pruned_loss=0.07069, ctc_loss=0.1387, over 15531.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2842, pruned_loss=0.06632, ctc_loss=0.1165, over 3306635.50 frames. ], batch size: 529, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:21:10,924 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:21:40,737 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2755489.3333333335, ans=0.125 2023-10-09 12:22:04,413 INFO [train.py:1031] (0/4) Epoch 14, batch 5750, loss[loss=0.2414, simple_loss=0.3055, pruned_loss=0.06547, ctc_loss=0.116, over 16251.00 frames. ], tot_loss[loss=0.225, simple_loss=0.28, pruned_loss=0.06285, ctc_loss=0.1109, over 3303262.83 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:22:28,327 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 3.061e+02 3.546e+02 4.294e+02 7.342e+02, threshold=7.092e+02, percent-clipped=2.0 2023-10-09 12:22:32,607 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2755676.0, ans=0.125 2023-10-09 12:22:43,704 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2755722.6666666665, ans=0.125 2023-10-09 12:22:56,888 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-10-09 12:23:07,367 INFO [train.py:1031] (0/4) Epoch 14, batch 5800, loss[loss=0.2464, simple_loss=0.2877, pruned_loss=0.07761, ctc_loss=0.1245, over 16555.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2831, pruned_loss=0.0644, ctc_loss=0.1137, over 3310223.22 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:23:09,701 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2755816.0, ans=0.0 2023-10-09 12:23:10,086 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=22.5 2023-10-09 12:23:11,259 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-10-09 12:23:16,935 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2755816.0, ans=0.04949747468305833 2023-10-09 12:23:23,792 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2755862.6666666665, ans=0.125 2023-10-09 12:23:29,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2755909.3333333335, ans=0.125 2023-10-09 12:23:52,440 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2755956.0, ans=0.125 2023-10-09 12:24:06,438 INFO [train.py:1031] (0/4) Epoch 14, batch 5850, loss[loss=0.215, simple_loss=0.251, pruned_loss=0.06789, ctc_loss=0.108, over 16814.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2795, pruned_loss=0.06498, ctc_loss=0.1147, over 3301064.17 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:24:21,289 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2756096.0, ans=0.125 2023-10-09 12:24:31,816 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.138e+02 3.574e+02 4.171e+02 9.183e+02, threshold=7.147e+02, percent-clipped=2.0 2023-10-09 12:24:48,735 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=2756189.3333333335, ans=0.2 2023-10-09 12:25:05,812 INFO [train.py:1031] (0/4) Epoch 14, batch 5900, loss[loss=0.2223, simple_loss=0.2709, pruned_loss=0.06578, ctc_loss=0.1051, over 16503.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2754, pruned_loss=0.06476, ctc_loss=0.1137, over 3282881.64 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:25:35,810 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2756376.0, ans=0.125 2023-10-09 12:25:55,848 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-10-09 12:26:06,818 INFO [train.py:1031] (0/4) Epoch 14, batch 5950, loss[loss=0.2179, simple_loss=0.2683, pruned_loss=0.06221, ctc_loss=0.1077, over 16733.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.275, pruned_loss=0.0653, ctc_loss=0.1149, over 3281596.45 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:26:15,160 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2756516.0, ans=0.0 2023-10-09 12:26:17,283 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2756562.6666666665, ans=0.2 2023-10-09 12:26:31,791 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 3.192e+02 3.462e+02 4.093e+02 6.652e+02, threshold=6.925e+02, percent-clipped=0.0 2023-10-09 12:26:37,360 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:26:50,328 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2756656.0, ans=0.2 2023-10-09 12:27:06,816 INFO [train.py:1031] (0/4) Epoch 14, batch 6000, loss[loss=0.2023, simple_loss=0.3047, pruned_loss=0.03606, ctc_loss=0.0696, over 16312.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2803, pruned_loss=0.06468, ctc_loss=0.114, over 3275673.13 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:27:06,817 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 12:27:23,521 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2297, simple_loss=0.3012, pruned_loss=0.0607, ctc_loss=0.09172, over 1796401.00 frames. 2023-10-09 12:27:23,522 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14582MB 2023-10-09 12:27:45,468 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-10-09 12:27:46,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2756796.0, ans=0.025 2023-10-09 12:28:22,260 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2756936.0, ans=0.2 2023-10-09 12:28:23,223 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2756982.6666666665, ans=0.95 2023-10-09 12:28:23,964 INFO [train.py:1031] (0/4) Epoch 14, batch 6050, loss[loss=0.1904, simple_loss=0.2481, pruned_loss=0.04949, ctc_loss=0.08432, over 16755.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2731, pruned_loss=0.06079, ctc_loss=0.1075, over 3275108.19 frames. ], batch size: 242, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:28:31,877 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2756982.6666666665, ans=0.0 2023-10-09 12:28:41,676 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2757029.3333333335, ans=0.1 2023-10-09 12:28:49,956 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2757076.0, ans=0.0 2023-10-09 12:28:49,961 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2757076.0, ans=0.0 2023-10-09 12:28:52,470 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.832e+02 3.391e+02 4.153e+02 6.756e+02, threshold=6.782e+02, percent-clipped=0.0 2023-10-09 12:29:12,150 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2757169.3333333335, ans=0.0 2023-10-09 12:29:13,683 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2023-10-09 12:29:24,169 INFO [train.py:1031] (0/4) Epoch 14, batch 6100, loss[loss=0.2166, simple_loss=0.271, pruned_loss=0.05999, ctc_loss=0.1053, over 16826.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2696, pruned_loss=0.06108, ctc_loss=0.108, over 3275276.01 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:29:29,485 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757216.0, ans=0.0 2023-10-09 12:29:47,705 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2757262.6666666665, ans=0.125 2023-10-09 12:29:53,022 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2757309.3333333335, ans=0.125 2023-10-09 12:29:58,681 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2757309.3333333335, ans=0.0 2023-10-09 12:30:01,577 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2757356.0, ans=0.2 2023-10-09 12:30:01,982 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-10-09 12:30:07,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2757356.0, ans=0.0 2023-10-09 12:30:25,863 INFO [train.py:1031] (0/4) Epoch 14, batch 6150, loss[loss=0.2083, simple_loss=0.2878, pruned_loss=0.04631, ctc_loss=0.09074, over 16300.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2745, pruned_loss=0.05956, ctc_loss=0.1058, over 3274070.28 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:30:42,377 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2023-10-09 12:30:48,703 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2757496.0, ans=0.1 2023-10-09 12:30:56,530 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 2.946e+02 3.372e+02 3.986e+02 9.785e+02, threshold=6.744e+02, percent-clipped=2.0 2023-10-09 12:31:03,758 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2757589.3333333335, ans=0.125 2023-10-09 12:31:16,678 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2757636.0, ans=0.125 2023-10-09 12:31:18,374 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-10-09 12:31:26,108 INFO [train.py:1031] (0/4) Epoch 14, batch 6200, loss[loss=0.233, simple_loss=0.2783, pruned_loss=0.07016, ctc_loss=0.1186, over 16781.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2761, pruned_loss=0.06042, ctc_loss=0.107, over 3268421.65 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:31:29,157 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2757682.6666666665, ans=0.125 2023-10-09 12:31:34,843 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757682.6666666665, ans=0.0 2023-10-09 12:31:36,904 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2757682.6666666665, ans=0.125 2023-10-09 12:31:43,899 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2757729.3333333335, ans=0.0 2023-10-09 12:31:44,119 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=22.5 2023-10-09 12:31:46,451 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2757729.3333333335, ans=0.125 2023-10-09 12:32:06,865 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2757822.6666666665, ans=0.0 2023-10-09 12:32:08,299 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-10-09 12:32:10,062 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2757822.6666666665, ans=0.125 2023-10-09 12:32:11,096 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2757822.6666666665, ans=0.2 2023-10-09 12:32:25,320 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2757916.0, ans=0.0 2023-10-09 12:32:25,366 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2757916.0, ans=0.0 2023-10-09 12:32:26,032 INFO [train.py:1031] (0/4) Epoch 14, batch 6250, loss[loss=0.2061, simple_loss=0.2896, pruned_loss=0.04418, ctc_loss=0.0853, over 16828.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2771, pruned_loss=0.06023, ctc_loss=0.1065, over 3282925.26 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:32:34,563 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-10-09 12:32:40,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2757962.6666666665, ans=0.0 2023-10-09 12:32:55,891 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 2.973e+02 3.416e+02 4.008e+02 8.801e+02, threshold=6.831e+02, percent-clipped=1.0 2023-10-09 12:33:13,305 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2758102.6666666665, ans=0.125 2023-10-09 12:33:26,290 INFO [train.py:1031] (0/4) Epoch 14, batch 6300, loss[loss=0.1827, simple_loss=0.2569, pruned_loss=0.04008, ctc_loss=0.07086, over 16795.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2763, pruned_loss=0.05637, ctc_loss=0.1003, over 3292520.94 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:33:33,388 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2758149.3333333335, ans=0.1 2023-10-09 12:33:53,241 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2758242.6666666665, ans=0.0 2023-10-09 12:34:12,579 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2758289.3333333335, ans=0.0 2023-10-09 12:34:28,480 INFO [train.py:1031] (0/4) Epoch 14, batch 6350, loss[loss=0.2331, simple_loss=0.2819, pruned_loss=0.06717, ctc_loss=0.1248, over 15305.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2767, pruned_loss=0.0573, ctc_loss=0.1017, over 3287228.61 frames. ], batch size: 529, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:34:30,228 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.34 vs. limit=6.0 2023-10-09 12:34:35,971 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2758382.6666666665, ans=0.04949747468305833 2023-10-09 12:34:37,721 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.04 vs. limit=15.0 2023-10-09 12:34:51,833 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2758429.3333333335, ans=0.125 2023-10-09 12:34:59,013 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2758476.0, ans=0.125 2023-10-09 12:35:00,733 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 3.028e+02 3.570e+02 4.973e+02 1.101e+03, threshold=7.141e+02, percent-clipped=8.0 2023-10-09 12:35:11,265 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2758522.6666666665, ans=0.125 2023-10-09 12:35:32,011 INFO [train.py:1031] (0/4) Epoch 14, batch 6400, loss[loss=0.2085, simple_loss=0.2506, pruned_loss=0.06251, ctc_loss=0.1035, over 16549.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2807, pruned_loss=0.05854, ctc_loss=0.1038, over 3294834.41 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:35:35,315 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2758616.0, ans=0.05 2023-10-09 12:36:01,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2758709.3333333335, ans=0.125 2023-10-09 12:36:01,456 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:36:11,098 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2758756.0, ans=0.125 2023-10-09 12:36:18,077 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=15.0 2023-10-09 12:36:34,414 INFO [train.py:1031] (0/4) Epoch 14, batch 6450, loss[loss=0.2386, simple_loss=0.2913, pruned_loss=0.06925, ctc_loss=0.1183, over 16822.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2924, pruned_loss=0.06124, ctc_loss=0.1083, over 3294689.37 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:36:34,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2758849.3333333335, ans=0.0 2023-10-09 12:37:05,228 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-10-09 12:37:07,767 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2758942.6666666665, ans=0.125 2023-10-09 12:37:09,127 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.527e+02 4.142e+02 5.294e+02 1.315e+03, threshold=8.284e+02, percent-clipped=10.0 2023-10-09 12:37:24,847 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2759036.0, ans=0.125 2023-10-09 12:37:25,876 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2759036.0, ans=0.0 2023-10-09 12:37:26,835 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2759036.0, ans=0.125 2023-10-09 12:37:31,767 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2759036.0, ans=0.1 2023-10-09 12:37:33,350 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=22.5 2023-10-09 12:37:37,501 INFO [train.py:1031] (0/4) Epoch 14, batch 6500, loss[loss=0.2152, simple_loss=0.2864, pruned_loss=0.05257, ctc_loss=0.09698, over 16816.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.297, pruned_loss=0.06434, ctc_loss=0.1136, over 3280813.00 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:37:43,347 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2759082.6666666665, ans=0.125 2023-10-09 12:37:47,523 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2759082.6666666665, ans=0.125 2023-10-09 12:37:56,947 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2759129.3333333335, ans=0.05 2023-10-09 12:38:08,770 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2759176.0, ans=10.0 2023-10-09 12:38:15,507 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2759222.6666666665, ans=0.125 2023-10-09 12:38:25,947 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2759269.3333333335, ans=0.0 2023-10-09 12:38:35,707 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2759269.3333333335, ans=0.0 2023-10-09 12:38:39,324 INFO [train.py:1031] (0/4) Epoch 14, batch 6550, loss[loss=0.2276, simple_loss=0.3191, pruned_loss=0.04887, ctc_loss=0.09585, over 16841.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2958, pruned_loss=0.06211, ctc_loss=0.1098, over 3275431.99 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:38:53,126 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2759362.6666666665, ans=0.125 2023-10-09 12:39:13,987 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+02 3.110e+02 3.501e+02 4.783e+02 9.305e+02, threshold=7.003e+02, percent-clipped=1.0 2023-10-09 12:39:41,349 INFO [train.py:1031] (0/4) Epoch 14, batch 6600, loss[loss=0.2089, simple_loss=0.2628, pruned_loss=0.0579, ctc_loss=0.09781, over 16925.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2922, pruned_loss=0.06134, ctc_loss=0.1083, over 3280969.29 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:39:44,420 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:39:54,312 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2023-10-09 12:40:38,108 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2759736.0, ans=0.0 2023-10-09 12:40:43,319 INFO [train.py:1031] (0/4) Epoch 14, batch 6650, loss[loss=0.2189, simple_loss=0.29, pruned_loss=0.0537, ctc_loss=0.1009, over 16286.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2855, pruned_loss=0.06094, ctc_loss=0.1075, over 3286792.61 frames. ], batch size: 466, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:40:54,892 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759829.3333333335, ans=0.1 2023-10-09 12:41:07,948 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2759876.0, ans=0.125 2023-10-09 12:41:18,756 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.057e+02 3.351e+02 3.889e+02 6.888e+02, threshold=6.703e+02, percent-clipped=0.0 2023-10-09 12:41:33,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2759969.3333333335, ans=0.025 2023-10-09 12:41:35,754 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2759969.3333333335, ans=0.125 2023-10-09 12:41:45,278 INFO [train.py:1031] (0/4) Epoch 14, batch 6700, loss[loss=0.2511, simple_loss=0.3299, pruned_loss=0.06232, ctc_loss=0.1193, over 16758.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2866, pruned_loss=0.06057, ctc_loss=0.107, over 3277310.11 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:42:08,734 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2760062.6666666665, ans=0.0 2023-10-09 12:42:26,895 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2760156.0, ans=0.0 2023-10-09 12:42:27,965 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2760156.0, ans=0.125 2023-10-09 12:42:39,498 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2760202.6666666665, ans=0.125 2023-10-09 12:42:45,802 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=22.5 2023-10-09 12:42:48,656 INFO [train.py:1031] (0/4) Epoch 14, batch 6750, loss[loss=0.299, simple_loss=0.405, pruned_loss=0.07058, ctc_loss=0.1295, over 15088.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.3007, pruned_loss=0.0642, ctc_loss=0.1143, over 3276945.92 frames. ], batch size: 527, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:43:17,080 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2760342.6666666665, ans=0.0 2023-10-09 12:43:23,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2760342.6666666665, ans=0.0 2023-10-09 12:43:25,357 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 3.277e+02 3.937e+02 4.777e+02 6.969e+02, threshold=7.873e+02, percent-clipped=1.0 2023-10-09 12:43:49,734 INFO [train.py:1031] (0/4) Epoch 14, batch 6800, loss[loss=0.2658, simple_loss=0.3001, pruned_loss=0.08567, ctc_loss=0.1503, over 16595.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.3009, pruned_loss=0.06623, ctc_loss=0.1179, over 3281338.33 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:43:50,071 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2760482.6666666665, ans=0.125 2023-10-09 12:43:58,673 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2760482.6666666665, ans=0.1 2023-10-09 12:43:59,646 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2760482.6666666665, ans=0.125 2023-10-09 12:44:17,915 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2760576.0, ans=6.0 2023-10-09 12:44:19,864 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2760576.0, ans=0.125 2023-10-09 12:44:25,291 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2760576.0, ans=0.125 2023-10-09 12:44:32,450 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2760622.6666666665, ans=0.0 2023-10-09 12:44:34,428 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2760622.6666666665, ans=0.125 2023-10-09 12:44:51,404 INFO [train.py:1031] (0/4) Epoch 14, batch 6850, loss[loss=0.183, simple_loss=0.2492, pruned_loss=0.04328, ctc_loss=0.0756, over 16724.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2975, pruned_loss=0.06454, ctc_loss=0.1147, over 3281018.48 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:44:56,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2760716.0, ans=0.0 2023-10-09 12:45:03,744 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2760762.6666666665, ans=0.0 2023-10-09 12:45:08,093 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2760762.6666666665, ans=0.5 2023-10-09 12:45:28,872 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.178e+02 3.827e+02 4.505e+02 1.079e+03, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 12:45:48,911 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2760902.6666666665, ans=0.1 2023-10-09 12:45:54,910 INFO [train.py:1031] (0/4) Epoch 14, batch 6900, loss[loss=0.2875, simple_loss=0.3382, pruned_loss=0.08923, ctc_loss=0.1458, over 16995.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2991, pruned_loss=0.06541, ctc_loss=0.1165, over 3279959.07 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 12:46:03,530 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2760949.3333333335, ans=0.0 2023-10-09 12:46:07,626 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2023-10-09 12:46:15,464 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2760996.0, ans=0.125 2023-10-09 12:46:49,402 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2761136.0, ans=0.0 2023-10-09 12:46:56,001 INFO [train.py:1031] (0/4) Epoch 14, batch 6950, loss[loss=0.2015, simple_loss=0.2876, pruned_loss=0.04134, ctc_loss=0.08206, over 16804.00 frames. ], tot_loss[loss=0.241, simple_loss=0.3006, pruned_loss=0.06692, ctc_loss=0.1186, over 3293239.22 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:47:01,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2761182.6666666665, ans=0.125 2023-10-09 12:47:34,797 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.216e+02 3.562e+02 4.288e+02 5.901e+02, threshold=7.125e+02, percent-clipped=0.0 2023-10-09 12:47:55,787 INFO [train.py:1031] (0/4) Epoch 14, batch 7000, loss[loss=0.2408, simple_loss=0.2808, pruned_loss=0.07483, ctc_loss=0.1281, over 16681.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.296, pruned_loss=0.06515, ctc_loss=0.1161, over 3302684.23 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:48:18,841 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-10-09 12:48:24,320 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2761509.3333333335, ans=0.125 2023-10-09 12:48:39,048 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:48:47,542 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=22.5 2023-10-09 12:48:49,771 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2761602.6666666665, ans=0.2 2023-10-09 12:48:49,856 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2761602.6666666665, ans=0.1 2023-10-09 12:48:56,169 INFO [train.py:1031] (0/4) Epoch 14, batch 7050, loss[loss=0.1729, simple_loss=0.2329, pruned_loss=0.04158, ctc_loss=0.07437, over 16827.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2892, pruned_loss=0.06383, ctc_loss=0.1137, over 3307649.46 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:49:04,534 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2023-10-09 12:49:05,013 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2761649.3333333335, ans=0.125 2023-10-09 12:49:10,619 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2023-10-09 12:49:12,430 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2761696.0, ans=0.125 2023-10-09 12:49:35,169 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-10-09 12:49:38,088 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.795e+02 3.132e+02 3.641e+02 6.976e+02, threshold=6.264e+02, percent-clipped=0.0 2023-10-09 12:49:41,061 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2761789.3333333335, ans=0.0 2023-10-09 12:49:41,125 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2761789.3333333335, ans=0.0 2023-10-09 12:49:43,190 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2761789.3333333335, ans=0.125 2023-10-09 12:49:44,250 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2761836.0, ans=0.1 2023-10-09 12:49:57,633 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2761882.6666666665, ans=0.0 2023-10-09 12:49:58,341 INFO [train.py:1031] (0/4) Epoch 14, batch 7100, loss[loss=0.2603, simple_loss=0.273, pruned_loss=0.09104, ctc_loss=0.1637, over 16608.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2801, pruned_loss=0.06175, ctc_loss=0.1097, over 3307202.29 frames. ], batch size: 386, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:50:00,830 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2761882.6666666665, ans=0.0 2023-10-09 12:50:25,360 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2761976.0, ans=0.09899494936611666 2023-10-09 12:50:39,494 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2762022.6666666665, ans=0.2 2023-10-09 12:50:51,292 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2762069.3333333335, ans=0.0 2023-10-09 12:50:57,249 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2762069.3333333335, ans=0.0 2023-10-09 12:50:59,430 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2762116.0, ans=0.125 2023-10-09 12:51:00,099 INFO [train.py:1031] (0/4) Epoch 14, batch 7150, loss[loss=0.1991, simple_loss=0.2269, pruned_loss=0.06238, ctc_loss=0.1165, over 15475.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2732, pruned_loss=0.06087, ctc_loss=0.1078, over 3314449.64 frames. ], batch size: 529, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:51:20,484 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2762162.6666666665, ans=0.2 2023-10-09 12:51:26,204 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2762209.3333333335, ans=15.0 2023-10-09 12:51:43,559 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.162e+02 3.626e+02 4.175e+02 1.632e+03, threshold=7.251e+02, percent-clipped=2.0 2023-10-09 12:51:59,130 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2762302.6666666665, ans=0.1 2023-10-09 12:52:00,868 INFO [train.py:1031] (0/4) Epoch 14, batch 7200, loss[loss=0.1895, simple_loss=0.2269, pruned_loss=0.05809, ctc_loss=0.08972, over 16391.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2742, pruned_loss=0.0629, ctc_loss=0.1111, over 3322001.66 frames. ], batch size: 70, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:52:19,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2762396.0, ans=0.2 2023-10-09 12:52:22,279 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2762396.0, ans=0.125 2023-10-09 12:52:45,134 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=8.0 2023-10-09 12:53:03,088 INFO [train.py:1031] (0/4) Epoch 14, batch 7250, loss[loss=0.2023, simple_loss=0.2768, pruned_loss=0.04757, ctc_loss=0.08188, over 16826.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2762, pruned_loss=0.0636, ctc_loss=0.1121, over 3323492.47 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:53:07,667 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2762582.6666666665, ans=0.2 2023-10-09 12:53:17,790 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=22.5 2023-10-09 12:53:23,754 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-592000.pt 2023-10-09 12:53:49,738 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+02 3.091e+02 3.554e+02 4.025e+02 7.139e+02, threshold=7.107e+02, percent-clipped=0.0 2023-10-09 12:53:54,328 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=2762769.3333333335, ans=0.02 2023-10-09 12:53:54,390 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2762769.3333333335, ans=0.1 2023-10-09 12:54:02,520 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2762769.3333333335, ans=0.0 2023-10-09 12:54:02,575 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2762769.3333333335, ans=0.125 2023-10-09 12:54:07,975 INFO [train.py:1031] (0/4) Epoch 14, batch 7300, loss[loss=0.2363, simple_loss=0.2871, pruned_loss=0.06874, ctc_loss=0.1198, over 16774.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2784, pruned_loss=0.06194, ctc_loss=0.1097, over 3307480.87 frames. ], batch size: 291, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:54:09,564 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-10-09 12:54:39,172 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-10-09 12:54:56,955 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2763002.6666666665, ans=0.125 2023-10-09 12:55:00,026 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2763002.6666666665, ans=0.125 2023-10-09 12:55:07,497 INFO [train.py:1031] (0/4) Epoch 14, batch 7350, loss[loss=0.2674, simple_loss=0.3048, pruned_loss=0.08673, ctc_loss=0.1411, over 16598.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2776, pruned_loss=0.06263, ctc_loss=0.1101, over 3303440.34 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:55:28,095 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2763096.0, ans=0.125 2023-10-09 12:55:38,244 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=22.5 2023-10-09 12:55:47,442 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2023-10-09 12:55:49,261 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2763189.3333333335, ans=0.125 2023-10-09 12:55:50,542 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.022e+02 3.585e+02 4.093e+02 1.089e+03, threshold=7.169e+02, percent-clipped=4.0 2023-10-09 12:56:07,715 INFO [train.py:1031] (0/4) Epoch 14, batch 7400, loss[loss=0.2381, simple_loss=0.2877, pruned_loss=0.0702, ctc_loss=0.1205, over 16790.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.279, pruned_loss=0.06356, ctc_loss=0.1112, over 3308722.18 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:56:16,958 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-10-09 12:56:32,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2763376.0, ans=0.125 2023-10-09 12:56:33,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2763376.0, ans=0.125 2023-10-09 12:56:39,132 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2763376.0, ans=0.125 2023-10-09 12:56:43,873 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2763422.6666666665, ans=0.125 2023-10-09 12:56:59,635 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.20 vs. limit=10.0 2023-10-09 12:57:00,297 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2763469.3333333335, ans=0.0 2023-10-09 12:57:04,319 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.28 vs. limit=6.0 2023-10-09 12:57:09,478 INFO [train.py:1031] (0/4) Epoch 14, batch 7450, loss[loss=0.2239, simple_loss=0.2974, pruned_loss=0.05595, ctc_loss=0.09627, over 16756.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.285, pruned_loss=0.06341, ctc_loss=0.1113, over 3305186.12 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:57:19,971 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2763516.0, ans=0.0 2023-10-09 12:57:22,640 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2763562.6666666665, ans=0.125 2023-10-09 12:57:37,177 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2763609.3333333335, ans=0.0 2023-10-09 12:57:38,344 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2763609.3333333335, ans=10.0 2023-10-09 12:57:58,526 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 3.056e+02 3.585e+02 4.525e+02 9.951e+02, threshold=7.170e+02, percent-clipped=3.0 2023-10-09 12:57:59,051 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2763656.0, ans=10.0 2023-10-09 12:58:00,033 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2763702.6666666665, ans=0.125 2023-10-09 12:58:08,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2763702.6666666665, ans=0.1 2023-10-09 12:58:12,843 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2763749.3333333335, ans=0.2 2023-10-09 12:58:13,552 INFO [train.py:1031] (0/4) Epoch 14, batch 7500, loss[loss=0.1982, simple_loss=0.2718, pruned_loss=0.04506, ctc_loss=0.0861, over 16673.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2866, pruned_loss=0.06066, ctc_loss=0.1078, over 3301297.94 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:58:22,185 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2023-10-09 12:58:42,348 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2763842.6666666665, ans=0.0 2023-10-09 12:58:45,859 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2763842.6666666665, ans=0.125 2023-10-09 12:58:47,205 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=12.0 2023-10-09 12:58:52,131 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-10-09 12:58:56,404 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2763889.3333333335, ans=0.125 2023-10-09 12:59:07,361 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2763936.0, ans=0.05 2023-10-09 12:59:13,831 INFO [train.py:1031] (0/4) Epoch 14, batch 7550, loss[loss=0.1949, simple_loss=0.2584, pruned_loss=0.04875, ctc_loss=0.0847, over 16780.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2855, pruned_loss=0.05854, ctc_loss=0.1041, over 3297571.54 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:59:23,963 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2763982.6666666665, ans=0.125 2023-10-09 12:59:51,020 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2764122.6666666665, ans=0.0 2023-10-09 12:59:59,785 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+02 3.477e+02 4.054e+02 5.168e+02 9.952e+02, threshold=8.108e+02, percent-clipped=5.0 2023-10-09 13:00:03,076 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-10-09 13:00:04,926 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2764169.3333333335, ans=0.0 2023-10-09 13:00:14,753 INFO [train.py:1031] (0/4) Epoch 14, batch 7600, loss[loss=0.1882, simple_loss=0.2261, pruned_loss=0.05613, ctc_loss=0.09519, over 10684.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2847, pruned_loss=0.06006, ctc_loss=0.1061, over 3296581.91 frames. ], batch size: 37, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:00:19,223 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-10-09 13:00:20,901 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2764216.0, ans=0.2 2023-10-09 13:00:25,256 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2764216.0, ans=0.125 2023-10-09 13:00:25,422 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-10-09 13:00:26,312 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2764262.6666666665, ans=0.125 2023-10-09 13:00:26,357 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2764262.6666666665, ans=0.125 2023-10-09 13:01:15,035 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2764449.3333333335, ans=0.2 2023-10-09 13:01:15,236 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-10-09 13:01:16,412 INFO [train.py:1031] (0/4) Epoch 14, batch 7650, loss[loss=0.2444, simple_loss=0.2938, pruned_loss=0.07329, ctc_loss=0.121, over 16996.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2851, pruned_loss=0.06271, ctc_loss=0.1101, over 3310121.89 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:01:35,926 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2764496.0, ans=0.125 2023-10-09 13:01:43,696 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2764542.6666666665, ans=0.125 2023-10-09 13:02:03,962 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.219e+02 3.717e+02 4.421e+02 1.818e+03, threshold=7.434e+02, percent-clipped=3.0 2023-10-09 13:02:05,329 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2764636.0, ans=0.125 2023-10-09 13:02:16,470 INFO [train.py:1031] (0/4) Epoch 14, batch 7700, loss[loss=0.1671, simple_loss=0.2404, pruned_loss=0.0343, ctc_loss=0.06321, over 16679.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2834, pruned_loss=0.06225, ctc_loss=0.1093, over 3310802.34 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:02:22,702 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2764682.6666666665, ans=0.0 2023-10-09 13:02:28,022 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2764729.3333333335, ans=0.125 2023-10-09 13:02:35,572 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764729.3333333335, ans=0.1 2023-10-09 13:03:01,437 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2764822.6666666665, ans=0.125 2023-10-09 13:03:17,588 INFO [train.py:1031] (0/4) Epoch 14, batch 7750, loss[loss=0.1977, simple_loss=0.2561, pruned_loss=0.0519, ctc_loss=0.08869, over 16967.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.282, pruned_loss=0.06281, ctc_loss=0.11, over 3312093.50 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:03:22,010 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:03:25,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2764916.0, ans=0.125 2023-10-09 13:03:27,371 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2764916.0, ans=0.125 2023-10-09 13:04:04,665 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2765056.0, ans=0.125 2023-10-09 13:04:08,220 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.190e+02 3.453e+02 4.126e+02 8.582e+02, threshold=6.905e+02, percent-clipped=1.0 2023-10-09 13:04:20,483 INFO [train.py:1031] (0/4) Epoch 14, batch 7800, loss[loss=0.2215, simple_loss=0.2725, pruned_loss=0.06464, ctc_loss=0.1032, over 16798.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2812, pruned_loss=0.06369, ctc_loss=0.1106, over 3318196.95 frames. ], batch size: 201, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:04:29,763 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-10-09 13:04:33,375 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2765196.0, ans=0.2 2023-10-09 13:04:37,161 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2765196.0, ans=0.0 2023-10-09 13:04:46,432 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2765242.6666666665, ans=0.1 2023-10-09 13:04:49,178 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2765242.6666666665, ans=0.0 2023-10-09 13:05:20,497 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2765336.0, ans=0.1 2023-10-09 13:05:23,385 INFO [train.py:1031] (0/4) Epoch 14, batch 7850, loss[loss=0.2831, simple_loss=0.3511, pruned_loss=0.08045, ctc_loss=0.1357, over 16817.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2846, pruned_loss=0.06408, ctc_loss=0.1101, over 3296105.00 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:05:37,798 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2765429.3333333335, ans=0.125 2023-10-09 13:06:05,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2765522.6666666665, ans=0.2 2023-10-09 13:06:05,874 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-10-09 13:06:07,801 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2765522.6666666665, ans=0.2 2023-10-09 13:06:14,910 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+02 3.051e+02 3.784e+02 4.491e+02 1.708e+03, threshold=7.568e+02, percent-clipped=4.0 2023-10-09 13:06:26,265 INFO [train.py:1031] (0/4) Epoch 14, batch 7900, loss[loss=0.2909, simple_loss=0.3435, pruned_loss=0.08792, ctc_loss=0.1562, over 16574.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2899, pruned_loss=0.06351, ctc_loss=0.1098, over 3298335.71 frames. ], batch size: 350, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:06:53,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2765709.3333333335, ans=0.125 2023-10-09 13:06:58,098 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-10-09 13:07:27,538 INFO [train.py:1031] (0/4) Epoch 14, batch 7950, loss[loss=0.2185, simple_loss=0.2802, pruned_loss=0.05715, ctc_loss=0.1064, over 16940.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2884, pruned_loss=0.06262, ctc_loss=0.1085, over 3290038.72 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:07:28,209 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=2765849.3333333335, ans=15.0 2023-10-09 13:07:42,018 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2765896.0, ans=0.125 2023-10-09 13:08:09,325 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2765989.3333333335, ans=0.125 2023-10-09 13:08:14,612 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2765989.3333333335, ans=0.5 2023-10-09 13:08:19,020 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.992e+02 3.305e+02 3.981e+02 8.015e+02, threshold=6.609e+02, percent-clipped=1.0 2023-10-09 13:08:28,686 INFO [train.py:1031] (0/4) Epoch 14, batch 8000, loss[loss=0.2151, simple_loss=0.2728, pruned_loss=0.05868, ctc_loss=0.09998, over 16999.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2861, pruned_loss=0.06353, ctc_loss=0.1103, over 3302046.63 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:08:33,569 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-10-09 13:08:47,803 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2023-10-09 13:08:58,014 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2766176.0, ans=0.0 2023-10-09 13:09:00,188 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2766176.0, ans=0.0 2023-10-09 13:09:00,262 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2766176.0, ans=0.025 2023-10-09 13:09:29,364 INFO [train.py:1031] (0/4) Epoch 14, batch 8050, loss[loss=0.2203, simple_loss=0.2842, pruned_loss=0.05819, ctc_loss=0.09974, over 16941.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2848, pruned_loss=0.06456, ctc_loss=0.1121, over 3307522.72 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:09:35,987 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2766316.0, ans=0.015 2023-10-09 13:10:16,327 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2766456.0, ans=0.2 2023-10-09 13:10:22,705 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.190e+02 3.616e+02 4.250e+02 6.056e+02, threshold=7.233e+02, percent-clipped=0.0 2023-10-09 13:10:30,817 INFO [train.py:1031] (0/4) Epoch 14, batch 8100, loss[loss=0.2208, simple_loss=0.2769, pruned_loss=0.0622, ctc_loss=0.1009, over 17055.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2831, pruned_loss=0.0658, ctc_loss=0.1143, over 3315445.80 frames. ], batch size: 259, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:10:38,511 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2766549.3333333335, ans=0.2 2023-10-09 13:11:28,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2766736.0, ans=0.125 2023-10-09 13:11:31,971 INFO [train.py:1031] (0/4) Epoch 14, batch 8150, loss[loss=0.2052, simple_loss=0.2644, pruned_loss=0.05364, ctc_loss=0.0968, over 16711.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2814, pruned_loss=0.06605, ctc_loss=0.1149, over 3308718.47 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:11:35,490 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-10-09 13:11:52,104 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2766829.3333333335, ans=0.0 2023-10-09 13:12:05,839 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=2766876.0, ans=0.1 2023-10-09 13:12:09,688 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2766922.6666666665, ans=0.0 2023-10-09 13:12:17,543 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-10-09 13:12:26,044 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 3.114e+02 3.645e+02 4.244e+02 8.238e+02, threshold=7.291e+02, percent-clipped=3.0 2023-10-09 13:12:32,817 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2767016.0, ans=0.125 2023-10-09 13:12:33,448 INFO [train.py:1031] (0/4) Epoch 14, batch 8200, loss[loss=0.2639, simple_loss=0.3271, pruned_loss=0.07438, ctc_loss=0.1299, over 16828.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2819, pruned_loss=0.06384, ctc_loss=0.1117, over 3310374.70 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:13:37,128 INFO [train.py:1031] (0/4) Epoch 14, batch 8250, loss[loss=0.2431, simple_loss=0.3329, pruned_loss=0.05556, ctc_loss=0.1055, over 16798.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2914, pruned_loss=0.06282, ctc_loss=0.1112, over 3319730.88 frames. ], batch size: 308, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:13:49,055 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2767296.0, ans=0.0 2023-10-09 13:13:51,739 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2767296.0, ans=0.125 2023-10-09 13:13:54,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2767296.0, ans=0.0 2023-10-09 13:14:04,882 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-10-09 13:14:08,522 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2767342.6666666665, ans=0.125 2023-10-09 13:14:32,770 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.653e+02 3.032e+02 3.705e+02 6.938e+02, threshold=6.064e+02, percent-clipped=0.0 2023-10-09 13:14:40,099 INFO [train.py:1031] (0/4) Epoch 14, batch 8300, loss[loss=0.1933, simple_loss=0.2639, pruned_loss=0.04591, ctc_loss=0.07751, over 16809.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2909, pruned_loss=0.0594, ctc_loss=0.106, over 3309582.22 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:14:50,932 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2767482.6666666665, ans=0.125 2023-10-09 13:15:00,710 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2767529.3333333335, ans=0.05 2023-10-09 13:15:12,866 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:15:28,658 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2767669.3333333335, ans=0.125 2023-10-09 13:15:36,825 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2767669.3333333335, ans=0.0 2023-10-09 13:15:42,537 INFO [train.py:1031] (0/4) Epoch 14, batch 8350, loss[loss=0.199, simple_loss=0.3067, pruned_loss=0.03254, ctc_loss=0.06547, over 15023.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2918, pruned_loss=0.0578, ctc_loss=0.1036, over 3304301.63 frames. ], batch size: 527, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:15:47,801 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2767716.0, ans=0.2 2023-10-09 13:15:47,817 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2767716.0, ans=0.1 2023-10-09 13:16:01,222 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2767762.6666666665, ans=0.0 2023-10-09 13:16:06,242 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2767809.3333333335, ans=0.0 2023-10-09 13:16:25,471 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2767856.0, ans=0.1 2023-10-09 13:16:25,688 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-10-09 13:16:38,122 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.846e+02 3.361e+02 4.142e+02 6.688e+02, threshold=6.722e+02, percent-clipped=2.0 2023-10-09 13:16:44,684 INFO [train.py:1031] (0/4) Epoch 14, batch 8400, loss[loss=0.189, simple_loss=0.2685, pruned_loss=0.0399, ctc_loss=0.07396, over 16851.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2847, pruned_loss=0.05359, ctc_loss=0.09678, over 3284627.78 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:17:22,814 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2768089.3333333335, ans=0.125 2023-10-09 13:17:42,758 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2768136.0, ans=0.0 2023-10-09 13:17:45,053 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2768136.0, ans=0.125 2023-10-09 13:17:48,486 INFO [train.py:1031] (0/4) Epoch 14, batch 8450, loss[loss=0.2772, simple_loss=0.3827, pruned_loss=0.06207, ctc_loss=0.1188, over 16313.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2907, pruned_loss=0.05505, ctc_loss=0.09978, over 3286373.65 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:18:05,418 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2768229.3333333335, ans=0.125 2023-10-09 13:18:10,412 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2768229.3333333335, ans=0.125 2023-10-09 13:18:39,154 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2768369.3333333335, ans=0.125 2023-10-09 13:18:45,221 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 3.438e+02 4.157e+02 5.474e+02 9.633e+02, threshold=8.314e+02, percent-clipped=10.0 2023-10-09 13:18:48,293 INFO [train.py:1031] (0/4) Epoch 14, batch 8500, loss[loss=0.2296, simple_loss=0.2784, pruned_loss=0.06687, ctc_loss=0.1176, over 16843.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2933, pruned_loss=0.05752, ctc_loss=0.1039, over 3298046.28 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:19:16,225 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2768509.3333333335, ans=0.1 2023-10-09 13:19:19,372 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:19:24,806 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2768556.0, ans=0.1 2023-10-09 13:19:33,154 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2768556.0, ans=0.125 2023-10-09 13:19:38,962 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2023-10-09 13:19:40,860 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2768602.6666666665, ans=0.125 2023-10-09 13:19:40,892 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:19:48,996 INFO [train.py:1031] (0/4) Epoch 14, batch 8550, loss[loss=0.3026, simple_loss=0.3151, pruned_loss=0.1076, ctc_loss=0.1875, over 16826.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2906, pruned_loss=0.05962, ctc_loss=0.1068, over 3304475.88 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:19:54,099 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-10-09 13:19:54,742 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2768649.3333333335, ans=0.2 2023-10-09 13:19:55,850 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2768649.3333333335, ans=0.125 2023-10-09 13:20:29,382 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2768789.3333333335, ans=0.2 2023-10-09 13:20:51,139 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.120e+02 3.650e+02 4.330e+02 6.615e+02, threshold=7.300e+02, percent-clipped=0.0 2023-10-09 13:20:53,244 INFO [train.py:1031] (0/4) Epoch 14, batch 8600, loss[loss=0.2071, simple_loss=0.2862, pruned_loss=0.04499, ctc_loss=0.09513, over 16775.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.289, pruned_loss=0.05894, ctc_loss=0.1052, over 3306371.19 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:21:05,834 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2768929.3333333335, ans=0.125 2023-10-09 13:21:13,070 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2768929.3333333335, ans=0.0 2023-10-09 13:21:22,301 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-10-09 13:21:26,556 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2023-10-09 13:21:32,008 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2769022.6666666665, ans=0.125 2023-10-09 13:21:35,241 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2769022.6666666665, ans=0.2 2023-10-09 13:21:38,345 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2023-10-09 13:21:47,038 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=22.5 2023-10-09 13:21:56,024 INFO [train.py:1031] (0/4) Epoch 14, batch 8650, loss[loss=0.156, simple_loss=0.2376, pruned_loss=0.02668, ctc_loss=0.05239, over 16796.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2838, pruned_loss=0.05552, ctc_loss=0.09927, over 3305629.09 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:22:59,313 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 2.781e+02 3.323e+02 4.118e+02 1.274e+03, threshold=6.646e+02, percent-clipped=1.0 2023-10-09 13:23:00,363 INFO [train.py:1031] (0/4) Epoch 14, batch 8700, loss[loss=0.1931, simple_loss=0.2836, pruned_loss=0.03668, ctc_loss=0.07303, over 16779.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2824, pruned_loss=0.05359, ctc_loss=0.09606, over 3302793.08 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:23:18,815 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=22.5 2023-10-09 13:23:21,037 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-10-09 13:23:37,917 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2769489.3333333335, ans=0.125 2023-10-09 13:23:47,102 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2769489.3333333335, ans=0.0 2023-10-09 13:24:00,638 INFO [train.py:1031] (0/4) Epoch 14, batch 8750, loss[loss=0.169, simple_loss=0.2398, pruned_loss=0.03685, ctc_loss=0.06148, over 16730.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2839, pruned_loss=0.05316, ctc_loss=0.09592, over 3304557.54 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:24:03,733 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2769582.6666666665, ans=0.125 2023-10-09 13:24:06,958 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2769582.6666666665, ans=0.125 2023-10-09 13:24:18,913 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:24:32,733 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-10-09 13:24:54,087 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2769769.3333333335, ans=0.125 2023-10-09 13:25:02,700 INFO [train.py:1031] (0/4) Epoch 14, batch 8800, loss[loss=0.1688, simple_loss=0.2474, pruned_loss=0.03364, ctc_loss=0.05719, over 16730.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2813, pruned_loss=0.05002, ctc_loss=0.09067, over 3301361.57 frames. ], batch size: 140, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:25:03,717 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.673e+02 3.234e+02 4.604e+02 9.306e+02, threshold=6.469e+02, percent-clipped=8.0 2023-10-09 13:25:04,004 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2769816.0, ans=0.125 2023-10-09 13:25:36,591 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2769909.3333333335, ans=0.07 2023-10-09 13:25:45,554 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2769956.0, ans=0.0 2023-10-09 13:25:50,467 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2769956.0, ans=0.2 2023-10-09 13:25:53,901 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2770002.6666666665, ans=0.0 2023-10-09 13:25:58,052 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2770002.6666666665, ans=0.125 2023-10-09 13:26:05,140 INFO [train.py:1031] (0/4) Epoch 14, batch 8850, loss[loss=0.1581, simple_loss=0.2208, pruned_loss=0.03591, ctc_loss=0.05876, over 16849.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2758, pruned_loss=0.04582, ctc_loss=0.08327, over 3302688.91 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:26:08,402 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-10-09 13:26:24,813 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2023-10-09 13:26:38,383 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2770142.6666666665, ans=0.0 2023-10-09 13:26:52,048 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=22.5 2023-10-09 13:27:05,691 INFO [train.py:1031] (0/4) Epoch 14, batch 8900, loss[loss=0.2327, simple_loss=0.2762, pruned_loss=0.06989, ctc_loss=0.1234, over 16624.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2731, pruned_loss=0.04588, ctc_loss=0.08309, over 3307851.99 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:27:08,453 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.336e+02 2.693e+02 3.509e+02 6.659e+02, threshold=5.387e+02, percent-clipped=1.0 2023-10-09 13:27:29,506 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2770376.0, ans=0.125 2023-10-09 13:27:53,507 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2770422.6666666665, ans=0.125 2023-10-09 13:27:54,580 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2770469.3333333335, ans=0.125 2023-10-09 13:28:08,425 INFO [train.py:1031] (0/4) Epoch 14, batch 8950, loss[loss=0.196, simple_loss=0.253, pruned_loss=0.0515, ctc_loss=0.0899, over 16925.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2736, pruned_loss=0.05023, ctc_loss=0.09, over 3313366.09 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:28:14,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2770516.0, ans=0.1 2023-10-09 13:28:17,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2770516.0, ans=0.04949747468305833 2023-10-09 13:28:26,206 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2770562.6666666665, ans=0.2 2023-10-09 13:28:32,931 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-10-09 13:28:34,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2770609.3333333335, ans=0.125 2023-10-09 13:28:48,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2770656.0, ans=0.125 2023-10-09 13:28:56,862 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2770702.6666666665, ans=0.025 2023-10-09 13:29:10,856 INFO [train.py:1031] (0/4) Epoch 14, batch 9000, loss[loss=0.211, simple_loss=0.2417, pruned_loss=0.06609, ctc_loss=0.1202, over 16330.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2703, pruned_loss=0.05343, ctc_loss=0.09532, over 3314861.62 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:29:10,857 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 13:29:26,448 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2412, simple_loss=0.3097, pruned_loss=0.06635, ctc_loss=0.1001, over 1796401.00 frames. 2023-10-09 13:29:26,449 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14582MB 2023-10-09 13:29:29,126 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.435e+02 3.467e+02 3.875e+02 4.625e+02 8.873e+02, threshold=7.750e+02, percent-clipped=12.0 2023-10-09 13:29:44,008 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770796.0, ans=0.1 2023-10-09 13:30:06,366 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-10-09 13:30:11,241 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2770889.3333333335, ans=0.125 2023-10-09 13:30:28,024 INFO [train.py:1031] (0/4) Epoch 14, batch 9050, loss[loss=0.2058, simple_loss=0.2537, pruned_loss=0.0583, ctc_loss=0.1032, over 16980.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2674, pruned_loss=0.05535, ctc_loss=0.0984, over 3316213.01 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:30:36,461 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=12.0 2023-10-09 13:30:50,125 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2771029.3333333335, ans=0.2 2023-10-09 13:31:01,794 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771076.0, ans=0.1 2023-10-09 13:31:29,168 INFO [train.py:1031] (0/4) Epoch 14, batch 9100, loss[loss=0.2147, simple_loss=0.2617, pruned_loss=0.06104, ctc_loss=0.1143, over 16596.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2635, pruned_loss=0.05582, ctc_loss=0.09923, over 3309871.94 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:31:34,400 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.919e+02 3.286e+02 3.918e+02 6.845e+02, threshold=6.573e+02, percent-clipped=0.0 2023-10-09 13:31:49,585 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2771262.6666666665, ans=0.125 2023-10-09 13:32:02,383 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=22.5 2023-10-09 13:32:11,173 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.74 vs. limit=15.0 2023-10-09 13:32:11,592 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2771356.0, ans=0.125 2023-10-09 13:32:26,236 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2771402.6666666665, ans=0.125 2023-10-09 13:32:30,942 INFO [train.py:1031] (0/4) Epoch 14, batch 9150, loss[loss=0.1867, simple_loss=0.2542, pruned_loss=0.04418, ctc_loss=0.07726, over 16807.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2621, pruned_loss=0.05304, ctc_loss=0.09475, over 3308458.24 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:32:32,772 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2771449.3333333335, ans=0.125 2023-10-09 13:32:58,050 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-10-09 13:33:18,957 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2771636.0, ans=0.0 2023-10-09 13:33:30,897 INFO [train.py:1031] (0/4) Epoch 14, batch 9200, loss[loss=0.2283, simple_loss=0.2717, pruned_loss=0.06964, ctc_loss=0.1139, over 16821.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.266, pruned_loss=0.05561, ctc_loss=0.09855, over 3315755.25 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:33:32,697 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2023-10-09 13:33:37,289 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.803e+02 3.406e+02 4.271e+02 8.869e+02, threshold=6.811e+02, percent-clipped=4.0 2023-10-09 13:34:29,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2771869.3333333335, ans=0.125 2023-10-09 13:34:29,657 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2771869.3333333335, ans=0.125 2023-10-09 13:34:32,046 INFO [train.py:1031] (0/4) Epoch 14, batch 9250, loss[loss=0.227, simple_loss=0.2824, pruned_loss=0.06332, ctc_loss=0.1126, over 16765.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2696, pruned_loss=0.05772, ctc_loss=0.1019, over 3308175.54 frames. ], batch size: 291, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:34:42,446 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2771916.0, ans=0.125 2023-10-09 13:34:43,832 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=22.5 2023-10-09 13:34:44,550 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771962.6666666665, ans=0.1 2023-10-09 13:34:49,820 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2023-10-09 13:35:18,298 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2772056.0, ans=0.125 2023-10-09 13:35:22,382 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-10-09 13:35:28,548 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2772102.6666666665, ans=0.125 2023-10-09 13:35:32,217 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2772102.6666666665, ans=0.125 2023-10-09 13:35:33,981 INFO [train.py:1031] (0/4) Epoch 14, batch 9300, loss[loss=0.2179, simple_loss=0.2958, pruned_loss=0.04989, ctc_loss=0.1002, over 16761.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2698, pruned_loss=0.0562, ctc_loss=0.09988, over 3315581.86 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:35:35,576 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-10-09 13:35:38,071 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2772149.3333333335, ans=0.0 2023-10-09 13:35:41,378 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.952e+02 3.315e+02 3.911e+02 8.519e+02, threshold=6.629e+02, percent-clipped=4.0 2023-10-09 13:36:03,459 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2772242.6666666665, ans=0.0 2023-10-09 13:36:11,834 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2772289.3333333335, ans=0.0 2023-10-09 13:36:35,774 INFO [train.py:1031] (0/4) Epoch 14, batch 9350, loss[loss=0.1944, simple_loss=0.2752, pruned_loss=0.04182, ctc_loss=0.07485, over 16835.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2769, pruned_loss=0.05808, ctc_loss=0.1037, over 3318393.85 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:36:45,306 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2772382.6666666665, ans=0.0 2023-10-09 13:36:48,032 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2772429.3333333335, ans=0.05 2023-10-09 13:36:48,281 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-10-09 13:36:52,299 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2772429.3333333335, ans=0.2 2023-10-09 13:37:11,715 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-10-09 13:37:20,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2772522.6666666665, ans=0.125 2023-10-09 13:37:28,230 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2772569.3333333335, ans=0.2 2023-10-09 13:37:31,861 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2772569.3333333335, ans=0.1 2023-10-09 13:37:39,102 INFO [train.py:1031] (0/4) Epoch 14, batch 9400, loss[loss=0.2042, simple_loss=0.2982, pruned_loss=0.03934, ctc_loss=0.07884, over 16906.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2832, pruned_loss=0.05761, ctc_loss=0.1034, over 3315145.96 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:37:44,268 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2772616.0, ans=0.1 2023-10-09 13:37:46,565 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 3.369e+02 4.289e+02 5.603e+02 1.054e+03, threshold=8.577e+02, percent-clipped=14.0 2023-10-09 13:37:59,047 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2772662.6666666665, ans=0.04949747468305833 2023-10-09 13:38:04,260 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2772709.3333333335, ans=0.125 2023-10-09 13:38:08,995 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2772709.3333333335, ans=0.125 2023-10-09 13:38:21,679 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-10-09 13:38:41,002 INFO [train.py:1031] (0/4) Epoch 14, batch 9450, loss[loss=0.2133, simple_loss=0.2858, pruned_loss=0.05233, ctc_loss=0.09035, over 16859.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2867, pruned_loss=0.05593, ctc_loss=0.1011, over 3314783.29 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:38:58,794 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2772896.0, ans=0.125 2023-10-09 13:39:14,385 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:39:17,135 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2772989.3333333335, ans=0.5 2023-10-09 13:39:19,823 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:39:26,307 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2772989.3333333335, ans=0.0 2023-10-09 13:39:37,605 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2773036.0, ans=0.125 2023-10-09 13:39:43,172 INFO [train.py:1031] (0/4) Epoch 14, batch 9500, loss[loss=0.254, simple_loss=0.2977, pruned_loss=0.07766, ctc_loss=0.1377, over 16917.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2862, pruned_loss=0.05845, ctc_loss=0.1047, over 3306083.70 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:39:46,607 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2773082.6666666665, ans=0.125 2023-10-09 13:39:51,849 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 3.046e+02 3.571e+02 4.127e+02 8.787e+02, threshold=7.141e+02, percent-clipped=1.0 2023-10-09 13:39:58,024 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-10-09 13:40:06,277 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2773129.3333333335, ans=0.07 2023-10-09 13:40:18,871 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2773176.0, ans=0.0 2023-10-09 13:40:40,995 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-10-09 13:40:46,270 INFO [train.py:1031] (0/4) Epoch 14, batch 9550, loss[loss=0.233, simple_loss=0.2831, pruned_loss=0.06807, ctc_loss=0.1169, over 16875.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2887, pruned_loss=0.06291, ctc_loss=0.1117, over 3300656.31 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:40:52,070 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2773316.0, ans=0.2 2023-10-09 13:41:02,303 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2773362.6666666665, ans=0.1 2023-10-09 13:41:21,095 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-10-09 13:41:24,624 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2773456.0, ans=0.1 2023-10-09 13:41:38,091 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2773502.6666666665, ans=0.2 2023-10-09 13:41:48,552 INFO [train.py:1031] (0/4) Epoch 14, batch 9600, loss[loss=0.2478, simple_loss=0.2967, pruned_loss=0.07366, ctc_loss=0.1287, over 16855.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2935, pruned_loss=0.06601, ctc_loss=0.1168, over 3312665.47 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:42:00,632 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.675e+02 3.309e+02 3.670e+02 4.199e+02 1.268e+03, threshold=7.340e+02, percent-clipped=3.0 2023-10-09 13:42:01,331 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-10-09 13:42:08,566 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2773596.0, ans=0.125 2023-10-09 13:42:08,580 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2773596.0, ans=0.125 2023-10-09 13:42:20,708 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-10-09 13:42:29,065 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.69 vs. limit=10.0 2023-10-09 13:42:48,701 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2773736.0, ans=0.1 2023-10-09 13:42:52,928 INFO [train.py:1031] (0/4) Epoch 14, batch 9650, loss[loss=0.2076, simple_loss=0.2785, pruned_loss=0.05109, ctc_loss=0.08661, over 16729.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.2978, pruned_loss=0.06838, ctc_loss=0.1211, over 3304558.85 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:43:02,484 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-10-09 13:43:17,745 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2773876.0, ans=0.2 2023-10-09 13:43:47,705 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2773969.3333333335, ans=0.09899494936611666 2023-10-09 13:43:47,931 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2023-10-09 13:43:50,491 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2773969.3333333335, ans=0.125 2023-10-09 13:43:55,643 INFO [train.py:1031] (0/4) Epoch 14, batch 9700, loss[loss=0.1989, simple_loss=0.2762, pruned_loss=0.04484, ctc_loss=0.07987, over 16782.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.296, pruned_loss=0.0652, ctc_loss=0.1157, over 3301549.03 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:43:58,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2774016.0, ans=0.125 2023-10-09 13:44:06,800 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.889e+02 3.444e+02 4.302e+02 1.235e+03, threshold=6.889e+02, percent-clipped=2.0 2023-10-09 13:44:56,793 INFO [train.py:1031] (0/4) Epoch 14, batch 9750, loss[loss=0.2095, simple_loss=0.2548, pruned_loss=0.06095, ctc_loss=0.1057, over 16849.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2884, pruned_loss=0.06401, ctc_loss=0.1131, over 3297501.94 frames. ], batch size: 189, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:45:18,789 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2774296.0, ans=0.1 2023-10-09 13:45:29,864 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2774342.6666666665, ans=0.2 2023-10-09 13:45:49,384 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2774436.0, ans=0.5 2023-10-09 13:45:49,517 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=22.5 2023-10-09 13:45:51,746 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-10-09 13:45:54,687 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2774436.0, ans=0.125 2023-10-09 13:45:55,781 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2774436.0, ans=0.0 2023-10-09 13:45:55,818 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2774436.0, ans=0.125 2023-10-09 13:45:59,206 INFO [train.py:1031] (0/4) Epoch 14, batch 9800, loss[loss=0.2096, simple_loss=0.2779, pruned_loss=0.05198, ctc_loss=0.09322, over 16833.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2853, pruned_loss=0.06252, ctc_loss=0.1106, over 3299423.65 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:46:04,461 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2774482.6666666665, ans=0.125 2023-10-09 13:46:04,557 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2774482.6666666665, ans=0.07 2023-10-09 13:46:11,661 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.062e+02 3.512e+02 4.119e+02 7.038e+02, threshold=7.024e+02, percent-clipped=1.0 2023-10-09 13:46:26,510 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2774576.0, ans=0.125 2023-10-09 13:46:40,030 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2774622.6666666665, ans=0.09899494936611666 2023-10-09 13:46:40,031 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2774622.6666666665, ans=0.0 2023-10-09 13:46:54,997 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774669.3333333335, ans=0.125 2023-10-09 13:47:00,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2774716.0, ans=0.1 2023-10-09 13:47:01,107 INFO [train.py:1031] (0/4) Epoch 14, batch 9850, loss[loss=0.2347, simple_loss=0.2857, pruned_loss=0.06689, ctc_loss=0.1251, over 15345.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2848, pruned_loss=0.0619, ctc_loss=0.1095, over 3289695.21 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:47:22,916 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.82 vs. limit=10.0 2023-10-09 13:47:27,949 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774809.3333333335, ans=0.1 2023-10-09 13:47:43,019 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2774856.0, ans=0.05 2023-10-09 13:47:44,669 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:47:47,078 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-10-09 13:47:55,950 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2774902.6666666665, ans=0.125 2023-10-09 13:48:02,737 INFO [train.py:1031] (0/4) Epoch 14, batch 9900, loss[loss=0.1955, simple_loss=0.2494, pruned_loss=0.05354, ctc_loss=0.0865, over 16925.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2797, pruned_loss=0.06153, ctc_loss=0.1083, over 3291004.35 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:48:04,787 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2774949.3333333335, ans=0.1 2023-10-09 13:48:08,371 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2774949.3333333335, ans=0.05 2023-10-09 13:48:16,789 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+02 2.878e+02 3.183e+02 3.713e+02 1.156e+03, threshold=6.367e+02, percent-clipped=1.0 2023-10-09 13:48:19,371 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2774996.0, ans=0.125 2023-10-09 13:48:26,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2775042.6666666665, ans=0.0 2023-10-09 13:48:32,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2775042.6666666665, ans=0.125 2023-10-09 13:48:39,914 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-10-09 13:48:51,571 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2775136.0, ans=0.0 2023-10-09 13:48:56,996 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2775136.0, ans=0.125 2023-10-09 13:48:58,152 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2775136.0, ans=0.125 2023-10-09 13:49:05,473 INFO [train.py:1031] (0/4) Epoch 14, batch 9950, loss[loss=0.1831, simple_loss=0.2503, pruned_loss=0.04252, ctc_loss=0.07714, over 16958.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2726, pruned_loss=0.0588, ctc_loss=0.1034, over 3302039.94 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:49:06,787 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2775182.6666666665, ans=0.125 2023-10-09 13:49:20,044 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2775229.3333333335, ans=0.1 2023-10-09 13:49:23,907 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2775229.3333333335, ans=0.125 2023-10-09 13:49:26,006 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:49:27,656 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=2775229.3333333335, ans=0.02 2023-10-09 13:49:34,743 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2775276.0, ans=0.125 2023-10-09 13:49:40,218 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2775276.0, ans=0.125 2023-10-09 13:49:57,701 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2775369.3333333335, ans=0.1 2023-10-09 13:49:58,966 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.60 vs. limit=10.0 2023-10-09 13:50:08,695 INFO [train.py:1031] (0/4) Epoch 14, batch 10000, loss[loss=0.2005, simple_loss=0.2532, pruned_loss=0.05645, ctc_loss=0.08727, over 16962.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2694, pruned_loss=0.0567, ctc_loss=0.1002, over 3307365.87 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:50:19,522 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2775416.0, ans=0.05 2023-10-09 13:50:24,894 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+02 2.835e+02 3.193e+02 3.667e+02 1.150e+03, threshold=6.386e+02, percent-clipped=3.0 2023-10-09 13:50:25,239 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2775462.6666666665, ans=0.125 2023-10-09 13:50:25,301 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:50:29,323 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=22.5 2023-10-09 13:50:38,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2775509.3333333335, ans=0.2 2023-10-09 13:50:51,065 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.32 vs. limit=15.0 2023-10-09 13:51:10,625 INFO [train.py:1031] (0/4) Epoch 14, batch 10050, loss[loss=0.2295, simple_loss=0.2699, pruned_loss=0.0701, ctc_loss=0.1222, over 16742.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2648, pruned_loss=0.05702, ctc_loss=0.1009, over 3310044.10 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:51:13,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2775649.3333333335, ans=0.05 2023-10-09 13:51:21,957 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2775649.3333333335, ans=0.125 2023-10-09 13:51:25,183 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2775696.0, ans=0.125 2023-10-09 13:51:43,749 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2775742.6666666665, ans=0.125 2023-10-09 13:52:13,733 INFO [train.py:1031] (0/4) Epoch 14, batch 10100, loss[loss=0.1774, simple_loss=0.2331, pruned_loss=0.04575, ctc_loss=0.07572, over 16674.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2619, pruned_loss=0.05638, ctc_loss=0.09961, over 3315152.61 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:52:17,460 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=22.5 2023-10-09 13:52:18,339 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2775882.6666666665, ans=0.1 2023-10-09 13:52:30,294 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.838e+02 3.162e+02 3.584e+02 6.355e+02, threshold=6.323e+02, percent-clipped=0.0 2023-10-09 13:52:50,871 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2776022.6666666665, ans=0.125 2023-10-09 13:52:53,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2776022.6666666665, ans=0.125 2023-10-09 13:53:12,916 INFO [train.py:1031] (0/4) Epoch 14, batch 10150, loss[loss=0.2463, simple_loss=0.2978, pruned_loss=0.0727, ctc_loss=0.1237, over 16895.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2628, pruned_loss=0.05842, ctc_loss=0.1028, over 3318591.55 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:53:14,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2776116.0, ans=0.125 2023-10-09 13:53:41,094 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2776209.3333333335, ans=0.0 2023-10-09 13:54:12,026 INFO [train.py:1031] (0/4) Epoch 14, batch 10200, loss[loss=0.2, simple_loss=0.2516, pruned_loss=0.05457, ctc_loss=0.09804, over 16791.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.265, pruned_loss=0.06018, ctc_loss=0.1056, over 3319549.64 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:54:24,038 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2776396.0, ans=0.1 2023-10-09 13:54:28,917 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.256e+02 3.649e+02 4.243e+02 9.669e+02, threshold=7.298e+02, percent-clipped=6.0 2023-10-09 13:54:43,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2776442.6666666665, ans=0.125 2023-10-09 13:54:50,251 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2776489.3333333335, ans=0.0 2023-10-09 13:55:12,754 INFO [train.py:1031] (0/4) Epoch 14, batch 10250, loss[loss=0.208, simple_loss=0.2523, pruned_loss=0.06126, ctc_loss=0.1028, over 16626.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.264, pruned_loss=0.06131, ctc_loss=0.1072, over 3314684.72 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:55:14,331 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.27 vs. limit=10.0 2023-10-09 13:55:17,852 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2776582.6666666665, ans=0.125 2023-10-09 13:55:27,155 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2776629.3333333335, ans=0.2 2023-10-09 13:55:48,428 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2776722.6666666665, ans=0.125 2023-10-09 13:55:53,146 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2776722.6666666665, ans=0.1 2023-10-09 13:55:54,150 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2776722.6666666665, ans=0.2 2023-10-09 13:56:10,023 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2776769.3333333335, ans=0.2 2023-10-09 13:56:10,051 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2776769.3333333335, ans=0.125 2023-10-09 13:56:13,436 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:56:14,048 INFO [train.py:1031] (0/4) Epoch 14, batch 10300, loss[loss=0.236, simple_loss=0.2778, pruned_loss=0.07042, ctc_loss=0.1335, over 16822.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2651, pruned_loss=0.0631, ctc_loss=0.1103, over 3316844.93 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:56:16,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2776816.0, ans=0.0 2023-10-09 13:56:24,887 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2776862.6666666665, ans=0.0 2023-10-09 13:56:30,115 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2776862.6666666665, ans=0.0 2023-10-09 13:56:33,609 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+02 3.333e+02 3.833e+02 4.530e+02 9.139e+02, threshold=7.666e+02, percent-clipped=3.0 2023-10-09 13:56:35,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2776862.6666666665, ans=0.0 2023-10-09 13:56:42,583 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2776909.3333333335, ans=0.125 2023-10-09 13:57:16,373 INFO [train.py:1031] (0/4) Epoch 14, batch 10350, loss[loss=0.1821, simple_loss=0.2951, pruned_loss=0.02419, ctc_loss=0.05182, over 16342.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2677, pruned_loss=0.06287, ctc_loss=0.1105, over 3319231.89 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:57:26,318 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2777049.3333333335, ans=0.125 2023-10-09 13:57:27,955 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2777096.0, ans=0.0 2023-10-09 13:57:59,432 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2777189.3333333335, ans=0.125 2023-10-09 13:58:15,665 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2777236.0, ans=0.0 2023-10-09 13:58:18,009 INFO [train.py:1031] (0/4) Epoch 14, batch 10400, loss[loss=0.1663, simple_loss=0.2567, pruned_loss=0.0274, ctc_loss=0.05298, over 16850.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2682, pruned_loss=0.05862, ctc_loss=0.1035, over 3323098.35 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:58:18,370 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2777282.6666666665, ans=0.125 2023-10-09 13:58:37,078 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.958e+02 3.554e+02 4.330e+02 8.227e+02, threshold=7.107e+02, percent-clipped=1.0 2023-10-09 13:58:38,326 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2777329.3333333335, ans=0.125 2023-10-09 13:59:07,826 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2777469.3333333335, ans=0.125 2023-10-09 13:59:14,502 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2777469.3333333335, ans=0.2 2023-10-09 13:59:18,284 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2777469.3333333335, ans=0.0 2023-10-09 13:59:20,077 INFO [train.py:1031] (0/4) Epoch 14, batch 10450, loss[loss=0.2544, simple_loss=0.3062, pruned_loss=0.07528, ctc_loss=0.1304, over 16714.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.274, pruned_loss=0.06039, ctc_loss=0.1066, over 3326555.21 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:59:36,423 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2777562.6666666665, ans=0.125 2023-10-09 13:59:36,461 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2777562.6666666665, ans=0.0 2023-10-09 13:59:49,153 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2777609.3333333335, ans=0.0 2023-10-09 13:59:50,098 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2777609.3333333335, ans=0.125 2023-10-09 14:00:05,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2777656.0, ans=0.5 2023-10-09 14:00:20,703 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2777749.3333333335, ans=0.125 2023-10-09 14:00:21,480 INFO [train.py:1031] (0/4) Epoch 14, batch 10500, loss[loss=0.1997, simple_loss=0.2569, pruned_loss=0.05204, ctc_loss=0.09631, over 16795.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2746, pruned_loss=0.06279, ctc_loss=0.1102, over 3326002.48 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:00:21,754 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2777749.3333333335, ans=0.0 2023-10-09 14:00:36,584 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-10-09 14:00:40,487 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2777796.0, ans=0.0 2023-10-09 14:00:43,436 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+02 3.496e+02 3.857e+02 4.755e+02 1.181e+03, threshold=7.715e+02, percent-clipped=1.0 2023-10-09 14:00:43,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2777796.0, ans=0.1 2023-10-09 14:00:51,932 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:00:55,296 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2023-10-09 14:00:57,368 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-10-09 14:01:04,782 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2777889.3333333335, ans=0.0 2023-10-09 14:01:17,300 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2023-10-09 14:01:20,288 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2777936.0, ans=0.1 2023-10-09 14:01:22,068 INFO [train.py:1031] (0/4) Epoch 14, batch 10550, loss[loss=0.2272, simple_loss=0.2827, pruned_loss=0.06316, ctc_loss=0.1134, over 16737.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2723, pruned_loss=0.06211, ctc_loss=0.1094, over 3317808.91 frames. ], batch size: 291, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:01:26,720 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2777982.6666666665, ans=0.1 2023-10-09 14:01:27,694 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2777982.6666666665, ans=0.125 2023-10-09 14:01:32,315 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2778029.3333333335, ans=0.0 2023-10-09 14:01:38,764 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-10-09 14:01:52,596 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:02:00,811 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2778122.6666666665, ans=0.125 2023-10-09 14:02:07,141 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:02:20,571 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2023-10-09 14:02:24,163 INFO [train.py:1031] (0/4) Epoch 14, batch 10600, loss[loss=0.2261, simple_loss=0.2876, pruned_loss=0.05985, ctc_loss=0.1124, over 16206.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2748, pruned_loss=0.06086, ctc_loss=0.1076, over 3313493.98 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:02:47,791 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.164e+02 3.650e+02 4.243e+02 8.211e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 14:02:48,360 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2023-10-09 14:02:58,914 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2778309.3333333335, ans=0.0 2023-10-09 14:03:11,843 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2778356.0, ans=0.1 2023-10-09 14:03:20,104 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2778402.6666666665, ans=0.125 2023-10-09 14:03:26,249 INFO [train.py:1031] (0/4) Epoch 14, batch 10650, loss[loss=0.2045, simple_loss=0.2589, pruned_loss=0.0554, ctc_loss=0.09814, over 16747.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2786, pruned_loss=0.06208, ctc_loss=0.1093, over 3305655.21 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:03:32,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2778449.3333333335, ans=0.2 2023-10-09 14:04:06,035 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2778589.3333333335, ans=0.125 2023-10-09 14:04:07,684 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2778589.3333333335, ans=0.125 2023-10-09 14:04:25,758 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2778636.0, ans=0.0 2023-10-09 14:04:28,567 INFO [train.py:1031] (0/4) Epoch 14, batch 10700, loss[loss=0.2104, simple_loss=0.262, pruned_loss=0.05883, ctc_loss=0.1029, over 16809.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2733, pruned_loss=0.05885, ctc_loss=0.1033, over 3297718.86 frames. ], batch size: 141, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:04:52,617 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 3.058e+02 3.576e+02 4.175e+02 9.953e+02, threshold=7.153e+02, percent-clipped=1.0 2023-10-09 14:04:55,105 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.04 vs. limit=5.0 2023-10-09 14:05:00,363 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.10 vs. limit=12.0 2023-10-09 14:05:09,839 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2778822.6666666665, ans=0.125 2023-10-09 14:05:12,977 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2778822.6666666665, ans=0.0 2023-10-09 14:05:18,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2778869.3333333335, ans=0.125 2023-10-09 14:05:27,444 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2778869.3333333335, ans=0.1 2023-10-09 14:05:32,627 INFO [train.py:1031] (0/4) Epoch 14, batch 10750, loss[loss=0.2631, simple_loss=0.3102, pruned_loss=0.07836, ctc_loss=0.1481, over 16796.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2803, pruned_loss=0.06272, ctc_loss=0.1097, over 3303663.95 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:05:32,818 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2778916.0, ans=0.0 2023-10-09 14:05:42,779 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2778916.0, ans=0.09899494936611666 2023-10-09 14:05:59,026 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2779009.3333333335, ans=0.0 2023-10-09 14:05:59,391 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=22.5 2023-10-09 14:06:02,700 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=22.5 2023-10-09 14:06:07,405 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2779009.3333333335, ans=0.125 2023-10-09 14:06:21,740 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=12.0 2023-10-09 14:06:35,717 INFO [train.py:1031] (0/4) Epoch 14, batch 10800, loss[loss=0.2263, simple_loss=0.242, pruned_loss=0.07545, ctc_loss=0.1491, over 15416.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2788, pruned_loss=0.06379, ctc_loss=0.1115, over 3304732.52 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:06:46,670 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2779196.0, ans=0.2 2023-10-09 14:06:53,195 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2779196.0, ans=0.1 2023-10-09 14:07:01,261 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.600e+02 3.349e+02 3.657e+02 4.515e+02 8.469e+02, threshold=7.313e+02, percent-clipped=4.0 2023-10-09 14:07:12,948 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2779289.3333333335, ans=0.125 2023-10-09 14:07:15,645 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.39 vs. limit=22.5 2023-10-09 14:07:28,572 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2779336.0, ans=0.125 2023-10-09 14:07:33,764 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:07:36,231 INFO [train.py:1031] (0/4) Epoch 14, batch 10850, loss[loss=0.2019, simple_loss=0.2506, pruned_loss=0.05618, ctc_loss=0.1023, over 16707.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2746, pruned_loss=0.064, ctc_loss=0.1118, over 3304671.45 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:07:41,313 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2779382.6666666665, ans=0.125 2023-10-09 14:07:46,836 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=22.5 2023-10-09 14:07:49,190 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-10-09 14:07:53,916 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2023-10-09 14:08:12,597 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=12.0 2023-10-09 14:08:19,794 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779522.6666666665, ans=0.1 2023-10-09 14:08:38,603 INFO [train.py:1031] (0/4) Epoch 14, batch 10900, loss[loss=0.2007, simple_loss=0.2515, pruned_loss=0.05508, ctc_loss=0.09936, over 16794.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2691, pruned_loss=0.06332, ctc_loss=0.111, over 3300859.07 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:08:50,678 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2779662.6666666665, ans=0.125 2023-10-09 14:09:05,441 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.210e+02 3.855e+02 4.821e+02 1.226e+03, threshold=7.710e+02, percent-clipped=2.0 2023-10-09 14:09:14,500 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-10-09 14:09:29,122 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2779802.6666666665, ans=0.125 2023-10-09 14:09:31,353 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2779802.6666666665, ans=0.125 2023-10-09 14:09:39,565 INFO [train.py:1031] (0/4) Epoch 14, batch 10950, loss[loss=0.1792, simple_loss=0.2146, pruned_loss=0.05273, ctc_loss=0.09574, over 16115.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2648, pruned_loss=0.06224, ctc_loss=0.1092, over 3300516.65 frames. ], batch size: 466, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:09:59,695 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2779896.0, ans=0.05 2023-10-09 14:10:09,500 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.22 vs. limit=10.0 2023-10-09 14:10:42,343 INFO [train.py:1031] (0/4) Epoch 14, batch 11000, loss[loss=0.2442, simple_loss=0.2897, pruned_loss=0.07296, ctc_loss=0.1321, over 16767.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2657, pruned_loss=0.06328, ctc_loss=0.111, over 3303785.44 frames. ], batch size: 308, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:11:11,556 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.328e+02 3.883e+02 5.018e+02 9.874e+02, threshold=7.766e+02, percent-clipped=3.0 2023-10-09 14:11:33,975 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2780269.3333333335, ans=0.09899494936611666 2023-10-09 14:11:46,299 INFO [train.py:1031] (0/4) Epoch 14, batch 11050, loss[loss=0.2136, simple_loss=0.2731, pruned_loss=0.0562, ctc_loss=0.1042, over 16978.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2738, pruned_loss=0.06662, ctc_loss=0.1163, over 3301520.55 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:11:55,541 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2780316.0, ans=0.2 2023-10-09 14:11:58,727 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-10-09 14:12:07,042 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2780362.6666666665, ans=0.125 2023-10-09 14:12:08,572 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2780362.6666666665, ans=0.0 2023-10-09 14:12:13,604 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2780409.3333333335, ans=0.0 2023-10-09 14:12:19,119 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2780409.3333333335, ans=0.125 2023-10-09 14:12:28,168 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2780456.0, ans=0.125 2023-10-09 14:12:38,089 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2780502.6666666665, ans=0.2 2023-10-09 14:12:49,093 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2780549.3333333335, ans=0.0 2023-10-09 14:12:49,759 INFO [train.py:1031] (0/4) Epoch 14, batch 11100, loss[loss=0.1684, simple_loss=0.2067, pruned_loss=0.04801, ctc_loss=0.08525, over 16890.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2767, pruned_loss=0.06468, ctc_loss=0.1128, over 3298068.69 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:13:01,663 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2780596.0, ans=0.2 2023-10-09 14:13:18,549 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+02 3.611e+02 4.307e+02 5.885e+02 1.880e+03, threshold=8.614e+02, percent-clipped=7.0 2023-10-09 14:13:29,266 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2780689.3333333335, ans=0.125 2023-10-09 14:13:33,545 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.36 vs. limit=10.0 2023-10-09 14:13:38,622 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2780736.0, ans=0.125 2023-10-09 14:13:48,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2780736.0, ans=0.125 2023-10-09 14:13:51,662 INFO [train.py:1031] (0/4) Epoch 14, batch 11150, loss[loss=0.2343, simple_loss=0.273, pruned_loss=0.07363, ctc_loss=0.121, over 16745.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2743, pruned_loss=0.06405, ctc_loss=0.1114, over 3289091.55 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:14:09,999 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2780829.3333333335, ans=0.0 2023-10-09 14:14:21,883 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2780876.0, ans=0.0 2023-10-09 14:14:50,518 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2023-10-09 14:14:53,092 INFO [train.py:1031] (0/4) Epoch 14, batch 11200, loss[loss=0.26, simple_loss=0.3256, pruned_loss=0.07138, ctc_loss=0.1289, over 16182.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2751, pruned_loss=0.06501, ctc_loss=0.1134, over 3297142.60 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:15:10,044 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781062.6666666665, ans=0.1 2023-10-09 14:15:17,599 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781109.3333333335, ans=0.1 2023-10-09 14:15:25,074 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.168e+02 3.497e+02 4.095e+02 1.585e+03, threshold=6.993e+02, percent-clipped=3.0 2023-10-09 14:15:26,531 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2781109.3333333335, ans=0.125 2023-10-09 14:15:36,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781156.0, ans=0.1 2023-10-09 14:15:47,355 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2781202.6666666665, ans=0.125 2023-10-09 14:15:49,485 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2781202.6666666665, ans=0.2 2023-10-09 14:15:55,968 INFO [train.py:1031] (0/4) Epoch 14, batch 11250, loss[loss=0.2351, simple_loss=0.2939, pruned_loss=0.06645, ctc_loss=0.1082, over 16545.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.285, pruned_loss=0.06534, ctc_loss=0.1141, over 3296327.54 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:16:18,435 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-596000.pt 2023-10-09 14:16:21,066 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2781296.0, ans=0.125 2023-10-09 14:16:26,059 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2781342.6666666665, ans=0.0 2023-10-09 14:16:37,035 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2023-10-09 14:16:46,999 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781389.3333333335, ans=0.1 2023-10-09 14:16:55,444 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-10-09 14:17:02,416 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2781482.6666666665, ans=0.2 2023-10-09 14:17:03,085 INFO [train.py:1031] (0/4) Epoch 14, batch 11300, loss[loss=0.2066, simple_loss=0.2954, pruned_loss=0.04304, ctc_loss=0.07937, over 16960.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2893, pruned_loss=0.06389, ctc_loss=0.1122, over 3302741.84 frames. ], batch size: 258, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:17:05,039 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2781482.6666666665, ans=0.125 2023-10-09 14:17:06,094 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781482.6666666665, ans=0.1 2023-10-09 14:17:14,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2781529.3333333335, ans=0.0 2023-10-09 14:17:28,870 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2781576.0, ans=0.0 2023-10-09 14:17:33,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2781576.0, ans=0.125 2023-10-09 14:17:33,912 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 3.091e+02 3.848e+02 4.979e+02 9.254e+02, threshold=7.696e+02, percent-clipped=6.0 2023-10-09 14:18:03,517 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2781716.0, ans=0.1 2023-10-09 14:18:04,196 INFO [train.py:1031] (0/4) Epoch 14, batch 11350, loss[loss=0.2142, simple_loss=0.2676, pruned_loss=0.06013, ctc_loss=0.1011, over 16687.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2872, pruned_loss=0.06162, ctc_loss=0.1085, over 3286160.35 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:18:29,814 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-10-09 14:18:34,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2781809.3333333335, ans=0.0 2023-10-09 14:18:39,955 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=22.5 2023-10-09 14:18:43,119 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=22.5 2023-10-09 14:18:52,772 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2781902.6666666665, ans=0.125 2023-10-09 14:19:00,347 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2781902.6666666665, ans=0.2 2023-10-09 14:19:00,576 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-10-09 14:19:05,821 INFO [train.py:1031] (0/4) Epoch 14, batch 11400, loss[loss=0.2526, simple_loss=0.2904, pruned_loss=0.08156, ctc_loss=0.1292, over 16773.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2842, pruned_loss=0.06254, ctc_loss=0.1096, over 3298094.76 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:19:06,207 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2781949.3333333335, ans=0.125 2023-10-09 14:19:10,022 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2781949.3333333335, ans=0.2 2023-10-09 14:19:37,885 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.191e+02 3.489e+02 4.256e+02 5.952e+02, threshold=6.979e+02, percent-clipped=0.0 2023-10-09 14:19:39,773 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2782042.6666666665, ans=0.1 2023-10-09 14:20:07,765 INFO [train.py:1031] (0/4) Epoch 14, batch 11450, loss[loss=0.2155, simple_loss=0.2574, pruned_loss=0.06497, ctc_loss=0.1091, over 16506.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2816, pruned_loss=0.06351, ctc_loss=0.1111, over 3290819.98 frames. ], batch size: 466, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:20:14,351 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-10-09 14:20:48,637 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2782322.6666666665, ans=0.1 2023-10-09 14:20:52,748 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2782322.6666666665, ans=15.0 2023-10-09 14:21:08,509 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=22.5 2023-10-09 14:21:08,869 INFO [train.py:1031] (0/4) Epoch 14, batch 11500, loss[loss=0.2352, simple_loss=0.2792, pruned_loss=0.07247, ctc_loss=0.1157, over 16723.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2838, pruned_loss=0.06628, ctc_loss=0.1159, over 3300853.28 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:21:21,799 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2782462.6666666665, ans=0.0 2023-10-09 14:21:24,466 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2782462.6666666665, ans=0.04949747468305833 2023-10-09 14:21:43,380 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:44,092 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.359e+02 3.820e+02 4.365e+02 7.019e+02, threshold=7.640e+02, percent-clipped=1.0 2023-10-09 14:21:47,262 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2782556.0, ans=0.035 2023-10-09 14:21:59,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2782602.6666666665, ans=0.1 2023-10-09 14:22:01,465 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-10-09 14:22:11,638 INFO [train.py:1031] (0/4) Epoch 14, batch 11550, loss[loss=0.2042, simple_loss=0.2573, pruned_loss=0.05554, ctc_loss=0.09991, over 16798.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2875, pruned_loss=0.06756, ctc_loss=0.118, over 3301046.31 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:22:26,726 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2782696.0, ans=0.125 2023-10-09 14:22:52,944 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2782789.3333333335, ans=0.1 2023-10-09 14:23:02,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2782836.0, ans=0.05 2023-10-09 14:23:13,431 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2782836.0, ans=0.2 2023-10-09 14:23:15,896 INFO [train.py:1031] (0/4) Epoch 14, batch 11600, loss[loss=0.3536, simple_loss=0.4098, pruned_loss=0.1081, ctc_loss=0.203, over 16557.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.294, pruned_loss=0.06681, ctc_loss=0.1174, over 3299388.11 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:23:21,571 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-10-09 14:23:24,313 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-10-09 14:23:31,487 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2782929.3333333335, ans=0.0 2023-10-09 14:23:47,489 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2782976.0, ans=0.125 2023-10-09 14:23:52,925 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 3.363e+02 4.073e+02 4.861e+02 8.872e+02, threshold=8.146e+02, percent-clipped=3.0 2023-10-09 14:23:54,422 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2783022.6666666665, ans=0.1 2023-10-09 14:24:09,910 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:24:19,982 INFO [train.py:1031] (0/4) Epoch 14, batch 11650, loss[loss=0.189, simple_loss=0.2416, pruned_loss=0.0514, ctc_loss=0.0841, over 16719.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2951, pruned_loss=0.06667, ctc_loss=0.1172, over 3295675.36 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:24:44,386 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2783209.3333333335, ans=0.0 2023-10-09 14:24:44,438 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2783209.3333333335, ans=0.125 2023-10-09 14:24:44,634 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-10-09 14:25:05,732 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2023-10-09 14:25:14,116 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2783302.6666666665, ans=0.1 2023-10-09 14:25:23,312 INFO [train.py:1031] (0/4) Epoch 14, batch 11700, loss[loss=0.2092, simple_loss=0.2514, pruned_loss=0.06194, ctc_loss=0.1078, over 16729.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2895, pruned_loss=0.06619, ctc_loss=0.116, over 3298867.80 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:25:23,676 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2783349.3333333335, ans=0.2 2023-10-09 14:25:25,562 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2783349.3333333335, ans=0.0 2023-10-09 14:25:43,564 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2783396.0, ans=0.2 2023-10-09 14:25:53,253 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2783442.6666666665, ans=10.0 2023-10-09 14:25:58,214 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.451e+02 4.279e+02 5.142e+02 9.107e+02, threshold=8.558e+02, percent-clipped=4.0 2023-10-09 14:26:12,484 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=22.5 2023-10-09 14:26:14,978 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2783536.0, ans=0.2 2023-10-09 14:26:18,368 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.91 vs. limit=10.0 2023-10-09 14:26:19,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2783536.0, ans=0.125 2023-10-09 14:26:21,233 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2783536.0, ans=0.1 2023-10-09 14:26:21,754 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2023-10-09 14:26:23,053 INFO [train.py:1031] (0/4) Epoch 14, batch 11750, loss[loss=0.2223, simple_loss=0.2763, pruned_loss=0.06361, ctc_loss=0.1027, over 12604.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2822, pruned_loss=0.06515, ctc_loss=0.114, over 3296892.91 frames. ], batch size: 35, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:26:32,466 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2783582.6666666665, ans=0.1 2023-10-09 14:27:07,414 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2783722.6666666665, ans=0.0 2023-10-09 14:27:08,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2783722.6666666665, ans=0.125 2023-10-09 14:27:10,460 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2783769.3333333335, ans=0.2 2023-10-09 14:27:24,342 INFO [train.py:1031] (0/4) Epoch 14, batch 11800, loss[loss=0.2057, simple_loss=0.2476, pruned_loss=0.06216, ctc_loss=0.09832, over 16274.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2764, pruned_loss=0.0643, ctc_loss=0.1124, over 3296075.22 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:27:50,296 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2783909.3333333335, ans=0.125 2023-10-09 14:27:58,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2783909.3333333335, ans=0.1 2023-10-09 14:27:58,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2783909.3333333335, ans=0.0 2023-10-09 14:27:58,425 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2023-10-09 14:28:03,482 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+02 3.032e+02 3.578e+02 4.296e+02 8.317e+02, threshold=7.156e+02, percent-clipped=0.0 2023-10-09 14:28:13,367 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2783956.0, ans=0.125 2023-10-09 14:28:29,803 INFO [train.py:1031] (0/4) Epoch 14, batch 11850, loss[loss=0.2134, simple_loss=0.259, pruned_loss=0.06347, ctc_loss=0.1022, over 16782.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2826, pruned_loss=0.06357, ctc_loss=0.1115, over 3298576.69 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:28:43,363 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2784096.0, ans=0.125 2023-10-09 14:29:01,338 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2784142.6666666665, ans=0.125 2023-10-09 14:29:10,074 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2784189.3333333335, ans=0.2 2023-10-09 14:29:11,399 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=22.5 2023-10-09 14:29:14,845 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2784189.3333333335, ans=0.2 2023-10-09 14:29:26,199 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784236.0, ans=0.1 2023-10-09 14:29:33,116 INFO [train.py:1031] (0/4) Epoch 14, batch 11900, loss[loss=0.2129, simple_loss=0.2961, pruned_loss=0.04779, ctc_loss=0.0853, over 15226.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2884, pruned_loss=0.0627, ctc_loss=0.1106, over 3288732.56 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:29:41,263 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2023-10-09 14:29:42,566 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2784282.6666666665, ans=0.125 2023-10-09 14:30:10,494 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2784422.6666666665, ans=0.5 2023-10-09 14:30:14,076 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.208e+02 3.772e+02 4.590e+02 1.035e+03, threshold=7.543e+02, percent-clipped=4.0 2023-10-09 14:30:26,920 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2784469.3333333335, ans=0.125 2023-10-09 14:30:27,668 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2784469.3333333335, ans=0.125 2023-10-09 14:30:36,488 INFO [train.py:1031] (0/4) Epoch 14, batch 11950, loss[loss=0.2539, simple_loss=0.3161, pruned_loss=0.07124, ctc_loss=0.1228, over 16787.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2909, pruned_loss=0.06448, ctc_loss=0.1132, over 3284738.81 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:31:14,314 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2784656.0, ans=0.1 2023-10-09 14:31:35,027 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:31:36,386 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=22.5 2023-10-09 14:31:40,165 INFO [train.py:1031] (0/4) Epoch 14, batch 12000, loss[loss=0.2622, simple_loss=0.3375, pruned_loss=0.06903, ctc_loss=0.1221, over 16855.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2932, pruned_loss=0.06409, ctc_loss=0.1132, over 3283401.77 frames. ], batch size: 308, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:31:40,166 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 14:31:54,601 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2358, simple_loss=0.3055, pruned_loss=0.064, ctc_loss=0.09509, over 1796401.00 frames. 2023-10-09 14:31:54,602 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 14:32:11,733 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784796.0, ans=0.1 2023-10-09 14:32:16,102 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2784796.0, ans=0.0 2023-10-09 14:32:22,615 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:32:34,562 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-10-09 14:32:36,798 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 3.474e+02 4.193e+02 5.077e+02 1.283e+03, threshold=8.386e+02, percent-clipped=9.0 2023-10-09 14:32:47,809 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:32:48,028 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-10-09 14:33:00,831 INFO [train.py:1031] (0/4) Epoch 14, batch 12050, loss[loss=0.281, simple_loss=0.3406, pruned_loss=0.08313, ctc_loss=0.1379, over 16855.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2938, pruned_loss=0.06477, ctc_loss=0.1131, over 3290567.32 frames. ], batch size: 329, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:33:02,292 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2784982.6666666665, ans=0.125 2023-10-09 14:33:08,115 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2784982.6666666665, ans=0.125 2023-10-09 14:33:15,057 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-10-09 14:33:44,789 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2785122.6666666665, ans=0.125 2023-10-09 14:33:45,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2785122.6666666665, ans=0.0 2023-10-09 14:33:51,618 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2785169.3333333335, ans=0.0 2023-10-09 14:33:56,426 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2785169.3333333335, ans=0.125 2023-10-09 14:34:03,707 INFO [train.py:1031] (0/4) Epoch 14, batch 12100, loss[loss=0.3307, simple_loss=0.3415, pruned_loss=0.1194, ctc_loss=0.2028, over 16797.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2957, pruned_loss=0.06635, ctc_loss=0.1156, over 3295813.34 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:34:12,522 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2785216.0, ans=0.125 2023-10-09 14:34:45,439 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+02 3.559e+02 4.280e+02 5.187e+02 9.097e+02, threshold=8.560e+02, percent-clipped=2.0 2023-10-09 14:35:06,631 INFO [train.py:1031] (0/4) Epoch 14, batch 12150, loss[loss=0.2392, simple_loss=0.306, pruned_loss=0.06421, ctc_loss=0.1098, over 16898.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2959, pruned_loss=0.06686, ctc_loss=0.117, over 3304342.74 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:35:08,831 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2785449.3333333335, ans=0.0 2023-10-09 14:35:50,308 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2785589.3333333335, ans=0.0 2023-10-09 14:36:05,683 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2023-10-09 14:36:09,749 INFO [train.py:1031] (0/4) Epoch 14, batch 12200, loss[loss=0.1615, simple_loss=0.2032, pruned_loss=0.04542, ctc_loss=0.0722, over 16560.00 frames. ], tot_loss[loss=0.2476, simple_loss=0.3099, pruned_loss=0.06823, ctc_loss=0.1219, over 3295401.85 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:36:21,869 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2785729.3333333335, ans=0.125 2023-10-09 14:36:30,597 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2785729.3333333335, ans=0.1 2023-10-09 14:36:36,362 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2785776.0, ans=0.1 2023-10-09 14:36:36,737 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-10-09 14:36:38,237 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=22.5 2023-10-09 14:36:52,409 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.614e+02 4.622e+02 6.130e+02 1.347e+03, threshold=9.244e+02, percent-clipped=11.0 2023-10-09 14:37:09,699 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2785869.3333333335, ans=0.0 2023-10-09 14:37:12,092 INFO [train.py:1031] (0/4) Epoch 14, batch 12250, loss[loss=0.205, simple_loss=0.2517, pruned_loss=0.05871, ctc_loss=0.1021, over 16802.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.3022, pruned_loss=0.06648, ctc_loss=0.1189, over 3293189.17 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:37:15,636 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2785916.0, ans=0.125 2023-10-09 14:37:16,973 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=22.5 2023-10-09 14:37:24,282 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2785962.6666666665, ans=0.125 2023-10-09 14:37:41,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2786009.3333333335, ans=0.0 2023-10-09 14:37:50,332 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2786056.0, ans=0.0 2023-10-09 14:37:58,802 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2786102.6666666665, ans=0.0 2023-10-09 14:38:06,242 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-10-09 14:38:12,141 INFO [train.py:1031] (0/4) Epoch 14, batch 12300, loss[loss=0.1999, simple_loss=0.2261, pruned_loss=0.06305, ctc_loss=0.1191, over 15529.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2908, pruned_loss=0.06476, ctc_loss=0.1153, over 3289842.11 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:38:14,778 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2786149.3333333335, ans=0.1 2023-10-09 14:38:39,980 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2786242.6666666665, ans=0.0 2023-10-09 14:38:45,805 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2023-10-09 14:38:54,471 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2786289.3333333335, ans=0.0 2023-10-09 14:38:55,170 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+02 3.070e+02 3.744e+02 4.897e+02 1.313e+03, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 14:39:01,418 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2786336.0, ans=0.125 2023-10-09 14:39:13,380 INFO [train.py:1031] (0/4) Epoch 14, batch 12350, loss[loss=0.2079, simple_loss=0.2633, pruned_loss=0.05615, ctc_loss=0.1007, over 16530.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2938, pruned_loss=0.06583, ctc_loss=0.1171, over 3290425.19 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:39:13,785 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2786382.6666666665, ans=0.2 2023-10-09 14:39:24,461 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2786429.3333333335, ans=0.0 2023-10-09 14:39:25,531 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2786429.3333333335, ans=0.1 2023-10-09 14:39:34,831 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2786429.3333333335, ans=0.07 2023-10-09 14:39:50,498 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2786522.6666666665, ans=0.125 2023-10-09 14:39:55,707 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2786522.6666666665, ans=0.2 2023-10-09 14:40:05,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2786569.3333333335, ans=0.0 2023-10-09 14:40:14,937 INFO [train.py:1031] (0/4) Epoch 14, batch 12400, loss[loss=0.256, simple_loss=0.3178, pruned_loss=0.07003, ctc_loss=0.1352, over 16814.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2893, pruned_loss=0.06241, ctc_loss=0.1116, over 3296876.61 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:40:16,345 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2786616.0, ans=0.0 2023-10-09 14:40:19,127 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2786616.0, ans=0.1 2023-10-09 14:40:36,188 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2786662.6666666665, ans=0.125 2023-10-09 14:40:48,681 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2786709.3333333335, ans=0.125 2023-10-09 14:41:00,375 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.233e+02 3.612e+02 4.098e+02 6.929e+02, threshold=7.223e+02, percent-clipped=0.0 2023-10-09 14:41:02,894 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2786756.0, ans=0.125 2023-10-09 14:41:07,982 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-10-09 14:41:14,068 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2786802.6666666665, ans=0.0 2023-10-09 14:41:17,629 INFO [train.py:1031] (0/4) Epoch 14, batch 12450, loss[loss=0.2505, simple_loss=0.3247, pruned_loss=0.06413, ctc_loss=0.1203, over 15211.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2896, pruned_loss=0.06141, ctc_loss=0.1103, over 3296098.47 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:41:51,594 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2786942.6666666665, ans=0.125 2023-10-09 14:41:54,674 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2786989.3333333335, ans=0.0 2023-10-09 14:41:58,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2786989.3333333335, ans=0.0 2023-10-09 14:42:05,756 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2786989.3333333335, ans=0.125 2023-10-09 14:42:07,191 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-10-09 14:42:12,051 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-10-09 14:42:19,503 INFO [train.py:1031] (0/4) Epoch 14, batch 12500, loss[loss=0.2161, simple_loss=0.274, pruned_loss=0.05801, ctc_loss=0.1055, over 16848.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.289, pruned_loss=0.06012, ctc_loss=0.1083, over 3298070.01 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:42:23,321 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-10-09 14:42:43,993 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2787176.0, ans=0.1 2023-10-09 14:42:45,024 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2787176.0, ans=0.125 2023-10-09 14:43:07,996 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.957e+02 3.304e+02 4.556e+02 8.176e+02, threshold=6.608e+02, percent-clipped=1.0 2023-10-09 14:43:09,825 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-10-09 14:43:23,453 INFO [train.py:1031] (0/4) Epoch 14, batch 12550, loss[loss=0.1756, simple_loss=0.2352, pruned_loss=0.04348, ctc_loss=0.07284, over 16554.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2874, pruned_loss=0.05885, ctc_loss=0.106, over 3302933.84 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:43:35,775 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.33 vs. limit=10.0 2023-10-09 14:43:52,850 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787409.3333333335, ans=0.1 2023-10-09 14:43:55,269 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-10-09 14:43:55,936 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787409.3333333335, ans=0.1 2023-10-09 14:43:57,136 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2787409.3333333335, ans=0.0 2023-10-09 14:44:03,047 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2787456.0, ans=10.0 2023-10-09 14:44:11,283 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2023-10-09 14:44:12,459 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-10-09 14:44:23,449 INFO [train.py:1031] (0/4) Epoch 14, batch 12600, loss[loss=0.1916, simple_loss=0.2583, pruned_loss=0.04594, ctc_loss=0.08239, over 16718.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2832, pruned_loss=0.056, ctc_loss=0.1013, over 3300831.02 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:44:27,004 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2787549.3333333335, ans=0.0 2023-10-09 14:44:27,084 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2787549.3333333335, ans=0.0 2023-10-09 14:44:29,525 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-10-09 14:44:44,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2787596.0, ans=0.0 2023-10-09 14:44:47,892 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2787642.6666666665, ans=0.125 2023-10-09 14:45:07,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2787689.3333333335, ans=0.0 2023-10-09 14:45:11,285 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 3.144e+02 3.491e+02 4.128e+02 9.398e+02, threshold=6.982e+02, percent-clipped=1.0 2023-10-09 14:45:12,994 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-10-09 14:45:24,788 INFO [train.py:1031] (0/4) Epoch 14, batch 12650, loss[loss=0.2055, simple_loss=0.2553, pruned_loss=0.05805, ctc_loss=0.09927, over 16777.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2823, pruned_loss=0.05805, ctc_loss=0.1041, over 3308068.71 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:45:41,003 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2787829.3333333335, ans=0.1 2023-10-09 14:46:16,417 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2787969.3333333335, ans=0.125 2023-10-09 14:46:17,584 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2787969.3333333335, ans=0.07 2023-10-09 14:46:22,024 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=12.0 2023-10-09 14:46:26,369 INFO [train.py:1031] (0/4) Epoch 14, batch 12700, loss[loss=0.1946, simple_loss=0.2423, pruned_loss=0.05459, ctc_loss=0.09445, over 16666.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2784, pruned_loss=0.05977, ctc_loss=0.1066, over 3307370.35 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:46:32,410 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2788016.0, ans=0.125 2023-10-09 14:46:43,638 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:46:43,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2788062.6666666665, ans=0.0 2023-10-09 14:46:44,547 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2788062.6666666665, ans=0.04949747468305833 2023-10-09 14:46:44,606 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2788062.6666666665, ans=0.125 2023-10-09 14:46:44,948 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-10-09 14:46:49,181 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2788062.6666666665, ans=0.0 2023-10-09 14:46:55,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2788109.3333333335, ans=0.125 2023-10-09 14:47:12,757 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2788156.0, ans=10.0 2023-10-09 14:47:15,779 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.456e+02 3.994e+02 4.833e+02 1.526e+03, threshold=7.989e+02, percent-clipped=4.0 2023-10-09 14:47:27,068 INFO [train.py:1031] (0/4) Epoch 14, batch 12750, loss[loss=0.2194, simple_loss=0.2776, pruned_loss=0.06021, ctc_loss=0.1019, over 16681.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2796, pruned_loss=0.06249, ctc_loss=0.111, over 3304712.60 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:47:58,864 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-10-09 14:48:17,615 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2788436.0, ans=0.0 2023-10-09 14:48:29,383 INFO [train.py:1031] (0/4) Epoch 14, batch 12800, loss[loss=0.2906, simple_loss=0.3758, pruned_loss=0.07432, ctc_loss=0.1418, over 15081.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.288, pruned_loss=0.06484, ctc_loss=0.1154, over 3306429.59 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:48:32,398 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2788482.6666666665, ans=0.2 2023-10-09 14:48:32,459 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2788482.6666666665, ans=0.1 2023-10-09 14:49:05,739 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-10-09 14:49:07,416 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2788622.6666666665, ans=0.1 2023-10-09 14:49:18,036 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+02 3.551e+02 3.934e+02 4.932e+02 8.018e+02, threshold=7.868e+02, percent-clipped=1.0 2023-10-09 14:49:28,922 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2788669.3333333335, ans=0.0 2023-10-09 14:49:30,703 INFO [train.py:1031] (0/4) Epoch 14, batch 12850, loss[loss=0.308, simple_loss=0.3415, pruned_loss=0.1022, ctc_loss=0.1753, over 16701.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2932, pruned_loss=0.06607, ctc_loss=0.1175, over 3299139.69 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:49:54,177 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2788762.6666666665, ans=0.125 2023-10-09 14:50:04,421 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2788809.3333333335, ans=0.0 2023-10-09 14:50:32,944 INFO [train.py:1031] (0/4) Epoch 14, batch 12900, loss[loss=0.2605, simple_loss=0.3399, pruned_loss=0.06639, ctc_loss=0.1209, over 15265.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.2989, pruned_loss=0.06889, ctc_loss=0.1219, over 3285494.96 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:50:39,198 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:50:54,239 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2788996.0, ans=0.0 2023-10-09 14:50:55,394 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2788996.0, ans=0.1 2023-10-09 14:51:01,390 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2789042.6666666665, ans=10.0 2023-10-09 14:51:02,429 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2789042.6666666665, ans=0.125 2023-10-09 14:51:07,160 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2789042.6666666665, ans=0.0 2023-10-09 14:51:26,856 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+02 3.447e+02 3.800e+02 4.409e+02 9.438e+02, threshold=7.600e+02, percent-clipped=3.0 2023-10-09 14:51:29,321 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2789136.0, ans=0.125 2023-10-09 14:51:35,868 INFO [train.py:1031] (0/4) Epoch 14, batch 12950, loss[loss=0.2089, simple_loss=0.2962, pruned_loss=0.04483, ctc_loss=0.08, over 16498.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2998, pruned_loss=0.06532, ctc_loss=0.1169, over 3292600.42 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:51:47,360 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:52:10,385 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2789276.0, ans=0.125 2023-10-09 14:52:26,802 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2789369.3333333335, ans=0.0 2023-10-09 14:52:34,506 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2789369.3333333335, ans=0.125 2023-10-09 14:52:36,322 INFO [train.py:1031] (0/4) Epoch 14, batch 13000, loss[loss=0.2172, simple_loss=0.2623, pruned_loss=0.06504, ctc_loss=0.1051, over 16802.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2902, pruned_loss=0.06168, ctc_loss=0.1101, over 3295518.61 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:52:52,184 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2789462.6666666665, ans=0.0 2023-10-09 14:53:10,276 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2789509.3333333335, ans=10.0 2023-10-09 14:53:26,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2789602.6666666665, ans=0.125 2023-10-09 14:53:28,013 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.825e+02 3.282e+02 3.972e+02 1.143e+03, threshold=6.563e+02, percent-clipped=1.0 2023-10-09 14:53:36,424 INFO [train.py:1031] (0/4) Epoch 14, batch 13050, loss[loss=0.22, simple_loss=0.2654, pruned_loss=0.06379, ctc_loss=0.1177, over 16826.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2815, pruned_loss=0.06084, ctc_loss=0.1082, over 3297029.43 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:53:37,813 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2789649.3333333335, ans=0.125 2023-10-09 14:53:39,937 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=2789649.3333333335, ans=0.02 2023-10-09 14:53:43,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2789649.3333333335, ans=0.04949747468305833 2023-10-09 14:53:45,854 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2789649.3333333335, ans=0.1 2023-10-09 14:54:02,681 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-10-09 14:54:14,992 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2789789.3333333335, ans=0.0 2023-10-09 14:54:24,791 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2789836.0, ans=0.125 2023-10-09 14:54:37,341 INFO [train.py:1031] (0/4) Epoch 14, batch 13100, loss[loss=0.2921, simple_loss=0.3503, pruned_loss=0.08581, ctc_loss=0.1556, over 16935.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2809, pruned_loss=0.0632, ctc_loss=0.1118, over 3303591.05 frames. ], batch size: 330, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:54:55,517 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2789929.3333333335, ans=0.125 2023-10-09 14:55:00,413 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2789929.3333333335, ans=0.95 2023-10-09 14:55:13,928 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2789976.0, ans=0.0 2023-10-09 14:55:20,530 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2790022.6666666665, ans=0.125 2023-10-09 14:55:21,504 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2790022.6666666665, ans=0.125 2023-10-09 14:55:23,797 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-10-09 14:55:26,434 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2790022.6666666665, ans=0.125 2023-10-09 14:55:29,340 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2790069.3333333335, ans=0.125 2023-10-09 14:55:32,623 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.251e+02 4.048e+02 5.157e+02 1.010e+03, threshold=8.097e+02, percent-clipped=11.0 2023-10-09 14:55:42,104 INFO [train.py:1031] (0/4) Epoch 14, batch 13150, loss[loss=0.2486, simple_loss=0.3217, pruned_loss=0.06353, ctc_loss=0.1208, over 16915.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2906, pruned_loss=0.06466, ctc_loss=0.115, over 3294825.45 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:56:08,017 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=22.5 2023-10-09 14:56:08,735 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2790209.3333333335, ans=0.0 2023-10-09 14:56:20,353 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2790256.0, ans=0.1 2023-10-09 14:56:20,398 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2790256.0, ans=0.0 2023-10-09 14:56:26,472 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2790256.0, ans=0.125 2023-10-09 14:56:37,560 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2790302.6666666665, ans=0.0 2023-10-09 14:56:45,812 INFO [train.py:1031] (0/4) Epoch 14, batch 13200, loss[loss=0.2534, simple_loss=0.2932, pruned_loss=0.07822, ctc_loss=0.143, over 15249.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.2982, pruned_loss=0.06855, ctc_loss=0.122, over 3293645.04 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:56:54,917 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2790349.3333333335, ans=0.2 2023-10-09 14:57:01,170 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2790396.0, ans=0.125 2023-10-09 14:57:01,597 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2023-10-09 14:57:08,859 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2790396.0, ans=0.125 2023-10-09 14:57:20,950 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2790442.6666666665, ans=0.1 2023-10-09 14:57:29,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2790489.3333333335, ans=0.125 2023-10-09 14:57:38,932 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:57:41,600 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+02 3.287e+02 3.760e+02 4.559e+02 7.411e+02, threshold=7.519e+02, percent-clipped=0.0 2023-10-09 14:57:48,113 INFO [train.py:1031] (0/4) Epoch 14, batch 13250, loss[loss=0.2201, simple_loss=0.2757, pruned_loss=0.06023, ctc_loss=0.1098, over 16882.00 frames. ], tot_loss[loss=0.2401, simple_loss=0.2987, pruned_loss=0.06694, ctc_loss=0.1194, over 3291974.12 frames. ], batch size: 216, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:58:40,626 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=2790769.3333333335, ans=0.1 2023-10-09 14:58:47,307 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2790769.3333333335, ans=0.125 2023-10-09 14:58:49,205 INFO [train.py:1031] (0/4) Epoch 14, batch 13300, loss[loss=0.2167, simple_loss=0.2662, pruned_loss=0.06197, ctc_loss=0.1082, over 16689.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2913, pruned_loss=0.06609, ctc_loss=0.1173, over 3294848.93 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:59:04,016 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2023-10-09 14:59:11,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2790862.6666666665, ans=0.2 2023-10-09 14:59:16,608 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2790909.3333333335, ans=0.125 2023-10-09 14:59:25,579 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2790909.3333333335, ans=0.0 2023-10-09 14:59:40,052 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2791002.6666666665, ans=0.2 2023-10-09 14:59:47,826 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+02 3.361e+02 3.791e+02 4.833e+02 1.183e+03, threshold=7.583e+02, percent-clipped=5.0 2023-10-09 14:59:52,774 INFO [train.py:1031] (0/4) Epoch 14, batch 13350, loss[loss=0.1783, simple_loss=0.2264, pruned_loss=0.04825, ctc_loss=0.08401, over 16680.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.29, pruned_loss=0.06431, ctc_loss=0.1142, over 3288046.88 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:00:05,578 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2791096.0, ans=0.2 2023-10-09 15:00:06,456 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2791096.0, ans=0.0 2023-10-09 15:00:10,179 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2023-10-09 15:00:11,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2791096.0, ans=0.125 2023-10-09 15:00:17,308 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2791142.6666666665, ans=0.0 2023-10-09 15:00:31,104 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2791189.3333333335, ans=0.0 2023-10-09 15:00:48,506 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2791236.0, ans=0.125 2023-10-09 15:00:55,836 INFO [train.py:1031] (0/4) Epoch 14, batch 13400, loss[loss=0.2721, simple_loss=0.3127, pruned_loss=0.08713, ctc_loss=0.1432, over 16789.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2903, pruned_loss=0.06479, ctc_loss=0.1136, over 3285964.01 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:01:00,734 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2791282.6666666665, ans=0.125 2023-10-09 15:01:10,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2791329.3333333335, ans=0.125 2023-10-09 15:01:15,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2791329.3333333335, ans=0.125 2023-10-09 15:01:33,092 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2791422.6666666665, ans=0.1 2023-10-09 15:01:40,151 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-10-09 15:01:41,206 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-10-09 15:01:55,332 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.467e+02 4.135e+02 5.221e+02 9.023e+02, threshold=8.270e+02, percent-clipped=2.0 2023-10-09 15:01:55,626 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2791469.3333333335, ans=0.125 2023-10-09 15:01:55,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2791469.3333333335, ans=0.0 2023-10-09 15:01:57,438 INFO [train.py:1031] (0/4) Epoch 14, batch 13450, loss[loss=0.1971, simple_loss=0.2481, pruned_loss=0.05426, ctc_loss=0.09393, over 16787.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2854, pruned_loss=0.064, ctc_loss=0.1117, over 3291422.67 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:02:04,400 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2791516.0, ans=0.5 2023-10-09 15:02:43,834 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2791656.0, ans=0.125 2023-10-09 15:02:59,443 INFO [train.py:1031] (0/4) Epoch 14, batch 13500, loss[loss=0.1748, simple_loss=0.2511, pruned_loss=0.03583, ctc_loss=0.06727, over 16856.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2806, pruned_loss=0.06151, ctc_loss=0.1075, over 3292230.76 frames. ], batch size: 189, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:02:59,763 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2791749.3333333335, ans=0.0 2023-10-09 15:03:14,880 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2791796.0, ans=0.0 2023-10-09 15:03:39,460 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2791889.3333333335, ans=0.2 2023-10-09 15:03:56,281 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2791936.0, ans=0.125 2023-10-09 15:04:00,114 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2791936.0, ans=0.0 2023-10-09 15:04:01,768 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.985e+02 3.501e+02 4.657e+02 8.617e+02, threshold=7.002e+02, percent-clipped=1.0 2023-10-09 15:04:01,794 INFO [train.py:1031] (0/4) Epoch 14, batch 13550, loss[loss=0.2721, simple_loss=0.3159, pruned_loss=0.08471, ctc_loss=0.1472, over 16914.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2819, pruned_loss=0.06138, ctc_loss=0.1077, over 3295309.34 frames. ], batch size: 292, lr: 2.57e-03, grad_scale: 0.5 2023-10-09 15:04:08,182 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2791982.6666666665, ans=0.125 2023-10-09 15:04:49,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792122.6666666665, ans=0.1 2023-10-09 15:05:05,326 INFO [train.py:1031] (0/4) Epoch 14, batch 13600, loss[loss=0.2323, simple_loss=0.3086, pruned_loss=0.05611, ctc_loss=0.1097, over 16818.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.288, pruned_loss=0.06426, ctc_loss=0.1124, over 3300914.64 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:05:08,294 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2792216.0, ans=0.125 2023-10-09 15:05:11,478 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-10-09 15:05:22,292 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2023-10-09 15:05:24,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2792262.6666666665, ans=0.0 2023-10-09 15:05:41,423 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2792309.3333333335, ans=0.125 2023-10-09 15:05:49,447 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2023-10-09 15:05:52,149 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2792356.0, ans=0.0 2023-10-09 15:05:52,215 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2792356.0, ans=0.125 2023-10-09 15:05:54,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792402.6666666665, ans=0.1 2023-10-09 15:06:08,500 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 3.317e+02 4.234e+02 5.653e+02 1.556e+03, threshold=8.468e+02, percent-clipped=11.0 2023-10-09 15:06:08,528 INFO [train.py:1031] (0/4) Epoch 14, batch 13650, loss[loss=0.2317, simple_loss=0.3081, pruned_loss=0.05699, ctc_loss=0.1031, over 16793.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2902, pruned_loss=0.06156, ctc_loss=0.1088, over 3306526.25 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:06:12,697 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:06:14,321 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2792449.3333333335, ans=0.125 2023-10-09 15:06:19,180 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792449.3333333335, ans=0.1 2023-10-09 15:06:37,828 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-10-09 15:07:01,539 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2792636.0, ans=0.0 2023-10-09 15:07:02,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2792636.0, ans=0.125 2023-10-09 15:07:11,263 INFO [train.py:1031] (0/4) Epoch 14, batch 13700, loss[loss=0.1955, simple_loss=0.2769, pruned_loss=0.04253, ctc_loss=0.07246, over 16803.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2946, pruned_loss=0.06161, ctc_loss=0.1091, over 3306680.30 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:07:12,689 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2792682.6666666665, ans=0.125 2023-10-09 15:07:26,906 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2792729.3333333335, ans=0.125 2023-10-09 15:07:29,266 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-10-09 15:07:41,185 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2792776.0, ans=0.2 2023-10-09 15:07:53,348 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2792822.6666666665, ans=0.1 2023-10-09 15:08:00,110 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2792822.6666666665, ans=0.2 2023-10-09 15:08:15,042 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 3.067e+02 3.765e+02 4.506e+02 1.005e+03, threshold=7.530e+02, percent-clipped=2.0 2023-10-09 15:08:15,069 INFO [train.py:1031] (0/4) Epoch 14, batch 13750, loss[loss=0.2615, simple_loss=0.3164, pruned_loss=0.07708, ctc_loss=0.1308, over 16743.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.295, pruned_loss=0.06011, ctc_loss=0.1073, over 3308639.64 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:08:36,171 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2792962.6666666665, ans=0.0 2023-10-09 15:08:56,030 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2793056.0, ans=0.0 2023-10-09 15:09:03,898 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2793056.0, ans=0.125 2023-10-09 15:09:12,186 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2793102.6666666665, ans=0.125 2023-10-09 15:09:17,801 INFO [train.py:1031] (0/4) Epoch 14, batch 13800, loss[loss=0.2432, simple_loss=0.2999, pruned_loss=0.06967, ctc_loss=0.1182, over 16898.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2984, pruned_loss=0.06368, ctc_loss=0.113, over 3309008.92 frames. ], batch size: 258, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:09:43,776 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2793242.6666666665, ans=0.125 2023-10-09 15:09:47,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2793242.6666666665, ans=0.125 2023-10-09 15:09:53,666 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-10-09 15:09:57,836 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2793289.3333333335, ans=0.125 2023-10-09 15:10:21,760 INFO [train.py:1031] (0/4) Epoch 14, batch 13850, loss[loss=0.2198, simple_loss=0.2323, pruned_loss=0.07397, ctc_loss=0.1484, over 15404.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2919, pruned_loss=0.06407, ctc_loss=0.1133, over 3304371.48 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:10:22,842 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.299e+02 3.702e+02 4.236e+02 7.153e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 15:10:29,883 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2793382.6666666665, ans=0.125 2023-10-09 15:10:31,067 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-10-09 15:10:35,909 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2793429.3333333335, ans=0.0 2023-10-09 15:10:52,985 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2793476.0, ans=0.1 2023-10-09 15:10:56,133 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2793476.0, ans=0.1 2023-10-09 15:11:01,100 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2793522.6666666665, ans=10.0 2023-10-09 15:11:05,208 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2793522.6666666665, ans=0.0 2023-10-09 15:11:25,341 INFO [train.py:1031] (0/4) Epoch 14, batch 13900, loss[loss=0.272, simple_loss=0.3043, pruned_loss=0.08682, ctc_loss=0.1652, over 16390.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2872, pruned_loss=0.06377, ctc_loss=0.113, over 3312106.31 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:11:25,815 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2793616.0, ans=0.0 2023-10-09 15:11:43,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2793662.6666666665, ans=0.125 2023-10-09 15:12:28,098 INFO [train.py:1031] (0/4) Epoch 14, batch 13950, loss[loss=0.2672, simple_loss=0.3334, pruned_loss=0.07499, ctc_loss=0.1277, over 16869.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2941, pruned_loss=0.06417, ctc_loss=0.1139, over 3305557.87 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:12:30,206 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+02 3.293e+02 3.736e+02 4.752e+02 8.901e+02, threshold=7.472e+02, percent-clipped=3.0 2023-10-09 15:13:22,221 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2794036.0, ans=0.5 2023-10-09 15:13:28,908 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2794036.0, ans=0.0 2023-10-09 15:13:28,949 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2794036.0, ans=0.0 2023-10-09 15:13:31,758 INFO [train.py:1031] (0/4) Epoch 14, batch 14000, loss[loss=0.2385, simple_loss=0.2942, pruned_loss=0.06817, ctc_loss=0.1161, over 16822.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2977, pruned_loss=0.06642, ctc_loss=0.1173, over 3294008.61 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:13:31,978 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2794082.6666666665, ans=0.125 2023-10-09 15:13:34,409 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2794082.6666666665, ans=0.125 2023-10-09 15:13:42,189 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2794082.6666666665, ans=0.125 2023-10-09 15:13:49,833 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:13:58,300 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2794176.0, ans=0.0 2023-10-09 15:14:06,212 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-10-09 15:14:34,477 INFO [train.py:1031] (0/4) Epoch 14, batch 14050, loss[loss=0.2409, simple_loss=0.2667, pruned_loss=0.07887, ctc_loss=0.1435, over 16379.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.296, pruned_loss=0.0654, ctc_loss=0.1156, over 3296792.45 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:14:38,946 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+02 3.126e+02 3.568e+02 4.195e+02 6.339e+02, threshold=7.137e+02, percent-clipped=0.0 2023-10-09 15:14:43,525 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2794316.0, ans=0.125 2023-10-09 15:15:00,870 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2794409.3333333335, ans=0.0 2023-10-09 15:15:12,882 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2794456.0, ans=0.04949747468305833 2023-10-09 15:15:37,073 INFO [train.py:1031] (0/4) Epoch 14, batch 14100, loss[loss=0.2129, simple_loss=0.2565, pruned_loss=0.06326, ctc_loss=0.1073, over 16647.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2868, pruned_loss=0.06398, ctc_loss=0.1134, over 3302403.82 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:16:00,055 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2794642.6666666665, ans=0.125 2023-10-09 15:16:06,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2794642.6666666665, ans=0.125 2023-10-09 15:16:08,745 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794642.6666666665, ans=0.1 2023-10-09 15:16:16,393 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-10-09 15:16:18,996 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2794689.3333333335, ans=0.0 2023-10-09 15:16:36,220 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2794736.0, ans=0.0 2023-10-09 15:16:37,980 INFO [train.py:1031] (0/4) Epoch 14, batch 14150, loss[loss=0.2126, simple_loss=0.2519, pruned_loss=0.06526, ctc_loss=0.107, over 16681.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2793, pruned_loss=0.06333, ctc_loss=0.112, over 3302559.50 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:16:44,073 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.061e+02 3.515e+02 4.416e+02 9.283e+02, threshold=7.030e+02, percent-clipped=2.0 2023-10-09 15:16:54,487 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794829.3333333335, ans=0.1 2023-10-09 15:17:01,790 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2794829.3333333335, ans=0.04949747468305833 2023-10-09 15:17:05,213 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2023-10-09 15:17:16,153 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2794922.6666666665, ans=0.025 2023-10-09 15:17:28,412 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2794969.3333333335, ans=0.2 2023-10-09 15:17:29,380 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794969.3333333335, ans=0.1 2023-10-09 15:17:31,564 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2794969.3333333335, ans=0.07 2023-10-09 15:17:34,083 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=12.0 2023-10-09 15:17:39,322 INFO [train.py:1031] (0/4) Epoch 14, batch 14200, loss[loss=0.1826, simple_loss=0.2535, pruned_loss=0.0412, ctc_loss=0.07287, over 16806.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2747, pruned_loss=0.06153, ctc_loss=0.1089, over 3302815.32 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:17:50,527 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795016.0, ans=0.1 2023-10-09 15:17:59,808 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2795062.6666666665, ans=0.09899494936611666 2023-10-09 15:18:07,952 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:18:08,932 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2795109.3333333335, ans=0.125 2023-10-09 15:18:25,098 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=12.0 2023-10-09 15:18:29,787 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2795202.6666666665, ans=0.0 2023-10-09 15:18:29,813 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2795202.6666666665, ans=0.125 2023-10-09 15:18:43,118 INFO [train.py:1031] (0/4) Epoch 14, batch 14250, loss[loss=0.2853, simple_loss=0.3237, pruned_loss=0.09115, ctc_loss=0.1617, over 16856.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2789, pruned_loss=0.06332, ctc_loss=0.1117, over 3299847.88 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:18:49,205 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.880e+02 3.480e+02 3.927e+02 7.059e+02, threshold=6.960e+02, percent-clipped=1.0 2023-10-09 15:18:55,581 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2795296.0, ans=0.125 2023-10-09 15:19:02,064 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2795296.0, ans=0.0 2023-10-09 15:19:06,421 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2795296.0, ans=10.0 2023-10-09 15:19:33,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2795436.0, ans=0.125 2023-10-09 15:19:41,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2795436.0, ans=0.125 2023-10-09 15:19:44,954 INFO [train.py:1031] (0/4) Epoch 14, batch 14300, loss[loss=0.2073, simple_loss=0.2695, pruned_loss=0.0534, ctc_loss=0.09577, over 17019.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2833, pruned_loss=0.06466, ctc_loss=0.1146, over 3304371.78 frames. ], batch size: 216, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 15:20:05,593 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2795529.3333333335, ans=0.95 2023-10-09 15:20:10,142 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=12.0 2023-10-09 15:20:43,826 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2795669.3333333335, ans=0.0 2023-10-09 15:20:47,326 INFO [train.py:1031] (0/4) Epoch 14, batch 14350, loss[loss=0.205, simple_loss=0.2624, pruned_loss=0.05482, ctc_loss=0.09518, over 16841.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2828, pruned_loss=0.06559, ctc_loss=0.1155, over 3308095.55 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:20:53,861 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.121e+02 3.540e+02 4.017e+02 5.602e+02, threshold=7.080e+02, percent-clipped=0.0 2023-10-09 15:21:01,833 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2795762.6666666665, ans=0.125 2023-10-09 15:21:50,346 INFO [train.py:1031] (0/4) Epoch 14, batch 14400, loss[loss=0.2498, simple_loss=0.3018, pruned_loss=0.07287, ctc_loss=0.1301, over 16770.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2827, pruned_loss=0.06514, ctc_loss=0.1153, over 3314471.80 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:21:59,503 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2795949.3333333335, ans=0.0 2023-10-09 15:22:05,746 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-10-09 15:22:21,033 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2796042.6666666665, ans=0.0 2023-10-09 15:22:34,654 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2796089.3333333335, ans=0.0 2023-10-09 15:22:37,218 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2796089.3333333335, ans=0.125 2023-10-09 15:22:41,116 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2796136.0, ans=0.125 2023-10-09 15:22:49,884 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2796136.0, ans=0.125 2023-10-09 15:22:53,855 INFO [train.py:1031] (0/4) Epoch 14, batch 14450, loss[loss=0.2463, simple_loss=0.3194, pruned_loss=0.06306, ctc_loss=0.1176, over 16203.00 frames. ], tot_loss[loss=0.234, simple_loss=0.287, pruned_loss=0.0669, ctc_loss=0.1181, over 3309763.20 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:23:00,762 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+02 3.304e+02 3.703e+02 4.462e+02 6.927e+02, threshold=7.405e+02, percent-clipped=0.0 2023-10-09 15:23:27,484 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2796276.0, ans=0.125 2023-10-09 15:23:31,017 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2796322.6666666665, ans=0.1 2023-10-09 15:23:33,158 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2796322.6666666665, ans=0.125 2023-10-09 15:23:37,005 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2796322.6666666665, ans=0.125 2023-10-09 15:23:46,585 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2796369.3333333335, ans=0.125 2023-10-09 15:23:54,681 INFO [train.py:1031] (0/4) Epoch 14, batch 14500, loss[loss=0.2016, simple_loss=0.2571, pruned_loss=0.05404, ctc_loss=0.09511, over 16863.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2871, pruned_loss=0.06559, ctc_loss=0.1156, over 3316152.64 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:24:04,344 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-10-09 15:24:12,205 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-10-09 15:24:28,076 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2796509.3333333335, ans=0.07 2023-10-09 15:24:48,075 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2796602.6666666665, ans=0.1 2023-10-09 15:24:56,620 INFO [train.py:1031] (0/4) Epoch 14, batch 14550, loss[loss=0.1901, simple_loss=0.2415, pruned_loss=0.05132, ctc_loss=0.0902, over 16745.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2795, pruned_loss=0.0638, ctc_loss=0.1122, over 3316164.24 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:25:05,844 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+02 3.180e+02 3.818e+02 4.474e+02 1.185e+03, threshold=7.637e+02, percent-clipped=2.0 2023-10-09 15:25:06,171 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2796649.3333333335, ans=0.125 2023-10-09 15:25:30,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2796742.6666666665, ans=0.125 2023-10-09 15:25:37,182 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2796789.3333333335, ans=0.1 2023-10-09 15:25:56,523 INFO [train.py:1031] (0/4) Epoch 14, batch 14600, loss[loss=0.2202, simple_loss=0.2811, pruned_loss=0.05768, ctc_loss=0.1098, over 16805.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2803, pruned_loss=0.06408, ctc_loss=0.1123, over 3310744.58 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:25:58,219 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=22.5 2023-10-09 15:26:05,414 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2796882.6666666665, ans=0.125 2023-10-09 15:26:10,307 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=22.5 2023-10-09 15:26:56,318 INFO [train.py:1031] (0/4) Epoch 14, batch 14650, loss[loss=0.2312, simple_loss=0.2781, pruned_loss=0.06803, ctc_loss=0.1206, over 16936.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2812, pruned_loss=0.06483, ctc_loss=0.1133, over 3309061.67 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:27:05,750 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.040e+02 3.470e+02 3.930e+02 6.552e+02, threshold=6.941e+02, percent-clipped=0.0 2023-10-09 15:27:06,117 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2797116.0, ans=0.2 2023-10-09 15:27:14,474 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2023-10-09 15:27:19,218 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2797162.6666666665, ans=0.0 2023-10-09 15:27:21,345 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2797209.3333333335, ans=0.125 2023-10-09 15:27:35,436 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2797256.0, ans=0.0 2023-10-09 15:27:38,490 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2797256.0, ans=0.125 2023-10-09 15:27:57,842 INFO [train.py:1031] (0/4) Epoch 14, batch 14700, loss[loss=0.2672, simple_loss=0.278, pruned_loss=0.09441, ctc_loss=0.1692, over 16599.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2778, pruned_loss=0.06417, ctc_loss=0.1125, over 3309336.37 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:28:00,350 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2797349.3333333335, ans=0.95 2023-10-09 15:28:09,097 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2797396.0, ans=0.125 2023-10-09 15:28:39,440 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2797489.3333333335, ans=0.09899494936611666 2023-10-09 15:28:59,990 INFO [train.py:1031] (0/4) Epoch 14, batch 14750, loss[loss=0.202, simple_loss=0.2528, pruned_loss=0.05452, ctc_loss=0.1054, over 16766.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.272, pruned_loss=0.06377, ctc_loss=0.112, over 3305260.35 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:29:11,214 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797629.3333333335, ans=0.1 2023-10-09 15:29:11,861 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 3.078e+02 3.394e+02 3.991e+02 6.777e+02, threshold=6.787e+02, percent-clipped=0.0 2023-10-09 15:29:24,853 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2797676.0, ans=0.125 2023-10-09 15:29:37,419 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2797722.6666666665, ans=0.125 2023-10-09 15:29:59,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2797769.3333333335, ans=0.035 2023-10-09 15:30:01,444 INFO [train.py:1031] (0/4) Epoch 14, batch 14800, loss[loss=0.2673, simple_loss=0.3042, pruned_loss=0.08672, ctc_loss=0.1424, over 16775.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2763, pruned_loss=0.06547, ctc_loss=0.1147, over 3311015.76 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:30:01,797 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2797816.0, ans=0.07 2023-10-09 15:30:09,557 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2797816.0, ans=0.0 2023-10-09 15:30:14,328 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.26 vs. limit=10.0 2023-10-09 15:30:25,725 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2797909.3333333335, ans=0.125 2023-10-09 15:30:31,212 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2797909.3333333335, ans=0.04949747468305833 2023-10-09 15:30:38,763 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2797956.0, ans=0.0 2023-10-09 15:30:50,165 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2797956.0, ans=0.2 2023-10-09 15:31:05,101 INFO [train.py:1031] (0/4) Epoch 14, batch 14850, loss[loss=0.1832, simple_loss=0.241, pruned_loss=0.04643, ctc_loss=0.08146, over 16664.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2781, pruned_loss=0.06693, ctc_loss=0.1168, over 3293772.90 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:31:16,853 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.615e+02 3.104e+02 3.584e+02 4.093e+02 5.889e+02, threshold=7.167e+02, percent-clipped=0.0 2023-10-09 15:31:30,770 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2798142.6666666665, ans=0.125 2023-10-09 15:31:40,045 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-10-09 15:32:08,276 INFO [train.py:1031] (0/4) Epoch 14, batch 14900, loss[loss=0.182, simple_loss=0.2319, pruned_loss=0.04983, ctc_loss=0.08109, over 16114.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2743, pruned_loss=0.06595, ctc_loss=0.1153, over 3299726.64 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:32:08,662 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2798282.6666666665, ans=0.025 2023-10-09 15:32:15,990 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2798282.6666666665, ans=0.125 2023-10-09 15:32:23,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2798329.3333333335, ans=0.2 2023-10-09 15:32:55,762 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2798422.6666666665, ans=0.05 2023-10-09 15:33:11,269 INFO [train.py:1031] (0/4) Epoch 14, batch 14950, loss[loss=0.2811, simple_loss=0.3299, pruned_loss=0.08816, ctc_loss=0.1401, over 13216.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2739, pruned_loss=0.06513, ctc_loss=0.1139, over 3296251.76 frames. ], batch size: 38, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:33:22,002 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2798516.0, ans=0.025 2023-10-09 15:33:25,247 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2798562.6666666665, ans=0.125 2023-10-09 15:33:25,938 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+02 3.074e+02 3.344e+02 3.882e+02 5.335e+02, threshold=6.688e+02, percent-clipped=0.0 2023-10-09 15:33:46,570 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2798609.3333333335, ans=0.0 2023-10-09 15:33:50,941 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2798656.0, ans=0.125 2023-10-09 15:34:00,096 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-10-09 15:34:13,087 INFO [train.py:1031] (0/4) Epoch 14, batch 15000, loss[loss=0.2051, simple_loss=0.277, pruned_loss=0.0492, ctc_loss=0.08708, over 16826.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2765, pruned_loss=0.06493, ctc_loss=0.1135, over 3300659.83 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:34:13,088 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 15:34:29,427 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2384, simple_loss=0.3088, pruned_loss=0.06452, ctc_loss=0.09761, over 1796401.00 frames. 2023-10-09 15:34:29,428 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 15:34:40,063 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2798749.3333333335, ans=0.04949747468305833 2023-10-09 15:34:45,866 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2798796.0, ans=0.125 2023-10-09 15:34:47,051 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2798796.0, ans=0.125 2023-10-09 15:35:17,529 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2798889.3333333335, ans=0.125 2023-10-09 15:35:22,649 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.75 vs. limit=10.0 2023-10-09 15:35:32,334 INFO [train.py:1031] (0/4) Epoch 14, batch 15050, loss[loss=0.1997, simple_loss=0.2666, pruned_loss=0.04965, ctc_loss=0.08381, over 16778.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2752, pruned_loss=0.06288, ctc_loss=0.11, over 3302365.16 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:35:38,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2798982.6666666665, ans=0.125 2023-10-09 15:35:42,805 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2798982.6666666665, ans=0.0 2023-10-09 15:35:49,234 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+02 3.126e+02 3.487e+02 4.278e+02 6.504e+02, threshold=6.973e+02, percent-clipped=0.0 2023-10-09 15:36:05,204 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=22.5 2023-10-09 15:36:16,359 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2799122.6666666665, ans=0.125 2023-10-09 15:36:17,844 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=22.5 2023-10-09 15:36:21,779 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2799169.3333333335, ans=0.0 2023-10-09 15:36:27,706 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:36:27,799 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=2799169.3333333335, ans=0.1 2023-10-09 15:36:35,024 INFO [train.py:1031] (0/4) Epoch 14, batch 15100, loss[loss=0.2322, simple_loss=0.2899, pruned_loss=0.06537, ctc_loss=0.1097, over 16933.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2779, pruned_loss=0.06357, ctc_loss=0.1103, over 3303422.26 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:36:52,374 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2799262.6666666665, ans=0.0 2023-10-09 15:37:15,485 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=22.5 2023-10-09 15:37:35,265 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799402.6666666665, ans=0.1 2023-10-09 15:37:36,280 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2799449.3333333335, ans=0.125 2023-10-09 15:37:37,650 INFO [train.py:1031] (0/4) Epoch 14, batch 15150, loss[loss=0.2394, simple_loss=0.286, pruned_loss=0.07096, ctc_loss=0.1273, over 16401.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.284, pruned_loss=0.06552, ctc_loss=0.1138, over 3309308.84 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:37:43,324 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2799449.3333333335, ans=0.2 2023-10-09 15:37:47,967 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2799496.0, ans=0.0 2023-10-09 15:37:55,143 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.319e+02 4.410e+02 5.242e+02 1.151e+03, threshold=8.819e+02, percent-clipped=3.0 2023-10-09 15:37:57,709 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2799496.0, ans=0.125 2023-10-09 15:38:18,295 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-10-09 15:38:33,287 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2799636.0, ans=0.0 2023-10-09 15:38:34,288 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2799636.0, ans=0.95 2023-10-09 15:38:34,341 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799636.0, ans=0.1 2023-10-09 15:38:38,457 INFO [train.py:1031] (0/4) Epoch 14, batch 15200, loss[loss=0.1979, simple_loss=0.2625, pruned_loss=0.04893, ctc_loss=0.08886, over 16793.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2831, pruned_loss=0.06336, ctc_loss=0.1102, over 3314277.64 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:38:46,171 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2799682.6666666665, ans=0.125 2023-10-09 15:39:04,772 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-10-09 15:39:07,979 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=22.5 2023-10-09 15:39:24,426 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2023-10-09 15:39:29,332 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-10-09 15:39:36,126 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:39:38,263 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799869.3333333335, ans=0.1 2023-10-09 15:39:40,021 INFO [train.py:1031] (0/4) Epoch 14, batch 15250, loss[loss=0.1686, simple_loss=0.2542, pruned_loss=0.03081, ctc_loss=0.05331, over 16786.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2804, pruned_loss=0.06013, ctc_loss=0.1048, over 3314174.24 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:39:58,267 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.655e+02 2.982e+02 3.898e+02 5.868e+02, threshold=5.964e+02, percent-clipped=0.0 2023-10-09 15:40:02,175 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-600000.pt 2023-10-09 15:40:07,016 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2800009.3333333335, ans=0.0 2023-10-09 15:40:09,129 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2800009.3333333335, ans=0.125 2023-10-09 15:40:32,164 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2800102.6666666665, ans=0.2 2023-10-09 15:40:44,709 INFO [train.py:1031] (0/4) Epoch 14, batch 15300, loss[loss=0.1897, simple_loss=0.2586, pruned_loss=0.04487, ctc_loss=0.07745, over 16850.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2743, pruned_loss=0.05556, ctc_loss=0.09716, over 3316334.98 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:40:58,838 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2800196.0, ans=0.0 2023-10-09 15:41:07,879 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2800196.0, ans=0.125 2023-10-09 15:41:25,897 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2800289.3333333335, ans=0.0 2023-10-09 15:41:36,196 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2800336.0, ans=0.0 2023-10-09 15:41:42,897 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2800336.0, ans=10.0 2023-10-09 15:41:48,937 INFO [train.py:1031] (0/4) Epoch 14, batch 15350, loss[loss=0.2216, simple_loss=0.2891, pruned_loss=0.05736, ctc_loss=0.09855, over 16864.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2789, pruned_loss=0.05851, ctc_loss=0.1023, over 3311277.34 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:41:58,262 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800382.6666666665, ans=0.1 2023-10-09 15:41:58,275 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2800382.6666666665, ans=0.125 2023-10-09 15:42:05,617 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2800429.3333333335, ans=0.0 2023-10-09 15:42:09,088 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.939e+02 3.401e+02 4.199e+02 7.970e+02, threshold=6.801e+02, percent-clipped=2.0 2023-10-09 15:42:22,053 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2800476.0, ans=0.125 2023-10-09 15:42:53,682 INFO [train.py:1031] (0/4) Epoch 14, batch 15400, loss[loss=0.1985, simple_loss=0.2563, pruned_loss=0.05247, ctc_loss=0.08944, over 16774.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2854, pruned_loss=0.05963, ctc_loss=0.1045, over 3310985.48 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:43:10,959 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2800662.6666666665, ans=0.125 2023-10-09 15:43:14,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2800662.6666666665, ans=0.125 2023-10-09 15:43:23,428 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-10-09 15:43:30,138 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2800709.3333333335, ans=0.0 2023-10-09 15:43:56,846 INFO [train.py:1031] (0/4) Epoch 14, batch 15450, loss[loss=0.1802, simple_loss=0.2331, pruned_loss=0.04812, ctc_loss=0.07752, over 16821.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2821, pruned_loss=0.05922, ctc_loss=0.1028, over 3305391.54 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:43:58,404 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2800849.3333333335, ans=0.125 2023-10-09 15:44:17,382 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 3.271e+02 3.987e+02 5.026e+02 8.046e+02, threshold=7.973e+02, percent-clipped=4.0 2023-10-09 15:44:19,078 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:44:38,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2800989.3333333335, ans=0.0 2023-10-09 15:44:42,255 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2800989.3333333335, ans=0.2 2023-10-09 15:44:44,028 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2800989.3333333335, ans=0.0 2023-10-09 15:44:47,338 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801036.0, ans=0.1 2023-10-09 15:45:00,348 INFO [train.py:1031] (0/4) Epoch 14, batch 15500, loss[loss=0.198, simple_loss=0.2705, pruned_loss=0.04752, ctc_loss=0.0759, over 16887.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2755, pruned_loss=0.05832, ctc_loss=0.1003, over 3302932.26 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:45:04,894 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801082.6666666665, ans=0.1 2023-10-09 15:45:37,602 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2801222.6666666665, ans=0.125 2023-10-09 15:45:41,453 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2801222.6666666665, ans=0.0 2023-10-09 15:45:45,725 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2801222.6666666665, ans=0.0 2023-10-09 15:45:59,988 INFO [train.py:1031] (0/4) Epoch 14, batch 15550, loss[loss=0.2436, simple_loss=0.2859, pruned_loss=0.07596, ctc_loss=0.1232, over 16721.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2759, pruned_loss=0.0597, ctc_loss=0.1019, over 3293291.03 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:46:12,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2801362.6666666665, ans=0.125 2023-10-09 15:46:15,547 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2801362.6666666665, ans=0.1 2023-10-09 15:46:20,284 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2801362.6666666665, ans=0.125 2023-10-09 15:46:22,076 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.223e+02 3.587e+02 4.203e+02 7.757e+02, threshold=7.174e+02, percent-clipped=0.0 2023-10-09 15:46:34,979 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2801456.0, ans=0.1 2023-10-09 15:46:40,651 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2023-10-09 15:46:42,420 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2801456.0, ans=0.125 2023-10-09 15:46:43,496 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2801456.0, ans=0.125 2023-10-09 15:46:54,854 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2801502.6666666665, ans=0.0 2023-10-09 15:46:59,412 INFO [train.py:1031] (0/4) Epoch 14, batch 15600, loss[loss=0.2113, simple_loss=0.2754, pruned_loss=0.05506, ctc_loss=0.09278, over 16735.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2818, pruned_loss=0.06288, ctc_loss=0.1077, over 3295080.22 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:47:24,444 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.91 vs. limit=15.0 2023-10-09 15:47:28,598 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2801642.6666666665, ans=0.1 2023-10-09 15:47:30,938 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-10-09 15:47:41,870 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2801689.3333333335, ans=0.125 2023-10-09 15:47:42,932 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2801689.3333333335, ans=0.0 2023-10-09 15:47:43,035 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2801689.3333333335, ans=0.125 2023-10-09 15:47:47,273 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2801736.0, ans=0.0 2023-10-09 15:47:53,405 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2801736.0, ans=0.125 2023-10-09 15:47:54,959 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2801736.0, ans=0.0 2023-10-09 15:48:00,434 INFO [train.py:1031] (0/4) Epoch 14, batch 15650, loss[loss=0.1954, simple_loss=0.2514, pruned_loss=0.05059, ctc_loss=0.09544, over 16780.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2826, pruned_loss=0.06064, ctc_loss=0.105, over 3293124.59 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 15:48:03,015 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2801782.6666666665, ans=0.125 2023-10-09 15:48:05,246 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2801782.6666666665, ans=0.0 2023-10-09 15:48:07,263 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2801782.6666666665, ans=0.125 2023-10-09 15:48:13,769 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2801829.3333333335, ans=0.0 2023-10-09 15:48:23,318 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 3.056e+02 3.461e+02 4.046e+02 6.916e+02, threshold=6.921e+02, percent-clipped=0.0 2023-10-09 15:48:33,419 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2801876.0, ans=0.2 2023-10-09 15:48:34,576 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2801922.6666666665, ans=0.0 2023-10-09 15:48:35,546 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2801922.6666666665, ans=0.0 2023-10-09 15:48:40,345 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2801922.6666666665, ans=0.2 2023-10-09 15:49:00,034 INFO [train.py:1031] (0/4) Epoch 14, batch 15700, loss[loss=0.2172, simple_loss=0.2651, pruned_loss=0.06215, ctc_loss=0.1123, over 16911.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2772, pruned_loss=0.06065, ctc_loss=0.1051, over 3302907.31 frames. ], batch size: 260, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:49:22,229 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802062.6666666665, ans=0.1 2023-10-09 15:49:23,264 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2802109.3333333335, ans=0.0 2023-10-09 15:49:41,062 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2802156.0, ans=0.125 2023-10-09 15:49:42,503 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-10-09 15:49:44,900 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2802156.0, ans=0.1 2023-10-09 15:49:58,427 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:50:00,362 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=22.5 2023-10-09 15:50:01,940 INFO [train.py:1031] (0/4) Epoch 14, batch 15750, loss[loss=0.2197, simple_loss=0.2639, pruned_loss=0.06579, ctc_loss=0.1096, over 16030.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2711, pruned_loss=0.05985, ctc_loss=0.1037, over 3303880.51 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:50:13,421 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2802296.0, ans=0.2 2023-10-09 15:50:26,228 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.013e+02 3.498e+02 4.173e+02 6.687e+02, threshold=6.996e+02, percent-clipped=0.0 2023-10-09 15:50:43,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2802389.3333333335, ans=0.125 2023-10-09 15:50:46,288 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2802389.3333333335, ans=0.0 2023-10-09 15:50:53,816 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2802436.0, ans=0.0 2023-10-09 15:50:54,282 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-10-09 15:51:01,625 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802436.0, ans=0.1 2023-10-09 15:51:02,620 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2802482.6666666665, ans=0.125 2023-10-09 15:51:03,854 INFO [train.py:1031] (0/4) Epoch 14, batch 15800, loss[loss=0.1928, simple_loss=0.2462, pruned_loss=0.05178, ctc_loss=0.08976, over 16800.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2684, pruned_loss=0.05888, ctc_loss=0.1026, over 3304004.94 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:51:06,357 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2802482.6666666665, ans=0.0 2023-10-09 15:51:20,642 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2802529.3333333335, ans=0.015 2023-10-09 15:51:34,990 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2802576.0, ans=0.125 2023-10-09 15:51:44,881 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-10-09 15:51:56,255 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2802669.3333333335, ans=0.0 2023-10-09 15:52:09,233 INFO [train.py:1031] (0/4) Epoch 14, batch 15850, loss[loss=0.1756, simple_loss=0.2361, pruned_loss=0.04302, ctc_loss=0.07272, over 16710.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2726, pruned_loss=0.05806, ctc_loss=0.1011, over 3301047.22 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:52:18,690 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2802716.0, ans=0.125 2023-10-09 15:52:31,836 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-10-09 15:52:36,120 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+02 3.156e+02 3.985e+02 5.059e+02 1.038e+03, threshold=7.970e+02, percent-clipped=10.0 2023-10-09 15:52:53,233 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:52:56,319 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2802856.0, ans=0.0 2023-10-09 15:53:00,717 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2802902.6666666665, ans=0.1 2023-10-09 15:53:12,701 INFO [train.py:1031] (0/4) Epoch 14, batch 15900, loss[loss=0.1865, simple_loss=0.2478, pruned_loss=0.04671, ctc_loss=0.07948, over 16829.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2727, pruned_loss=0.05724, ctc_loss=0.09938, over 3296233.40 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:53:17,954 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2802949.3333333335, ans=0.0 2023-10-09 15:53:29,170 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2802996.0, ans=0.125 2023-10-09 15:53:53,118 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2023-10-09 15:53:57,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2803089.3333333335, ans=0.125 2023-10-09 15:54:14,353 INFO [train.py:1031] (0/4) Epoch 14, batch 15950, loss[loss=0.221, simple_loss=0.2789, pruned_loss=0.05965, ctc_loss=0.1097, over 11724.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2706, pruned_loss=0.05724, ctc_loss=0.09936, over 3288145.12 frames. ], batch size: 36, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:54:32,095 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2803229.3333333335, ans=0.0 2023-10-09 15:54:36,047 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2803229.3333333335, ans=0.0 2023-10-09 15:54:41,028 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2803276.0, ans=0.95 2023-10-09 15:54:41,754 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 3.015e+02 3.467e+02 4.153e+02 6.024e+02, threshold=6.935e+02, percent-clipped=0.0 2023-10-09 15:54:51,307 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2023-10-09 15:54:59,808 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2803322.6666666665, ans=0.1 2023-10-09 15:55:10,670 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2023-10-09 15:55:16,117 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2803416.0, ans=0.2 2023-10-09 15:55:16,519 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2023-10-09 15:55:16,768 INFO [train.py:1031] (0/4) Epoch 14, batch 16000, loss[loss=0.306, simple_loss=0.369, pruned_loss=0.0885, ctc_loss=0.1652, over 16758.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.278, pruned_loss=0.06129, ctc_loss=0.1061, over 3291757.16 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:55:25,834 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-10-09 15:55:26,460 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2803416.0, ans=0.0 2023-10-09 15:55:26,936 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.44 vs. limit=15.0 2023-10-09 15:55:27,505 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2803416.0, ans=0.0 2023-10-09 15:55:28,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2803462.6666666665, ans=0.125 2023-10-09 15:55:30,332 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2803462.6666666665, ans=0.125 2023-10-09 15:55:50,353 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2803509.3333333335, ans=0.1 2023-10-09 15:56:10,957 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2803602.6666666665, ans=0.0 2023-10-09 15:56:19,094 INFO [train.py:1031] (0/4) Epoch 14, batch 16050, loss[loss=0.2882, simple_loss=0.3534, pruned_loss=0.07995, ctc_loss=0.1576, over 16729.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2892, pruned_loss=0.06318, ctc_loss=0.1114, over 3295482.73 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:56:19,479 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2803649.3333333335, ans=0.125 2023-10-09 15:56:48,623 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 3.378e+02 4.238e+02 4.995e+02 7.928e+02, threshold=8.476e+02, percent-clipped=3.0 2023-10-09 15:56:52,198 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2803742.6666666665, ans=0.125 2023-10-09 15:57:02,180 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.23 vs. limit=22.5 2023-10-09 15:57:06,024 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2803789.3333333335, ans=0.125 2023-10-09 15:57:08,393 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.32 vs. limit=10.0 2023-10-09 15:57:14,775 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2803836.0, ans=0.125 2023-10-09 15:57:19,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2803836.0, ans=0.015 2023-10-09 15:57:21,613 INFO [train.py:1031] (0/4) Epoch 14, batch 16100, loss[loss=0.2521, simple_loss=0.3108, pruned_loss=0.07145, ctc_loss=0.126, over 16918.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2914, pruned_loss=0.06325, ctc_loss=0.1118, over 3290231.67 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:57:39,907 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-10-09 15:58:03,047 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2804022.6666666665, ans=0.0 2023-10-09 15:58:23,846 INFO [train.py:1031] (0/4) Epoch 14, batch 16150, loss[loss=0.2094, simple_loss=0.2819, pruned_loss=0.05098, ctc_loss=0.08735, over 16849.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2943, pruned_loss=0.06552, ctc_loss=0.1156, over 3296703.11 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:58:25,225 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2804116.0, ans=0.125 2023-10-09 15:58:28,685 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2804116.0, ans=0.125 2023-10-09 15:58:29,772 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2804116.0, ans=0.09899494936611666 2023-10-09 15:58:29,816 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2804116.0, ans=0.2 2023-10-09 15:58:31,874 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2804116.0, ans=0.125 2023-10-09 15:58:49,455 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-10-09 15:58:52,188 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-10-09 15:58:53,759 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.176e+02 3.660e+02 4.435e+02 1.361e+03, threshold=7.321e+02, percent-clipped=1.0 2023-10-09 15:59:15,878 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2804302.6666666665, ans=0.125 2023-10-09 15:59:16,030 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2804302.6666666665, ans=0.0 2023-10-09 15:59:19,159 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2804302.6666666665, ans=0.0 2023-10-09 15:59:24,938 INFO [train.py:1031] (0/4) Epoch 14, batch 16200, loss[loss=0.2164, simple_loss=0.2619, pruned_loss=0.06229, ctc_loss=0.1157, over 16720.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2883, pruned_loss=0.06394, ctc_loss=0.1127, over 3297665.88 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:59:34,032 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2804349.3333333335, ans=0.1 2023-10-09 15:59:34,086 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2804349.3333333335, ans=0.0 2023-10-09 15:59:49,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2804442.6666666665, ans=0.0 2023-10-09 15:59:57,368 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2804442.6666666665, ans=0.0 2023-10-09 16:00:00,416 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-10-09 16:00:08,119 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2804489.3333333335, ans=0.0 2023-10-09 16:00:27,731 INFO [train.py:1031] (0/4) Epoch 14, batch 16250, loss[loss=0.2083, simple_loss=0.2601, pruned_loss=0.05538, ctc_loss=0.1142, over 15301.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.282, pruned_loss=0.0624, ctc_loss=0.1099, over 3288292.01 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:00:43,228 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2023-10-09 16:00:58,613 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 3.037e+02 3.428e+02 4.095e+02 1.009e+03, threshold=6.855e+02, percent-clipped=2.0 2023-10-09 16:01:12,270 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2804722.6666666665, ans=0.07 2023-10-09 16:01:13,640 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2023-10-09 16:01:16,071 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2804722.6666666665, ans=0.2 2023-10-09 16:01:30,634 INFO [train.py:1031] (0/4) Epoch 14, batch 16300, loss[loss=0.1941, simple_loss=0.2586, pruned_loss=0.0472, ctc_loss=0.08776, over 16790.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2813, pruned_loss=0.06024, ctc_loss=0.1066, over 3295869.84 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:01:32,079 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2804816.0, ans=0.125 2023-10-09 16:01:34,806 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2804816.0, ans=0.2 2023-10-09 16:02:13,299 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-10-09 16:02:13,790 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2804956.0, ans=0.125 2023-10-09 16:02:22,117 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2805002.6666666665, ans=0.1 2023-10-09 16:02:31,522 INFO [train.py:1031] (0/4) Epoch 14, batch 16350, loss[loss=0.2091, simple_loss=0.2616, pruned_loss=0.05835, ctc_loss=0.09969, over 16800.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2754, pruned_loss=0.05939, ctc_loss=0.1051, over 3298437.97 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:02:39,100 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2805049.3333333335, ans=0.125 2023-10-09 16:02:41,203 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2805049.3333333335, ans=0.2 2023-10-09 16:02:54,094 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2805096.0, ans=0.0 2023-10-09 16:02:55,077 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2805142.6666666665, ans=0.0 2023-10-09 16:03:01,014 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2805142.6666666665, ans=0.125 2023-10-09 16:03:01,698 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.121e+02 3.548e+02 4.178e+02 8.324e+02, threshold=7.096e+02, percent-clipped=2.0 2023-10-09 16:03:28,563 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2805236.0, ans=0.125 2023-10-09 16:03:32,989 INFO [train.py:1031] (0/4) Epoch 14, batch 16400, loss[loss=0.2778, simple_loss=0.3065, pruned_loss=0.09257, ctc_loss=0.1598, over 16837.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2761, pruned_loss=0.0615, ctc_loss=0.1084, over 3292911.54 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:03:55,615 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2805329.3333333335, ans=0.2 2023-10-09 16:03:58,250 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-10-09 16:04:16,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2805422.6666666665, ans=0.0 2023-10-09 16:04:19,359 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2805422.6666666665, ans=0.0 2023-10-09 16:04:34,782 INFO [train.py:1031] (0/4) Epoch 14, batch 16450, loss[loss=0.2112, simple_loss=0.2655, pruned_loss=0.05844, ctc_loss=0.09999, over 16925.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2758, pruned_loss=0.06329, ctc_loss=0.1112, over 3295146.30 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:04:41,518 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2805516.0, ans=0.2 2023-10-09 16:04:48,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2805562.6666666665, ans=0.125 2023-10-09 16:04:58,327 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:05:06,536 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+02 3.324e+02 3.650e+02 4.238e+02 1.011e+03, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 16:05:28,681 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2805702.6666666665, ans=0.0 2023-10-09 16:05:35,697 INFO [train.py:1031] (0/4) Epoch 14, batch 16500, loss[loss=0.2666, simple_loss=0.3026, pruned_loss=0.08528, ctc_loss=0.1501, over 16477.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2712, pruned_loss=0.06321, ctc_loss=0.1111, over 3289437.34 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:05:47,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2805796.0, ans=0.125 2023-10-09 16:05:47,907 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-10-09 16:06:04,145 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2805842.6666666665, ans=0.5 2023-10-09 16:06:08,466 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2805842.6666666665, ans=0.0 2023-10-09 16:06:09,591 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2805842.6666666665, ans=0.125 2023-10-09 16:06:15,102 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=22.5 2023-10-09 16:06:17,098 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2805889.3333333335, ans=10.0 2023-10-09 16:06:26,370 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.67 vs. limit=6.0 2023-10-09 16:06:28,632 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2805936.0, ans=0.0 2023-10-09 16:06:37,061 INFO [train.py:1031] (0/4) Epoch 14, batch 16550, loss[loss=0.2038, simple_loss=0.2598, pruned_loss=0.05401, ctc_loss=0.09953, over 16771.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2727, pruned_loss=0.06272, ctc_loss=0.1104, over 3296930.29 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:06:52,482 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806029.3333333335, ans=0.1 2023-10-09 16:07:09,905 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+02 3.011e+02 3.365e+02 4.120e+02 6.132e+02, threshold=6.730e+02, percent-clipped=0.0 2023-10-09 16:07:18,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806122.6666666665, ans=0.1 2023-10-09 16:07:29,514 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2806169.3333333335, ans=0.2 2023-10-09 16:07:32,627 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2806169.3333333335, ans=0.0 2023-10-09 16:07:34,864 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2023-10-09 16:07:37,225 INFO [train.py:1031] (0/4) Epoch 14, batch 16600, loss[loss=0.2081, simple_loss=0.2401, pruned_loss=0.06579, ctc_loss=0.1113, over 16643.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.269, pruned_loss=0.06221, ctc_loss=0.1092, over 3294379.99 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:07:46,253 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2806216.0, ans=0.125 2023-10-09 16:07:59,492 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.92 vs. limit=10.0 2023-10-09 16:07:59,613 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-10-09 16:08:11,782 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806309.3333333335, ans=0.1 2023-10-09 16:08:39,073 INFO [train.py:1031] (0/4) Epoch 14, batch 16650, loss[loss=0.205, simple_loss=0.2628, pruned_loss=0.05269, ctc_loss=0.1044, over 15331.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2718, pruned_loss=0.062, ctc_loss=0.1095, over 3288897.40 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:08:43,633 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2806449.3333333335, ans=0.125 2023-10-09 16:08:46,844 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-10-09 16:08:59,712 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2806496.0, ans=0.0 2023-10-09 16:09:15,076 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 2.878e+02 3.292e+02 3.921e+02 8.519e+02, threshold=6.584e+02, percent-clipped=3.0 2023-10-09 16:09:23,011 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2806589.3333333335, ans=0.05 2023-10-09 16:09:40,523 INFO [train.py:1031] (0/4) Epoch 14, batch 16700, loss[loss=0.2227, simple_loss=0.2541, pruned_loss=0.07006, ctc_loss=0.128, over 16335.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2694, pruned_loss=0.06275, ctc_loss=0.1102, over 3291301.21 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:09:43,607 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2806682.6666666665, ans=0.0 2023-10-09 16:09:46,955 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2806682.6666666665, ans=0.05 2023-10-09 16:09:48,641 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2806682.6666666665, ans=0.07 2023-10-09 16:09:49,804 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2806682.6666666665, ans=0.125 2023-10-09 16:10:08,500 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2806776.0, ans=0.0 2023-10-09 16:10:11,475 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2806776.0, ans=0.125 2023-10-09 16:10:16,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2806776.0, ans=0.125 2023-10-09 16:10:23,413 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2023-10-09 16:10:39,318 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2806869.3333333335, ans=0.125 2023-10-09 16:10:42,246 INFO [train.py:1031] (0/4) Epoch 14, batch 16750, loss[loss=0.2312, simple_loss=0.2899, pruned_loss=0.06663, ctc_loss=0.09811, over 16916.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2693, pruned_loss=0.06258, ctc_loss=0.1097, over 3296890.43 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:11:18,573 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.049e+02 3.546e+02 4.303e+02 6.611e+02, threshold=7.093e+02, percent-clipped=1.0 2023-10-09 16:11:23,007 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807056.0, ans=0.1 2023-10-09 16:11:30,871 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-10-09 16:11:42,937 INFO [train.py:1031] (0/4) Epoch 14, batch 16800, loss[loss=0.2042, simple_loss=0.263, pruned_loss=0.05407, ctc_loss=0.09313, over 16788.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2722, pruned_loss=0.06174, ctc_loss=0.1084, over 3304157.82 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:11:56,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2807196.0, ans=0.125 2023-10-09 16:12:23,701 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2807289.3333333335, ans=0.0 2023-10-09 16:12:37,778 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2807336.0, ans=0.95 2023-10-09 16:12:42,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2807336.0, ans=0.1 2023-10-09 16:12:45,119 INFO [train.py:1031] (0/4) Epoch 14, batch 16850, loss[loss=0.2134, simple_loss=0.2857, pruned_loss=0.05083, ctc_loss=0.09865, over 16313.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2729, pruned_loss=0.06221, ctc_loss=0.1094, over 3312357.29 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:12:57,981 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2807429.3333333335, ans=0.125 2023-10-09 16:13:03,373 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2807429.3333333335, ans=0.125 2023-10-09 16:13:11,369 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807476.0, ans=0.1 2023-10-09 16:13:24,924 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.198e+02 3.748e+02 4.342e+02 8.032e+02, threshold=7.496e+02, percent-clipped=3.0 2023-10-09 16:13:28,437 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-10-09 16:13:39,491 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2807569.3333333335, ans=0.125 2023-10-09 16:13:48,642 INFO [train.py:1031] (0/4) Epoch 14, batch 16900, loss[loss=0.2655, simple_loss=0.3338, pruned_loss=0.07147, ctc_loss=0.1354, over 15218.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2763, pruned_loss=0.062, ctc_loss=0.1095, over 3307348.41 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:13:55,291 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-10-09 16:14:04,479 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2807662.6666666665, ans=0.0 2023-10-09 16:14:17,559 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2807709.3333333335, ans=0.125 2023-10-09 16:14:18,581 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2807709.3333333335, ans=0.0 2023-10-09 16:14:18,653 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2807709.3333333335, ans=0.125 2023-10-09 16:14:42,371 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2023-10-09 16:14:50,838 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2807849.3333333335, ans=0.95 2023-10-09 16:14:51,571 INFO [train.py:1031] (0/4) Epoch 14, batch 16950, loss[loss=0.2664, simple_loss=0.3056, pruned_loss=0.08433, ctc_loss=0.1461, over 16714.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2832, pruned_loss=0.06489, ctc_loss=0.1144, over 3299038.53 frames. ], batch size: 111, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:15:01,110 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807849.3333333335, ans=0.1 2023-10-09 16:15:04,998 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2807896.0, ans=0.1 2023-10-09 16:15:33,201 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+02 3.296e+02 3.627e+02 4.465e+02 8.431e+02, threshold=7.254e+02, percent-clipped=3.0 2023-10-09 16:15:35,725 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2807989.3333333335, ans=0.0 2023-10-09 16:15:36,804 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2807989.3333333335, ans=0.2 2023-10-09 16:15:55,743 INFO [train.py:1031] (0/4) Epoch 14, batch 17000, loss[loss=0.2726, simple_loss=0.3336, pruned_loss=0.07886, ctc_loss=0.1346, over 16491.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2872, pruned_loss=0.06574, ctc_loss=0.1157, over 3281645.44 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:16:19,673 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2808129.3333333335, ans=0.2 2023-10-09 16:16:19,752 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2808129.3333333335, ans=0.125 2023-10-09 16:16:28,762 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2808176.0, ans=0.0 2023-10-09 16:16:38,991 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2808222.6666666665, ans=0.125 2023-10-09 16:16:40,006 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2808222.6666666665, ans=0.125 2023-10-09 16:16:53,132 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2808269.3333333335, ans=0.0 2023-10-09 16:16:53,161 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2808269.3333333335, ans=0.0 2023-10-09 16:16:59,297 INFO [train.py:1031] (0/4) Epoch 14, batch 17050, loss[loss=0.2861, simple_loss=0.3338, pruned_loss=0.0855, ctc_loss=0.1686, over 16725.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2866, pruned_loss=0.06348, ctc_loss=0.1121, over 3278825.37 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:17:01,450 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=2808316.0, ans=0.1 2023-10-09 16:17:41,824 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+02 3.300e+02 3.832e+02 4.647e+02 9.893e+02, threshold=7.664e+02, percent-clipped=3.0 2023-10-09 16:18:02,361 INFO [train.py:1031] (0/4) Epoch 14, batch 17100, loss[loss=0.2588, simple_loss=0.3084, pruned_loss=0.07643, ctc_loss=0.141, over 16734.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.289, pruned_loss=0.0653, ctc_loss=0.1147, over 3281323.88 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:18:03,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2808549.3333333335, ans=0.125 2023-10-09 16:18:20,906 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2808596.0, ans=0.125 2023-10-09 16:18:33,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2808642.6666666665, ans=0.2 2023-10-09 16:18:43,540 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2808689.3333333335, ans=0.0 2023-10-09 16:19:03,700 INFO [train.py:1031] (0/4) Epoch 14, batch 17150, loss[loss=0.2214, simple_loss=0.2987, pruned_loss=0.05188, ctc_loss=0.1011, over 16964.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2891, pruned_loss=0.06357, ctc_loss=0.1117, over 3281660.72 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:19:38,883 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:19:40,067 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2808922.6666666665, ans=0.025 2023-10-09 16:19:40,291 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=22.5 2023-10-09 16:19:46,139 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.091e+02 3.589e+02 4.240e+02 6.885e+02, threshold=7.178e+02, percent-clipped=0.0 2023-10-09 16:19:53,315 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2808969.3333333335, ans=0.04949747468305833 2023-10-09 16:19:59,104 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-10-09 16:20:05,575 INFO [train.py:1031] (0/4) Epoch 14, batch 17200, loss[loss=0.2244, simple_loss=0.3104, pruned_loss=0.04997, ctc_loss=0.09603, over 16845.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2957, pruned_loss=0.06438, ctc_loss=0.114, over 3277966.52 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:20:09,687 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2809016.0, ans=0.0 2023-10-09 16:20:30,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2809062.6666666665, ans=0.2 2023-10-09 16:20:33,458 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2809109.3333333335, ans=0.125 2023-10-09 16:21:07,060 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2809202.6666666665, ans=0.2 2023-10-09 16:21:12,742 INFO [train.py:1031] (0/4) Epoch 14, batch 17250, loss[loss=0.2651, simple_loss=0.4016, pruned_loss=0.044, ctc_loss=0.1014, over 16288.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.3107, pruned_loss=0.06622, ctc_loss=0.1189, over 3271492.57 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:21:21,472 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2809249.3333333335, ans=0.125 2023-10-09 16:21:22,471 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2809249.3333333335, ans=0.0 2023-10-09 16:21:22,489 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2809249.3333333335, ans=0.02 2023-10-09 16:21:40,981 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2809342.6666666665, ans=0.1 2023-10-09 16:21:46,490 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2809342.6666666665, ans=0.125 2023-10-09 16:21:57,826 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.919e+02 4.626e+02 5.820e+02 9.725e+02, threshold=9.252e+02, percent-clipped=7.0 2023-10-09 16:22:10,556 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2809436.0, ans=0.125 2023-10-09 16:22:13,701 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2809436.0, ans=0.125 2023-10-09 16:22:16,149 INFO [train.py:1031] (0/4) Epoch 14, batch 17300, loss[loss=0.2246, simple_loss=0.2936, pruned_loss=0.05807, ctc_loss=0.09863, over 16838.00 frames. ], tot_loss[loss=0.2481, simple_loss=0.3168, pruned_loss=0.066, ctc_loss=0.1187, over 3279640.58 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:22:27,822 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2809529.3333333335, ans=0.125 2023-10-09 16:22:53,987 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2809622.6666666665, ans=0.125 2023-10-09 16:22:56,792 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2809622.6666666665, ans=0.1 2023-10-09 16:23:02,432 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2809622.6666666665, ans=0.125 2023-10-09 16:23:06,661 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2809669.3333333335, ans=0.2 2023-10-09 16:23:16,217 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2809716.0, ans=0.125 2023-10-09 16:23:17,642 INFO [train.py:1031] (0/4) Epoch 14, batch 17350, loss[loss=0.2777, simple_loss=0.3388, pruned_loss=0.07958, ctc_loss=0.1435, over 16763.00 frames. ], tot_loss[loss=0.2511, simple_loss=0.3209, pruned_loss=0.0668, ctc_loss=0.1194, over 3289778.03 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:23:27,737 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2809716.0, ans=0.125 2023-10-09 16:23:30,896 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2809762.6666666665, ans=0.0 2023-10-09 16:23:54,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2809856.0, ans=0.125 2023-10-09 16:24:01,203 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.229e+02 3.810e+02 5.005e+02 1.294e+03, threshold=7.619e+02, percent-clipped=1.0 2023-10-09 16:24:06,901 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2809902.6666666665, ans=0.1 2023-10-09 16:24:09,067 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2809902.6666666665, ans=0.0 2023-10-09 16:24:10,718 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:24:18,376 INFO [train.py:1031] (0/4) Epoch 14, batch 17400, loss[loss=0.2165, simple_loss=0.2626, pruned_loss=0.0629, ctc_loss=0.1112, over 16212.00 frames. ], tot_loss[loss=0.2468, simple_loss=0.3126, pruned_loss=0.06675, ctc_loss=0.1189, over 3288784.88 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:24:22,620 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-10-09 16:24:34,210 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:25:13,327 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2810136.0, ans=0.0 2023-10-09 16:25:18,381 INFO [train.py:1031] (0/4) Epoch 14, batch 17450, loss[loss=0.2017, simple_loss=0.2499, pruned_loss=0.0567, ctc_loss=0.1001, over 16655.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2998, pruned_loss=0.06558, ctc_loss=0.1164, over 3288531.78 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:25:18,755 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2810182.6666666665, ans=0.125 2023-10-09 16:25:47,196 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2810276.0, ans=0.125 2023-10-09 16:25:49,689 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-10-09 16:26:05,312 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+02 3.049e+02 3.427e+02 3.970e+02 9.337e+02, threshold=6.853e+02, percent-clipped=1.0 2023-10-09 16:26:06,537 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2810322.6666666665, ans=0.125 2023-10-09 16:26:11,111 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2810369.3333333335, ans=0.2 2023-10-09 16:26:14,214 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-10-09 16:26:20,087 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2810416.0, ans=0.0 2023-10-09 16:26:20,868 INFO [train.py:1031] (0/4) Epoch 14, batch 17500, loss[loss=0.2438, simple_loss=0.2797, pruned_loss=0.07656, ctc_loss=0.1369, over 16499.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2904, pruned_loss=0.06513, ctc_loss=0.1149, over 3285029.99 frames. ], batch size: 418, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:26:48,698 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810509.3333333335, ans=0.1 2023-10-09 16:26:57,538 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2023-10-09 16:27:00,332 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2023-10-09 16:27:10,218 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2810602.6666666665, ans=0.2 2023-10-09 16:27:11,807 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810602.6666666665, ans=0.1 2023-10-09 16:27:12,827 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2810602.6666666665, ans=0.125 2023-10-09 16:27:22,359 INFO [train.py:1031] (0/4) Epoch 14, batch 17550, loss[loss=0.238, simple_loss=0.2918, pruned_loss=0.06931, ctc_loss=0.1138, over 16765.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2903, pruned_loss=0.06639, ctc_loss=0.117, over 3292157.14 frames. ], batch size: 111, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:27:28,224 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2810649.3333333335, ans=0.0 2023-10-09 16:27:52,439 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2810742.6666666665, ans=0.025 2023-10-09 16:27:52,543 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2810742.6666666665, ans=0.125 2023-10-09 16:28:07,907 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2810789.3333333335, ans=0.09899494936611666 2023-10-09 16:28:08,989 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2810789.3333333335, ans=0.125 2023-10-09 16:28:10,419 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-10-09 16:28:12,373 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.113e+02 3.532e+02 4.348e+02 7.721e+02, threshold=7.063e+02, percent-clipped=2.0 2023-10-09 16:28:25,640 INFO [train.py:1031] (0/4) Epoch 14, batch 17600, loss[loss=0.2345, simple_loss=0.2828, pruned_loss=0.07048, ctc_loss=0.1132, over 16577.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2934, pruned_loss=0.06585, ctc_loss=0.1162, over 3299003.19 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:28:48,467 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.75 vs. limit=15.0 2023-10-09 16:28:58,118 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=15.0 2023-10-09 16:29:02,452 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2811022.6666666665, ans=0.125 2023-10-09 16:29:08,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2811022.6666666665, ans=0.0 2023-10-09 16:29:25,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2811069.3333333335, ans=0.1 2023-10-09 16:29:27,532 INFO [train.py:1031] (0/4) Epoch 14, batch 17650, loss[loss=0.204, simple_loss=0.2837, pruned_loss=0.04611, ctc_loss=0.08005, over 16903.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.293, pruned_loss=0.06489, ctc_loss=0.1141, over 3303996.80 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:29:29,034 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2811116.0, ans=0.125 2023-10-09 16:29:47,024 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2811162.6666666665, ans=0.2 2023-10-09 16:29:48,050 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:29:50,151 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2811162.6666666665, ans=0.0 2023-10-09 16:29:55,236 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2811209.3333333335, ans=0.2 2023-10-09 16:29:57,188 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-10-09 16:29:57,948 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2811209.3333333335, ans=0.125 2023-10-09 16:29:58,955 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2811209.3333333335, ans=0.05 2023-10-09 16:30:01,005 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2811209.3333333335, ans=0.2 2023-10-09 16:30:17,730 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-10-09 16:30:17,961 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.983e+02 3.277e+02 4.147e+02 6.506e+02, threshold=6.554e+02, percent-clipped=0.0 2023-10-09 16:30:27,421 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2023-10-09 16:30:31,456 INFO [train.py:1031] (0/4) Epoch 14, batch 17700, loss[loss=0.2083, simple_loss=0.2672, pruned_loss=0.05583, ctc_loss=0.09454, over 16825.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2946, pruned_loss=0.06176, ctc_loss=0.1098, over 3304589.22 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:30:35,020 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-10-09 16:30:47,009 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2811396.0, ans=0.125 2023-10-09 16:31:00,921 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-10-09 16:31:07,889 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-10-09 16:31:35,981 INFO [train.py:1031] (0/4) Epoch 14, batch 17750, loss[loss=0.2611, simple_loss=0.3395, pruned_loss=0.06621, ctc_loss=0.1257, over 16445.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2919, pruned_loss=0.0598, ctc_loss=0.1067, over 3306666.25 frames. ], batch size: 415, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:31:43,235 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2811582.6666666665, ans=0.0 2023-10-09 16:31:43,291 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2811582.6666666665, ans=0.125 2023-10-09 16:31:53,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2811629.3333333335, ans=0.0 2023-10-09 16:32:02,758 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2811676.0, ans=0.1 2023-10-09 16:32:05,925 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2811676.0, ans=0.0 2023-10-09 16:32:26,656 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+02 3.107e+02 3.479e+02 4.054e+02 7.691e+02, threshold=6.958e+02, percent-clipped=4.0 2023-10-09 16:32:28,412 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-10-09 16:32:34,620 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2811769.3333333335, ans=0.0 2023-10-09 16:32:39,786 INFO [train.py:1031] (0/4) Epoch 14, batch 17800, loss[loss=0.179, simple_loss=0.2548, pruned_loss=0.03728, ctc_loss=0.07158, over 16844.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2938, pruned_loss=0.05803, ctc_loss=0.1048, over 3302622.84 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:32:40,170 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2811816.0, ans=0.07 2023-10-09 16:32:59,975 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2811862.6666666665, ans=0.0 2023-10-09 16:33:21,900 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2811956.0, ans=0.125 2023-10-09 16:33:29,296 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2812002.6666666665, ans=0.0 2023-10-09 16:33:41,468 INFO [train.py:1031] (0/4) Epoch 14, batch 17850, loss[loss=0.2114, simple_loss=0.2726, pruned_loss=0.05637, ctc_loss=0.0935, over 16897.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2884, pruned_loss=0.05665, ctc_loss=0.1021, over 3308190.93 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:33:54,135 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2812096.0, ans=0.125 2023-10-09 16:33:54,142 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2812096.0, ans=0.2 2023-10-09 16:33:55,600 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.91 vs. limit=15.0 2023-10-09 16:34:01,734 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2812096.0, ans=0.07 2023-10-09 16:34:04,259 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2812096.0, ans=0.125 2023-10-09 16:34:25,795 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=15.0 2023-10-09 16:34:32,669 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.983e+02 3.516e+02 4.147e+02 7.275e+02, threshold=7.033e+02, percent-clipped=1.0 2023-10-09 16:34:43,849 INFO [train.py:1031] (0/4) Epoch 14, batch 17900, loss[loss=0.2202, simple_loss=0.2734, pruned_loss=0.06289, ctc_loss=0.1032, over 16107.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2818, pruned_loss=0.058, ctc_loss=0.1039, over 3297012.43 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:34:50,966 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2812282.6666666665, ans=0.2 2023-10-09 16:35:43,156 INFO [train.py:1031] (0/4) Epoch 14, batch 17950, loss[loss=0.2075, simple_loss=0.2706, pruned_loss=0.05429, ctc_loss=0.08941, over 16771.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2811, pruned_loss=0.06039, ctc_loss=0.1073, over 3303077.48 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:35:52,935 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2812516.0, ans=0.5 2023-10-09 16:35:56,140 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=22.5 2023-10-09 16:36:02,544 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2812562.6666666665, ans=0.02 2023-10-09 16:36:37,264 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+02 3.955e+02 4.561e+02 5.519e+02 1.023e+03, threshold=9.123e+02, percent-clipped=10.0 2023-10-09 16:36:39,301 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2812702.6666666665, ans=0.2 2023-10-09 16:36:42,450 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2812702.6666666665, ans=0.2 2023-10-09 16:36:44,353 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-10-09 16:36:47,004 INFO [train.py:1031] (0/4) Epoch 14, batch 18000, loss[loss=0.2761, simple_loss=0.3203, pruned_loss=0.08647, ctc_loss=0.1472, over 16554.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2854, pruned_loss=0.06354, ctc_loss=0.1122, over 3308313.61 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:36:47,005 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 16:37:00,427 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4108, 4.5989, 5.1604, 4.8756], device='cuda:0') 2023-10-09 16:37:03,486 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0393, 3.2078, 3.2861, 3.3635], device='cuda:0') 2023-10-09 16:37:05,084 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2359, simple_loss=0.3042, pruned_loss=0.06468, ctc_loss=0.09589, over 1796401.00 frames. 2023-10-09 16:37:05,084 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 16:37:09,595 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-10-09 16:37:16,061 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2812749.3333333335, ans=0.95 2023-10-09 16:37:24,904 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2812796.0, ans=0.125 2023-10-09 16:38:03,147 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:38:10,374 INFO [train.py:1031] (0/4) Epoch 14, batch 18050, loss[loss=0.2101, simple_loss=0.2685, pruned_loss=0.05666, ctc_loss=0.09581, over 16786.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2895, pruned_loss=0.06462, ctc_loss=0.1142, over 3305053.77 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:38:11,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2812982.6666666665, ans=0.0 2023-10-09 16:38:15,818 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2812982.6666666665, ans=0.0 2023-10-09 16:38:20,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2812982.6666666665, ans=0.125 2023-10-09 16:38:32,830 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2813029.3333333335, ans=0.125 2023-10-09 16:38:56,946 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2813122.6666666665, ans=0.125 2023-10-09 16:38:59,745 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=22.5 2023-10-09 16:39:06,174 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+02 3.447e+02 3.987e+02 5.015e+02 1.069e+03, threshold=7.973e+02, percent-clipped=1.0 2023-10-09 16:39:14,516 INFO [train.py:1031] (0/4) Epoch 14, batch 18100, loss[loss=0.2401, simple_loss=0.3199, pruned_loss=0.0596, ctc_loss=0.1026, over 16778.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2928, pruned_loss=0.06329, ctc_loss=0.1123, over 3294900.82 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:39:31,956 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-10-09 16:40:16,838 INFO [train.py:1031] (0/4) Epoch 14, batch 18150, loss[loss=0.1993, simple_loss=0.2426, pruned_loss=0.05791, ctc_loss=0.1004, over 16722.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2884, pruned_loss=0.06232, ctc_loss=0.1101, over 3299448.43 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 16:40:31,699 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2813496.0, ans=0.125 2023-10-09 16:40:39,442 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=22.5 2023-10-09 16:40:44,911 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2813542.6666666665, ans=0.0 2023-10-09 16:40:51,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2813542.6666666665, ans=0.125 2023-10-09 16:41:11,038 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2023-10-09 16:41:12,478 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.202e+02 3.701e+02 4.396e+02 8.361e+02, threshold=7.403e+02, percent-clipped=2.0 2023-10-09 16:41:16,340 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2813636.0, ans=0.2 2023-10-09 16:41:18,390 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2813682.6666666665, ans=0.125 2023-10-09 16:41:19,064 INFO [train.py:1031] (0/4) Epoch 14, batch 18200, loss[loss=0.2025, simple_loss=0.2682, pruned_loss=0.04969, ctc_loss=0.09333, over 16789.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2815, pruned_loss=0.06142, ctc_loss=0.108, over 3305108.22 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:41:46,615 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2813776.0, ans=10.0 2023-10-09 16:41:54,286 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2813776.0, ans=0.0 2023-10-09 16:41:56,649 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2023-10-09 16:41:58,073 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2813822.6666666665, ans=0.1 2023-10-09 16:42:02,809 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2813822.6666666665, ans=0.125 2023-10-09 16:42:12,624 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2813869.3333333335, ans=0.0 2023-10-09 16:42:13,594 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2813869.3333333335, ans=0.125 2023-10-09 16:42:21,165 INFO [train.py:1031] (0/4) Epoch 14, batch 18250, loss[loss=0.1922, simple_loss=0.2546, pruned_loss=0.04804, ctc_loss=0.08465, over 16686.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2744, pruned_loss=0.0575, ctc_loss=0.1016, over 3314722.05 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:42:29,057 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2813916.0, ans=0.0 2023-10-09 16:42:32,872 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2813962.6666666665, ans=0.125 2023-10-09 16:42:49,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814009.3333333335, ans=0.1 2023-10-09 16:42:51,170 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2814009.3333333335, ans=0.0 2023-10-09 16:42:51,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2814009.3333333335, ans=0.125 2023-10-09 16:42:56,117 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2814009.3333333335, ans=0.2 2023-10-09 16:43:08,069 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-10-09 16:43:12,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2814102.6666666665, ans=0.0 2023-10-09 16:43:13,165 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814102.6666666665, ans=0.1 2023-10-09 16:43:16,733 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.801e+02 3.276e+02 4.033e+02 6.396e+02, threshold=6.552e+02, percent-clipped=0.0 2023-10-09 16:43:22,463 INFO [train.py:1031] (0/4) Epoch 14, batch 18300, loss[loss=0.1787, simple_loss=0.2179, pruned_loss=0.05377, ctc_loss=0.07999, over 11747.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.2675, pruned_loss=0.05312, ctc_loss=0.09394, over 3316475.94 frames. ], batch size: 41, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:43:37,012 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2814196.0, ans=0.0 2023-10-09 16:43:50,503 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-10-09 16:43:57,997 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2814242.6666666665, ans=0.125 2023-10-09 16:44:00,696 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2814289.3333333335, ans=0.125 2023-10-09 16:44:11,181 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.63 vs. limit=10.0 2023-10-09 16:44:20,406 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2814336.0, ans=0.1 2023-10-09 16:44:25,852 INFO [train.py:1031] (0/4) Epoch 14, batch 18350, loss[loss=0.2742, simple_loss=0.3716, pruned_loss=0.0644, ctc_loss=0.12, over 15041.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2752, pruned_loss=0.05394, ctc_loss=0.0959, over 3307882.31 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:44:41,914 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814429.3333333335, ans=0.1 2023-10-09 16:44:57,376 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2814476.0, ans=0.0 2023-10-09 16:45:09,302 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2814522.6666666665, ans=0.125 2023-10-09 16:45:14,321 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2023-10-09 16:45:22,232 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 3.059e+02 3.585e+02 4.224e+02 7.359e+02, threshold=7.170e+02, percent-clipped=2.0 2023-10-09 16:45:26,918 INFO [train.py:1031] (0/4) Epoch 14, batch 18400, loss[loss=0.2861, simple_loss=0.3186, pruned_loss=0.0938, ctc_loss=0.1649, over 16596.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2818, pruned_loss=0.05735, ctc_loss=0.1018, over 3305071.43 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:45:28,277 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:45:29,315 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2814616.0, ans=0.125 2023-10-09 16:45:57,247 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2814709.3333333335, ans=0.125 2023-10-09 16:46:04,155 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2023-10-09 16:46:09,541 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2814756.0, ans=0.0 2023-10-09 16:46:27,823 INFO [train.py:1031] (0/4) Epoch 14, batch 18450, loss[loss=0.2679, simple_loss=0.3135, pruned_loss=0.08233, ctc_loss=0.1442, over 16838.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2833, pruned_loss=0.06082, ctc_loss=0.1074, over 3313166.72 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:46:40,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2814896.0, ans=0.0 2023-10-09 16:46:51,813 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2814942.6666666665, ans=0.5 2023-10-09 16:46:56,610 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2814942.6666666665, ans=0.05 2023-10-09 16:47:09,751 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-10-09 16:47:10,980 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=22.5 2023-10-09 16:47:26,502 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+02 3.308e+02 3.613e+02 4.264e+02 6.985e+02, threshold=7.226e+02, percent-clipped=0.0 2023-10-09 16:47:30,871 INFO [train.py:1031] (0/4) Epoch 14, batch 18500, loss[loss=0.2174, simple_loss=0.272, pruned_loss=0.06043, ctc_loss=0.1049, over 16759.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2848, pruned_loss=0.06279, ctc_loss=0.1105, over 3312696.45 frames. ], batch size: 140, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:47:32,292 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2815082.6666666665, ans=0.125 2023-10-09 16:47:40,074 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815082.6666666665, ans=0.1 2023-10-09 16:47:43,364 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.61 vs. limit=10.0 2023-10-09 16:47:59,318 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2815176.0, ans=0.125 2023-10-09 16:48:11,831 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2023-10-09 16:48:16,385 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2815222.6666666665, ans=0.125 2023-10-09 16:48:23,415 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-10-09 16:48:32,664 INFO [train.py:1031] (0/4) Epoch 14, batch 18550, loss[loss=0.2552, simple_loss=0.3135, pruned_loss=0.07424, ctc_loss=0.121, over 16738.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2897, pruned_loss=0.0654, ctc_loss=0.1147, over 3318874.23 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:48:38,584 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2815316.0, ans=0.125 2023-10-09 16:48:52,886 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2815362.6666666665, ans=0.125 2023-10-09 16:48:54,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2815362.6666666665, ans=0.0 2023-10-09 16:48:55,810 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2815362.6666666665, ans=0.1 2023-10-09 16:49:01,472 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:49:21,381 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2815456.0, ans=0.0 2023-10-09 16:49:30,405 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815502.6666666665, ans=0.1 2023-10-09 16:49:32,662 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2815502.6666666665, ans=0.0 2023-10-09 16:49:34,435 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+02 3.368e+02 3.936e+02 4.731e+02 1.128e+03, threshold=7.872e+02, percent-clipped=2.0 2023-10-09 16:49:36,562 INFO [train.py:1031] (0/4) Epoch 14, batch 18600, loss[loss=0.3122, simple_loss=0.394, pruned_loss=0.08385, ctc_loss=0.1567, over 16594.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.299, pruned_loss=0.06712, ctc_loss=0.1178, over 3323409.51 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:49:38,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2815549.3333333335, ans=0.125 2023-10-09 16:49:48,235 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2815549.3333333335, ans=0.025 2023-10-09 16:50:16,287 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2815689.3333333335, ans=0.2 2023-10-09 16:50:22,250 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2815689.3333333335, ans=0.125 2023-10-09 16:50:24,278 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-10-09 16:50:25,656 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2815689.3333333335, ans=0.0 2023-10-09 16:50:37,973 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-10-09 16:50:41,207 INFO [train.py:1031] (0/4) Epoch 14, batch 18650, loss[loss=0.3235, simple_loss=0.3511, pruned_loss=0.1102, ctc_loss=0.1884, over 16696.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.3051, pruned_loss=0.06875, ctc_loss=0.1209, over 3321013.34 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:50:48,668 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2815782.6666666665, ans=0.125 2023-10-09 16:50:59,120 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2815829.3333333335, ans=0.1 2023-10-09 16:51:10,601 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2815876.0, ans=0.125 2023-10-09 16:51:41,576 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.349e+02 3.828e+02 4.485e+02 8.259e+02, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 16:51:43,746 INFO [train.py:1031] (0/4) Epoch 14, batch 18700, loss[loss=0.2367, simple_loss=0.316, pruned_loss=0.05668, ctc_loss=0.1098, over 16797.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.304, pruned_loss=0.06854, ctc_loss=0.1208, over 3315218.19 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:51:50,227 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2816016.0, ans=0.125 2023-10-09 16:52:09,659 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2816109.3333333335, ans=0.125 2023-10-09 16:52:16,605 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2816109.3333333335, ans=0.1 2023-10-09 16:52:40,320 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2816202.6666666665, ans=0.125 2023-10-09 16:52:46,841 INFO [train.py:1031] (0/4) Epoch 14, batch 18750, loss[loss=0.2068, simple_loss=0.2817, pruned_loss=0.04914, ctc_loss=0.08408, over 16844.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.3027, pruned_loss=0.06581, ctc_loss=0.1164, over 3314829.17 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:52:58,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2816296.0, ans=0.125 2023-10-09 16:53:16,480 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2816342.6666666665, ans=0.125 2023-10-09 16:53:22,817 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=22.5 2023-10-09 16:53:43,685 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2816436.0, ans=0.125 2023-10-09 16:53:48,759 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.936e+02 3.595e+02 4.298e+02 1.016e+03, threshold=7.191e+02, percent-clipped=2.0 2023-10-09 16:53:48,785 INFO [train.py:1031] (0/4) Epoch 14, batch 18800, loss[loss=0.1871, simple_loss=0.2803, pruned_loss=0.03412, ctc_loss=0.06393, over 16436.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2947, pruned_loss=0.06174, ctc_loss=0.1095, over 3323606.78 frames. ], batch size: 466, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:53:56,125 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2816482.6666666665, ans=0.1 2023-10-09 16:54:21,158 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.30 vs. limit=10.0 2023-10-09 16:54:35,070 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2816622.6666666665, ans=0.125 2023-10-09 16:54:48,903 INFO [train.py:1031] (0/4) Epoch 14, batch 18850, loss[loss=0.2769, simple_loss=0.3034, pruned_loss=0.09278, ctc_loss=0.162, over 16847.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2896, pruned_loss=0.0613, ctc_loss=0.1085, over 3323429.71 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:54:55,914 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2816716.0, ans=0.125 2023-10-09 16:54:58,101 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2816716.0, ans=0.125 2023-10-09 16:54:59,230 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2816716.0, ans=0.125 2023-10-09 16:55:06,187 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2816762.6666666665, ans=0.125 2023-10-09 16:55:13,453 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2816809.3333333335, ans=0.125 2023-10-09 16:55:15,604 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2816809.3333333335, ans=0.125 2023-10-09 16:55:30,030 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2816856.0, ans=0.1 2023-10-09 16:55:35,643 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-10-09 16:55:44,607 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2816902.6666666665, ans=0.125 2023-10-09 16:55:49,882 INFO [train.py:1031] (0/4) Epoch 14, batch 18900, loss[loss=0.2413, simple_loss=0.2884, pruned_loss=0.07218, ctc_loss=0.1247, over 16771.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2894, pruned_loss=0.06353, ctc_loss=0.1116, over 3319497.25 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:55:53,225 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 3.158e+02 3.575e+02 4.091e+02 5.831e+02, threshold=7.150e+02, percent-clipped=0.0 2023-10-09 16:55:53,580 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2816949.3333333335, ans=0.125 2023-10-09 16:56:11,136 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2816996.0, ans=0.2 2023-10-09 16:56:24,356 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2817042.6666666665, ans=0.125 2023-10-09 16:56:30,804 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2817089.3333333335, ans=0.125 2023-10-09 16:56:54,183 INFO [train.py:1031] (0/4) Epoch 14, batch 18950, loss[loss=0.2279, simple_loss=0.3007, pruned_loss=0.05851, ctc_loss=0.09543, over 16756.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2902, pruned_loss=0.06454, ctc_loss=0.1133, over 3305521.22 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:57:03,427 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817182.6666666665, ans=0.1 2023-10-09 16:57:11,088 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817229.3333333335, ans=0.1 2023-10-09 16:57:20,192 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2817276.0, ans=0.2 2023-10-09 16:57:20,239 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2817276.0, ans=0.125 2023-10-09 16:57:37,131 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2817322.6666666665, ans=0.125 2023-10-09 16:57:55,557 INFO [train.py:1031] (0/4) Epoch 14, batch 19000, loss[loss=0.1848, simple_loss=0.2594, pruned_loss=0.04031, ctc_loss=0.07385, over 16661.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2889, pruned_loss=0.06305, ctc_loss=0.1108, over 3309223.80 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:57:58,302 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 3.278e+02 3.626e+02 4.352e+02 8.941e+02, threshold=7.252e+02, percent-clipped=2.0 2023-10-09 16:58:05,790 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2817462.6666666665, ans=0.125 2023-10-09 16:58:23,665 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2817509.3333333335, ans=0.125 2023-10-09 16:58:26,308 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2817509.3333333335, ans=0.125 2023-10-09 16:58:34,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2817556.0, ans=0.0 2023-10-09 16:58:44,579 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2023-10-09 16:58:57,889 INFO [train.py:1031] (0/4) Epoch 14, batch 19050, loss[loss=0.2301, simple_loss=0.2878, pruned_loss=0.06453, ctc_loss=0.1081, over 16928.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2877, pruned_loss=0.06392, ctc_loss=0.1119, over 3311888.23 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:59:29,872 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2817742.6666666665, ans=22.5 2023-10-09 16:59:55,089 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2023-10-09 17:00:00,788 INFO [train.py:1031] (0/4) Epoch 14, batch 19100, loss[loss=0.2105, simple_loss=0.2633, pruned_loss=0.05873, ctc_loss=0.1005, over 16865.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2901, pruned_loss=0.06684, ctc_loss=0.117, over 3314681.19 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:00:04,650 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.758e+02 3.450e+02 4.008e+02 4.699e+02 1.096e+03, threshold=8.015e+02, percent-clipped=2.0 2023-10-09 17:00:09,539 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.52 vs. limit=6.0 2023-10-09 17:00:15,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2817929.3333333335, ans=0.0 2023-10-09 17:00:32,342 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2817976.0, ans=0.0 2023-10-09 17:00:35,482 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.04 vs. limit=15.0 2023-10-09 17:00:37,369 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2023-10-09 17:00:43,901 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=2818022.6666666665, ans=10.0 2023-10-09 17:01:02,244 INFO [train.py:1031] (0/4) Epoch 14, batch 19150, loss[loss=0.189, simple_loss=0.26, pruned_loss=0.04401, ctc_loss=0.07505, over 16815.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2903, pruned_loss=0.06464, ctc_loss=0.1138, over 3311781.90 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:01:31,699 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2818209.3333333335, ans=0.05 2023-10-09 17:01:49,254 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.32 vs. limit=10.0 2023-10-09 17:02:06,623 INFO [train.py:1031] (0/4) Epoch 14, batch 19200, loss[loss=0.1779, simple_loss=0.2428, pruned_loss=0.04094, ctc_loss=0.07794, over 16778.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2891, pruned_loss=0.06147, ctc_loss=0.1088, over 3311182.63 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:02:11,653 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2818349.3333333335, ans=0.125 2023-10-09 17:02:12,383 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.095e+02 3.707e+02 4.645e+02 1.379e+03, threshold=7.414e+02, percent-clipped=4.0 2023-10-09 17:02:13,761 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2818349.3333333335, ans=0.0 2023-10-09 17:02:22,413 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2818396.0, ans=0.125 2023-10-09 17:02:32,383 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2023-10-09 17:02:33,431 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-10-09 17:02:34,464 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=22.5 2023-10-09 17:02:47,183 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2818489.3333333335, ans=0.125 2023-10-09 17:03:03,579 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:03:09,813 INFO [train.py:1031] (0/4) Epoch 14, batch 19250, loss[loss=0.1446, simple_loss=0.1894, pruned_loss=0.03682, ctc_loss=0.0655, over 12752.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.29, pruned_loss=0.06135, ctc_loss=0.1095, over 3310018.11 frames. ], batch size: 44, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:03:15,088 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2818582.6666666665, ans=0.0 2023-10-09 17:03:15,599 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.72 vs. limit=15.0 2023-10-09 17:03:18,881 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2818582.6666666665, ans=0.125 2023-10-09 17:03:19,265 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2023-10-09 17:03:31,082 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-604000.pt 2023-10-09 17:04:01,685 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=22.5 2023-10-09 17:04:02,875 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=15.0 2023-10-09 17:04:03,512 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2818769.3333333335, ans=0.2 2023-10-09 17:04:04,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2818769.3333333335, ans=0.0 2023-10-09 17:04:15,645 INFO [train.py:1031] (0/4) Epoch 14, batch 19300, loss[loss=0.2306, simple_loss=0.285, pruned_loss=0.06577, ctc_loss=0.1116, over 16835.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2896, pruned_loss=0.06175, ctc_loss=0.11, over 3312774.65 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:04:16,747 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2818816.0, ans=0.04949747468305833 2023-10-09 17:04:18,208 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2023-10-09 17:04:24,136 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.286e+02 3.972e+02 4.950e+02 6.905e+02, threshold=7.944e+02, percent-clipped=0.0 2023-10-09 17:04:33,310 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2818862.6666666665, ans=0.0 2023-10-09 17:04:36,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2818862.6666666665, ans=0.125 2023-10-09 17:04:36,343 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2023-10-09 17:04:45,741 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2818909.3333333335, ans=0.125 2023-10-09 17:04:55,766 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2818956.0, ans=0.125 2023-10-09 17:04:57,827 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2818956.0, ans=0.125 2023-10-09 17:05:05,271 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2819002.6666666665, ans=0.125 2023-10-09 17:05:14,098 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2819002.6666666665, ans=0.125 2023-10-09 17:05:18,463 INFO [train.py:1031] (0/4) Epoch 14, batch 19350, loss[loss=0.1448, simple_loss=0.2046, pruned_loss=0.03172, ctc_loss=0.05405, over 16851.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2884, pruned_loss=0.06163, ctc_loss=0.1097, over 3317158.94 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:05:22,595 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:05:29,172 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2819049.3333333335, ans=0.125 2023-10-09 17:05:39,801 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2819096.0, ans=0.0 2023-10-09 17:05:50,419 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2819142.6666666665, ans=0.125 2023-10-09 17:06:10,653 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2819236.0, ans=0.125 2023-10-09 17:06:10,777 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=22.5 2023-10-09 17:06:18,199 INFO [train.py:1031] (0/4) Epoch 14, batch 19400, loss[loss=0.2524, simple_loss=0.2951, pruned_loss=0.0774, ctc_loss=0.1374, over 16581.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2818, pruned_loss=0.05858, ctc_loss=0.1046, over 3304484.61 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:06:25,712 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.971e+02 3.609e+02 4.450e+02 6.456e+02, threshold=7.218e+02, percent-clipped=0.0 2023-10-09 17:06:32,583 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2819329.3333333335, ans=0.0 2023-10-09 17:07:01,949 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2819422.6666666665, ans=0.2 2023-10-09 17:07:16,803 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.60 vs. limit=6.0 2023-10-09 17:07:19,287 INFO [train.py:1031] (0/4) Epoch 14, batch 19450, loss[loss=0.2547, simple_loss=0.3014, pruned_loss=0.07943, ctc_loss=0.1229, over 16832.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2811, pruned_loss=0.06018, ctc_loss=0.1067, over 3303763.75 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:07:49,079 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819609.3333333335, ans=0.1 2023-10-09 17:07:56,025 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819656.0, ans=0.1 2023-10-09 17:08:04,690 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2819656.0, ans=0.0 2023-10-09 17:08:17,212 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2819702.6666666665, ans=0.025 2023-10-09 17:08:21,519 INFO [train.py:1031] (0/4) Epoch 14, batch 19500, loss[loss=0.2621, simple_loss=0.3146, pruned_loss=0.07758, ctc_loss=0.1361, over 16536.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2837, pruned_loss=0.05993, ctc_loss=0.1065, over 3311608.79 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:08:24,032 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2819749.3333333335, ans=0.0 2023-10-09 17:08:31,246 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 3.015e+02 3.593e+02 4.173e+02 8.054e+02, threshold=7.186e+02, percent-clipped=2.0 2023-10-09 17:08:43,729 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2819796.0, ans=15.0 2023-10-09 17:09:08,707 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2023-10-09 17:09:21,218 INFO [train.py:1031] (0/4) Epoch 14, batch 19550, loss[loss=0.2456, simple_loss=0.3076, pruned_loss=0.06924, ctc_loss=0.113, over 16796.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2878, pruned_loss=0.06287, ctc_loss=0.1112, over 3312758.38 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:09:23,694 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2819982.6666666665, ans=0.125 2023-10-09 17:09:33,492 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2023-10-09 17:09:34,573 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2023-10-09 17:09:36,358 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2820029.3333333335, ans=0.2 2023-10-09 17:09:48,630 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2023-10-09 17:10:00,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2820122.6666666665, ans=0.0 2023-10-09 17:10:03,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2820122.6666666665, ans=0.0 2023-10-09 17:10:24,124 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2820216.0, ans=0.07 2023-10-09 17:10:24,801 INFO [train.py:1031] (0/4) Epoch 14, batch 19600, loss[loss=0.1879, simple_loss=0.2698, pruned_loss=0.03793, ctc_loss=0.07543, over 16236.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2835, pruned_loss=0.06226, ctc_loss=0.11, over 3305881.66 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:10:32,189 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-10-09 17:10:35,226 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.075e+02 3.430e+02 4.007e+02 6.363e+02, threshold=6.860e+02, percent-clipped=0.0 2023-10-09 17:10:41,780 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2820262.6666666665, ans=0.2 2023-10-09 17:10:43,334 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=2820262.6666666665, ans=0.025 2023-10-09 17:10:44,662 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-10-09 17:10:52,295 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=22.5 2023-10-09 17:11:28,189 INFO [train.py:1031] (0/4) Epoch 14, batch 19650, loss[loss=0.2591, simple_loss=0.3052, pruned_loss=0.08036, ctc_loss=0.131, over 16700.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2839, pruned_loss=0.06328, ctc_loss=0.1114, over 3302962.51 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:11:34,373 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820449.3333333335, ans=0.1 2023-10-09 17:12:27,739 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2820636.0, ans=0.125 2023-10-09 17:12:30,816 INFO [train.py:1031] (0/4) Epoch 14, batch 19700, loss[loss=0.2294, simple_loss=0.2824, pruned_loss=0.066, ctc_loss=0.1109, over 16779.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2853, pruned_loss=0.06549, ctc_loss=0.1142, over 3301032.06 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:12:31,450 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2023-10-09 17:12:32,229 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2820682.6666666665, ans=0.125 2023-10-09 17:12:42,622 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+02 3.372e+02 3.843e+02 4.494e+02 9.285e+02, threshold=7.687e+02, percent-clipped=3.0 2023-10-09 17:13:03,612 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2820776.0, ans=0.125 2023-10-09 17:13:18,377 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:13:31,668 INFO [train.py:1031] (0/4) Epoch 14, batch 19750, loss[loss=0.2644, simple_loss=0.3326, pruned_loss=0.07171, ctc_loss=0.1319, over 16517.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2869, pruned_loss=0.06427, ctc_loss=0.1128, over 3299902.75 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:13:36,878 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-10-09 17:13:39,626 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2820916.0, ans=0.035 2023-10-09 17:13:43,499 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2820962.6666666665, ans=0.1 2023-10-09 17:13:52,379 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2820962.6666666665, ans=0.125 2023-10-09 17:14:00,940 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-10-09 17:14:20,464 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2023-10-09 17:14:32,785 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=12.0 2023-10-09 17:14:32,916 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2821102.6666666665, ans=10.0 2023-10-09 17:14:34,793 INFO [train.py:1031] (0/4) Epoch 14, batch 19800, loss[loss=0.2413, simple_loss=0.296, pruned_loss=0.06941, ctc_loss=0.1196, over 16747.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.292, pruned_loss=0.06579, ctc_loss=0.1156, over 3308910.26 frames. ], batch size: 111, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:14:39,113 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2821149.3333333335, ans=0.1 2023-10-09 17:14:39,138 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2821149.3333333335, ans=0.0 2023-10-09 17:14:46,188 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2821196.0, ans=0.0 2023-10-09 17:14:47,455 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+02 3.299e+02 3.756e+02 4.593e+02 7.524e+02, threshold=7.512e+02, percent-clipped=0.0 2023-10-09 17:14:50,519 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2821196.0, ans=0.0 2023-10-09 17:14:54,425 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2821196.0, ans=0.125 2023-10-09 17:15:17,424 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2821289.3333333335, ans=0.125 2023-10-09 17:15:38,927 INFO [train.py:1031] (0/4) Epoch 14, batch 19850, loss[loss=0.2948, simple_loss=0.3315, pruned_loss=0.09439, ctc_loss=0.1733, over 16543.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.2967, pruned_loss=0.06881, ctc_loss=0.1204, over 3306955.19 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:16:00,789 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2821429.3333333335, ans=0.125 2023-10-09 17:16:04,565 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2821476.0, ans=0.0 2023-10-09 17:16:07,706 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.14 vs. limit=10.0 2023-10-09 17:16:33,063 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2821569.3333333335, ans=0.015 2023-10-09 17:16:37,252 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-10-09 17:16:39,918 INFO [train.py:1031] (0/4) Epoch 14, batch 19900, loss[loss=0.2468, simple_loss=0.3053, pruned_loss=0.06862, ctc_loss=0.1278, over 16888.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.2977, pruned_loss=0.06968, ctc_loss=0.1214, over 3315419.40 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:16:54,365 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+02 3.692e+02 4.204e+02 4.980e+02 8.655e+02, threshold=8.408e+02, percent-clipped=2.0 2023-10-09 17:17:21,387 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2821756.0, ans=0.125 2023-10-09 17:17:35,144 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2821802.6666666665, ans=0.125 2023-10-09 17:17:41,865 INFO [train.py:1031] (0/4) Epoch 14, batch 19950, loss[loss=0.3055, simple_loss=0.3305, pruned_loss=0.103, ctc_loss=0.1863, over 16765.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.2963, pruned_loss=0.07045, ctc_loss=0.1229, over 3314988.41 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:17:55,092 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2821896.0, ans=0.125 2023-10-09 17:17:58,466 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2821896.0, ans=0.5 2023-10-09 17:18:27,199 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=12.0 2023-10-09 17:18:33,205 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2822036.0, ans=0.125 2023-10-09 17:18:42,975 INFO [train.py:1031] (0/4) Epoch 14, batch 20000, loss[loss=0.214, simple_loss=0.2775, pruned_loss=0.05547, ctc_loss=0.09875, over 16854.00 frames. ], tot_loss[loss=0.246, simple_loss=0.298, pruned_loss=0.07189, ctc_loss=0.1254, over 3303800.64 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:18:52,643 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2822082.6666666665, ans=0.0 2023-10-09 17:18:58,233 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+02 3.390e+02 3.727e+02 4.517e+02 8.491e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 17:19:14,506 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2822176.0, ans=0.125 2023-10-09 17:19:23,298 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2822222.6666666665, ans=10.0 2023-10-09 17:19:34,965 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=12.0 2023-10-09 17:19:46,372 INFO [train.py:1031] (0/4) Epoch 14, batch 20050, loss[loss=0.1712, simple_loss=0.2199, pruned_loss=0.04621, ctc_loss=0.07527, over 16694.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2918, pruned_loss=0.06973, ctc_loss=0.1213, over 3305682.68 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:20:23,951 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2822456.0, ans=0.09899494936611666 2023-10-09 17:20:25,378 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2822456.0, ans=15.0 2023-10-09 17:20:25,930 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2822456.0, ans=0.1 2023-10-09 17:20:35,161 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-10-09 17:20:50,089 INFO [train.py:1031] (0/4) Epoch 14, batch 20100, loss[loss=0.248, simple_loss=0.3025, pruned_loss=0.07158, ctc_loss=0.126, over 16691.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2831, pruned_loss=0.06669, ctc_loss=0.1153, over 3296223.14 frames. ], batch size: 271, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:21:03,608 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2023-10-09 17:21:07,168 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2822596.0, ans=0.125 2023-10-09 17:21:07,906 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.352e+02 3.979e+02 4.568e+02 7.750e+02, threshold=7.958e+02, percent-clipped=1.0 2023-10-09 17:21:33,576 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.62 vs. limit=10.0 2023-10-09 17:21:41,391 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2023-10-09 17:21:54,779 INFO [train.py:1031] (0/4) Epoch 14, batch 20150, loss[loss=0.2718, simple_loss=0.342, pruned_loss=0.07325, ctc_loss=0.1378, over 16914.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.289, pruned_loss=0.06585, ctc_loss=0.1148, over 3300478.62 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:21:55,629 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-10-09 17:21:59,566 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2822782.6666666665, ans=0.0 2023-10-09 17:22:13,652 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=15.0 2023-10-09 17:22:19,630 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2822876.0, ans=0.1 2023-10-09 17:22:55,974 INFO [train.py:1031] (0/4) Epoch 14, batch 20200, loss[loss=0.244, simple_loss=0.2967, pruned_loss=0.07014, ctc_loss=0.1277, over 16987.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2941, pruned_loss=0.06609, ctc_loss=0.1158, over 3308358.44 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:22:58,492 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2823016.0, ans=0.0 2023-10-09 17:23:12,688 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+02 3.410e+02 4.005e+02 4.580e+02 8.040e+02, threshold=8.011e+02, percent-clipped=1.0 2023-10-09 17:23:21,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2823109.3333333335, ans=0.125 2023-10-09 17:23:30,531 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2823109.3333333335, ans=0.0 2023-10-09 17:23:42,218 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2823156.0, ans=0.0 2023-10-09 17:23:55,824 INFO [train.py:1031] (0/4) Epoch 14, batch 20250, loss[loss=0.2164, simple_loss=0.2926, pruned_loss=0.05186, ctc_loss=0.09133, over 16871.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2914, pruned_loss=0.06547, ctc_loss=0.115, over 3313076.29 frames. ], batch size: 242, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:24:57,439 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2823482.6666666665, ans=0.125 2023-10-09 17:24:58,235 INFO [train.py:1031] (0/4) Epoch 14, batch 20300, loss[loss=0.1629, simple_loss=0.2129, pruned_loss=0.04376, ctc_loss=0.06353, over 11865.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2859, pruned_loss=0.06213, ctc_loss=0.1094, over 3305071.47 frames. ], batch size: 37, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:25:18,641 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+02 3.144e+02 3.729e+02 4.448e+02 8.440e+02, threshold=7.458e+02, percent-clipped=1.0 2023-10-09 17:25:26,571 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2823576.0, ans=0.2 2023-10-09 17:25:40,044 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2823622.6666666665, ans=0.125 2023-10-09 17:25:42,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2823622.6666666665, ans=0.0 2023-10-09 17:25:56,053 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2823669.3333333335, ans=0.125 2023-10-09 17:26:00,521 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2023-10-09 17:26:00,741 INFO [train.py:1031] (0/4) Epoch 14, batch 20350, loss[loss=0.2103, simple_loss=0.2631, pruned_loss=0.05835, ctc_loss=0.1021, over 16914.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2792, pruned_loss=0.06127, ctc_loss=0.1077, over 3308605.77 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:26:16,068 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-10-09 17:26:17,920 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2823762.6666666665, ans=0.1 2023-10-09 17:26:24,062 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2823809.3333333335, ans=0.125 2023-10-09 17:26:28,322 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2823809.3333333335, ans=0.0 2023-10-09 17:26:36,812 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-10-09 17:26:37,887 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2023-10-09 17:27:02,818 INFO [train.py:1031] (0/4) Epoch 14, batch 20400, loss[loss=0.2764, simple_loss=0.3268, pruned_loss=0.08514, ctc_loss=0.1391, over 16575.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2783, pruned_loss=0.0619, ctc_loss=0.1075, over 3299936.88 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:27:23,293 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+02 3.333e+02 4.109e+02 4.919e+02 1.143e+03, threshold=8.217e+02, percent-clipped=3.0 2023-10-09 17:27:40,929 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2824089.3333333335, ans=0.125 2023-10-09 17:27:43,033 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2824089.3333333335, ans=0.1 2023-10-09 17:27:50,488 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2824089.3333333335, ans=0.125 2023-10-09 17:27:50,491 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2824089.3333333335, ans=0.1 2023-10-09 17:28:05,983 INFO [train.py:1031] (0/4) Epoch 14, batch 20450, loss[loss=0.1835, simple_loss=0.2374, pruned_loss=0.04762, ctc_loss=0.08589, over 16862.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2764, pruned_loss=0.06157, ctc_loss=0.1059, over 3288758.83 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:28:23,552 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=12.0 2023-10-09 17:28:25,689 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=15.0 2023-10-09 17:28:29,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824229.3333333335, ans=0.1 2023-10-09 17:28:39,208 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2824276.0, ans=0.125 2023-10-09 17:28:55,487 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:28:57,605 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=15.0 2023-10-09 17:28:59,333 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2824369.3333333335, ans=0.2 2023-10-09 17:29:01,610 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2824369.3333333335, ans=0.125 2023-10-09 17:29:11,359 INFO [train.py:1031] (0/4) Epoch 14, batch 20500, loss[loss=0.2217, simple_loss=0.3361, pruned_loss=0.03832, ctc_loss=0.07681, over 16246.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2784, pruned_loss=0.05964, ctc_loss=0.103, over 3291605.31 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:29:27,203 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2824462.6666666665, ans=0.1 2023-10-09 17:29:32,927 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.206e+02 4.100e+02 5.464e+02 8.452e+02, threshold=8.200e+02, percent-clipped=1.0 2023-10-09 17:29:34,628 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.33 vs. limit=10.0 2023-10-09 17:29:37,966 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2824509.3333333335, ans=0.125 2023-10-09 17:29:38,244 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2023-10-09 17:29:40,169 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2824509.3333333335, ans=0.2 2023-10-09 17:29:41,660 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2023-10-09 17:30:04,529 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2023-10-09 17:30:15,068 INFO [train.py:1031] (0/4) Epoch 14, batch 20550, loss[loss=0.2141, simple_loss=0.2918, pruned_loss=0.04957, ctc_loss=0.09306, over 16777.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2905, pruned_loss=0.06071, ctc_loss=0.1062, over 3291123.38 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:30:15,443 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2824649.3333333335, ans=0.125 2023-10-09 17:30:15,819 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=12.0 2023-10-09 17:30:16,016 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-10-09 17:30:16,602 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2824649.3333333335, ans=0.0 2023-10-09 17:30:27,346 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2824696.0, ans=0.1 2023-10-09 17:30:36,406 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2824696.0, ans=0.125 2023-10-09 17:30:40,219 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824742.6666666665, ans=0.1 2023-10-09 17:31:01,918 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2824789.3333333335, ans=0.125 2023-10-09 17:31:11,412 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2824836.0, ans=0.125 2023-10-09 17:31:17,511 INFO [train.py:1031] (0/4) Epoch 14, batch 20600, loss[loss=0.2787, simple_loss=0.328, pruned_loss=0.08473, ctc_loss=0.1496, over 16718.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2942, pruned_loss=0.06118, ctc_loss=0.1076, over 3301613.00 frames. ], batch size: 271, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:31:24,700 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:31:24,945 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.92 vs. limit=10.0 2023-10-09 17:31:38,526 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2824929.3333333335, ans=0.125 2023-10-09 17:31:40,864 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.711e+02 4.411e+02 5.380e+02 7.131e+02, threshold=8.823e+02, percent-clipped=0.0 2023-10-09 17:31:42,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2824976.0, ans=0.0 2023-10-09 17:31:43,458 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2824976.0, ans=0.125 2023-10-09 17:31:49,427 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2824976.0, ans=0.05 2023-10-09 17:32:04,170 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825022.6666666665, ans=0.1 2023-10-09 17:32:13,428 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2825069.3333333335, ans=0.05 2023-10-09 17:32:20,106 INFO [train.py:1031] (0/4) Epoch 14, batch 20650, loss[loss=0.3517, simple_loss=0.3668, pruned_loss=0.1237, ctc_loss=0.2229, over 16685.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2992, pruned_loss=0.06535, ctc_loss=0.1146, over 3302363.87 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:32:22,197 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2825116.0, ans=0.125 2023-10-09 17:32:26,016 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2825116.0, ans=0.0 2023-10-09 17:32:30,231 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2825116.0, ans=0.125 2023-10-09 17:32:30,492 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=22.5 2023-10-09 17:32:32,039 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2825162.6666666665, ans=0.125 2023-10-09 17:32:41,423 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-10-09 17:32:42,511 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2825162.6666666665, ans=0.125 2023-10-09 17:32:54,896 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:33:03,373 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2825256.0, ans=0.0 2023-10-09 17:33:21,886 INFO [train.py:1031] (0/4) Epoch 14, batch 20700, loss[loss=0.2443, simple_loss=0.2916, pruned_loss=0.07395, ctc_loss=0.1227, over 16764.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.2972, pruned_loss=0.06647, ctc_loss=0.1164, over 3307580.07 frames. ], batch size: 271, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:33:30,812 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2825349.3333333335, ans=0.0 2023-10-09 17:33:45,255 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.306e+02 3.691e+02 4.281e+02 9.878e+02, threshold=7.382e+02, percent-clipped=2.0 2023-10-09 17:33:45,625 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2825442.6666666665, ans=0.0 2023-10-09 17:33:54,270 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2825442.6666666665, ans=0.0 2023-10-09 17:34:06,977 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2825489.3333333335, ans=0.125 2023-10-09 17:34:07,047 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825489.3333333335, ans=0.1 2023-10-09 17:34:08,660 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2825489.3333333335, ans=0.125 2023-10-09 17:34:08,707 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2825489.3333333335, ans=0.125 2023-10-09 17:34:15,581 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2825536.0, ans=0.0 2023-10-09 17:34:22,950 INFO [train.py:1031] (0/4) Epoch 14, batch 20750, loss[loss=0.2777, simple_loss=0.3183, pruned_loss=0.08742, ctc_loss=0.156, over 16857.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2952, pruned_loss=0.06742, ctc_loss=0.1181, over 3313789.07 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:34:33,981 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2825629.3333333335, ans=0.125 2023-10-09 17:34:54,474 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2825676.0, ans=0.125 2023-10-09 17:35:02,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2825722.6666666665, ans=0.125 2023-10-09 17:35:18,143 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2825769.3333333335, ans=0.0 2023-10-09 17:35:23,262 INFO [train.py:1031] (0/4) Epoch 14, batch 20800, loss[loss=0.1971, simple_loss=0.2655, pruned_loss=0.04665, ctc_loss=0.0886, over 16893.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.2968, pruned_loss=0.06823, ctc_loss=0.1205, over 3318578.62 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:35:34,143 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.33 vs. limit=15.0 2023-10-09 17:35:45,599 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2825909.3333333335, ans=0.125 2023-10-09 17:35:46,255 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+02 3.235e+02 3.640e+02 4.210e+02 8.474e+02, threshold=7.280e+02, percent-clipped=1.0 2023-10-09 17:35:54,154 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2825909.3333333335, ans=0.0 2023-10-09 17:35:58,036 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2825956.0, ans=0.0 2023-10-09 17:36:02,106 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2825956.0, ans=0.125 2023-10-09 17:36:03,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2825956.0, ans=0.125 2023-10-09 17:36:05,802 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2825956.0, ans=0.2 2023-10-09 17:36:22,181 INFO [train.py:1031] (0/4) Epoch 14, batch 20850, loss[loss=0.201, simple_loss=0.2625, pruned_loss=0.05154, ctc_loss=0.09121, over 16795.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.291, pruned_loss=0.06397, ctc_loss=0.1139, over 3322860.88 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:36:50,523 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2826142.6666666665, ans=0.125 2023-10-09 17:36:56,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2826142.6666666665, ans=15.0 2023-10-09 17:37:06,729 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2826189.3333333335, ans=0.1 2023-10-09 17:37:11,350 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2826236.0, ans=0.125 2023-10-09 17:37:14,066 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2826236.0, ans=0.125 2023-10-09 17:37:22,230 INFO [train.py:1031] (0/4) Epoch 14, batch 20900, loss[loss=0.1909, simple_loss=0.2471, pruned_loss=0.0494, ctc_loss=0.08991, over 16801.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2868, pruned_loss=0.06135, ctc_loss=0.1095, over 3321733.22 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:37:48,711 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.791e+02 3.163e+02 3.693e+02 7.251e+02, threshold=6.327e+02, percent-clipped=0.0 2023-10-09 17:38:18,435 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2826469.3333333335, ans=0.125 2023-10-09 17:38:22,288 INFO [train.py:1031] (0/4) Epoch 14, batch 20950, loss[loss=0.2479, simple_loss=0.2675, pruned_loss=0.08496, ctc_loss=0.1459, over 16529.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2798, pruned_loss=0.06103, ctc_loss=0.1082, over 3320903.92 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:38:23,628 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2826516.0, ans=0.125 2023-10-09 17:38:32,310 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2826516.0, ans=0.2 2023-10-09 17:38:40,580 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2826562.6666666665, ans=0.125 2023-10-09 17:38:40,972 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-10-09 17:38:53,483 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-10-09 17:39:01,845 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2826656.0, ans=0.0 2023-10-09 17:39:01,920 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2826656.0, ans=0.125 2023-10-09 17:39:23,275 INFO [train.py:1031] (0/4) Epoch 14, batch 21000, loss[loss=0.2769, simple_loss=0.3149, pruned_loss=0.08811, ctc_loss=0.1565, over 16680.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2806, pruned_loss=0.0634, ctc_loss=0.1119, over 3321917.52 frames. ], batch size: 353, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:39:23,276 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 17:39:38,689 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3028, 3.6221, 3.7341, 3.8422], device='cuda:0') 2023-10-09 17:39:41,355 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2348, simple_loss=0.3049, pruned_loss=0.06333, ctc_loss=0.09533, over 1796401.00 frames. 2023-10-09 17:39:41,356 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 17:39:49,119 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2826749.3333333335, ans=0.0 2023-10-09 17:40:03,085 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2826842.6666666665, ans=0.0 2023-10-09 17:40:06,245 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2826842.6666666665, ans=0.125 2023-10-09 17:40:07,007 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+02 3.238e+02 3.624e+02 4.210e+02 7.239e+02, threshold=7.249e+02, percent-clipped=3.0 2023-10-09 17:40:11,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826842.6666666665, ans=0.1 2023-10-09 17:40:19,123 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2826889.3333333335, ans=0.0 2023-10-09 17:40:25,421 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2826936.0, ans=0.0 2023-10-09 17:40:30,338 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:32,379 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:39,034 INFO [train.py:1031] (0/4) Epoch 14, batch 21050, loss[loss=0.2142, simple_loss=0.2869, pruned_loss=0.0543, ctc_loss=0.08211, over 16395.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2854, pruned_loss=0.06321, ctc_loss=0.1114, over 3316112.38 frames. ], batch size: 70, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:41:16,393 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2827122.6666666665, ans=0.125 2023-10-09 17:41:26,137 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2827169.3333333335, ans=0.125 2023-10-09 17:41:36,264 INFO [train.py:1031] (0/4) Epoch 14, batch 21100, loss[loss=0.2238, simple_loss=0.2789, pruned_loss=0.06146, ctc_loss=0.1144, over 16370.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2844, pruned_loss=0.06131, ctc_loss=0.1068, over 3313894.76 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:41:47,538 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2827262.6666666665, ans=0.125 2023-10-09 17:41:57,624 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827262.6666666665, ans=0.1 2023-10-09 17:42:00,374 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2827309.3333333335, ans=0.2 2023-10-09 17:42:05,336 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.720e+02 3.068e+02 3.590e+02 8.081e+02, threshold=6.137e+02, percent-clipped=1.0 2023-10-09 17:42:20,595 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2827356.0, ans=0.1 2023-10-09 17:42:37,596 INFO [train.py:1031] (0/4) Epoch 14, batch 21150, loss[loss=0.2229, simple_loss=0.2667, pruned_loss=0.06607, ctc_loss=0.1172, over 16803.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2809, pruned_loss=0.06138, ctc_loss=0.1064, over 3314681.59 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:42:42,586 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2827449.3333333335, ans=0.125 2023-10-09 17:42:44,738 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2827449.3333333335, ans=0.125 2023-10-09 17:42:52,623 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2827496.0, ans=0.2 2023-10-09 17:43:02,284 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2827542.6666666665, ans=0.125 2023-10-09 17:43:05,690 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=12.0 2023-10-09 17:43:13,896 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2827589.3333333335, ans=0.125 2023-10-09 17:43:13,935 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2827589.3333333335, ans=0.125 2023-10-09 17:43:36,527 INFO [train.py:1031] (0/4) Epoch 14, batch 21200, loss[loss=0.2015, simple_loss=0.2635, pruned_loss=0.05222, ctc_loss=0.08785, over 16816.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2758, pruned_loss=0.06112, ctc_loss=0.1055, over 3319849.93 frames. ], batch size: 329, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:43:36,836 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2827682.6666666665, ans=0.125 2023-10-09 17:43:40,088 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2023-10-09 17:43:42,779 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2827682.6666666665, ans=0.0 2023-10-09 17:43:46,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2827682.6666666665, ans=0.125 2023-10-09 17:43:52,198 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2827729.3333333335, ans=0.2 2023-10-09 17:43:53,850 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2827729.3333333335, ans=0.0 2023-10-09 17:43:56,528 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2827729.3333333335, ans=0.025 2023-10-09 17:44:07,100 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.239e+02 3.845e+02 5.038e+02 8.843e+02, threshold=7.690e+02, percent-clipped=9.0 2023-10-09 17:44:35,699 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2827869.3333333335, ans=0.1 2023-10-09 17:44:35,817 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2827869.3333333335, ans=0.0 2023-10-09 17:44:39,124 INFO [train.py:1031] (0/4) Epoch 14, batch 21250, loss[loss=0.3891, simple_loss=0.4285, pruned_loss=0.13, ctc_loss=0.2242, over 16661.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2769, pruned_loss=0.05932, ctc_loss=0.1032, over 3318984.03 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:44:46,350 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2827916.0, ans=0.95 2023-10-09 17:44:48,512 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:44:48,620 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2827916.0, ans=0.2 2023-10-09 17:44:58,477 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2827962.6666666665, ans=0.125 2023-10-09 17:45:17,027 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2828056.0, ans=0.0 2023-10-09 17:45:29,748 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-10-09 17:45:42,990 INFO [train.py:1031] (0/4) Epoch 14, batch 21300, loss[loss=0.222, simple_loss=0.2959, pruned_loss=0.0547, ctc_loss=0.09654, over 16837.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2905, pruned_loss=0.06376, ctc_loss=0.1111, over 3303434.99 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:45:46,473 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-10-09 17:45:47,199 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2828149.3333333335, ans=0.0 2023-10-09 17:45:55,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2828196.0, ans=0.04949747468305833 2023-10-09 17:46:01,293 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-10-09 17:46:14,300 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+02 3.461e+02 4.159e+02 5.409e+02 1.290e+03, threshold=8.318e+02, percent-clipped=7.0 2023-10-09 17:46:20,965 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:46:30,968 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2828289.3333333335, ans=0.125 2023-10-09 17:46:39,809 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2023-10-09 17:46:41,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2828336.0, ans=0.125 2023-10-09 17:46:43,188 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2828336.0, ans=0.2 2023-10-09 17:46:45,010 INFO [train.py:1031] (0/4) Epoch 14, batch 21350, loss[loss=0.2263, simple_loss=0.2888, pruned_loss=0.06137, ctc_loss=0.1028, over 16840.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2894, pruned_loss=0.06205, ctc_loss=0.1087, over 3300489.35 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:46:52,139 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2023-10-09 17:46:56,844 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=22.5 2023-10-09 17:46:58,471 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2828429.3333333335, ans=0.0 2023-10-09 17:47:07,093 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2828429.3333333335, ans=0.125 2023-10-09 17:47:10,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2828476.0, ans=0.0 2023-10-09 17:47:23,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2828522.6666666665, ans=0.0 2023-10-09 17:47:24,594 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2828522.6666666665, ans=0.0 2023-10-09 17:47:41,908 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2828569.3333333335, ans=0.2 2023-10-09 17:47:47,070 INFO [train.py:1031] (0/4) Epoch 14, batch 21400, loss[loss=0.2218, simple_loss=0.2829, pruned_loss=0.05992, ctc_loss=0.102, over 16851.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2874, pruned_loss=0.06308, ctc_loss=0.1107, over 3309448.80 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:47:53,196 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2828616.0, ans=0.1 2023-10-09 17:47:58,585 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2828662.6666666665, ans=0.0 2023-10-09 17:48:00,857 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2828662.6666666665, ans=0.125 2023-10-09 17:48:03,531 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2828662.6666666665, ans=0.0 2023-10-09 17:48:07,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2828662.6666666665, ans=0.05 2023-10-09 17:48:09,510 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2828662.6666666665, ans=0.125 2023-10-09 17:48:11,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2828709.3333333335, ans=0.2 2023-10-09 17:48:19,464 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 3.090e+02 3.533e+02 3.983e+02 1.095e+03, threshold=7.067e+02, percent-clipped=1.0 2023-10-09 17:48:24,632 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2828756.0, ans=0.0 2023-10-09 17:48:35,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2828802.6666666665, ans=0.0 2023-10-09 17:48:42,993 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=22.5 2023-10-09 17:48:48,679 INFO [train.py:1031] (0/4) Epoch 14, batch 21450, loss[loss=0.2032, simple_loss=0.2514, pruned_loss=0.05797, ctc_loss=0.09784, over 16734.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2817, pruned_loss=0.06323, ctc_loss=0.1108, over 3309011.53 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:48:52,598 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2828849.3333333335, ans=0.1 2023-10-09 17:48:56,440 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2828849.3333333335, ans=0.2 2023-10-09 17:49:00,978 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2828896.0, ans=6.0 2023-10-09 17:49:36,120 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2829036.0, ans=0.125 2023-10-09 17:49:46,898 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2829036.0, ans=0.035 2023-10-09 17:49:49,290 INFO [train.py:1031] (0/4) Epoch 14, batch 21500, loss[loss=0.2058, simple_loss=0.2557, pruned_loss=0.05869, ctc_loss=0.09618, over 16849.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2772, pruned_loss=0.063, ctc_loss=0.11, over 3306903.06 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:50:01,200 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2829129.3333333335, ans=0.125 2023-10-09 17:50:22,938 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 3.091e+02 3.543e+02 4.001e+02 7.738e+02, threshold=7.086e+02, percent-clipped=2.0 2023-10-09 17:50:49,220 INFO [train.py:1031] (0/4) Epoch 14, batch 21550, loss[loss=0.2352, simple_loss=0.2992, pruned_loss=0.06309, ctc_loss=0.1123, over 16764.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2738, pruned_loss=0.06266, ctc_loss=0.1093, over 3313533.28 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:50:55,077 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2829316.0, ans=0.125 2023-10-09 17:50:55,161 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2829316.0, ans=0.2 2023-10-09 17:51:03,328 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2023-10-09 17:51:29,169 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2829456.0, ans=0.1 2023-10-09 17:51:44,644 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2829502.6666666665, ans=0.125 2023-10-09 17:51:45,654 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2829502.6666666665, ans=0.125 2023-10-09 17:51:50,022 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:51:52,385 INFO [train.py:1031] (0/4) Epoch 14, batch 21600, loss[loss=0.2259, simple_loss=0.2808, pruned_loss=0.06285, ctc_loss=0.1134, over 15332.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2777, pruned_loss=0.06229, ctc_loss=0.1095, over 3313946.98 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:51:55,476 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2829549.3333333335, ans=0.0 2023-10-09 17:51:57,446 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2829549.3333333335, ans=0.125 2023-10-09 17:52:29,888 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 3.318e+02 3.916e+02 4.621e+02 6.071e+02, threshold=7.833e+02, percent-clipped=0.0 2023-10-09 17:52:47,859 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=12.0 2023-10-09 17:52:50,163 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2829736.0, ans=0.125 2023-10-09 17:52:53,995 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2829736.0, ans=0.125 2023-10-09 17:52:55,781 INFO [train.py:1031] (0/4) Epoch 14, batch 21650, loss[loss=0.2363, simple_loss=0.3029, pruned_loss=0.06215, ctc_loss=0.1134, over 16906.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2855, pruned_loss=0.06598, ctc_loss=0.1155, over 3308552.84 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:53:07,095 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2829782.6666666665, ans=0.0 2023-10-09 17:53:17,562 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2829829.3333333335, ans=0.2 2023-10-09 17:53:58,717 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2830016.0, ans=0.0 2023-10-09 17:53:59,381 INFO [train.py:1031] (0/4) Epoch 14, batch 21700, loss[loss=0.2611, simple_loss=0.315, pruned_loss=0.07721, ctc_loss=0.1317, over 16859.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2902, pruned_loss=0.06869, ctc_loss=0.1197, over 3298139.88 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:54:06,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2830016.0, ans=0.0 2023-10-09 17:54:18,627 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2830062.6666666665, ans=0.125 2023-10-09 17:54:19,774 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2830062.6666666665, ans=0.0 2023-10-09 17:54:20,831 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2830062.6666666665, ans=0.125 2023-10-09 17:54:25,731 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2830109.3333333335, ans=0.125 2023-10-09 17:54:32,684 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-10-09 17:54:34,707 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.454e+02 3.938e+02 4.640e+02 9.291e+02, threshold=7.877e+02, percent-clipped=1.0 2023-10-09 17:54:36,428 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=22.5 2023-10-09 17:54:51,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2830202.6666666665, ans=0.125 2023-10-09 17:54:58,957 INFO [train.py:1031] (0/4) Epoch 14, batch 21750, loss[loss=0.2194, simple_loss=0.2839, pruned_loss=0.05898, ctc_loss=0.09242, over 16715.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.2929, pruned_loss=0.06795, ctc_loss=0.1184, over 3281551.19 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:55:25,229 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2830342.6666666665, ans=0.125 2023-10-09 17:55:27,362 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2830342.6666666665, ans=0.2 2023-10-09 17:55:35,265 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:55:36,877 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2830389.3333333335, ans=0.125 2023-10-09 17:55:40,054 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.97 vs. limit=15.0 2023-10-09 17:55:42,622 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2830389.3333333335, ans=0.0 2023-10-09 17:55:50,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2830436.0, ans=0.0 2023-10-09 17:56:00,621 INFO [train.py:1031] (0/4) Epoch 14, batch 21800, loss[loss=0.1368, simple_loss=0.2036, pruned_loss=0.02591, ctc_loss=0.04545, over 16683.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2893, pruned_loss=0.06398, ctc_loss=0.1118, over 3272933.86 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:56:04,789 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2830482.6666666665, ans=0.125 2023-10-09 17:56:37,329 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.581e+02 3.060e+02 4.394e+02 8.007e+02, threshold=6.120e+02, percent-clipped=1.0 2023-10-09 17:56:52,765 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2830669.3333333335, ans=0.125 2023-10-09 17:56:58,495 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.34 vs. limit=10.0 2023-10-09 17:57:03,733 INFO [train.py:1031] (0/4) Epoch 14, batch 21850, loss[loss=0.2466, simple_loss=0.3307, pruned_loss=0.05924, ctc_loss=0.1103, over 16827.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2857, pruned_loss=0.05959, ctc_loss=0.1047, over 3281460.00 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:57:08,215 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2023-10-09 17:57:22,862 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2830762.6666666665, ans=0.125 2023-10-09 17:57:24,584 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2830762.6666666665, ans=0.2 2023-10-09 17:57:25,118 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2023-10-09 17:57:39,716 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2830809.3333333335, ans=0.2 2023-10-09 17:57:47,184 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:58:06,347 INFO [train.py:1031] (0/4) Epoch 14, batch 21900, loss[loss=0.2572, simple_loss=0.3202, pruned_loss=0.07113, ctc_loss=0.1298, over 16712.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2911, pruned_loss=0.06148, ctc_loss=0.1082, over 3282604.10 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:58:17,677 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.96 vs. limit=22.5 2023-10-09 17:58:22,010 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2830996.0, ans=0.0 2023-10-09 17:58:24,456 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=22.5 2023-10-09 17:58:29,341 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2830996.0, ans=0.1 2023-10-09 17:58:35,619 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-10-09 17:58:39,812 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831042.6666666665, ans=0.1 2023-10-09 17:58:40,945 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831042.6666666665, ans=0.1 2023-10-09 17:58:41,031 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831042.6666666665, ans=0.1 2023-10-09 17:58:42,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2831042.6666666665, ans=0.05 2023-10-09 17:58:47,464 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 3.180e+02 3.668e+02 4.478e+02 7.065e+02, threshold=7.335e+02, percent-clipped=3.0 2023-10-09 17:58:47,791 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2831089.3333333335, ans=0.125 2023-10-09 17:58:54,440 WARNING [train.py:1204] (0/4) Exclude cut with ID X0000003684_17524832_S00712_sp1.1 from training. Number of frames (before subsampling): 130. Number of frames (after subsampling): 31. Text: 哒哒哒哒哒哒哒哒哒哒哒哒. Tokens: ['▁', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>']. Number of tokens: 37 2023-10-09 17:59:06,532 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2831136.0, ans=0.0 2023-10-09 17:59:10,949 INFO [train.py:1031] (0/4) Epoch 14, batch 21950, loss[loss=0.2935, simple_loss=0.3548, pruned_loss=0.08667, ctc_loss=0.1469, over 16849.00 frames. ], tot_loss[loss=0.242, simple_loss=0.301, pruned_loss=0.0678, ctc_loss=0.1187, over 3289569.63 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:59:11,368 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2831182.6666666665, ans=0.02 2023-10-09 17:59:16,210 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=8.0 2023-10-09 17:59:21,744 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831182.6666666665, ans=0.1 2023-10-09 17:59:40,555 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2831276.0, ans=0.125 2023-10-09 17:59:45,523 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:59:54,840 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2831322.6666666665, ans=0.125 2023-10-09 18:00:01,444 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2831369.3333333335, ans=0.0 2023-10-09 18:00:14,567 INFO [train.py:1031] (0/4) Epoch 14, batch 22000, loss[loss=0.3181, simple_loss=0.3316, pruned_loss=0.1128, ctc_loss=0.1973, over 16527.00 frames. ], tot_loss[loss=0.2515, simple_loss=0.3099, pruned_loss=0.07157, ctc_loss=0.1251, over 3287089.66 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:00:37,893 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:00:40,122 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2831509.3333333335, ans=0.125 2023-10-09 18:00:46,101 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-10-09 18:00:55,417 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.995e+02 5.154e+02 7.072e+02 9.807e+02, threshold=1.031e+03, percent-clipped=19.0 2023-10-09 18:01:01,997 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2023-10-09 18:01:10,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2831602.6666666665, ans=0.125 2023-10-09 18:01:17,525 INFO [train.py:1031] (0/4) Epoch 14, batch 22050, loss[loss=0.1936, simple_loss=0.2427, pruned_loss=0.05445, ctc_loss=0.08925, over 16669.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.3004, pruned_loss=0.06973, ctc_loss=0.1216, over 3291482.93 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:01:20,628 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2831649.3333333335, ans=0.1 2023-10-09 18:01:27,374 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2831649.3333333335, ans=0.0 2023-10-09 18:01:29,760 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2831696.0, ans=0.125 2023-10-09 18:01:43,426 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2831742.6666666665, ans=0.2 2023-10-09 18:01:43,473 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2831742.6666666665, ans=0.0 2023-10-09 18:01:59,997 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.43 vs. limit=6.0 2023-10-09 18:02:08,619 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2831836.0, ans=0.125 2023-10-09 18:02:20,446 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2831836.0, ans=0.035 2023-10-09 18:02:22,388 INFO [train.py:1031] (0/4) Epoch 14, batch 22100, loss[loss=0.2234, simple_loss=0.2763, pruned_loss=0.0646, ctc_loss=0.1033, over 16815.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.2974, pruned_loss=0.06908, ctc_loss=0.1195, over 3277084.89 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:02:31,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2831882.6666666665, ans=0.125 2023-10-09 18:02:48,270 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2831976.0, ans=0.125 2023-10-09 18:02:52,495 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2831976.0, ans=0.2 2023-10-09 18:02:56,332 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2831976.0, ans=0.04949747468305833 2023-10-09 18:03:03,878 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2832022.6666666665, ans=0.0 2023-10-09 18:03:04,527 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+02 3.384e+02 3.750e+02 4.334e+02 8.202e+02, threshold=7.499e+02, percent-clipped=0.0 2023-10-09 18:03:19,135 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2832069.3333333335, ans=0.125 2023-10-09 18:03:21,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2832069.3333333335, ans=0.04949747468305833 2023-10-09 18:03:22,962 INFO [train.py:1031] (0/4) Epoch 14, batch 22150, loss[loss=0.2775, simple_loss=0.3226, pruned_loss=0.08652, ctc_loss=0.1486, over 15330.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.2975, pruned_loss=0.06966, ctc_loss=0.1199, over 3276330.50 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:03:32,358 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2832116.0, ans=0.2 2023-10-09 18:03:33,375 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2832116.0, ans=0.125 2023-10-09 18:03:33,549 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2023-10-09 18:03:36,861 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=22.5 2023-10-09 18:03:37,780 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=2832162.6666666665, ans=0.1 2023-10-09 18:03:40,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2832162.6666666665, ans=0.0 2023-10-09 18:03:44,532 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2832162.6666666665, ans=0.125 2023-10-09 18:03:56,293 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2832209.3333333335, ans=0.125 2023-10-09 18:04:12,235 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2832302.6666666665, ans=0.0 2023-10-09 18:04:18,045 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-10-09 18:04:25,075 INFO [train.py:1031] (0/4) Epoch 14, batch 22200, loss[loss=0.2418, simple_loss=0.3139, pruned_loss=0.06139, ctc_loss=0.1175, over 15380.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.2996, pruned_loss=0.07063, ctc_loss=0.122, over 3285461.62 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:05:03,203 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=22.5 2023-10-09 18:05:06,090 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.127e+02 3.515e+02 4.166e+02 8.841e+02, threshold=7.030e+02, percent-clipped=1.0 2023-10-09 18:05:17,558 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2832536.0, ans=15.0 2023-10-09 18:05:24,124 INFO [train.py:1031] (0/4) Epoch 14, batch 22250, loss[loss=0.2787, simple_loss=0.3282, pruned_loss=0.08493, ctc_loss=0.148, over 16825.00 frames. ], tot_loss[loss=0.2433, simple_loss=0.2995, pruned_loss=0.06945, ctc_loss=0.1205, over 3295016.86 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:05:28,798 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2832582.6666666665, ans=0.0 2023-10-09 18:05:31,249 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.26 vs. limit=15.0 2023-10-09 18:05:42,429 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.06 vs. limit=22.5 2023-10-09 18:06:06,763 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2832722.6666666665, ans=0.0 2023-10-09 18:06:07,135 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-10-09 18:06:20,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2832769.3333333335, ans=0.0 2023-10-09 18:06:25,140 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2832816.0, ans=0.125 2023-10-09 18:06:25,778 INFO [train.py:1031] (0/4) Epoch 14, batch 22300, loss[loss=0.2586, simple_loss=0.3015, pruned_loss=0.07974, ctc_loss=0.1405, over 16930.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.3008, pruned_loss=0.07068, ctc_loss=0.1229, over 3303642.62 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:06:27,164 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2832816.0, ans=0.1 2023-10-09 18:06:51,298 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2832909.3333333335, ans=0.2 2023-10-09 18:06:52,371 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2832909.3333333335, ans=0.1 2023-10-09 18:07:07,706 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.458e+02 3.881e+02 4.380e+02 7.162e+02, threshold=7.762e+02, percent-clipped=2.0 2023-10-09 18:07:17,191 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2833002.6666666665, ans=0.125 2023-10-09 18:07:25,987 INFO [train.py:1031] (0/4) Epoch 14, batch 22350, loss[loss=0.2146, simple_loss=0.2514, pruned_loss=0.0671, ctc_loss=0.1089, over 16772.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.2999, pruned_loss=0.07095, ctc_loss=0.1233, over 3302704.12 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:07:30,144 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2833049.3333333335, ans=0.2 2023-10-09 18:07:53,505 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2833142.6666666665, ans=0.125 2023-10-09 18:07:54,009 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-10-09 18:08:16,937 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-10-09 18:08:27,606 INFO [train.py:1031] (0/4) Epoch 14, batch 22400, loss[loss=0.2868, simple_loss=0.354, pruned_loss=0.0797, ctc_loss=0.1506, over 16638.00 frames. ], tot_loss[loss=0.246, simple_loss=0.3029, pruned_loss=0.07006, ctc_loss=0.1225, over 3295858.72 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:08:37,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2833282.6666666665, ans=0.125 2023-10-09 18:08:39,402 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2833329.3333333335, ans=0.125 2023-10-09 18:08:49,979 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2833329.3333333335, ans=0.0 2023-10-09 18:09:07,880 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2833422.6666666665, ans=0.035 2023-10-09 18:09:08,997 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2833422.6666666665, ans=0.125 2023-10-09 18:09:11,653 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.558e+02 3.421e+02 3.975e+02 5.211e+02 8.186e+02, threshold=7.949e+02, percent-clipped=2.0 2023-10-09 18:09:29,996 INFO [train.py:1031] (0/4) Epoch 14, batch 22450, loss[loss=0.3169, simple_loss=0.3523, pruned_loss=0.1031, ctc_loss=0.1883, over 16676.00 frames. ], tot_loss[loss=0.2473, simple_loss=0.3057, pruned_loss=0.06994, ctc_loss=0.1225, over 3301372.82 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:09:42,137 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2833562.6666666665, ans=0.125 2023-10-09 18:09:54,337 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2833609.3333333335, ans=0.1 2023-10-09 18:10:16,608 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2833656.0, ans=0.125 2023-10-09 18:10:31,932 INFO [train.py:1031] (0/4) Epoch 14, batch 22500, loss[loss=0.2387, simple_loss=0.2681, pruned_loss=0.0769, ctc_loss=0.1386, over 16465.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.3019, pruned_loss=0.07029, ctc_loss=0.123, over 3299105.72 frames. ], batch size: 418, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:10:50,488 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2833796.0, ans=0.125 2023-10-09 18:10:57,568 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:11:01,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2833842.6666666665, ans=0.2 2023-10-09 18:11:17,898 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+02 3.228e+02 3.590e+02 3.967e+02 7.433e+02, threshold=7.180e+02, percent-clipped=0.0 2023-10-09 18:11:23,098 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2833936.0, ans=0.125 2023-10-09 18:11:32,583 INFO [train.py:1031] (0/4) Epoch 14, batch 22550, loss[loss=0.1819, simple_loss=0.2376, pruned_loss=0.04679, ctc_loss=0.08146, over 16655.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.2929, pruned_loss=0.06873, ctc_loss=0.12, over 3306367.77 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:11:57,814 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2834076.0, ans=0.1 2023-10-09 18:12:02,866 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-10-09 18:12:33,387 INFO [train.py:1031] (0/4) Epoch 14, batch 22600, loss[loss=0.2017, simple_loss=0.268, pruned_loss=0.0499, ctc_loss=0.08895, over 16786.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2859, pruned_loss=0.06457, ctc_loss=0.1134, over 3309539.40 frames. ], batch size: 329, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:12:37,679 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2834216.0, ans=0.0 2023-10-09 18:12:42,649 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2834216.0, ans=0.0 2023-10-09 18:13:08,938 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2834356.0, ans=0.2 2023-10-09 18:13:20,265 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 2.912e+02 3.400e+02 4.128e+02 6.956e+02, threshold=6.801e+02, percent-clipped=0.0 2023-10-09 18:13:21,781 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2834402.6666666665, ans=0.125 2023-10-09 18:13:34,045 INFO [train.py:1031] (0/4) Epoch 14, batch 22650, loss[loss=0.196, simple_loss=0.2439, pruned_loss=0.05518, ctc_loss=0.09455, over 16769.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2806, pruned_loss=0.06351, ctc_loss=0.1118, over 3306543.80 frames. ], batch size: 141, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:13:34,451 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2834449.3333333335, ans=0.0 2023-10-09 18:13:38,607 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2834449.3333333335, ans=0.125 2023-10-09 18:13:40,629 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2834449.3333333335, ans=0.2 2023-10-09 18:13:45,485 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:13:54,512 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2834496.0, ans=0.0 2023-10-09 18:13:55,818 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-10-09 18:14:35,171 INFO [train.py:1031] (0/4) Epoch 14, batch 22700, loss[loss=0.2442, simple_loss=0.2958, pruned_loss=0.07097, ctc_loss=0.1267, over 16863.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2809, pruned_loss=0.0655, ctc_loss=0.1145, over 3310435.44 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:14:56,135 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2834729.3333333335, ans=0.125 2023-10-09 18:14:59,251 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2023-10-09 18:14:59,308 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2023-10-09 18:15:01,859 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2834776.0, ans=0.0 2023-10-09 18:15:05,567 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834776.0, ans=0.1 2023-10-09 18:15:06,682 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2834776.0, ans=0.125 2023-10-09 18:15:14,222 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2834822.6666666665, ans=0.1 2023-10-09 18:15:21,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2834822.6666666665, ans=0.125 2023-10-09 18:15:24,588 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.390e+02 4.032e+02 4.588e+02 8.428e+02, threshold=8.064e+02, percent-clipped=2.0 2023-10-09 18:15:35,907 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2834869.3333333335, ans=0.0 2023-10-09 18:15:37,582 INFO [train.py:1031] (0/4) Epoch 14, batch 22750, loss[loss=0.2344, simple_loss=0.2905, pruned_loss=0.0659, ctc_loss=0.1165, over 16866.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2851, pruned_loss=0.06825, ctc_loss=0.1189, over 3301513.40 frames. ], batch size: 141, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:15:37,909 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2834916.0, ans=0.125 2023-10-09 18:15:46,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2834916.0, ans=0.0 2023-10-09 18:15:48,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2834916.0, ans=0.2 2023-10-09 18:15:49,451 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2834962.6666666665, ans=0.125 2023-10-09 18:15:51,032 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2834962.6666666665, ans=0.2 2023-10-09 18:16:04,222 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=22.5 2023-10-09 18:16:12,043 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:16:25,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2835102.6666666665, ans=0.1 2023-10-09 18:16:34,928 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2835102.6666666665, ans=0.125 2023-10-09 18:16:39,368 INFO [train.py:1031] (0/4) Epoch 14, batch 22800, loss[loss=0.2776, simple_loss=0.3247, pruned_loss=0.08386, ctc_loss=0.157, over 16835.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2895, pruned_loss=0.07025, ctc_loss=0.1222, over 3300248.49 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:16:40,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2835149.3333333335, ans=0.2 2023-10-09 18:16:44,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2835149.3333333335, ans=0.125 2023-10-09 18:16:55,106 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=22.5 2023-10-09 18:17:02,442 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2835242.6666666665, ans=0.125 2023-10-09 18:17:10,533 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2835242.6666666665, ans=0.0 2023-10-09 18:17:28,722 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+02 3.223e+02 3.755e+02 4.885e+02 7.657e+02, threshold=7.509e+02, percent-clipped=0.0 2023-10-09 18:17:30,128 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2835336.0, ans=0.125 2023-10-09 18:17:31,282 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2835336.0, ans=0.2 2023-10-09 18:17:38,877 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2835382.6666666665, ans=0.125 2023-10-09 18:17:39,519 INFO [train.py:1031] (0/4) Epoch 14, batch 22850, loss[loss=0.2467, simple_loss=0.3007, pruned_loss=0.07201, ctc_loss=0.1216, over 16962.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2915, pruned_loss=0.06824, ctc_loss=0.1189, over 3301419.30 frames. ], batch size: 243, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:17:41,494 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2835382.6666666665, ans=0.125 2023-10-09 18:17:48,116 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2023-10-09 18:17:57,336 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2023-10-09 18:18:09,773 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2835476.0, ans=0.125 2023-10-09 18:18:13,354 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2835476.0, ans=0.1 2023-10-09 18:18:33,290 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2835569.3333333335, ans=0.0 2023-10-09 18:18:33,310 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:18:37,425 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-10-09 18:18:38,787 INFO [train.py:1031] (0/4) Epoch 14, batch 22900, loss[loss=0.245, simple_loss=0.2845, pruned_loss=0.07644, ctc_loss=0.1316, over 16580.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2908, pruned_loss=0.0674, ctc_loss=0.1176, over 3298408.57 frames. ], batch size: 416, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:18:40,998 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2023-10-09 18:18:49,074 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:18:53,986 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2835662.6666666665, ans=0.1 2023-10-09 18:18:54,918 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2835662.6666666665, ans=0.0 2023-10-09 18:18:54,976 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2835662.6666666665, ans=0.125 2023-10-09 18:19:07,366 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2835709.3333333335, ans=0.0 2023-10-09 18:19:22,072 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2835756.0, ans=0.2 2023-10-09 18:19:29,093 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+02 3.037e+02 3.390e+02 3.855e+02 5.718e+02, threshold=6.781e+02, percent-clipped=0.0 2023-10-09 18:19:29,495 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2835802.6666666665, ans=0.1 2023-10-09 18:19:40,763 INFO [train.py:1031] (0/4) Epoch 14, batch 22950, loss[loss=0.2321, simple_loss=0.2856, pruned_loss=0.06727, ctc_loss=0.1104, over 16997.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2877, pruned_loss=0.06697, ctc_loss=0.1168, over 3298623.93 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:19:57,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2835896.0, ans=0.125 2023-10-09 18:20:19,174 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2835989.3333333335, ans=0.1 2023-10-09 18:20:41,171 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2836036.0, ans=0.0 2023-10-09 18:20:42,886 INFO [train.py:1031] (0/4) Epoch 14, batch 23000, loss[loss=0.2348, simple_loss=0.2957, pruned_loss=0.06535, ctc_loss=0.1083, over 16802.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2889, pruned_loss=0.0648, ctc_loss=0.1137, over 3289717.03 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:21:12,219 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836176.0, ans=0.1 2023-10-09 18:21:12,244 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2836176.0, ans=0.125 2023-10-09 18:21:34,592 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2836269.3333333335, ans=0.125 2023-10-09 18:21:36,007 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 3.336e+02 3.961e+02 4.906e+02 8.428e+02, threshold=7.922e+02, percent-clipped=4.0 2023-10-09 18:21:40,199 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2836269.3333333335, ans=0.0 2023-10-09 18:21:42,392 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2836269.3333333335, ans=0.125 2023-10-09 18:21:45,238 INFO [train.py:1031] (0/4) Epoch 14, batch 23050, loss[loss=0.2314, simple_loss=0.2928, pruned_loss=0.06287, ctc_loss=0.1107, over 16752.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2944, pruned_loss=0.06776, ctc_loss=0.1186, over 3285502.34 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:22:14,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2836409.3333333335, ans=0.125 2023-10-09 18:22:15,623 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2836409.3333333335, ans=0.125 2023-10-09 18:22:33,231 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2836456.0, ans=0.0 2023-10-09 18:22:40,679 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2836502.6666666665, ans=0.2 2023-10-09 18:22:47,971 INFO [train.py:1031] (0/4) Epoch 14, batch 23100, loss[loss=0.1902, simple_loss=0.2529, pruned_loss=0.04705, ctc_loss=0.08332, over 16536.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2896, pruned_loss=0.06398, ctc_loss=0.1128, over 3292815.46 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:23:09,351 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2836596.0, ans=0.1 2023-10-09 18:23:12,559 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2836642.6666666665, ans=0.95 2023-10-09 18:23:41,038 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.931e+02 3.346e+02 4.278e+02 6.701e+02, threshold=6.692e+02, percent-clipped=0.0 2023-10-09 18:23:43,445 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2836736.0, ans=0.1 2023-10-09 18:23:50,131 INFO [train.py:1031] (0/4) Epoch 14, batch 23150, loss[loss=0.2207, simple_loss=0.257, pruned_loss=0.06641, ctc_loss=0.129, over 15251.00 frames. ], tot_loss[loss=0.226, simple_loss=0.284, pruned_loss=0.06202, ctc_loss=0.1099, over 3284682.30 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:24:12,463 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2836829.3333333335, ans=0.0 2023-10-09 18:24:29,160 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2836922.6666666665, ans=0.0 2023-10-09 18:24:50,638 INFO [train.py:1031] (0/4) Epoch 14, batch 23200, loss[loss=0.1562, simple_loss=0.2043, pruned_loss=0.03974, ctc_loss=0.07158, over 10538.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.28, pruned_loss=0.06149, ctc_loss=0.1088, over 3279088.39 frames. ], batch size: 35, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:25:13,044 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2837062.6666666665, ans=0.125 2023-10-09 18:25:41,291 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2837202.6666666665, ans=0.1 2023-10-09 18:25:47,045 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+02 3.050e+02 3.396e+02 3.920e+02 6.096e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 18:25:53,634 INFO [train.py:1031] (0/4) Epoch 14, batch 23250, loss[loss=0.2152, simple_loss=0.2607, pruned_loss=0.06366, ctc_loss=0.1058, over 16659.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2795, pruned_loss=0.06119, ctc_loss=0.1083, over 3285429.00 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:26:14,835 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-608000.pt 2023-10-09 18:26:20,911 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2837342.6666666665, ans=0.125 2023-10-09 18:26:30,340 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2837342.6666666665, ans=0.2 2023-10-09 18:26:39,318 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-10-09 18:26:54,487 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2837436.0, ans=0.1 2023-10-09 18:26:59,152 INFO [train.py:1031] (0/4) Epoch 14, batch 23300, loss[loss=0.1849, simple_loss=0.255, pruned_loss=0.04263, ctc_loss=0.07363, over 16787.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.276, pruned_loss=0.06085, ctc_loss=0.1077, over 3291861.82 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:27:06,699 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=22.5 2023-10-09 18:27:31,223 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2837576.0, ans=0.1 2023-10-09 18:27:32,268 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2837576.0, ans=0.0 2023-10-09 18:27:33,528 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=22.5 2023-10-09 18:27:36,539 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2837622.6666666665, ans=0.0 2023-10-09 18:27:40,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2837622.6666666665, ans=0.125 2023-10-09 18:27:47,141 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2837622.6666666665, ans=0.125 2023-10-09 18:27:57,269 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.134e+02 3.806e+02 4.608e+02 8.711e+02, threshold=7.613e+02, percent-clipped=4.0 2023-10-09 18:28:01,937 INFO [train.py:1031] (0/4) Epoch 14, batch 23350, loss[loss=0.2552, simple_loss=0.2739, pruned_loss=0.08753, ctc_loss=0.1538, over 16567.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2748, pruned_loss=0.06054, ctc_loss=0.1071, over 3289056.90 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:28:28,502 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2837809.3333333335, ans=0.0 2023-10-09 18:28:40,939 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2837856.0, ans=0.2 2023-10-09 18:28:55,566 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2837902.6666666665, ans=0.2 2023-10-09 18:28:57,581 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.20 vs. limit=22.5 2023-10-09 18:29:02,957 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2837949.3333333335, ans=0.125 2023-10-09 18:29:03,743 INFO [train.py:1031] (0/4) Epoch 14, batch 23400, loss[loss=0.2053, simple_loss=0.2559, pruned_loss=0.05776, ctc_loss=0.09789, over 16895.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2713, pruned_loss=0.06079, ctc_loss=0.1073, over 3286942.40 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:29:12,918 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2837949.3333333335, ans=0.125 2023-10-09 18:29:32,324 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-10-09 18:29:57,951 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2838136.0, ans=0.125 2023-10-09 18:30:00,414 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 3.093e+02 3.637e+02 4.189e+02 1.057e+03, threshold=7.274e+02, percent-clipped=1.0 2023-10-09 18:30:00,820 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2838136.0, ans=0.125 2023-10-09 18:30:04,495 INFO [train.py:1031] (0/4) Epoch 14, batch 23450, loss[loss=0.204, simple_loss=0.2604, pruned_loss=0.05533, ctc_loss=0.09215, over 16901.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2677, pruned_loss=0.06043, ctc_loss=0.1066, over 3291270.05 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:30:06,349 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2838182.6666666665, ans=0.0 2023-10-09 18:30:19,842 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2838229.3333333335, ans=0.0 2023-10-09 18:30:30,655 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-10-09 18:30:31,871 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-10-09 18:30:34,559 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-10-09 18:30:34,617 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=22.5 2023-10-09 18:30:44,994 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2838322.6666666665, ans=0.0 2023-10-09 18:30:51,099 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2838322.6666666665, ans=0.1 2023-10-09 18:31:03,218 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2838369.3333333335, ans=0.0 2023-10-09 18:31:04,183 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2838369.3333333335, ans=0.04949747468305833 2023-10-09 18:31:06,585 INFO [train.py:1031] (0/4) Epoch 14, batch 23500, loss[loss=0.256, simple_loss=0.2889, pruned_loss=0.0813, ctc_loss=0.1514, over 16580.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2674, pruned_loss=0.06153, ctc_loss=0.1085, over 3295344.26 frames. ], batch size: 418, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:31:08,983 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2838416.0, ans=0.2 2023-10-09 18:31:11,658 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2838416.0, ans=0.125 2023-10-09 18:31:21,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2838462.6666666665, ans=0.125 2023-10-09 18:32:05,664 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+02 3.415e+02 3.721e+02 4.306e+02 1.300e+03, threshold=7.442e+02, percent-clipped=1.0 2023-10-09 18:32:08,406 INFO [train.py:1031] (0/4) Epoch 14, batch 23550, loss[loss=0.2627, simple_loss=0.3064, pruned_loss=0.0806, ctc_loss=0.1446, over 16669.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2739, pruned_loss=0.06474, ctc_loss=0.1138, over 3299065.98 frames. ], batch size: 271, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:32:16,975 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2838649.3333333335, ans=0.125 2023-10-09 18:32:23,181 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2838696.0, ans=0.1 2023-10-09 18:32:25,307 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=2838696.0, ans=0.2 2023-10-09 18:32:25,484 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=2838696.0, ans=6.0 2023-10-09 18:32:30,576 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2838696.0, ans=0.0 2023-10-09 18:33:08,874 INFO [train.py:1031] (0/4) Epoch 14, batch 23600, loss[loss=0.2045, simple_loss=0.2601, pruned_loss=0.05548, ctc_loss=0.09481, over 16509.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.271, pruned_loss=0.06419, ctc_loss=0.1127, over 3288952.37 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:33:10,298 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2838882.6666666665, ans=0.0 2023-10-09 18:33:19,925 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2838882.6666666665, ans=0.125 2023-10-09 18:33:19,950 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2838882.6666666665, ans=0.0 2023-10-09 18:33:21,144 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2838929.3333333335, ans=0.2 2023-10-09 18:33:32,198 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2023-10-09 18:33:38,947 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2838976.0, ans=0.125 2023-10-09 18:33:45,977 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2839022.6666666665, ans=0.125 2023-10-09 18:34:09,458 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 2.978e+02 3.333e+02 3.973e+02 8.640e+02, threshold=6.667e+02, percent-clipped=1.0 2023-10-09 18:34:10,552 INFO [train.py:1031] (0/4) Epoch 14, batch 23650, loss[loss=0.2562, simple_loss=0.3179, pruned_loss=0.07219, ctc_loss=0.1251, over 16859.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2728, pruned_loss=0.06326, ctc_loss=0.111, over 3286327.64 frames. ], batch size: 242, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:34:10,970 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2839116.0, ans=0.1 2023-10-09 18:34:14,029 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2839116.0, ans=0.1 2023-10-09 18:34:16,946 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-10-09 18:34:29,694 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2839162.6666666665, ans=0.125 2023-10-09 18:34:36,543 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2839209.3333333335, ans=0.0 2023-10-09 18:34:48,810 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2839256.0, ans=0.125 2023-10-09 18:34:52,494 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2839256.0, ans=0.2 2023-10-09 18:34:57,278 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2839256.0, ans=0.0 2023-10-09 18:35:04,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2839302.6666666665, ans=0.0 2023-10-09 18:35:11,977 INFO [train.py:1031] (0/4) Epoch 14, batch 23700, loss[loss=0.1938, simple_loss=0.2602, pruned_loss=0.04747, ctc_loss=0.08093, over 16860.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2733, pruned_loss=0.0582, ctc_loss=0.1027, over 3291712.11 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:35:21,712 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=22.5 2023-10-09 18:35:27,182 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2839396.0, ans=15.0 2023-10-09 18:35:31,167 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2839396.0, ans=0.2 2023-10-09 18:35:36,002 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:35:39,689 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2839442.6666666665, ans=0.025 2023-10-09 18:35:52,628 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-10-09 18:35:53,310 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2839489.3333333335, ans=0.015 2023-10-09 18:36:11,208 INFO [train.py:1031] (0/4) Epoch 14, batch 23750, loss[loss=0.2381, simple_loss=0.3134, pruned_loss=0.05936, ctc_loss=0.1101, over 16954.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2772, pruned_loss=0.05798, ctc_loss=0.1031, over 3290325.62 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 18:36:12,959 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.785e+02 3.356e+02 4.379e+02 6.615e+02, threshold=6.712e+02, percent-clipped=0.0 2023-10-09 18:36:13,281 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2839582.6666666665, ans=0.125 2023-10-09 18:36:15,389 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2839582.6666666665, ans=0.125 2023-10-09 18:36:18,751 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-10-09 18:36:28,572 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2839629.3333333335, ans=0.125 2023-10-09 18:36:41,099 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2023-10-09 18:37:00,764 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2839769.3333333335, ans=0.125 2023-10-09 18:37:11,737 INFO [train.py:1031] (0/4) Epoch 14, batch 23800, loss[loss=0.2097, simple_loss=0.2846, pruned_loss=0.04877, ctc_loss=0.0932, over 16866.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.277, pruned_loss=0.05568, ctc_loss=0.09961, over 3294478.61 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:37:16,229 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2023-10-09 18:37:23,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2839862.6666666665, ans=0.015 2023-10-09 18:37:30,305 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2839862.6666666665, ans=0.125 2023-10-09 18:37:32,747 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2023-10-09 18:37:47,014 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2839909.3333333335, ans=0.125 2023-10-09 18:37:53,973 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2839956.0, ans=0.1 2023-10-09 18:37:57,611 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2839956.0, ans=0.0 2023-10-09 18:38:02,693 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2840002.6666666665, ans=0.0 2023-10-09 18:38:03,655 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2840002.6666666665, ans=0.125 2023-10-09 18:38:12,935 INFO [train.py:1031] (0/4) Epoch 14, batch 23850, loss[loss=0.2547, simple_loss=0.3298, pruned_loss=0.0656, ctc_loss=0.1211, over 16899.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2843, pruned_loss=0.05649, ctc_loss=0.1013, over 3288557.00 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:38:14,610 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 3.207e+02 4.081e+02 4.991e+02 8.849e+02, threshold=8.163e+02, percent-clipped=8.0 2023-10-09 18:38:14,970 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2840049.3333333335, ans=0.1 2023-10-09 18:38:40,106 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2840142.6666666665, ans=0.0 2023-10-09 18:38:52,439 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2840189.3333333335, ans=0.125 2023-10-09 18:39:01,638 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2840236.0, ans=0.125 2023-10-09 18:39:13,692 INFO [train.py:1031] (0/4) Epoch 14, batch 23900, loss[loss=0.2262, simple_loss=0.2613, pruned_loss=0.06902, ctc_loss=0.1327, over 15516.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2859, pruned_loss=0.05862, ctc_loss=0.1046, over 3286147.45 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:39:22,431 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=22.5 2023-10-09 18:39:31,248 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2840329.3333333335, ans=0.2 2023-10-09 18:39:32,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2840329.3333333335, ans=0.07 2023-10-09 18:39:43,885 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2840376.0, ans=0.125 2023-10-09 18:39:48,727 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2840376.0, ans=0.0 2023-10-09 18:40:00,172 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2840422.6666666665, ans=0.0 2023-10-09 18:40:11,455 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2840469.3333333335, ans=0.95 2023-10-09 18:40:15,805 INFO [train.py:1031] (0/4) Epoch 14, batch 23950, loss[loss=0.2189, simple_loss=0.2706, pruned_loss=0.06008, ctc_loss=0.1178, over 16802.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.284, pruned_loss=0.06077, ctc_loss=0.1082, over 3301334.48 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:40:16,838 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+02 3.283e+02 3.829e+02 4.670e+02 8.731e+02, threshold=7.659e+02, percent-clipped=1.0 2023-10-09 18:40:32,723 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=22.5 2023-10-09 18:40:45,029 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2840609.3333333335, ans=0.0 2023-10-09 18:41:01,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2840656.0, ans=0.0 2023-10-09 18:41:12,405 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2840702.6666666665, ans=0.0 2023-10-09 18:41:15,727 INFO [train.py:1031] (0/4) Epoch 14, batch 24000, loss[loss=0.2039, simple_loss=0.267, pruned_loss=0.05103, ctc_loss=0.09684, over 16954.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2821, pruned_loss=0.06163, ctc_loss=0.109, over 3290635.34 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:41:15,728 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 18:41:26,894 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0557, 3.0442, 3.1216, 3.2066], device='cuda:0') 2023-10-09 18:41:29,844 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4517, 4.6374, 3.7507, 4.2628], device='cuda:0') 2023-10-09 18:41:33,418 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2354, simple_loss=0.3014, pruned_loss=0.06541, ctc_loss=0.09632, over 1796401.00 frames. 2023-10-09 18:41:33,419 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 18:41:37,552 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2840749.3333333335, ans=0.2 2023-10-09 18:41:39,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2840749.3333333335, ans=0.0 2023-10-09 18:42:30,530 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2023-10-09 18:42:35,974 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.24 vs. limit=15.0 2023-10-09 18:42:36,288 INFO [train.py:1031] (0/4) Epoch 14, batch 24050, loss[loss=0.2245, simple_loss=0.277, pruned_loss=0.06222, ctc_loss=0.119, over 15269.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2862, pruned_loss=0.06218, ctc_loss=0.11, over 3291178.49 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:42:40,008 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.406e+02 3.196e+02 3.829e+02 4.589e+02 8.519e+02, threshold=7.658e+02, percent-clipped=2.0 2023-10-09 18:43:14,885 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841122.6666666665, ans=0.1 2023-10-09 18:43:24,182 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2841122.6666666665, ans=0.025 2023-10-09 18:43:24,209 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2841122.6666666665, ans=0.125 2023-10-09 18:43:31,236 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2841169.3333333335, ans=0.125 2023-10-09 18:43:37,901 INFO [train.py:1031] (0/4) Epoch 14, batch 24100, loss[loss=0.1962, simple_loss=0.2696, pruned_loss=0.04415, ctc_loss=0.0862, over 16816.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2896, pruned_loss=0.06448, ctc_loss=0.1137, over 3294180.18 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:43:39,972 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2841216.0, ans=0.125 2023-10-09 18:43:40,406 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-10-09 18:43:43,177 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2841216.0, ans=0.0 2023-10-09 18:43:46,210 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=22.5 2023-10-09 18:44:00,315 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2841262.6666666665, ans=0.035 2023-10-09 18:44:17,397 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2841356.0, ans=0.125 2023-10-09 18:44:21,172 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2841356.0, ans=10.0 2023-10-09 18:44:30,938 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=12.0 2023-10-09 18:44:33,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2841402.6666666665, ans=0.0 2023-10-09 18:44:38,669 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841449.3333333335, ans=0.1 2023-10-09 18:44:39,359 INFO [train.py:1031] (0/4) Epoch 14, batch 24150, loss[loss=0.2077, simple_loss=0.2506, pruned_loss=0.06154, ctc_loss=0.1044, over 16550.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2818, pruned_loss=0.06164, ctc_loss=0.1088, over 3296073.13 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:44:39,791 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2841449.3333333335, ans=0.0 2023-10-09 18:44:43,200 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.029e+02 3.494e+02 3.950e+02 7.485e+02, threshold=6.988e+02, percent-clipped=0.0 2023-10-09 18:44:57,403 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2841496.0, ans=0.0 2023-10-09 18:45:19,897 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2841589.3333333335, ans=0.2 2023-10-09 18:45:24,021 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2841589.3333333335, ans=0.125 2023-10-09 18:45:42,160 INFO [train.py:1031] (0/4) Epoch 14, batch 24200, loss[loss=0.2017, simple_loss=0.2695, pruned_loss=0.04884, ctc_loss=0.09039, over 16776.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2787, pruned_loss=0.05817, ctc_loss=0.1035, over 3296010.94 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:45:44,716 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2841682.6666666665, ans=0.05 2023-10-09 18:45:47,248 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-10-09 18:45:52,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2841729.3333333335, ans=0.2 2023-10-09 18:46:15,347 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841776.0, ans=0.1 2023-10-09 18:46:31,831 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2841869.3333333335, ans=22.5 2023-10-09 18:46:34,507 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2841869.3333333335, ans=0.125 2023-10-09 18:46:43,490 INFO [train.py:1031] (0/4) Epoch 14, batch 24250, loss[loss=0.2431, simple_loss=0.2858, pruned_loss=0.07546, ctc_loss=0.1239, over 16648.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2758, pruned_loss=0.05735, ctc_loss=0.1019, over 3292414.43 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:46:48,042 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2841916.0, ans=0.1 2023-10-09 18:46:49,469 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.981e+02 3.499e+02 4.269e+02 8.354e+02, threshold=6.999e+02, percent-clipped=3.0 2023-10-09 18:47:05,443 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:47:06,930 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=22.5 2023-10-09 18:47:10,232 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2842009.3333333335, ans=0.1 2023-10-09 18:47:23,456 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2842056.0, ans=0.1 2023-10-09 18:47:46,789 INFO [train.py:1031] (0/4) Epoch 14, batch 24300, loss[loss=0.2536, simple_loss=0.319, pruned_loss=0.0701, ctc_loss=0.1199, over 16813.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2825, pruned_loss=0.06116, ctc_loss=0.1082, over 3297405.34 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:48:06,247 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2842196.0, ans=0.125 2023-10-09 18:48:42,731 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2842336.0, ans=0.125 2023-10-09 18:48:42,770 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2842336.0, ans=0.2 2023-10-09 18:48:48,965 INFO [train.py:1031] (0/4) Epoch 14, batch 24350, loss[loss=0.2159, simple_loss=0.2762, pruned_loss=0.05746, ctc_loss=0.102, over 16933.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2845, pruned_loss=0.06155, ctc_loss=0.1089, over 3299488.40 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:48:55,800 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+02 3.451e+02 4.035e+02 4.756e+02 1.145e+03, threshold=8.070e+02, percent-clipped=2.0 2023-10-09 18:49:00,011 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-10-09 18:49:09,117 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2842429.3333333335, ans=0.125 2023-10-09 18:49:09,128 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2842429.3333333335, ans=0.05 2023-10-09 18:49:14,990 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2842476.0, ans=0.125 2023-10-09 18:49:18,988 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2023-10-09 18:49:24,492 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2842522.6666666665, ans=0.125 2023-10-09 18:49:49,954 INFO [train.py:1031] (0/4) Epoch 14, batch 24400, loss[loss=0.199, simple_loss=0.2499, pruned_loss=0.05583, ctc_loss=0.09123, over 16738.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2838, pruned_loss=0.06242, ctc_loss=0.1101, over 3298119.00 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:49:50,251 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2842616.0, ans=0.125 2023-10-09 18:49:50,299 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2842616.0, ans=0.125 2023-10-09 18:49:55,527 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2842616.0, ans=0.125 2023-10-09 18:49:55,564 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:50:27,718 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2842756.0, ans=0.0 2023-10-09 18:50:32,487 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2842756.0, ans=0.0 2023-10-09 18:50:47,368 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2023-10-09 18:50:50,579 INFO [train.py:1031] (0/4) Epoch 14, batch 24450, loss[loss=0.2105, simple_loss=0.2658, pruned_loss=0.05705, ctc_loss=0.1028, over 16883.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2835, pruned_loss=0.06322, ctc_loss=0.1114, over 3306228.56 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:50:57,484 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.474e+02 3.798e+02 4.507e+02 6.680e+02, threshold=7.596e+02, percent-clipped=0.0 2023-10-09 18:51:04,554 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-10-09 18:51:09,038 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2842896.0, ans=0.0 2023-10-09 18:51:12,442 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2842896.0, ans=0.0 2023-10-09 18:51:16,137 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2842942.6666666665, ans=0.125 2023-10-09 18:51:16,648 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-10-09 18:51:34,780 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2842989.3333333335, ans=0.125 2023-10-09 18:51:51,709 INFO [train.py:1031] (0/4) Epoch 14, batch 24500, loss[loss=0.2852, simple_loss=0.3326, pruned_loss=0.08819, ctc_loss=0.1534, over 16700.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2818, pruned_loss=0.06249, ctc_loss=0.1091, over 3302468.00 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:51:57,882 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2843082.6666666665, ans=0.0 2023-10-09 18:52:05,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2843129.3333333335, ans=0.0 2023-10-09 18:52:17,571 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2843176.0, ans=0.0 2023-10-09 18:52:20,237 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2843176.0, ans=0.1 2023-10-09 18:52:46,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2843269.3333333335, ans=0.125 2023-10-09 18:52:54,587 INFO [train.py:1031] (0/4) Epoch 14, batch 24550, loss[loss=0.1843, simple_loss=0.254, pruned_loss=0.04232, ctc_loss=0.07475, over 16767.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2782, pruned_loss=0.06069, ctc_loss=0.1052, over 3301928.41 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:53:03,099 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+02 3.410e+02 4.193e+02 5.169e+02 8.028e+02, threshold=8.385e+02, percent-clipped=3.0 2023-10-09 18:53:08,691 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:53:27,607 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-10-09 18:53:27,629 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-10-09 18:53:49,797 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-10-09 18:53:57,854 INFO [train.py:1031] (0/4) Epoch 14, batch 24600, loss[loss=0.2414, simple_loss=0.3103, pruned_loss=0.06359, ctc_loss=0.1132, over 16830.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2829, pruned_loss=0.06037, ctc_loss=0.1053, over 3288800.81 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:54:10,488 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2843596.0, ans=0.09899494936611666 2023-10-09 18:54:10,799 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-10-09 18:54:36,750 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2843689.3333333335, ans=0.07 2023-10-09 18:54:48,395 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2843736.0, ans=0.125 2023-10-09 18:55:02,708 INFO [train.py:1031] (0/4) Epoch 14, batch 24650, loss[loss=0.2603, simple_loss=0.3289, pruned_loss=0.07107, ctc_loss=0.1238, over 16923.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2889, pruned_loss=0.06287, ctc_loss=0.1099, over 3294263.89 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:55:03,081 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2843782.6666666665, ans=0.125 2023-10-09 18:55:03,101 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2843782.6666666665, ans=0.1 2023-10-09 18:55:08,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2843782.6666666665, ans=0.125 2023-10-09 18:55:08,378 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2843782.6666666665, ans=0.2 2023-10-09 18:55:13,665 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.365e+02 3.995e+02 4.722e+02 9.808e+02, threshold=7.989e+02, percent-clipped=0.0 2023-10-09 18:55:14,709 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2843829.3333333335, ans=0.125 2023-10-09 18:55:27,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2843876.0, ans=10.0 2023-10-09 18:55:45,122 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2843922.6666666665, ans=0.125 2023-10-09 18:55:45,198 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2843922.6666666665, ans=0.07 2023-10-09 18:55:54,448 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=8.0 2023-10-09 18:56:06,106 INFO [train.py:1031] (0/4) Epoch 14, batch 24700, loss[loss=0.2593, simple_loss=0.3194, pruned_loss=0.07397, ctc_loss=0.128, over 16905.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2972, pruned_loss=0.06476, ctc_loss=0.1131, over 3298281.09 frames. ], batch size: 243, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:56:11,237 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-10-09 18:56:15,866 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2844016.0, ans=0.125 2023-10-09 18:56:21,529 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2844062.6666666665, ans=0.125 2023-10-09 18:56:43,085 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2844109.3333333335, ans=0.125 2023-10-09 18:56:45,393 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2844156.0, ans=0.125 2023-10-09 18:56:47,187 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2844156.0, ans=0.0 2023-10-09 18:56:51,902 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2844156.0, ans=0.125 2023-10-09 18:56:52,174 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=22.5 2023-10-09 18:56:58,778 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-10-09 18:57:08,066 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2844202.6666666665, ans=0.1 2023-10-09 18:57:10,471 INFO [train.py:1031] (0/4) Epoch 14, batch 24750, loss[loss=0.2823, simple_loss=0.3338, pruned_loss=0.08567, ctc_loss=0.1483, over 16847.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.3022, pruned_loss=0.06825, ctc_loss=0.1191, over 3305967.08 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:57:14,831 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2844249.3333333335, ans=0.1 2023-10-09 18:57:23,591 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.844e+02 3.622e+02 4.141e+02 4.992e+02 1.091e+03, threshold=8.281e+02, percent-clipped=4.0 2023-10-09 18:57:24,296 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=22.5 2023-10-09 18:57:26,759 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:57:31,938 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2844296.0, ans=0.0 2023-10-09 18:58:01,931 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:58:07,621 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2844436.0, ans=0.125 2023-10-09 18:58:12,350 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2023-10-09 18:58:17,283 INFO [train.py:1031] (0/4) Epoch 14, batch 24800, loss[loss=0.228, simple_loss=0.3271, pruned_loss=0.04724, ctc_loss=0.08622, over 15069.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.3013, pruned_loss=0.06758, ctc_loss=0.1175, over 3294001.17 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:58:19,873 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2844482.6666666665, ans=0.0 2023-10-09 18:59:03,226 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2844622.6666666665, ans=0.125 2023-10-09 18:59:03,555 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.35 vs. limit=10.0 2023-10-09 18:59:06,563 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2844669.3333333335, ans=0.0 2023-10-09 18:59:20,852 INFO [train.py:1031] (0/4) Epoch 14, batch 24850, loss[loss=0.2522, simple_loss=0.2956, pruned_loss=0.07893, ctc_loss=0.1274, over 16748.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.3012, pruned_loss=0.06848, ctc_loss=0.1184, over 3289562.35 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 18:59:21,157 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2844716.0, ans=0.125 2023-10-09 18:59:22,391 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2844716.0, ans=0.0 2023-10-09 18:59:31,207 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2023-10-09 18:59:35,027 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.266e+02 3.931e+02 4.617e+02 8.041e+02, threshold=7.862e+02, percent-clipped=0.0 2023-10-09 18:59:41,577 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2023-10-09 18:59:51,222 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2844809.3333333335, ans=0.2 2023-10-09 19:00:04,446 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-10-09 19:00:04,540 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-10-09 19:00:12,965 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2844902.6666666665, ans=0.0 2023-10-09 19:00:19,623 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2844902.6666666665, ans=0.0 2023-10-09 19:00:27,329 INFO [train.py:1031] (0/4) Epoch 14, batch 24900, loss[loss=0.2478, simple_loss=0.3592, pruned_loss=0.05022, ctc_loss=0.09006, over 15081.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.304, pruned_loss=0.06969, ctc_loss=0.1204, over 3294682.45 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:00:33,421 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=22.5 2023-10-09 19:01:30,778 INFO [train.py:1031] (0/4) Epoch 14, batch 24950, loss[loss=0.2031, simple_loss=0.2539, pruned_loss=0.05735, ctc_loss=0.09397, over 16750.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.3076, pruned_loss=0.06829, ctc_loss=0.1184, over 3290500.17 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:01:38,015 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2845182.6666666665, ans=0.0 2023-10-09 19:01:46,518 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.532e+02 4.120e+02 4.965e+02 9.701e+02, threshold=8.240e+02, percent-clipped=4.0 2023-10-09 19:01:51,375 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2845229.3333333335, ans=0.125 2023-10-09 19:02:32,820 INFO [train.py:1031] (0/4) Epoch 14, batch 25000, loss[loss=0.2363, simple_loss=0.2913, pruned_loss=0.06782, ctc_loss=0.114, over 16872.00 frames. ], tot_loss[loss=0.2434, simple_loss=0.303, pruned_loss=0.06828, ctc_loss=0.1183, over 3286381.74 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:02:34,079 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2845416.0, ans=0.05 2023-10-09 19:02:47,051 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=12.0 2023-10-09 19:02:47,804 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2845462.6666666665, ans=0.125 2023-10-09 19:02:50,476 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2845462.6666666665, ans=0.125 2023-10-09 19:02:50,560 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2845462.6666666665, ans=0.1 2023-10-09 19:03:26,783 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2845602.6666666665, ans=0.5 2023-10-09 19:03:33,087 INFO [train.py:1031] (0/4) Epoch 14, batch 25050, loss[loss=0.2414, simple_loss=0.2947, pruned_loss=0.06936, ctc_loss=0.1235, over 16858.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.2977, pruned_loss=0.06788, ctc_loss=0.1177, over 3300239.50 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:03:36,600 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2845649.3333333335, ans=0.125 2023-10-09 19:03:36,896 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2023-10-09 19:03:50,005 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+02 3.393e+02 3.859e+02 4.552e+02 1.527e+03, threshold=7.717e+02, percent-clipped=2.0 2023-10-09 19:04:27,790 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2845836.0, ans=0.2 2023-10-09 19:04:34,838 INFO [train.py:1031] (0/4) Epoch 14, batch 25100, loss[loss=0.1883, simple_loss=0.244, pruned_loss=0.04906, ctc_loss=0.08616, over 16693.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2939, pruned_loss=0.06598, ctc_loss=0.1152, over 3305752.72 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:04:37,129 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:04:43,520 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2845882.6666666665, ans=0.2 2023-10-09 19:04:51,188 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2845929.3333333335, ans=0.125 2023-10-09 19:05:10,993 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2846022.6666666665, ans=0.125 2023-10-09 19:05:13,153 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:05:29,199 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2846069.3333333335, ans=0.125 2023-10-09 19:05:36,319 INFO [train.py:1031] (0/4) Epoch 14, batch 25150, loss[loss=0.2065, simple_loss=0.2632, pruned_loss=0.05553, ctc_loss=0.09661, over 16734.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2867, pruned_loss=0.06411, ctc_loss=0.1123, over 3311231.64 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:05:52,020 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 2.975e+02 3.474e+02 4.105e+02 7.010e+02, threshold=6.948e+02, percent-clipped=0.0 2023-10-09 19:06:15,777 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2846256.0, ans=0.125 2023-10-09 19:06:34,253 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.01 vs. limit=5.0 2023-10-09 19:06:34,851 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2846349.3333333335, ans=0.2 2023-10-09 19:06:36,059 INFO [train.py:1031] (0/4) Epoch 14, batch 25200, loss[loss=0.2396, simple_loss=0.2921, pruned_loss=0.06932, ctc_loss=0.121, over 16890.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2829, pruned_loss=0.06408, ctc_loss=0.1121, over 3317691.38 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:06:42,571 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2023-10-09 19:06:42,596 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2846349.3333333335, ans=6.0 2023-10-09 19:06:50,205 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2846396.0, ans=0.0 2023-10-09 19:06:54,640 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2846396.0, ans=0.2 2023-10-09 19:06:55,679 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2846396.0, ans=0.125 2023-10-09 19:07:27,734 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2846536.0, ans=0.125 2023-10-09 19:07:29,214 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-10-09 19:07:35,930 INFO [train.py:1031] (0/4) Epoch 14, batch 25250, loss[loss=0.2044, simple_loss=0.2352, pruned_loss=0.06371, ctc_loss=0.1155, over 15412.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.281, pruned_loss=0.06497, ctc_loss=0.1136, over 3316960.01 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:07:37,409 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2846582.6666666665, ans=0.125 2023-10-09 19:07:52,452 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2846629.3333333335, ans=0.2 2023-10-09 19:07:55,975 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2846629.3333333335, ans=0.1 2023-10-09 19:07:56,503 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+02 3.269e+02 3.734e+02 4.463e+02 8.122e+02, threshold=7.469e+02, percent-clipped=1.0 2023-10-09 19:08:12,503 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=2846676.0, ans=0.02 2023-10-09 19:08:21,608 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2846722.6666666665, ans=0.1 2023-10-09 19:08:23,113 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:08:39,563 INFO [train.py:1031] (0/4) Epoch 14, batch 25300, loss[loss=0.328, simple_loss=0.3836, pruned_loss=0.09911, ctc_loss=0.1856, over 16611.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2864, pruned_loss=0.06686, ctc_loss=0.1172, over 3321056.27 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:09:07,059 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2846909.3333333335, ans=0.0 2023-10-09 19:09:09,192 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2846909.3333333335, ans=0.0 2023-10-09 19:09:12,916 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2846909.3333333335, ans=0.0 2023-10-09 19:09:15,869 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2846956.0, ans=0.125 2023-10-09 19:09:18,568 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2846956.0, ans=0.025 2023-10-09 19:09:27,691 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2847002.6666666665, ans=0.1 2023-10-09 19:09:35,779 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2847002.6666666665, ans=0.125 2023-10-09 19:09:38,600 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-10-09 19:09:40,558 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2847049.3333333335, ans=0.125 2023-10-09 19:09:41,238 INFO [train.py:1031] (0/4) Epoch 14, batch 25350, loss[loss=0.2791, simple_loss=0.3154, pruned_loss=0.09159, ctc_loss=0.1491, over 16891.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.2933, pruned_loss=0.06784, ctc_loss=0.1195, over 3323931.31 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:10:01,553 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.479e+02 4.151e+02 5.048e+02 8.470e+02, threshold=8.302e+02, percent-clipped=4.0 2023-10-09 19:10:12,052 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2847142.6666666665, ans=0.125 2023-10-09 19:10:12,295 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=12.0 2023-10-09 19:10:19,749 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2023-10-09 19:10:29,072 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=22.5 2023-10-09 19:10:30,244 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-10-09 19:10:38,220 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2847236.0, ans=0.2 2023-10-09 19:10:41,385 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-10-09 19:10:41,702 INFO [train.py:1031] (0/4) Epoch 14, batch 25400, loss[loss=0.2198, simple_loss=0.2425, pruned_loss=0.07185, ctc_loss=0.1334, over 15490.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2905, pruned_loss=0.0682, ctc_loss=0.1199, over 3321100.44 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:11:01,140 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2847329.3333333335, ans=0.1 2023-10-09 19:11:13,392 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2847376.0, ans=0.125 2023-10-09 19:11:25,289 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2847422.6666666665, ans=0.0 2023-10-09 19:11:26,539 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-10-09 19:11:40,830 INFO [train.py:1031] (0/4) Epoch 14, batch 25450, loss[loss=0.2295, simple_loss=0.2859, pruned_loss=0.0641, ctc_loss=0.1122, over 16859.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2882, pruned_loss=0.06839, ctc_loss=0.1198, over 3326185.77 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:11:45,949 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2847516.0, ans=0.09899494936611666 2023-10-09 19:12:01,164 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+02 3.142e+02 3.636e+02 4.300e+02 1.054e+03, threshold=7.273e+02, percent-clipped=3.0 2023-10-09 19:12:08,777 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2847609.3333333335, ans=0.125 2023-10-09 19:12:10,742 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=22.5 2023-10-09 19:12:15,334 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2847609.3333333335, ans=0.0 2023-10-09 19:12:41,752 INFO [train.py:1031] (0/4) Epoch 14, batch 25500, loss[loss=0.2344, simple_loss=0.2798, pruned_loss=0.06938, ctc_loss=0.1257, over 15202.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2837, pruned_loss=0.06647, ctc_loss=0.1165, over 3313041.46 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:12:51,910 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2847749.3333333335, ans=0.0 2023-10-09 19:13:16,478 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2847842.6666666665, ans=0.0 2023-10-09 19:13:17,536 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2847842.6666666665, ans=0.0 2023-10-09 19:13:19,769 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2847889.3333333335, ans=0.0 2023-10-09 19:13:44,904 INFO [train.py:1031] (0/4) Epoch 14, batch 25550, loss[loss=0.2346, simple_loss=0.2788, pruned_loss=0.06994, ctc_loss=0.1261, over 15294.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2873, pruned_loss=0.06839, ctc_loss=0.1197, over 3311870.80 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:13:50,378 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2847982.6666666665, ans=0.0 2023-10-09 19:14:07,199 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+02 3.271e+02 3.768e+02 4.486e+02 1.096e+03, threshold=7.537e+02, percent-clipped=1.0 2023-10-09 19:14:08,637 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848076.0, ans=0.1 2023-10-09 19:14:15,498 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-10-09 19:14:18,558 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2848076.0, ans=0.0 2023-10-09 19:14:28,948 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2848122.6666666665, ans=0.125 2023-10-09 19:14:45,705 INFO [train.py:1031] (0/4) Epoch 14, batch 25600, loss[loss=0.2738, simple_loss=0.3222, pruned_loss=0.08375, ctc_loss=0.1447, over 16624.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2907, pruned_loss=0.06997, ctc_loss=0.1228, over 3315477.42 frames. ], batch size: 271, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:15:03,185 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=22.5 2023-10-09 19:15:07,455 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2848262.6666666665, ans=0.125 2023-10-09 19:15:47,651 INFO [train.py:1031] (0/4) Epoch 14, batch 25650, loss[loss=0.3361, simple_loss=0.3757, pruned_loss=0.108, ctc_loss=0.2014, over 16821.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.2964, pruned_loss=0.07118, ctc_loss=0.1247, over 3317926.15 frames. ], batch size: 329, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:16:11,373 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+02 3.570e+02 3.954e+02 4.505e+02 1.083e+03, threshold=7.908e+02, percent-clipped=2.0 2023-10-09 19:16:31,319 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2848589.3333333335, ans=0.1 2023-10-09 19:16:31,358 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2848589.3333333335, ans=0.125 2023-10-09 19:16:35,852 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2848589.3333333335, ans=0.125 2023-10-09 19:16:36,880 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2848636.0, ans=0.0 2023-10-09 19:16:42,403 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2848636.0, ans=0.0 2023-10-09 19:16:50,497 INFO [train.py:1031] (0/4) Epoch 14, batch 25700, loss[loss=0.2492, simple_loss=0.3038, pruned_loss=0.0726, ctc_loss=0.1234, over 16864.00 frames. ], tot_loss[loss=0.251, simple_loss=0.3028, pruned_loss=0.07372, ctc_loss=0.1291, over 3319555.41 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:17:03,732 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2848729.3333333335, ans=0.035 2023-10-09 19:17:29,051 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2848822.6666666665, ans=0.125 2023-10-09 19:17:48,359 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=22.5 2023-10-09 19:17:51,124 INFO [train.py:1031] (0/4) Epoch 14, batch 25750, loss[loss=0.1923, simple_loss=0.2448, pruned_loss=0.0528, ctc_loss=0.08549, over 11430.00 frames. ], tot_loss[loss=0.2513, simple_loss=0.3033, pruned_loss=0.07377, ctc_loss=0.1293, over 3311402.84 frames. ], batch size: 38, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:17:53,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2848916.0, ans=0.0 2023-10-09 19:17:57,391 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848916.0, ans=0.1 2023-10-09 19:18:01,611 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2848916.0, ans=0.125 2023-10-09 19:18:17,301 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+02 3.581e+02 3.886e+02 4.426e+02 7.686e+02, threshold=7.772e+02, percent-clipped=0.0 2023-10-09 19:18:38,814 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2849056.0, ans=0.0 2023-10-09 19:18:56,346 INFO [train.py:1031] (0/4) Epoch 14, batch 25800, loss[loss=0.2189, simple_loss=0.2635, pruned_loss=0.06552, ctc_loss=0.108, over 16701.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2984, pruned_loss=0.0686, ctc_loss=0.1207, over 3301107.31 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:19:02,655 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2849149.3333333335, ans=0.07 2023-10-09 19:19:22,324 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-10-09 19:19:36,107 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849289.3333333335, ans=0.125 2023-10-09 19:19:59,395 INFO [train.py:1031] (0/4) Epoch 14, batch 25850, loss[loss=0.1761, simple_loss=0.231, pruned_loss=0.0454, ctc_loss=0.07595, over 16847.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2928, pruned_loss=0.06625, ctc_loss=0.1162, over 3294614.48 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:20:15,549 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2849429.3333333335, ans=0.0 2023-10-09 19:20:24,802 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.590e+02 3.413e+02 3.966e+02 4.957e+02 9.645e+02, threshold=7.933e+02, percent-clipped=3.0 2023-10-09 19:20:38,873 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2849522.6666666665, ans=0.1 2023-10-09 19:20:40,252 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.07 vs. limit=10.0 2023-10-09 19:20:42,594 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2849522.6666666665, ans=10.0 2023-10-09 19:21:00,847 INFO [train.py:1031] (0/4) Epoch 14, batch 25900, loss[loss=0.2393, simple_loss=0.3291, pruned_loss=0.05591, ctc_loss=0.09421, over 15161.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2931, pruned_loss=0.06657, ctc_loss=0.1157, over 3290887.46 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:21:07,127 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2849616.0, ans=0.125 2023-10-09 19:21:22,872 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2849662.6666666665, ans=0.125 2023-10-09 19:21:24,040 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2849709.3333333335, ans=0.0 2023-10-09 19:21:26,186 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2849709.3333333335, ans=0.2 2023-10-09 19:21:27,853 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2849709.3333333335, ans=0.125 2023-10-09 19:21:41,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2849756.0, ans=10.0 2023-10-09 19:21:44,522 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-10-09 19:22:01,809 INFO [train.py:1031] (0/4) Epoch 14, batch 25950, loss[loss=0.2167, simple_loss=0.2658, pruned_loss=0.06296, ctc_loss=0.1045, over 16735.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2883, pruned_loss=0.06261, ctc_loss=0.1091, over 3298628.07 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:22:03,248 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2849849.3333333335, ans=0.1 2023-10-09 19:22:08,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2849849.3333333335, ans=0.0 2023-10-09 19:22:28,772 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.824e+02 3.535e+02 4.166e+02 1.027e+03, threshold=7.071e+02, percent-clipped=2.0 2023-10-09 19:22:41,841 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=8.0 2023-10-09 19:22:42,392 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2849989.3333333335, ans=0.0 2023-10-09 19:22:43,513 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:22:51,460 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2850036.0, ans=0.05 2023-10-09 19:23:02,804 INFO [train.py:1031] (0/4) Epoch 14, batch 26000, loss[loss=0.2113, simple_loss=0.2649, pruned_loss=0.059, ctc_loss=0.09922, over 16687.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2857, pruned_loss=0.06354, ctc_loss=0.1103, over 3293056.31 frames. ], batch size: 111, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:23:17,565 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2850129.3333333335, ans=0.1 2023-10-09 19:23:20,716 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=22.5 2023-10-09 19:23:38,450 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2850176.0, ans=0.125 2023-10-09 19:23:42,949 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-10-09 19:23:53,196 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2850269.3333333335, ans=0.125 2023-10-09 19:23:53,243 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2850269.3333333335, ans=0.125 2023-10-09 19:24:04,541 INFO [train.py:1031] (0/4) Epoch 14, batch 26050, loss[loss=0.2302, simple_loss=0.3071, pruned_loss=0.05835, ctc_loss=0.09157, over 16703.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2855, pruned_loss=0.06199, ctc_loss=0.1078, over 3303260.14 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:24:07,084 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2850316.0, ans=0.1 2023-10-09 19:24:15,578 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2850362.6666666665, ans=0.125 2023-10-09 19:24:15,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2850362.6666666665, ans=0.125 2023-10-09 19:24:19,268 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2850362.6666666665, ans=0.125 2023-10-09 19:24:31,163 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.011e+02 3.542e+02 4.270e+02 6.836e+02, threshold=7.085e+02, percent-clipped=0.0 2023-10-09 19:24:34,058 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-10-09 19:24:38,062 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2850409.3333333335, ans=0.2 2023-10-09 19:24:38,189 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=22.5 2023-10-09 19:24:39,122 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2850456.0, ans=0.1 2023-10-09 19:24:46,618 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2850456.0, ans=0.125 2023-10-09 19:24:46,818 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-10-09 19:25:04,427 INFO [train.py:1031] (0/4) Epoch 14, batch 26100, loss[loss=0.2204, simple_loss=0.2897, pruned_loss=0.05667, ctc_loss=0.09414, over 16949.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2878, pruned_loss=0.06143, ctc_loss=0.1057, over 3286529.69 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:25:24,880 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2850596.0, ans=0.125 2023-10-09 19:25:45,232 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2850689.3333333335, ans=0.125 2023-10-09 19:25:46,282 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2850689.3333333335, ans=0.1 2023-10-09 19:25:56,526 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-10-09 19:26:00,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2850736.0, ans=0.0 2023-10-09 19:26:06,160 INFO [train.py:1031] (0/4) Epoch 14, batch 26150, loss[loss=0.2377, simple_loss=0.2896, pruned_loss=0.06918, ctc_loss=0.1185, over 16861.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2904, pruned_loss=0.06294, ctc_loss=0.1078, over 3294872.59 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:26:10,605 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2850782.6666666665, ans=10.0 2023-10-09 19:26:23,783 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2850829.3333333335, ans=0.125 2023-10-09 19:26:32,047 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.14 vs. limit=15.0 2023-10-09 19:26:36,017 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 3.218e+02 3.789e+02 4.435e+02 6.214e+02, threshold=7.579e+02, percent-clipped=0.0 2023-10-09 19:26:46,783 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-10-09 19:26:48,662 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:27:07,870 INFO [train.py:1031] (0/4) Epoch 14, batch 26200, loss[loss=0.179, simple_loss=0.2357, pruned_loss=0.04574, ctc_loss=0.07717, over 16815.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2872, pruned_loss=0.06253, ctc_loss=0.1068, over 3301080.66 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:27:10,258 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2851016.0, ans=0.125 2023-10-09 19:27:15,132 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-10-09 19:27:15,909 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2851016.0, ans=0.125 2023-10-09 19:27:33,991 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=15.0 2023-10-09 19:27:42,488 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2851109.3333333335, ans=0.125 2023-10-09 19:28:09,472 INFO [train.py:1031] (0/4) Epoch 14, batch 26250, loss[loss=0.1606, simple_loss=0.2016, pruned_loss=0.04494, ctc_loss=0.07432, over 16628.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2769, pruned_loss=0.05947, ctc_loss=0.1012, over 3295247.51 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:28:40,504 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851342.6666666665, ans=0.1 2023-10-09 19:28:43,499 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 3.073e+02 4.030e+02 5.136e+02 8.779e+02, threshold=8.059e+02, percent-clipped=2.0 2023-10-09 19:29:13,858 INFO [train.py:1031] (0/4) Epoch 14, batch 26300, loss[loss=0.2419, simple_loss=0.3022, pruned_loss=0.06753, ctc_loss=0.1161, over 16848.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2775, pruned_loss=0.05902, ctc_loss=0.1008, over 3283068.74 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:29:36,204 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2851529.3333333335, ans=0.125 2023-10-09 19:29:36,290 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2851529.3333333335, ans=0.125 2023-10-09 19:29:36,560 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-10-09 19:29:42,665 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2851576.0, ans=0.125 2023-10-09 19:29:42,759 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2851576.0, ans=0.07 2023-10-09 19:30:10,771 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:30:14,345 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.76 vs. limit=10.0 2023-10-09 19:30:18,137 INFO [train.py:1031] (0/4) Epoch 14, batch 26350, loss[loss=0.2601, simple_loss=0.3291, pruned_loss=0.07076, ctc_loss=0.1239, over 16775.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.287, pruned_loss=0.06293, ctc_loss=0.1089, over 3290687.06 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:30:21,749 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2851716.0, ans=0.125 2023-10-09 19:30:23,658 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.32 vs. limit=10.0 2023-10-09 19:30:25,564 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851716.0, ans=0.1 2023-10-09 19:30:30,529 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2851762.6666666665, ans=0.125 2023-10-09 19:30:46,271 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-10-09 19:30:49,813 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+02 3.520e+02 4.150e+02 4.845e+02 1.370e+03, threshold=8.299e+02, percent-clipped=2.0 2023-10-09 19:30:53,629 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2851809.3333333335, ans=0.125 2023-10-09 19:31:20,349 INFO [train.py:1031] (0/4) Epoch 14, batch 26400, loss[loss=0.2111, simple_loss=0.2775, pruned_loss=0.05403, ctc_loss=0.09135, over 16789.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2908, pruned_loss=0.06485, ctc_loss=0.1127, over 3297505.55 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:31:39,302 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2023-10-09 19:31:45,830 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2852042.6666666665, ans=0.125 2023-10-09 19:32:05,675 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2023-10-09 19:32:24,384 INFO [train.py:1031] (0/4) Epoch 14, batch 26450, loss[loss=0.2448, simple_loss=0.3389, pruned_loss=0.05659, ctc_loss=0.09361, over 15012.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2898, pruned_loss=0.06334, ctc_loss=0.1104, over 3298129.57 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:32:27,477 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2852182.6666666665, ans=0.1 2023-10-09 19:32:37,551 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-10-09 19:32:53,613 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2852276.0, ans=0.125 2023-10-09 19:32:56,196 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2023-10-09 19:32:58,050 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.048e+02 3.586e+02 4.298e+02 7.757e+02, threshold=7.171e+02, percent-clipped=0.0 2023-10-09 19:33:17,931 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.26 vs. limit=10.0 2023-10-09 19:33:25,446 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2023-10-09 19:33:28,750 INFO [train.py:1031] (0/4) Epoch 14, batch 26500, loss[loss=0.2726, simple_loss=0.313, pruned_loss=0.08688, ctc_loss=0.146, over 16730.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2897, pruned_loss=0.06364, ctc_loss=0.1101, over 3297744.44 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 19:34:03,716 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2852509.3333333335, ans=0.04949747468305833 2023-10-09 19:34:10,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2852556.0, ans=0.5 2023-10-09 19:34:15,901 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2852556.0, ans=0.125 2023-10-09 19:34:16,114 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-10-09 19:34:26,398 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2852602.6666666665, ans=0.0 2023-10-09 19:34:30,313 INFO [train.py:1031] (0/4) Epoch 14, batch 26550, loss[loss=0.2294, simple_loss=0.3051, pruned_loss=0.05594, ctc_loss=0.1045, over 16206.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2925, pruned_loss=0.06602, ctc_loss=0.1144, over 3302158.89 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:34:32,158 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-10-09 19:34:37,953 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-10-09 19:34:41,455 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2852649.3333333335, ans=0.125 2023-10-09 19:35:06,510 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 3.537e+02 4.198e+02 5.222e+02 9.143e+02, threshold=8.395e+02, percent-clipped=3.0 2023-10-09 19:35:11,908 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2852789.3333333335, ans=0.125 2023-10-09 19:35:14,274 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-10-09 19:35:17,739 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2852789.3333333335, ans=0.0 2023-10-09 19:35:32,025 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-10-09 19:35:32,288 INFO [train.py:1031] (0/4) Epoch 14, batch 26600, loss[loss=0.2404, simple_loss=0.3436, pruned_loss=0.04977, ctc_loss=0.09413, over 15152.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2961, pruned_loss=0.06512, ctc_loss=0.1136, over 3281017.70 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:35:42,143 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2852882.6666666665, ans=0.125 2023-10-09 19:35:44,645 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=22.5 2023-10-09 19:35:45,931 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2852929.3333333335, ans=0.2 2023-10-09 19:35:48,696 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2852929.3333333335, ans=0.05 2023-10-09 19:36:19,856 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2853022.6666666665, ans=0.2 2023-10-09 19:36:20,834 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2853069.3333333335, ans=0.125 2023-10-09 19:36:30,564 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=15.0 2023-10-09 19:36:32,195 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2853069.3333333335, ans=0.0 2023-10-09 19:36:34,517 INFO [train.py:1031] (0/4) Epoch 14, batch 26650, loss[loss=0.1829, simple_loss=0.2543, pruned_loss=0.04085, ctc_loss=0.07465, over 16785.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.294, pruned_loss=0.06097, ctc_loss=0.1075, over 3294530.06 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:36:43,637 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2853116.0, ans=0.0 2023-10-09 19:36:45,264 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2853116.0, ans=0.125 2023-10-09 19:36:47,542 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2853162.6666666665, ans=0.0 2023-10-09 19:36:48,584 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2853162.6666666665, ans=0.0 2023-10-09 19:36:58,031 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2853209.3333333335, ans=0.1 2023-10-09 19:36:58,069 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2853209.3333333335, ans=0.2 2023-10-09 19:37:10,950 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.027e+02 3.490e+02 4.414e+02 7.979e+02, threshold=6.980e+02, percent-clipped=0.0 2023-10-09 19:37:18,295 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2853256.0, ans=0.125 2023-10-09 19:37:35,088 INFO [train.py:1031] (0/4) Epoch 14, batch 26700, loss[loss=0.2026, simple_loss=0.2597, pruned_loss=0.05368, ctc_loss=0.09566, over 16797.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2887, pruned_loss=0.05898, ctc_loss=0.1049, over 3296069.67 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:37:43,329 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=12.0 2023-10-09 19:37:43,995 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2853349.3333333335, ans=0.05 2023-10-09 19:38:03,205 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2853442.6666666665, ans=0.125 2023-10-09 19:38:13,914 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-10-09 19:38:36,854 INFO [train.py:1031] (0/4) Epoch 14, batch 26750, loss[loss=0.1762, simple_loss=0.2154, pruned_loss=0.05105, ctc_loss=0.08724, over 9972.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2817, pruned_loss=0.05923, ctc_loss=0.1051, over 3278887.89 frames. ], batch size: 35, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:38:42,145 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2853582.6666666665, ans=0.1 2023-10-09 19:39:05,778 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2853676.0, ans=0.0 2023-10-09 19:39:14,602 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.221e+02 3.735e+02 4.264e+02 6.455e+02, threshold=7.471e+02, percent-clipped=0.0 2023-10-09 19:39:36,517 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2853769.3333333335, ans=0.0 2023-10-09 19:39:38,944 INFO [train.py:1031] (0/4) Epoch 14, batch 26800, loss[loss=0.2103, simple_loss=0.2634, pruned_loss=0.05838, ctc_loss=0.1012, over 16589.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.277, pruned_loss=0.0585, ctc_loss=0.1036, over 3288720.22 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:39:55,575 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2853862.6666666665, ans=0.125 2023-10-09 19:40:09,770 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2853909.3333333335, ans=0.2 2023-10-09 19:40:17,888 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2853956.0, ans=0.125 2023-10-09 19:40:32,821 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2854002.6666666665, ans=0.125 2023-10-09 19:40:34,544 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2854002.6666666665, ans=0.125 2023-10-09 19:40:40,122 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2854002.6666666665, ans=0.95 2023-10-09 19:40:41,948 INFO [train.py:1031] (0/4) Epoch 14, batch 26850, loss[loss=0.242, simple_loss=0.3145, pruned_loss=0.0625, ctc_loss=0.1111, over 16719.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2805, pruned_loss=0.06135, ctc_loss=0.1084, over 3294490.47 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:41:21,410 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.526e+02 4.022e+02 4.797e+02 9.323e+02, threshold=8.043e+02, percent-clipped=3.0 2023-10-09 19:41:23,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2854189.3333333335, ans=0.0 2023-10-09 19:41:45,194 INFO [train.py:1031] (0/4) Epoch 14, batch 26900, loss[loss=0.3064, simple_loss=0.3565, pruned_loss=0.09335, ctc_loss=0.1743, over 16613.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2864, pruned_loss=0.06178, ctc_loss=0.1093, over 3296218.39 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:42:09,634 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-10-09 19:42:15,403 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2854376.0, ans=0.125 2023-10-09 19:42:22,080 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2854422.6666666665, ans=0.0 2023-10-09 19:42:27,025 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2854422.6666666665, ans=0.125 2023-10-09 19:42:28,014 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2854422.6666666665, ans=0.0 2023-10-09 19:42:40,432 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2854469.3333333335, ans=0.0 2023-10-09 19:42:45,271 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2854469.3333333335, ans=0.2 2023-10-09 19:42:47,764 INFO [train.py:1031] (0/4) Epoch 14, batch 26950, loss[loss=0.2409, simple_loss=0.2955, pruned_loss=0.06699, ctc_loss=0.131, over 15243.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2855, pruned_loss=0.06129, ctc_loss=0.1088, over 3300376.87 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:42:51,811 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2854516.0, ans=0.125 2023-10-09 19:43:26,398 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+02 3.110e+02 3.559e+02 4.212e+02 9.939e+02, threshold=7.118e+02, percent-clipped=2.0 2023-10-09 19:43:46,068 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2854702.6666666665, ans=0.2 2023-10-09 19:43:48,325 INFO [train.py:1031] (0/4) Epoch 14, batch 27000, loss[loss=0.2277, simple_loss=0.2701, pruned_loss=0.06992, ctc_loss=0.1138, over 16956.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2796, pruned_loss=0.06131, ctc_loss=0.1083, over 3292073.81 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:43:48,326 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 19:43:59,253 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9950, 2.3785, 3.9469, 1.7886], device='cuda:0') 2023-10-09 19:44:04,051 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8930, 2.9232, 1.8095, 2.2159], device='cuda:0') 2023-10-09 19:44:06,712 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2336, simple_loss=0.3018, pruned_loss=0.06376, ctc_loss=0.09459, over 1796401.00 frames. 2023-10-09 19:44:06,713 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 19:44:29,572 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-10-09 19:44:59,732 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-10-09 19:45:02,103 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2854936.0, ans=0.125 2023-10-09 19:45:05,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2854982.6666666665, ans=0.125 2023-10-09 19:45:06,468 INFO [train.py:1031] (0/4) Epoch 14, batch 27050, loss[loss=0.1968, simple_loss=0.2533, pruned_loss=0.05363, ctc_loss=0.08259, over 16749.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2742, pruned_loss=0.05994, ctc_loss=0.1051, over 3294465.48 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:45:09,788 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2854982.6666666665, ans=0.05 2023-10-09 19:45:14,422 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2854982.6666666665, ans=0.125 2023-10-09 19:45:16,573 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2855029.3333333335, ans=0.1 2023-10-09 19:45:20,469 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2855029.3333333335, ans=0.125 2023-10-09 19:45:22,418 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2855029.3333333335, ans=0.125 2023-10-09 19:45:44,970 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.824e+02 3.206e+02 4.209e+02 1.336e+03, threshold=6.413e+02, percent-clipped=5.0 2023-10-09 19:46:05,132 INFO [train.py:1031] (0/4) Epoch 14, batch 27100, loss[loss=0.2025, simple_loss=0.2695, pruned_loss=0.05165, ctc_loss=0.08053, over 16762.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2695, pruned_loss=0.05824, ctc_loss=0.1014, over 3298171.51 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:46:30,801 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2855309.3333333335, ans=0.0 2023-10-09 19:46:34,504 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2855309.3333333335, ans=0.1 2023-10-09 19:46:43,261 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2023-10-09 19:46:45,215 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2855356.0, ans=0.125 2023-10-09 19:46:58,339 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2855402.6666666665, ans=0.1 2023-10-09 19:46:58,771 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=15.0 2023-10-09 19:47:04,173 INFO [train.py:1031] (0/4) Epoch 14, batch 27150, loss[loss=0.2242, simple_loss=0.2812, pruned_loss=0.06339, ctc_loss=0.1009, over 16777.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2705, pruned_loss=0.0597, ctc_loss=0.1039, over 3300443.39 frames. ], batch size: 111, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:47:17,294 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2855496.0, ans=0.125 2023-10-09 19:47:22,717 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.59 vs. limit=22.5 2023-10-09 19:47:28,841 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=12.0 2023-10-09 19:47:39,683 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2855542.6666666665, ans=0.0 2023-10-09 19:47:46,372 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+02 3.038e+02 3.523e+02 4.275e+02 1.319e+03, threshold=7.047e+02, percent-clipped=7.0 2023-10-09 19:47:52,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2855636.0, ans=0.125 2023-10-09 19:47:54,382 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2855636.0, ans=0.125 2023-10-09 19:47:58,247 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2855636.0, ans=0.125 2023-10-09 19:48:05,720 INFO [train.py:1031] (0/4) Epoch 14, batch 27200, loss[loss=0.2106, simple_loss=0.2739, pruned_loss=0.05445, ctc_loss=0.09592, over 16788.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2791, pruned_loss=0.06062, ctc_loss=0.1062, over 3303647.13 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:48:36,080 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2855776.0, ans=0.025 2023-10-09 19:49:06,308 INFO [train.py:1031] (0/4) Epoch 14, batch 27250, loss[loss=0.1893, simple_loss=0.2509, pruned_loss=0.04635, ctc_loss=0.08765, over 16721.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2804, pruned_loss=0.06014, ctc_loss=0.1054, over 3280729.30 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:49:23,712 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-10-09 19:49:27,233 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-612000.pt 2023-10-09 19:49:29,941 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2855962.6666666665, ans=0.1 2023-10-09 19:49:31,646 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:49:34,262 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-10-09 19:49:34,271 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2023-10-09 19:49:43,018 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2856009.3333333335, ans=22.5 2023-10-09 19:49:49,656 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2856056.0, ans=0.125 2023-10-09 19:49:50,617 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2856056.0, ans=0.125 2023-10-09 19:49:51,926 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 3.213e+02 3.949e+02 4.744e+02 1.249e+03, threshold=7.899e+02, percent-clipped=6.0 2023-10-09 19:50:00,280 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-10-09 19:50:10,445 INFO [train.py:1031] (0/4) Epoch 14, batch 27300, loss[loss=0.1798, simple_loss=0.2408, pruned_loss=0.04473, ctc_loss=0.07323, over 16895.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2756, pruned_loss=0.05966, ctc_loss=0.1051, over 3288237.44 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:51:02,085 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2856336.0, ans=0.0 2023-10-09 19:51:13,358 INFO [train.py:1031] (0/4) Epoch 14, batch 27350, loss[loss=0.2015, simple_loss=0.2816, pruned_loss=0.04404, ctc_loss=0.08323, over 16778.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2733, pruned_loss=0.0569, ctc_loss=0.1006, over 3287863.13 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:51:14,737 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2856382.6666666665, ans=0.0 2023-10-09 19:51:32,374 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2856429.3333333335, ans=0.125 2023-10-09 19:51:36,488 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-10-09 19:51:51,809 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2856522.6666666665, ans=0.2 2023-10-09 19:51:55,507 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2856522.6666666665, ans=0.0 2023-10-09 19:51:58,975 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.714e+02 3.156e+02 4.138e+02 1.229e+03, threshold=6.312e+02, percent-clipped=2.0 2023-10-09 19:52:15,404 INFO [train.py:1031] (0/4) Epoch 14, batch 27400, loss[loss=0.1852, simple_loss=0.2515, pruned_loss=0.04408, ctc_loss=0.07673, over 16880.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2702, pruned_loss=0.05376, ctc_loss=0.09553, over 3277630.80 frames. ], batch size: 216, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:52:17,739 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2856616.0, ans=0.1 2023-10-09 19:52:24,576 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2856616.0, ans=0.1 2023-10-09 19:52:25,404 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2856616.0, ans=0.125 2023-10-09 19:52:35,819 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2856662.6666666665, ans=0.05 2023-10-09 19:52:37,871 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2856662.6666666665, ans=0.125 2023-10-09 19:52:46,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2856709.3333333335, ans=0.0 2023-10-09 19:52:51,416 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2856756.0, ans=0.0 2023-10-09 19:53:15,201 INFO [train.py:1031] (0/4) Epoch 14, batch 27450, loss[loss=0.236, simple_loss=0.2814, pruned_loss=0.06981, ctc_loss=0.1277, over 16956.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2658, pruned_loss=0.05399, ctc_loss=0.09555, over 3274866.72 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:54:00,072 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.825e+02 3.198e+02 4.029e+02 6.832e+02, threshold=6.397e+02, percent-clipped=4.0 2023-10-09 19:54:14,411 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2857036.0, ans=0.0 2023-10-09 19:54:16,224 INFO [train.py:1031] (0/4) Epoch 14, batch 27500, loss[loss=0.2108, simple_loss=0.2582, pruned_loss=0.06116, ctc_loss=0.1025, over 16691.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2675, pruned_loss=0.05398, ctc_loss=0.09578, over 3267061.90 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:54:19,140 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2857082.6666666665, ans=0.125 2023-10-09 19:54:24,595 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2857082.6666666665, ans=0.125 2023-10-09 19:54:45,048 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2857176.0, ans=0.0 2023-10-09 19:54:58,537 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2857222.6666666665, ans=0.1 2023-10-09 19:55:15,394 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2857269.3333333335, ans=0.0 2023-10-09 19:55:16,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2857316.0, ans=0.125 2023-10-09 19:55:17,198 INFO [train.py:1031] (0/4) Epoch 14, batch 27550, loss[loss=0.2252, simple_loss=0.2649, pruned_loss=0.06964, ctc_loss=0.1153, over 16804.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2661, pruned_loss=0.0551, ctc_loss=0.09738, over 3271177.58 frames. ], batch size: 141, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:55:19,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2857316.0, ans=0.1 2023-10-09 19:55:27,872 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2857316.0, ans=0.125 2023-10-09 19:55:50,136 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:55:53,451 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2857409.3333333335, ans=0.0 2023-10-09 19:56:04,490 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-10-09 19:56:06,684 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.165e+02 3.747e+02 4.293e+02 1.170e+03, threshold=7.493e+02, percent-clipped=3.0 2023-10-09 19:56:20,205 INFO [train.py:1031] (0/4) Epoch 14, batch 27600, loss[loss=0.2476, simple_loss=0.3151, pruned_loss=0.06643, ctc_loss=0.118, over 16163.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2662, pruned_loss=0.05673, ctc_loss=0.09999, over 3275896.71 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:56:21,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2857549.3333333335, ans=0.125 2023-10-09 19:56:21,935 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.76 vs. limit=10.0 2023-10-09 19:56:23,393 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:56:29,853 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2857549.3333333335, ans=0.0 2023-10-09 19:56:32,032 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2857596.0, ans=0.125 2023-10-09 19:56:45,691 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2857642.6666666665, ans=0.0 2023-10-09 19:57:02,611 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2857689.3333333335, ans=0.125 2023-10-09 19:57:02,617 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2857689.3333333335, ans=0.125 2023-10-09 19:57:08,210 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2857689.3333333335, ans=0.04949747468305833 2023-10-09 19:57:18,622 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-10-09 19:57:22,123 INFO [train.py:1031] (0/4) Epoch 14, batch 27650, loss[loss=0.2451, simple_loss=0.2905, pruned_loss=0.07292, ctc_loss=0.1349, over 16384.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2697, pruned_loss=0.0572, ctc_loss=0.1011, over 3287703.65 frames. ], batch size: 417, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:57:29,653 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2857782.6666666665, ans=0.125 2023-10-09 19:57:45,699 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2857829.3333333335, ans=0.0 2023-10-09 19:57:59,227 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2857922.6666666665, ans=0.0 2023-10-09 19:58:02,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2857922.6666666665, ans=0.125 2023-10-09 19:58:10,218 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.178e+02 3.688e+02 4.513e+02 1.131e+03, threshold=7.375e+02, percent-clipped=1.0 2023-10-09 19:58:24,465 INFO [train.py:1031] (0/4) Epoch 14, batch 27700, loss[loss=0.178, simple_loss=0.2297, pruned_loss=0.04703, ctc_loss=0.08061, over 16873.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2674, pruned_loss=0.05798, ctc_loss=0.1023, over 3292434.19 frames. ], batch size: 189, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:58:24,869 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2858016.0, ans=0.125 2023-10-09 19:58:25,190 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-10-09 19:58:40,320 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2858062.6666666665, ans=0.125 2023-10-09 19:58:49,040 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2858109.3333333335, ans=0.125 2023-10-09 19:58:51,970 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2858109.3333333335, ans=15.0 2023-10-09 19:58:59,499 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2858156.0, ans=0.1 2023-10-09 19:59:06,755 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2858156.0, ans=0.0 2023-10-09 19:59:13,723 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2858202.6666666665, ans=0.125 2023-10-09 19:59:15,739 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2858202.6666666665, ans=0.2 2023-10-09 19:59:24,059 INFO [train.py:1031] (0/4) Epoch 14, batch 27750, loss[loss=0.2212, simple_loss=0.2677, pruned_loss=0.06555, ctc_loss=0.1087, over 16840.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2655, pruned_loss=0.05898, ctc_loss=0.1036, over 3290754.41 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:59:25,473 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2858249.3333333335, ans=0.0 2023-10-09 19:59:27,176 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2858249.3333333335, ans=0.125 2023-10-09 19:59:29,796 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.47 vs. limit=15.0 2023-10-09 20:00:04,967 WARNING [train.py:1204] (0/4) Exclude cut with ID R0014_M0086-0174-157 from training. Number of frames (before subsampling): 147. Number of frames (after subsampling): 35. Text: 你买多少东西一会儿他就送你这么多东西啊啊三大桶那三大桶得用多少时间就啊. Tokens: ['▁你', '买', '多', '少', '东', '西', '一', '会', '儿', '他', '就', '送', '你', '这', '么', '多', '东', '西', '啊', '啊', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '那', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '得', '用', '多', '少', '时', '间', '就', '啊']. Number of tokens: 39 2023-10-09 20:00:12,260 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2858436.0, ans=0.1 2023-10-09 20:00:14,002 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+02 3.399e+02 3.890e+02 4.499e+02 8.877e+02, threshold=7.779e+02, percent-clipped=2.0 2023-10-09 20:00:24,188 INFO [train.py:1031] (0/4) Epoch 14, batch 27800, loss[loss=0.238, simple_loss=0.2833, pruned_loss=0.07153, ctc_loss=0.124, over 16935.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2679, pruned_loss=0.06118, ctc_loss=0.1075, over 3305574.78 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:00:36,574 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2858529.3333333335, ans=0.125 2023-10-09 20:00:40,038 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2023-10-09 20:00:51,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2858576.0, ans=0.125 2023-10-09 20:00:53,389 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2858576.0, ans=0.125 2023-10-09 20:01:13,647 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=22.5 2023-10-09 20:01:22,775 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-10-09 20:01:25,645 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-10-09 20:01:27,836 INFO [train.py:1031] (0/4) Epoch 14, batch 27850, loss[loss=0.2151, simple_loss=0.2846, pruned_loss=0.05239, ctc_loss=0.1024, over 16875.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2762, pruned_loss=0.06373, ctc_loss=0.1128, over 3302456.04 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 20:01:31,430 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2858716.0, ans=0.1 2023-10-09 20:01:37,132 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:02:18,208 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+02 3.601e+02 4.394e+02 5.369e+02 1.444e+03, threshold=8.787e+02, percent-clipped=3.0 2023-10-09 20:02:19,673 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2858902.6666666665, ans=0.125 2023-10-09 20:02:27,369 INFO [train.py:1031] (0/4) Epoch 14, batch 27900, loss[loss=0.2042, simple_loss=0.2755, pruned_loss=0.04804, ctc_loss=0.0919, over 16837.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2803, pruned_loss=0.06276, ctc_loss=0.1125, over 3304959.95 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:02:27,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2858949.3333333335, ans=0.0 2023-10-09 20:02:39,327 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2023-10-09 20:02:49,564 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-10-09 20:02:52,132 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2859042.6666666665, ans=0.0 2023-10-09 20:03:20,606 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2859136.0, ans=0.125 2023-10-09 20:03:22,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2859136.0, ans=0.0 2023-10-09 20:03:26,749 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-10-09 20:03:29,819 INFO [train.py:1031] (0/4) Epoch 14, batch 27950, loss[loss=0.1629, simple_loss=0.2518, pruned_loss=0.02646, ctc_loss=0.05259, over 16894.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.277, pruned_loss=0.05812, ctc_loss=0.1051, over 3299698.37 frames. ], batch size: 243, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:03:37,646 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2859182.6666666665, ans=0.125 2023-10-09 20:03:43,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2859229.3333333335, ans=0.2 2023-10-09 20:04:03,459 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-10-09 20:04:04,273 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2859276.0, ans=0.0 2023-10-09 20:04:04,281 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2859276.0, ans=0.125 2023-10-09 20:04:21,888 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.805e+02 3.200e+02 4.012e+02 8.186e+02, threshold=6.399e+02, percent-clipped=0.0 2023-10-09 20:04:24,387 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2023-10-09 20:04:30,019 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=12.0 2023-10-09 20:04:30,197 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-10-09 20:04:31,540 INFO [train.py:1031] (0/4) Epoch 14, batch 28000, loss[loss=0.2305, simple_loss=0.265, pruned_loss=0.07217, ctc_loss=0.1291, over 16554.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.2721, pruned_loss=0.05664, ctc_loss=0.1022, over 3302139.01 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:04:50,612 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2859462.6666666665, ans=0.0 2023-10-09 20:05:00,853 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2859509.3333333335, ans=0.0 2023-10-09 20:05:16,378 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-10-09 20:05:18,164 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-10-09 20:05:31,052 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2859602.6666666665, ans=0.125 2023-10-09 20:05:32,261 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2859602.6666666665, ans=0.1 2023-10-09 20:05:33,910 INFO [train.py:1031] (0/4) Epoch 14, batch 28050, loss[loss=0.2138, simple_loss=0.261, pruned_loss=0.0619, ctc_loss=0.1068, over 16702.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2697, pruned_loss=0.05839, ctc_loss=0.1045, over 3302848.50 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:05:40,318 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2023-10-09 20:05:42,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2859649.3333333335, ans=0.05 2023-10-09 20:06:08,045 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2859742.6666666665, ans=0.125 2023-10-09 20:06:09,382 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2023-10-09 20:06:10,232 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2859789.3333333335, ans=0.125 2023-10-09 20:06:21,720 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2859836.0, ans=0.0 2023-10-09 20:06:25,713 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.244e+02 3.661e+02 4.395e+02 6.655e+02, threshold=7.321e+02, percent-clipped=2.0 2023-10-09 20:06:32,325 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2859836.0, ans=0.1 2023-10-09 20:06:34,744 INFO [train.py:1031] (0/4) Epoch 14, batch 28100, loss[loss=0.2514, simple_loss=0.3002, pruned_loss=0.07471, ctc_loss=0.1333, over 16972.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2704, pruned_loss=0.06048, ctc_loss=0.1074, over 3309386.93 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:06:35,756 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2859882.6666666665, ans=0.0 2023-10-09 20:06:54,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2859929.3333333335, ans=0.125 2023-10-09 20:07:22,976 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2860022.6666666665, ans=0.1 2023-10-09 20:07:27,842 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2860069.3333333335, ans=0.0 2023-10-09 20:07:28,078 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-10-09 20:07:39,091 INFO [train.py:1031] (0/4) Epoch 14, batch 28150, loss[loss=0.2436, simple_loss=0.3404, pruned_loss=0.05239, ctc_loss=0.105, over 16872.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2791, pruned_loss=0.06096, ctc_loss=0.1093, over 3312782.96 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:08:34,456 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.258e+02 3.630e+02 4.315e+02 7.484e+02, threshold=7.260e+02, percent-clipped=1.0 2023-10-09 20:08:41,525 INFO [train.py:1031] (0/4) Epoch 14, batch 28200, loss[loss=0.2511, simple_loss=0.3044, pruned_loss=0.07302, ctc_loss=0.1295, over 16728.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.288, pruned_loss=0.06349, ctc_loss=0.114, over 3312829.68 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:09:02,409 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2860396.0, ans=0.1 2023-10-09 20:09:07,195 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2860442.6666666665, ans=0.125 2023-10-09 20:09:10,622 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=12.0 2023-10-09 20:09:43,224 INFO [train.py:1031] (0/4) Epoch 14, batch 28250, loss[loss=0.2876, simple_loss=0.3035, pruned_loss=0.09977, ctc_loss=0.1804, over 16552.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2906, pruned_loss=0.06728, ctc_loss=0.1193, over 3316749.59 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:10:41,218 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+02 3.506e+02 4.003e+02 4.873e+02 1.007e+03, threshold=8.006e+02, percent-clipped=4.0 2023-10-09 20:10:46,101 INFO [train.py:1031] (0/4) Epoch 14, batch 28300, loss[loss=0.2011, simple_loss=0.2605, pruned_loss=0.05202, ctc_loss=0.09412, over 16859.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2903, pruned_loss=0.06846, ctc_loss=0.121, over 3319606.90 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:11:00,732 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-10-09 20:11:09,655 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2860909.3333333335, ans=0.125 2023-10-09 20:11:23,617 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2860956.0, ans=0.2 2023-10-09 20:11:23,659 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2860956.0, ans=0.0 2023-10-09 20:11:27,827 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2860956.0, ans=0.125 2023-10-09 20:11:44,905 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2861002.6666666665, ans=0.125 2023-10-09 20:11:48,188 INFO [train.py:1031] (0/4) Epoch 14, batch 28350, loss[loss=0.2235, simple_loss=0.2668, pruned_loss=0.06703, ctc_loss=0.1153, over 16663.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2837, pruned_loss=0.06721, ctc_loss=0.1184, over 3315950.05 frames. ], batch size: 271, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:11:49,591 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:12:14,248 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2861142.6666666665, ans=10.0 2023-10-09 20:12:26,096 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2861189.3333333335, ans=0.2 2023-10-09 20:12:33,417 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2861189.3333333335, ans=0.125 2023-10-09 20:12:46,030 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.563e+02 3.327e+02 3.829e+02 4.439e+02 7.732e+02, threshold=7.659e+02, percent-clipped=0.0 2023-10-09 20:12:46,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2861236.0, ans=0.0 2023-10-09 20:12:46,508 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2861236.0, ans=0.1 2023-10-09 20:12:49,600 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2861282.6666666665, ans=0.09899494936611666 2023-10-09 20:12:49,950 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=12.0 2023-10-09 20:12:50,326 INFO [train.py:1031] (0/4) Epoch 14, batch 28400, loss[loss=0.3285, simple_loss=0.3791, pruned_loss=0.1023, ctc_loss=0.1829, over 16624.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2864, pruned_loss=0.06737, ctc_loss=0.1186, over 3312432.41 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:13:19,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2861376.0, ans=0.0 2023-10-09 20:13:28,789 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2861376.0, ans=0.125 2023-10-09 20:13:32,619 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2861422.6666666665, ans=0.1 2023-10-09 20:13:37,588 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2861422.6666666665, ans=0.125 2023-10-09 20:13:51,493 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-10-09 20:13:56,934 INFO [train.py:1031] (0/4) Epoch 14, batch 28450, loss[loss=0.2592, simple_loss=0.3312, pruned_loss=0.06828, ctc_loss=0.1269, over 16825.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.2977, pruned_loss=0.06832, ctc_loss=0.1211, over 3302316.48 frames. ], batch size: 291, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:14:15,675 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2861562.6666666665, ans=0.2 2023-10-09 20:14:25,193 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-10-09 20:14:28,357 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-10-09 20:14:35,076 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2861656.0, ans=0.125 2023-10-09 20:14:39,370 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-10-09 20:14:43,451 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2023-10-09 20:14:58,087 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+02 3.582e+02 4.557e+02 5.514e+02 1.079e+03, threshold=9.115e+02, percent-clipped=9.0 2023-10-09 20:14:58,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2861702.6666666665, ans=0.0 2023-10-09 20:14:59,573 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2861702.6666666665, ans=10.0 2023-10-09 20:15:01,355 INFO [train.py:1031] (0/4) Epoch 14, batch 28500, loss[loss=0.2573, simple_loss=0.3315, pruned_loss=0.06584, ctc_loss=0.1285, over 16799.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.3051, pruned_loss=0.06864, ctc_loss=0.1221, over 3306702.92 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:15:15,800 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2861796.0, ans=0.0 2023-10-09 20:16:03,042 INFO [train.py:1031] (0/4) Epoch 14, batch 28550, loss[loss=0.1802, simple_loss=0.2905, pruned_loss=0.02458, ctc_loss=0.05168, over 15217.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.3001, pruned_loss=0.06298, ctc_loss=0.1125, over 3302391.75 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:16:31,996 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2862076.0, ans=0.0 2023-10-09 20:16:32,291 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=22.5 2023-10-09 20:16:51,388 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2862169.3333333335, ans=0.125 2023-10-09 20:16:54,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2862169.3333333335, ans=0.125 2023-10-09 20:16:57,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2862169.3333333335, ans=0.0 2023-10-09 20:17:00,496 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.790e+02 3.333e+02 3.902e+02 5.980e+02, threshold=6.666e+02, percent-clipped=0.0 2023-10-09 20:17:03,187 INFO [train.py:1031] (0/4) Epoch 14, batch 28600, loss[loss=0.2315, simple_loss=0.2833, pruned_loss=0.06765, ctc_loss=0.1109, over 16792.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2944, pruned_loss=0.06101, ctc_loss=0.1087, over 3304300.87 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:17:31,918 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2862309.3333333335, ans=0.1 2023-10-09 20:17:37,565 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2023-10-09 20:17:38,240 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2862309.3333333335, ans=0.125 2023-10-09 20:17:49,023 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2862356.0, ans=10.0 2023-10-09 20:17:49,502 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-10-09 20:17:52,402 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2862402.6666666665, ans=0.125 2023-10-09 20:17:56,477 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2862402.6666666665, ans=0.0 2023-10-09 20:17:58,140 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2862402.6666666665, ans=10.0 2023-10-09 20:18:00,396 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2862402.6666666665, ans=0.1 2023-10-09 20:18:05,180 INFO [train.py:1031] (0/4) Epoch 14, batch 28650, loss[loss=0.2485, simple_loss=0.3075, pruned_loss=0.06896, ctc_loss=0.1289, over 16737.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2908, pruned_loss=0.06146, ctc_loss=0.1091, over 3313628.54 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:18:27,628 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2862496.0, ans=0.125 2023-10-09 20:18:30,969 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2862542.6666666665, ans=0.0 2023-10-09 20:18:34,199 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=12.0 2023-10-09 20:18:34,836 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2862542.6666666665, ans=0.2 2023-10-09 20:18:48,626 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2862589.3333333335, ans=0.0 2023-10-09 20:18:54,554 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862636.0, ans=0.1 2023-10-09 20:18:57,370 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862636.0, ans=0.1 2023-10-09 20:19:05,968 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 2.995e+02 3.402e+02 4.215e+02 9.672e+02, threshold=6.804e+02, percent-clipped=2.0 2023-10-09 20:19:06,674 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.20 vs. limit=10.0 2023-10-09 20:19:07,092 INFO [train.py:1031] (0/4) Epoch 14, batch 28700, loss[loss=0.2635, simple_loss=0.3191, pruned_loss=0.07476, ctc_loss=0.146, over 16566.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2865, pruned_loss=0.05845, ctc_loss=0.1043, over 3314631.55 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:19:33,031 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2862776.0, ans=0.125 2023-10-09 20:19:45,409 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2862822.6666666665, ans=0.0 2023-10-09 20:19:53,965 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2862822.6666666665, ans=0.125 2023-10-09 20:19:57,718 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2862869.3333333335, ans=0.0 2023-10-09 20:20:07,321 INFO [train.py:1031] (0/4) Epoch 14, batch 28750, loss[loss=0.3, simple_loss=0.3373, pruned_loss=0.09609, ctc_loss=0.1764, over 16716.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2833, pruned_loss=0.05698, ctc_loss=0.1019, over 3309999.72 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:20:13,068 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2862916.0, ans=0.2 2023-10-09 20:20:16,323 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2862916.0, ans=0.125 2023-10-09 20:20:20,598 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2862962.6666666665, ans=0.125 2023-10-09 20:20:27,565 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2862962.6666666665, ans=0.125 2023-10-09 20:20:39,522 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2863009.3333333335, ans=0.1 2023-10-09 20:21:09,039 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 3.101e+02 3.665e+02 4.221e+02 6.562e+02, threshold=7.330e+02, percent-clipped=0.0 2023-10-09 20:21:09,066 INFO [train.py:1031] (0/4) Epoch 14, batch 28800, loss[loss=0.2287, simple_loss=0.2933, pruned_loss=0.06146, ctc_loss=0.103, over 16838.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2831, pruned_loss=0.05833, ctc_loss=0.1039, over 3320467.77 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:21:21,709 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2863196.0, ans=0.07 2023-10-09 20:21:21,788 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2863196.0, ans=0.2 2023-10-09 20:21:35,193 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.62 vs. limit=10.0 2023-10-09 20:21:40,377 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2863242.6666666665, ans=0.0 2023-10-09 20:21:42,558 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2863242.6666666665, ans=0.0 2023-10-09 20:21:48,516 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2863289.3333333335, ans=0.125 2023-10-09 20:21:49,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2863289.3333333335, ans=0.0 2023-10-09 20:21:59,108 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2863336.0, ans=0.05 2023-10-09 20:21:59,409 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-10-09 20:22:03,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2863336.0, ans=0.05 2023-10-09 20:22:10,820 INFO [train.py:1031] (0/4) Epoch 14, batch 28850, loss[loss=0.2059, simple_loss=0.2521, pruned_loss=0.05967, ctc_loss=0.1012, over 16873.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2797, pruned_loss=0.05972, ctc_loss=0.1055, over 3315751.86 frames. ], batch size: 141, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:22:39,872 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2863476.0, ans=0.0 2023-10-09 20:22:41,816 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2863476.0, ans=0.2 2023-10-09 20:23:07,447 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2863569.3333333335, ans=0.04949747468305833 2023-10-09 20:23:12,084 INFO [train.py:1031] (0/4) Epoch 14, batch 28900, loss[loss=0.2144, simple_loss=0.2747, pruned_loss=0.0569, ctc_loss=0.101, over 16799.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2755, pruned_loss=0.0607, ctc_loss=0.1067, over 3310206.49 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:23:13,123 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+02 3.415e+02 3.744e+02 4.568e+02 8.890e+02, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 20:23:13,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2863616.0, ans=0.125 2023-10-09 20:23:17,946 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:23:34,776 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2863662.6666666665, ans=0.0 2023-10-09 20:23:52,574 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2023-10-09 20:23:53,844 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-10-09 20:24:13,689 INFO [train.py:1031] (0/4) Epoch 14, batch 28950, loss[loss=0.196, simple_loss=0.224, pruned_loss=0.06405, ctc_loss=0.09971, over 16798.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2735, pruned_loss=0.06091, ctc_loss=0.1056, over 3304827.13 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:24:35,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2863896.0, ans=0.0 2023-10-09 20:24:37,594 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2863942.6666666665, ans=0.125 2023-10-09 20:24:52,819 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-10-09 20:25:13,505 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2023-10-09 20:25:15,070 INFO [train.py:1031] (0/4) Epoch 14, batch 29000, loss[loss=0.1618, simple_loss=0.2096, pruned_loss=0.04322, ctc_loss=0.06874, over 16303.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2708, pruned_loss=0.05899, ctc_loss=0.1019, over 3301317.82 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:25:17,201 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+02 3.225e+02 3.785e+02 4.643e+02 9.976e+02, threshold=7.570e+02, percent-clipped=3.0 2023-10-09 20:25:46,592 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2864176.0, ans=0.125 2023-10-09 20:25:54,114 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2864222.6666666665, ans=0.0 2023-10-09 20:25:55,057 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2864222.6666666665, ans=0.125 2023-10-09 20:26:01,112 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2864222.6666666665, ans=0.1 2023-10-09 20:26:06,940 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2864269.3333333335, ans=0.2 2023-10-09 20:26:15,142 INFO [train.py:1031] (0/4) Epoch 14, batch 29050, loss[loss=0.2046, simple_loss=0.2691, pruned_loss=0.05126, ctc_loss=0.09364, over 16893.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2736, pruned_loss=0.0594, ctc_loss=0.1029, over 3310056.94 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:26:16,903 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-10-09 20:26:30,836 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2864362.6666666665, ans=0.125 2023-10-09 20:26:52,708 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2023-10-09 20:26:58,686 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2864456.0, ans=0.125 2023-10-09 20:26:59,214 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-10-09 20:27:16,985 INFO [train.py:1031] (0/4) Epoch 14, batch 29100, loss[loss=0.227, simple_loss=0.2865, pruned_loss=0.06338, ctc_loss=0.102, over 16891.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2764, pruned_loss=0.06174, ctc_loss=0.107, over 3311423.62 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:27:20,251 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+02 3.447e+02 3.769e+02 4.635e+02 6.729e+02, threshold=7.539e+02, percent-clipped=0.0 2023-10-09 20:27:22,273 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-10-09 20:27:41,219 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2864642.6666666665, ans=0.0 2023-10-09 20:27:45,528 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2864642.6666666665, ans=0.0 2023-10-09 20:27:58,462 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2864689.3333333335, ans=0.125 2023-10-09 20:27:58,657 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.87 vs. limit=10.0 2023-10-09 20:28:18,188 INFO [train.py:1031] (0/4) Epoch 14, batch 29150, loss[loss=0.2211, simple_loss=0.2939, pruned_loss=0.05576, ctc_loss=0.09204, over 16899.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2812, pruned_loss=0.06471, ctc_loss=0.1125, over 3320567.41 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:28:22,343 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2864782.6666666665, ans=0.0 2023-10-09 20:28:31,854 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2864829.3333333335, ans=0.125 2023-10-09 20:28:54,241 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2864876.0, ans=0.125 2023-10-09 20:29:22,662 INFO [train.py:1031] (0/4) Epoch 14, batch 29200, loss[loss=0.2208, simple_loss=0.3008, pruned_loss=0.05147, ctc_loss=0.09473, over 16895.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2825, pruned_loss=0.06499, ctc_loss=0.1133, over 3304970.50 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:29:28,273 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+02 3.299e+02 3.814e+02 4.330e+02 6.435e+02, threshold=7.628e+02, percent-clipped=0.0 2023-10-09 20:29:49,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2865109.3333333335, ans=0.2 2023-10-09 20:30:05,677 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2865156.0, ans=0.125 2023-10-09 20:30:27,647 INFO [train.py:1031] (0/4) Epoch 14, batch 29250, loss[loss=0.2355, simple_loss=0.3064, pruned_loss=0.06086, ctc_loss=0.107, over 16812.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2822, pruned_loss=0.06281, ctc_loss=0.1099, over 3294622.30 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:30:31,282 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-10-09 20:31:07,185 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2865389.3333333335, ans=0.0 2023-10-09 20:31:08,308 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2865389.3333333335, ans=0.1 2023-10-09 20:31:09,392 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=2865389.3333333335, ans=0.2 2023-10-09 20:31:32,754 INFO [train.py:1031] (0/4) Epoch 14, batch 29300, loss[loss=0.3044, simple_loss=0.4004, pruned_loss=0.07538, ctc_loss=0.1439, over 15034.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2928, pruned_loss=0.06496, ctc_loss=0.114, over 3300297.36 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:31:38,516 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 3.153e+02 3.767e+02 4.679e+02 9.052e+02, threshold=7.535e+02, percent-clipped=4.0 2023-10-09 20:31:47,916 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2865529.3333333335, ans=0.0 2023-10-09 20:32:11,237 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2865622.6666666665, ans=0.0 2023-10-09 20:32:33,854 INFO [train.py:1031] (0/4) Epoch 14, batch 29350, loss[loss=0.2084, simple_loss=0.2686, pruned_loss=0.05276, ctc_loss=0.1066, over 16932.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2912, pruned_loss=0.0654, ctc_loss=0.1149, over 3308512.04 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:32:41,929 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2865716.0, ans=0.125 2023-10-09 20:32:42,024 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2865716.0, ans=0.125 2023-10-09 20:32:44,240 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-10-09 20:32:48,656 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865762.6666666665, ans=0.1 2023-10-09 20:32:53,479 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2865762.6666666665, ans=0.0 2023-10-09 20:33:02,042 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2865809.3333333335, ans=0.125 2023-10-09 20:33:08,901 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2865809.3333333335, ans=0.125 2023-10-09 20:33:13,850 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2865856.0, ans=0.0 2023-10-09 20:33:17,015 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.14 vs. limit=10.0 2023-10-09 20:33:36,259 INFO [train.py:1031] (0/4) Epoch 14, batch 29400, loss[loss=0.1983, simple_loss=0.2892, pruned_loss=0.03853, ctc_loss=0.07556, over 16421.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2848, pruned_loss=0.06166, ctc_loss=0.1091, over 3308273.05 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:33:44,078 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.925e+02 3.429e+02 4.063e+02 7.311e+02, threshold=6.858e+02, percent-clipped=0.0 2023-10-09 20:34:18,927 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2866089.3333333335, ans=0.0 2023-10-09 20:34:19,142 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-10-09 20:34:21,062 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2866089.3333333335, ans=0.125 2023-10-09 20:34:31,112 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-10-09 20:34:39,999 INFO [train.py:1031] (0/4) Epoch 14, batch 29450, loss[loss=0.213, simple_loss=0.2823, pruned_loss=0.05283, ctc_loss=0.09531, over 16833.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2789, pruned_loss=0.05772, ctc_loss=0.1026, over 3301286.57 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:34:48,048 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2866182.6666666665, ans=0.125 2023-10-09 20:35:43,431 INFO [train.py:1031] (0/4) Epoch 14, batch 29500, loss[loss=0.2081, simple_loss=0.2899, pruned_loss=0.0439, ctc_loss=0.09619, over 16875.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2821, pruned_loss=0.05637, ctc_loss=0.1012, over 3308353.07 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 20:35:51,253 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.902e+02 3.659e+02 4.459e+02 8.520e+02, threshold=7.319e+02, percent-clipped=6.0 2023-10-09 20:36:31,776 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2866602.6666666665, ans=0.125 2023-10-09 20:36:38,689 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2866602.6666666665, ans=0.025 2023-10-09 20:36:38,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2866602.6666666665, ans=0.125 2023-10-09 20:36:44,251 INFO [train.py:1031] (0/4) Epoch 14, batch 29550, loss[loss=0.1867, simple_loss=0.2228, pruned_loss=0.0554, ctc_loss=0.09955, over 16133.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2771, pruned_loss=0.05612, ctc_loss=0.1006, over 3312119.97 frames. ], batch size: 466, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:36:52,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2866649.3333333335, ans=0.0 2023-10-09 20:37:18,795 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2866742.6666666665, ans=0.125 2023-10-09 20:37:28,829 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-10-09 20:37:32,846 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866836.0, ans=0.1 2023-10-09 20:37:32,860 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2866836.0, ans=0.125 2023-10-09 20:37:34,993 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2866836.0, ans=0.0 2023-10-09 20:37:44,954 INFO [train.py:1031] (0/4) Epoch 14, batch 29600, loss[loss=0.2028, simple_loss=0.2572, pruned_loss=0.05357, ctc_loss=0.103, over 16710.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2728, pruned_loss=0.05641, ctc_loss=0.1009, over 3316235.63 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:37:54,762 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.047e+02 3.582e+02 4.028e+02 6.950e+02, threshold=7.163e+02, percent-clipped=0.0 2023-10-09 20:37:57,204 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2866929.3333333335, ans=0.0 2023-10-09 20:38:03,178 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2866929.3333333335, ans=0.125 2023-10-09 20:38:19,996 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-10-09 20:38:46,693 INFO [train.py:1031] (0/4) Epoch 14, batch 29650, loss[loss=0.2193, simple_loss=0.2799, pruned_loss=0.05876, ctc_loss=0.1028, over 16876.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2768, pruned_loss=0.05749, ctc_loss=0.1028, over 3313862.01 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:39:16,914 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-10-09 20:39:20,213 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2867209.3333333335, ans=0.1 2023-10-09 20:39:22,346 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2867256.0, ans=10.0 2023-10-09 20:39:35,180 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2867302.6666666665, ans=0.07 2023-10-09 20:39:36,599 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2023-10-09 20:39:39,595 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2867302.6666666665, ans=0.125 2023-10-09 20:39:48,383 INFO [train.py:1031] (0/4) Epoch 14, batch 29700, loss[loss=0.3017, simple_loss=0.3316, pruned_loss=0.1008, ctc_loss=0.1754, over 16648.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2786, pruned_loss=0.0595, ctc_loss=0.1058, over 3315339.28 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:39:59,295 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.266e+02 3.794e+02 4.396e+02 1.319e+03, threshold=7.588e+02, percent-clipped=2.0 2023-10-09 20:40:03,404 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2867396.0, ans=0.125 2023-10-09 20:40:08,040 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2867396.0, ans=0.0 2023-10-09 20:40:20,325 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2867442.6666666665, ans=0.125 2023-10-09 20:40:34,643 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2023-10-09 20:40:35,472 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2867489.3333333335, ans=0.125 2023-10-09 20:40:38,145 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2867536.0, ans=0.2 2023-10-09 20:40:50,196 INFO [train.py:1031] (0/4) Epoch 14, batch 29750, loss[loss=0.2222, simple_loss=0.2656, pruned_loss=0.06576, ctc_loss=0.1183, over 16961.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2792, pruned_loss=0.0608, ctc_loss=0.1077, over 3320425.35 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:41:26,769 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2867722.6666666665, ans=0.125 2023-10-09 20:41:26,900 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2867722.6666666665, ans=0.2 2023-10-09 20:41:31,655 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2867722.6666666665, ans=0.05 2023-10-09 20:41:39,770 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-10-09 20:41:47,105 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2867769.3333333335, ans=0.1 2023-10-09 20:41:53,582 INFO [train.py:1031] (0/4) Epoch 14, batch 29800, loss[loss=0.2228, simple_loss=0.2691, pruned_loss=0.06604, ctc_loss=0.1109, over 16900.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2799, pruned_loss=0.06217, ctc_loss=0.1099, over 3316892.96 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:41:56,163 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2867816.0, ans=0.0 2023-10-09 20:42:05,752 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.683e+02 3.252e+02 3.750e+02 4.690e+02 1.156e+03, threshold=7.500e+02, percent-clipped=2.0 2023-10-09 20:42:06,112 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2867862.6666666665, ans=0.125 2023-10-09 20:42:11,573 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=12.0 2023-10-09 20:42:29,788 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2867956.0, ans=0.2 2023-10-09 20:42:35,738 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.64 vs. limit=6.0 2023-10-09 20:42:56,236 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2868049.3333333335, ans=0.0 2023-10-09 20:42:56,951 INFO [train.py:1031] (0/4) Epoch 14, batch 29850, loss[loss=0.2888, simple_loss=0.3401, pruned_loss=0.08666, ctc_loss=0.1603, over 16862.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2908, pruned_loss=0.0644, ctc_loss=0.1143, over 3312279.95 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:43:26,098 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2868142.6666666665, ans=0.1 2023-10-09 20:43:32,657 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2868142.6666666665, ans=0.0 2023-10-09 20:43:45,191 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2868189.3333333335, ans=0.05 2023-10-09 20:43:49,765 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2868236.0, ans=0.125 2023-10-09 20:44:02,051 INFO [train.py:1031] (0/4) Epoch 14, batch 29900, loss[loss=0.234, simple_loss=0.2873, pruned_loss=0.06818, ctc_loss=0.1107, over 16668.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.2966, pruned_loss=0.06793, ctc_loss=0.1201, over 3314963.81 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:44:15,803 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+02 3.520e+02 3.961e+02 4.963e+02 1.132e+03, threshold=7.922e+02, percent-clipped=8.0 2023-10-09 20:44:22,961 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2868329.3333333335, ans=0.125 2023-10-09 20:44:38,319 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2868422.6666666665, ans=0.0 2023-10-09 20:44:54,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2868469.3333333335, ans=0.125 2023-10-09 20:45:04,794 INFO [train.py:1031] (0/4) Epoch 14, batch 29950, loss[loss=0.2819, simple_loss=0.3397, pruned_loss=0.0842, ctc_loss=0.1391, over 16774.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.3015, pruned_loss=0.07016, ctc_loss=0.1224, over 3308512.48 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:45:06,724 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-10-09 20:45:12,382 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2868516.0, ans=0.125 2023-10-09 20:45:13,978 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2868516.0, ans=0.1 2023-10-09 20:45:20,077 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2023-10-09 20:45:32,495 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2868609.3333333335, ans=0.035 2023-10-09 20:45:51,276 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2868656.0, ans=0.2 2023-10-09 20:46:00,943 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2023-10-09 20:46:05,484 INFO [train.py:1031] (0/4) Epoch 14, batch 30000, loss[loss=0.2908, simple_loss=0.3296, pruned_loss=0.09237, ctc_loss=0.1681, over 16687.00 frames. ], tot_loss[loss=0.243, simple_loss=0.3025, pruned_loss=0.06792, ctc_loss=0.1189, over 3308038.60 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 20:46:05,485 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 20:46:21,901 INFO [zipformer.py:1853] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3620, 5.2118, 4.9421, 5.4826], device='cuda:0') 2023-10-09 20:46:22,666 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2308, simple_loss=0.3022, pruned_loss=0.06118, ctc_loss=0.09249, over 1796401.00 frames. 2023-10-09 20:46:22,667 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 20:46:24,073 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2868749.3333333335, ans=0.125 2023-10-09 20:46:25,207 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2868749.3333333335, ans=0.125 2023-10-09 20:46:26,323 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2868749.3333333335, ans=0.1 2023-10-09 20:46:34,275 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2868796.0, ans=0.1 2023-10-09 20:46:36,770 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.188e+02 3.941e+02 4.902e+02 7.309e+02, threshold=7.881e+02, percent-clipped=0.0 2023-10-09 20:46:38,115 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2868796.0, ans=0.125 2023-10-09 20:46:39,239 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2868796.0, ans=0.125 2023-10-09 20:46:59,785 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-10-09 20:47:15,373 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2023-10-09 20:47:21,715 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-10-09 20:47:24,068 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2868982.6666666665, ans=0.0 2023-10-09 20:47:24,788 INFO [train.py:1031] (0/4) Epoch 14, batch 30050, loss[loss=0.2247, simple_loss=0.3086, pruned_loss=0.05234, ctc_loss=0.09051, over 16876.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2995, pruned_loss=0.06681, ctc_loss=0.1174, over 3302787.83 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:47:27,350 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2868982.6666666665, ans=0.0 2023-10-09 20:47:28,420 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2868982.6666666665, ans=0.125 2023-10-09 20:47:30,343 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2868982.6666666665, ans=0.0 2023-10-09 20:47:33,050 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2868982.6666666665, ans=0.1 2023-10-09 20:48:00,458 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:48:19,035 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=22.5 2023-10-09 20:48:25,620 INFO [train.py:1031] (0/4) Epoch 14, batch 30100, loss[loss=0.1916, simple_loss=0.2622, pruned_loss=0.04496, ctc_loss=0.07752, over 16705.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.2964, pruned_loss=0.06393, ctc_loss=0.1129, over 3304987.24 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:48:43,029 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+02 3.136e+02 3.710e+02 4.656e+02 9.667e+02, threshold=7.419e+02, percent-clipped=2.0 2023-10-09 20:48:48,274 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2869262.6666666665, ans=0.0 2023-10-09 20:48:59,556 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2869309.3333333335, ans=0.125 2023-10-09 20:49:02,562 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2869356.0, ans=0.1 2023-10-09 20:49:11,661 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2023-10-09 20:49:27,212 INFO [train.py:1031] (0/4) Epoch 14, batch 30150, loss[loss=0.2112, simple_loss=0.266, pruned_loss=0.05768, ctc_loss=0.1025, over 16732.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2947, pruned_loss=0.06258, ctc_loss=0.1107, over 3289700.82 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:49:27,545 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2869449.3333333335, ans=0.125 2023-10-09 20:49:42,221 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2869496.0, ans=0.0 2023-10-09 20:49:48,722 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-10-09 20:49:57,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2869542.6666666665, ans=0.125 2023-10-09 20:50:24,557 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2869636.0, ans=0.125 2023-10-09 20:50:26,672 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2869682.6666666665, ans=0.1 2023-10-09 20:50:27,445 INFO [train.py:1031] (0/4) Epoch 14, batch 30200, loss[loss=0.2729, simple_loss=0.3211, pruned_loss=0.08526, ctc_loss=0.1355, over 16895.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2956, pruned_loss=0.06462, ctc_loss=0.1138, over 3283963.55 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:50:45,496 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.166e+02 3.687e+02 4.321e+02 7.960e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 20:51:25,582 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2869869.3333333335, ans=0.0 2023-10-09 20:51:28,009 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=2869916.0, ans=0.1 2023-10-09 20:51:28,750 INFO [train.py:1031] (0/4) Epoch 14, batch 30250, loss[loss=0.2505, simple_loss=0.3068, pruned_loss=0.07142, ctc_loss=0.1284, over 16881.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2966, pruned_loss=0.06579, ctc_loss=0.1159, over 3276867.69 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:51:30,654 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2869916.0, ans=0.125 2023-10-09 20:51:43,658 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=22.5 2023-10-09 20:51:45,953 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2869962.6666666665, ans=0.125 2023-10-09 20:51:54,291 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2870009.3333333335, ans=0.125 2023-10-09 20:52:24,638 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2870102.6666666665, ans=0.0 2023-10-09 20:52:28,921 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2870102.6666666665, ans=0.125 2023-10-09 20:52:32,397 INFO [train.py:1031] (0/4) Epoch 14, batch 30300, loss[loss=0.2441, simple_loss=0.3024, pruned_loss=0.06926, ctc_loss=0.1182, over 16839.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.3001, pruned_loss=0.0684, ctc_loss=0.1201, over 3260048.75 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:52:51,951 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+02 3.410e+02 3.951e+02 4.930e+02 7.071e+02, threshold=7.902e+02, percent-clipped=0.0 2023-10-09 20:53:06,553 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2870242.6666666665, ans=0.125 2023-10-09 20:53:33,832 INFO [train.py:1031] (0/4) Epoch 14, batch 30350, loss[loss=0.2094, simple_loss=0.2699, pruned_loss=0.05551, ctc_loss=0.09463, over 16811.00 frames. ], tot_loss[loss=0.2431, simple_loss=0.2986, pruned_loss=0.06945, ctc_loss=0.1217, over 3271076.23 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:53:34,179 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2870382.6666666665, ans=0.125 2023-10-09 20:53:57,797 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2870476.0, ans=0.125 2023-10-09 20:53:58,872 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2870476.0, ans=0.035 2023-10-09 20:54:15,163 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.63 vs. limit=6.0 2023-10-09 20:54:19,234 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2870522.6666666665, ans=0.0 2023-10-09 20:54:20,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2870522.6666666665, ans=0.0 2023-10-09 20:54:35,112 INFO [train.py:1031] (0/4) Epoch 14, batch 30400, loss[loss=0.2074, simple_loss=0.2677, pruned_loss=0.05446, ctc_loss=0.09564, over 16955.00 frames. ], tot_loss[loss=0.2411, simple_loss=0.2948, pruned_loss=0.06935, ctc_loss=0.1215, over 3286551.64 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:54:44,110 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2870616.0, ans=0.0 2023-10-09 20:54:54,415 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.508e+02 3.255e+02 4.087e+02 4.759e+02 9.430e+02, threshold=8.174e+02, percent-clipped=1.0 2023-10-09 20:55:15,477 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2870756.0, ans=0.125 2023-10-09 20:55:22,862 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2870802.6666666665, ans=0.1 2023-10-09 20:55:25,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2870802.6666666665, ans=0.2 2023-10-09 20:55:27,801 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2870802.6666666665, ans=0.125 2023-10-09 20:55:35,537 INFO [train.py:1031] (0/4) Epoch 14, batch 30450, loss[loss=0.1945, simple_loss=0.2551, pruned_loss=0.04976, ctc_loss=0.08587, over 16945.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2883, pruned_loss=0.06853, ctc_loss=0.1201, over 3280494.20 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:56:14,499 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2870989.3333333335, ans=0.0 2023-10-09 20:56:18,137 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-10-09 20:56:24,265 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2870989.3333333335, ans=0.0 2023-10-09 20:56:33,502 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2871036.0, ans=0.125 2023-10-09 20:56:33,535 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2871036.0, ans=0.125 2023-10-09 20:56:36,616 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2871036.0, ans=0.0 2023-10-09 20:56:38,905 INFO [train.py:1031] (0/4) Epoch 14, batch 30500, loss[loss=0.2626, simple_loss=0.3374, pruned_loss=0.06948, ctc_loss=0.1223, over 16437.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2854, pruned_loss=0.06663, ctc_loss=0.1168, over 3282067.21 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:56:43,440 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2871082.6666666665, ans=0.0 2023-10-09 20:56:53,289 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2871129.3333333335, ans=0.125 2023-10-09 20:57:00,012 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.155e+02 3.681e+02 4.532e+02 7.008e+02, threshold=7.361e+02, percent-clipped=0.0 2023-10-09 20:57:03,826 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2871176.0, ans=0.125 2023-10-09 20:57:07,691 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.80 vs. limit=10.0 2023-10-09 20:57:14,233 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2871176.0, ans=0.2 2023-10-09 20:57:33,833 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2871269.3333333335, ans=0.0 2023-10-09 20:57:41,471 INFO [train.py:1031] (0/4) Epoch 14, batch 30550, loss[loss=0.2629, simple_loss=0.3166, pruned_loss=0.07847, ctc_loss=0.1303, over 16978.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.29, pruned_loss=0.06532, ctc_loss=0.1149, over 3280535.93 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:13,023 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2871409.3333333335, ans=0.125 2023-10-09 20:58:22,430 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-10-09 20:58:40,229 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2871549.3333333335, ans=0.125 2023-10-09 20:58:40,238 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2871549.3333333335, ans=0.125 2023-10-09 20:58:40,297 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:58:41,007 INFO [train.py:1031] (0/4) Epoch 14, batch 30600, loss[loss=0.2012, simple_loss=0.2517, pruned_loss=0.05516, ctc_loss=0.101, over 16863.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2891, pruned_loss=0.0665, ctc_loss=0.1165, over 3286051.70 frames. ], batch size: 189, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:57,285 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2871596.0, ans=0.125 2023-10-09 20:59:01,072 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.194e+02 3.624e+02 4.233e+02 1.074e+03, threshold=7.249e+02, percent-clipped=2.0 2023-10-09 20:59:02,420 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2871642.6666666665, ans=0.035 2023-10-09 20:59:02,920 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=22.5 2023-10-09 20:59:09,250 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2871642.6666666665, ans=0.125 2023-10-09 20:59:10,261 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2871642.6666666665, ans=0.125 2023-10-09 20:59:40,083 INFO [train.py:1031] (0/4) Epoch 14, batch 30650, loss[loss=0.1941, simple_loss=0.2467, pruned_loss=0.05276, ctc_loss=0.08972, over 16931.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2823, pruned_loss=0.06471, ctc_loss=0.1132, over 3297985.67 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:59:46,743 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2871782.6666666665, ans=0.04949747468305833 2023-10-09 20:59:47,065 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-10-09 21:00:12,066 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-10-09 21:00:18,411 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2871922.6666666665, ans=0.125 2023-10-09 21:00:21,618 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:00:23,804 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2871922.6666666665, ans=0.125 2023-10-09 21:00:29,325 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:00:41,981 INFO [train.py:1031] (0/4) Epoch 14, batch 30700, loss[loss=0.2101, simple_loss=0.2646, pruned_loss=0.05944, ctc_loss=0.09189, over 16796.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2763, pruned_loss=0.06138, ctc_loss=0.1076, over 3298289.48 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:00:44,013 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2872016.0, ans=0.0 2023-10-09 21:00:44,035 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2872016.0, ans=0.125 2023-10-09 21:00:46,742 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2872016.0, ans=0.1 2023-10-09 21:00:51,645 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2872016.0, ans=0.0 2023-10-09 21:01:06,275 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.086e+02 3.703e+02 4.369e+02 9.445e+02, threshold=7.405e+02, percent-clipped=1.0 2023-10-09 21:01:21,982 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2872156.0, ans=0.125 2023-10-09 21:01:24,130 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2872156.0, ans=0.2 2023-10-09 21:01:42,005 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2872202.6666666665, ans=0.0 2023-10-09 21:01:46,062 INFO [train.py:1031] (0/4) Epoch 14, batch 30750, loss[loss=0.206, simple_loss=0.2574, pruned_loss=0.05854, ctc_loss=0.09365, over 16747.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2792, pruned_loss=0.06105, ctc_loss=0.106, over 3300114.12 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:02:11,051 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2872342.6666666665, ans=0.125 2023-10-09 21:02:30,001 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2872389.3333333335, ans=0.125 2023-10-09 21:02:37,889 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2872436.0, ans=0.0 2023-10-09 21:02:50,792 INFO [train.py:1031] (0/4) Epoch 14, batch 30800, loss[loss=0.2405, simple_loss=0.3038, pruned_loss=0.06614, ctc_loss=0.1125, over 16812.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2874, pruned_loss=0.06182, ctc_loss=0.1073, over 3299193.32 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:16,580 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+02 3.887e+02 4.535e+02 5.921e+02 9.056e+02, threshold=9.070e+02, percent-clipped=5.0 2023-10-09 21:03:27,409 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2023-10-09 21:03:35,797 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2872622.6666666665, ans=0.125 2023-10-09 21:03:47,209 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2872669.3333333335, ans=0.125 2023-10-09 21:03:54,406 INFO [train.py:1031] (0/4) Epoch 14, batch 30850, loss[loss=0.215, simple_loss=0.265, pruned_loss=0.05982, ctc_loss=0.1133, over 16729.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2869, pruned_loss=0.06229, ctc_loss=0.1088, over 3286464.49 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:54,731 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2872716.0, ans=0.125 2023-10-09 21:04:04,195 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2872716.0, ans=0.0 2023-10-09 21:04:04,212 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2872716.0, ans=0.1 2023-10-09 21:04:12,799 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2872762.6666666665, ans=0.1 2023-10-09 21:04:30,627 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.40 vs. limit=10.0 2023-10-09 21:04:49,022 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2872902.6666666665, ans=0.95 2023-10-09 21:04:56,202 INFO [train.py:1031] (0/4) Epoch 14, batch 30900, loss[loss=0.1986, simple_loss=0.2477, pruned_loss=0.05688, ctc_loss=0.08928, over 16559.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2792, pruned_loss=0.06086, ctc_loss=0.1064, over 3294544.18 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:05:02,440 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2872949.3333333335, ans=0.0 2023-10-09 21:05:20,047 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.130e+02 3.635e+02 4.208e+02 6.076e+02, threshold=7.270e+02, percent-clipped=0.0 2023-10-09 21:05:21,706 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2873042.6666666665, ans=22.5 2023-10-09 21:05:39,003 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:05:50,695 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873136.0, ans=0.1 2023-10-09 21:05:54,378 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2873136.0, ans=0.0 2023-10-09 21:05:54,667 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2023-10-09 21:05:56,102 INFO [train.py:1031] (0/4) Epoch 14, batch 30950, loss[loss=0.2371, simple_loss=0.2832, pruned_loss=0.07114, ctc_loss=0.1218, over 16860.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2761, pruned_loss=0.06054, ctc_loss=0.1057, over 3297614.64 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:06:37,984 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2873322.6666666665, ans=0.125 2023-10-09 21:06:56,311 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2873369.3333333335, ans=0.1 2023-10-09 21:06:58,772 INFO [train.py:1031] (0/4) Epoch 14, batch 31000, loss[loss=0.201, simple_loss=0.2678, pruned_loss=0.04926, ctc_loss=0.08915, over 16790.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2773, pruned_loss=0.06171, ctc_loss=0.1078, over 3303488.18 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:07:09,405 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2873416.0, ans=0.0 2023-10-09 21:07:13,563 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2873462.6666666665, ans=0.0 2023-10-09 21:07:23,144 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2873509.3333333335, ans=0.2 2023-10-09 21:07:25,527 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+02 3.246e+02 3.902e+02 4.981e+02 7.271e+02, threshold=7.805e+02, percent-clipped=1.0 2023-10-09 21:07:35,799 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=22.5 2023-10-09 21:07:51,299 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2873602.6666666665, ans=0.2 2023-10-09 21:07:58,590 INFO [train.py:1031] (0/4) Epoch 14, batch 31050, loss[loss=0.2218, simple_loss=0.2871, pruned_loss=0.05678, ctc_loss=0.1074, over 16617.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2754, pruned_loss=0.05903, ctc_loss=0.1035, over 3308256.73 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:08:11,206 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-10-09 21:08:16,685 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-10-09 21:08:24,181 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2873742.6666666665, ans=0.125 2023-10-09 21:08:31,841 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2873742.6666666665, ans=0.0 2023-10-09 21:08:36,278 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-10-09 21:08:50,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2873836.0, ans=0.125 2023-10-09 21:08:50,242 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2873836.0, ans=0.125 2023-10-09 21:08:51,920 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:08:58,998 INFO [train.py:1031] (0/4) Epoch 14, batch 31100, loss[loss=0.2413, simple_loss=0.2924, pruned_loss=0.07189, ctc_loss=0.1158, over 16979.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2727, pruned_loss=0.05817, ctc_loss=0.1018, over 3314722.04 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:09:01,653 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2873882.6666666665, ans=0.125 2023-10-09 21:09:07,924 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2873882.6666666665, ans=0.125 2023-10-09 21:09:12,705 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2873929.3333333335, ans=0.05 2023-10-09 21:09:14,800 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2873929.3333333335, ans=0.0 2023-10-09 21:09:19,556 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2873929.3333333335, ans=0.0 2023-10-09 21:09:26,091 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.900e+02 3.232e+02 3.684e+02 6.119e+02, threshold=6.464e+02, percent-clipped=0.0 2023-10-09 21:09:44,979 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2874069.3333333335, ans=0.0 2023-10-09 21:09:54,432 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=15.0 2023-10-09 21:09:57,961 INFO [train.py:1031] (0/4) Epoch 14, batch 31150, loss[loss=0.2439, simple_loss=0.2971, pruned_loss=0.07054, ctc_loss=0.1241, over 16848.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2736, pruned_loss=0.06015, ctc_loss=0.105, over 3309056.23 frames. ], batch size: 328, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:10:09,519 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2874162.6666666665, ans=0.0 2023-10-09 21:10:22,897 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-10-09 21:10:34,240 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2874256.0, ans=0.125 2023-10-09 21:10:45,611 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2874302.6666666665, ans=0.0 2023-10-09 21:10:55,269 INFO [scaling.py:979] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.86 vs. limit=5.0 2023-10-09 21:10:57,603 INFO [train.py:1031] (0/4) Epoch 14, batch 31200, loss[loss=0.2252, simple_loss=0.2697, pruned_loss=0.06787, ctc_loss=0.1123, over 16701.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2716, pruned_loss=0.06093, ctc_loss=0.1064, over 3288404.28 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:11:05,365 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2874349.3333333335, ans=0.2 2023-10-09 21:11:13,757 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2874396.0, ans=0.0 2023-10-09 21:11:27,522 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.224e+02 3.763e+02 4.515e+02 7.909e+02, threshold=7.526e+02, percent-clipped=5.0 2023-10-09 21:11:37,607 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.31 vs. limit=10.0 2023-10-09 21:11:49,603 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2874536.0, ans=0.125 2023-10-09 21:11:52,414 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2874536.0, ans=0.0 2023-10-09 21:11:57,375 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2874582.6666666665, ans=0.125 2023-10-09 21:11:58,148 INFO [train.py:1031] (0/4) Epoch 14, batch 31250, loss[loss=0.1984, simple_loss=0.2608, pruned_loss=0.05013, ctc_loss=0.08946, over 16817.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2686, pruned_loss=0.06009, ctc_loss=0.1047, over 3291927.55 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:12:02,628 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2874582.6666666665, ans=0.2 2023-10-09 21:12:07,069 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874582.6666666665, ans=0.1 2023-10-09 21:12:15,208 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2874629.3333333335, ans=0.125 2023-10-09 21:12:18,326 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-616000.pt 2023-10-09 21:12:22,798 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2023-10-09 21:12:25,585 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2874676.0, ans=0.125 2023-10-09 21:12:27,411 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:12:40,540 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2874722.6666666665, ans=0.125 2023-10-09 21:12:49,259 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874769.3333333335, ans=0.1 2023-10-09 21:12:52,513 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2874769.3333333335, ans=0.125 2023-10-09 21:12:54,556 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2874769.3333333335, ans=0.0 2023-10-09 21:13:01,890 INFO [train.py:1031] (0/4) Epoch 14, batch 31300, loss[loss=0.2384, simple_loss=0.2635, pruned_loss=0.07953, ctc_loss=0.1356, over 16625.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.266, pruned_loss=0.06031, ctc_loss=0.105, over 3294565.32 frames. ], batch size: 386, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:13:07,960 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=12.0 2023-10-09 21:13:08,093 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-10-09 21:13:20,125 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874862.6666666665, ans=0.1 2023-10-09 21:13:32,264 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2874909.3333333335, ans=0.125 2023-10-09 21:13:32,852 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.069e+02 3.507e+02 3.932e+02 8.166e+02, threshold=7.015e+02, percent-clipped=1.0 2023-10-09 21:13:38,704 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2874956.0, ans=0.2 2023-10-09 21:13:42,505 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2874956.0, ans=0.0 2023-10-09 21:13:48,548 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2874956.0, ans=0.125 2023-10-09 21:13:53,685 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2875002.6666666665, ans=0.2 2023-10-09 21:14:03,886 INFO [train.py:1031] (0/4) Epoch 14, batch 31350, loss[loss=0.2414, simple_loss=0.2752, pruned_loss=0.07686, ctc_loss=0.1348, over 16791.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2634, pruned_loss=0.06081, ctc_loss=0.1061, over 3295824.44 frames. ], batch size: 329, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:14:08,452 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2875049.3333333335, ans=0.0 2023-10-09 21:14:22,555 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2875096.0, ans=0.025 2023-10-09 21:14:46,193 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2023-10-09 21:14:50,211 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2875236.0, ans=0.125 2023-10-09 21:14:58,176 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2875236.0, ans=0.2 2023-10-09 21:15:02,121 INFO [train.py:1031] (0/4) Epoch 14, batch 31400, loss[loss=0.179, simple_loss=0.2465, pruned_loss=0.04071, ctc_loss=0.07516, over 16836.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2615, pruned_loss=0.06092, ctc_loss=0.1063, over 3293561.16 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:15:34,911 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.143e+02 3.682e+02 4.471e+02 1.037e+03, threshold=7.364e+02, percent-clipped=4.0 2023-10-09 21:15:39,922 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2875422.6666666665, ans=0.125 2023-10-09 21:15:48,536 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2875422.6666666665, ans=0.0 2023-10-09 21:16:03,132 INFO [train.py:1031] (0/4) Epoch 14, batch 31450, loss[loss=0.1947, simple_loss=0.2435, pruned_loss=0.0547, ctc_loss=0.09155, over 16676.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2596, pruned_loss=0.05929, ctc_loss=0.1038, over 3298103.20 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:16:04,601 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2875516.0, ans=0.0 2023-10-09 21:16:15,940 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2875562.6666666665, ans=0.125 2023-10-09 21:16:31,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2875609.3333333335, ans=0.0 2023-10-09 21:16:32,693 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-10-09 21:16:36,501 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.75 vs. limit=10.0 2023-10-09 21:16:48,950 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2875656.0, ans=0.125 2023-10-09 21:17:06,119 INFO [train.py:1031] (0/4) Epoch 14, batch 31500, loss[loss=0.183, simple_loss=0.2353, pruned_loss=0.04888, ctc_loss=0.08204, over 16699.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2603, pruned_loss=0.06005, ctc_loss=0.1048, over 3297570.49 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:17:09,192 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2875749.3333333335, ans=0.0 2023-10-09 21:17:23,953 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2875796.0, ans=0.125 2023-10-09 21:17:40,594 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.158e+02 3.690e+02 4.602e+02 7.979e+02, threshold=7.380e+02, percent-clipped=2.0 2023-10-09 21:17:59,257 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2875936.0, ans=0.0 2023-10-09 21:18:09,325 INFO [train.py:1031] (0/4) Epoch 14, batch 31550, loss[loss=0.2096, simple_loss=0.2761, pruned_loss=0.05132, ctc_loss=0.1014, over 16842.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.268, pruned_loss=0.0621, ctc_loss=0.108, over 3300967.23 frames. ], batch size: 189, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:18:25,513 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=22.5 2023-10-09 21:18:28,944 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2876029.3333333335, ans=0.0 2023-10-09 21:18:39,237 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-10-09 21:18:42,263 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=12.0 2023-10-09 21:18:43,054 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2876076.0, ans=0.125 2023-10-09 21:18:49,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2876122.6666666665, ans=0.0 2023-10-09 21:18:54,087 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2876122.6666666665, ans=0.125 2023-10-09 21:18:58,143 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2876169.3333333335, ans=0.125 2023-10-09 21:19:01,815 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2876169.3333333335, ans=0.125 2023-10-09 21:19:01,830 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2876169.3333333335, ans=0.125 2023-10-09 21:19:02,929 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2876169.3333333335, ans=0.0 2023-10-09 21:19:09,419 INFO [train.py:1031] (0/4) Epoch 14, batch 31600, loss[loss=0.2031, simple_loss=0.2504, pruned_loss=0.05757, ctc_loss=0.1017, over 16527.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2734, pruned_loss=0.06401, ctc_loss=0.1111, over 3302253.47 frames. ], batch size: 466, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:19:24,515 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2023-10-09 21:19:40,628 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2876309.3333333335, ans=0.0 2023-10-09 21:19:45,052 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+02 3.252e+02 3.692e+02 4.282e+02 8.692e+02, threshold=7.384e+02, percent-clipped=4.0 2023-10-09 21:19:55,694 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2876356.0, ans=0.0 2023-10-09 21:20:11,052 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2876402.6666666665, ans=0.125 2023-10-09 21:20:13,364 INFO [train.py:1031] (0/4) Epoch 14, batch 31650, loss[loss=0.264, simple_loss=0.3191, pruned_loss=0.07662, ctc_loss=0.1394, over 16745.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2747, pruned_loss=0.06358, ctc_loss=0.1105, over 3300852.60 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:20:27,103 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-10-09 21:21:15,701 INFO [train.py:1031] (0/4) Epoch 14, batch 31700, loss[loss=0.2403, simple_loss=0.2912, pruned_loss=0.06909, ctc_loss=0.1283, over 15199.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2754, pruned_loss=0.06181, ctc_loss=0.1079, over 3295932.70 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:21:33,725 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2876729.3333333335, ans=0.125 2023-10-09 21:21:36,926 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2876729.3333333335, ans=0.0 2023-10-09 21:21:52,723 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+02 3.115e+02 3.904e+02 4.739e+02 1.536e+03, threshold=7.807e+02, percent-clipped=3.0 2023-10-09 21:22:18,294 INFO [train.py:1031] (0/4) Epoch 14, batch 31750, loss[loss=0.2356, simple_loss=0.2855, pruned_loss=0.07009, ctc_loss=0.1136, over 16770.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2813, pruned_loss=0.06434, ctc_loss=0.1125, over 3293190.72 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:22:26,898 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2876916.0, ans=0.0 2023-10-09 21:22:40,007 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=12.0 2023-10-09 21:23:15,929 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2877102.6666666665, ans=0.025 2023-10-09 21:23:16,352 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=2877102.6666666665, ans=22.5 2023-10-09 21:23:20,607 INFO [train.py:1031] (0/4) Epoch 14, batch 31800, loss[loss=0.2055, simple_loss=0.2632, pruned_loss=0.05425, ctc_loss=0.09814, over 16973.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2822, pruned_loss=0.06469, ctc_loss=0.1134, over 3299716.53 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:23:26,744 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:23:55,582 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2877242.6666666665, ans=0.125 2023-10-09 21:23:57,867 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+02 3.284e+02 3.680e+02 4.274e+02 9.032e+02, threshold=7.360e+02, percent-clipped=1.0 2023-10-09 21:23:58,211 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2877289.3333333335, ans=0.0 2023-10-09 21:24:12,642 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2877336.0, ans=0.0 2023-10-09 21:24:13,763 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2877336.0, ans=0.125 2023-10-09 21:24:22,031 INFO [train.py:1031] (0/4) Epoch 14, batch 31850, loss[loss=0.2209, simple_loss=0.2605, pruned_loss=0.06731, ctc_loss=0.1169, over 16786.00 frames. ], tot_loss[loss=0.228, simple_loss=0.28, pruned_loss=0.06518, ctc_loss=0.1141, over 3305362.79 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:24:35,783 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2877429.3333333335, ans=0.125 2023-10-09 21:24:41,186 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2877429.3333333335, ans=10.0 2023-10-09 21:25:23,223 INFO [train.py:1031] (0/4) Epoch 14, batch 31900, loss[loss=0.1765, simple_loss=0.2435, pruned_loss=0.03954, ctc_loss=0.07619, over 16805.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2747, pruned_loss=0.06418, ctc_loss=0.1122, over 3291881.38 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:25:28,490 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2877616.0, ans=0.035 2023-10-09 21:25:37,878 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2877662.6666666665, ans=0.125 2023-10-09 21:25:54,018 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2877709.3333333335, ans=0.0 2023-10-09 21:25:57,100 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2023-10-09 21:25:57,708 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2877709.3333333335, ans=0.125 2023-10-09 21:26:03,167 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+02 3.194e+02 3.623e+02 4.182e+02 7.324e+02, threshold=7.246e+02, percent-clipped=0.0 2023-10-09 21:26:13,871 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2877802.6666666665, ans=0.125 2023-10-09 21:26:25,780 INFO [train.py:1031] (0/4) Epoch 14, batch 31950, loss[loss=0.1918, simple_loss=0.2383, pruned_loss=0.05481, ctc_loss=0.08911, over 16666.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2672, pruned_loss=0.0605, ctc_loss=0.1063, over 3293942.94 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:26:34,259 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2877849.3333333335, ans=0.05 2023-10-09 21:26:48,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2877896.0, ans=10.0 2023-10-09 21:26:53,029 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2877942.6666666665, ans=0.125 2023-10-09 21:26:54,826 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2877942.6666666665, ans=0.125 2023-10-09 21:27:10,856 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2023-10-09 21:27:26,899 INFO [train.py:1031] (0/4) Epoch 14, batch 32000, loss[loss=0.2215, simple_loss=0.2736, pruned_loss=0.06387, ctc_loss=0.104, over 16952.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2627, pruned_loss=0.05972, ctc_loss=0.1046, over 3293483.50 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:27:43,012 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=12.0 2023-10-09 21:27:44,372 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2878129.3333333335, ans=0.125 2023-10-09 21:27:45,409 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2878129.3333333335, ans=0.2 2023-10-09 21:28:06,913 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+02 3.047e+02 3.544e+02 4.263e+02 6.076e+02, threshold=7.087e+02, percent-clipped=0.0 2023-10-09 21:28:18,281 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=12.0 2023-10-09 21:28:22,816 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2878269.3333333335, ans=0.0 2023-10-09 21:28:30,825 INFO [train.py:1031] (0/4) Epoch 14, batch 32050, loss[loss=0.203, simple_loss=0.2836, pruned_loss=0.04547, ctc_loss=0.07881, over 16812.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.267, pruned_loss=0.05786, ctc_loss=0.102, over 3292578.18 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:28:55,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2878409.3333333335, ans=0.2 2023-10-09 21:29:26,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2878502.6666666665, ans=0.1 2023-10-09 21:29:33,753 INFO [train.py:1031] (0/4) Epoch 14, batch 32100, loss[loss=0.2234, simple_loss=0.2776, pruned_loss=0.06314, ctc_loss=0.1073, over 9680.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2735, pruned_loss=0.0563, ctc_loss=0.09966, over 3282914.30 frames. ], batch size: 36, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:29:55,429 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=15.0 2023-10-09 21:30:13,059 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.944e+02 3.415e+02 4.141e+02 9.202e+02, threshold=6.830e+02, percent-clipped=4.0 2023-10-09 21:30:16,630 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2878689.3333333335, ans=0.0 2023-10-09 21:30:32,512 INFO [train.py:1031] (0/4) Epoch 14, batch 32150, loss[loss=0.1911, simple_loss=0.2511, pruned_loss=0.04975, ctc_loss=0.07914, over 16791.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2743, pruned_loss=0.05562, ctc_loss=0.098, over 3289181.91 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:30:32,780 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2878782.6666666665, ans=0.07 2023-10-09 21:30:34,712 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2878782.6666666665, ans=0.0 2023-10-09 21:30:37,688 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=22.5 2023-10-09 21:31:06,370 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2878876.0, ans=0.2 2023-10-09 21:31:13,939 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2878922.6666666665, ans=0.125 2023-10-09 21:31:15,986 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2878922.6666666665, ans=0.0 2023-10-09 21:31:22,129 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-10-09 21:31:33,112 INFO [train.py:1031] (0/4) Epoch 14, batch 32200, loss[loss=0.1972, simple_loss=0.2469, pruned_loss=0.05417, ctc_loss=0.09759, over 16798.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2698, pruned_loss=0.0563, ctc_loss=0.09887, over 3283128.71 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:31:47,170 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-10-09 21:31:56,773 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2879109.3333333335, ans=0.125 2023-10-09 21:32:14,183 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+02 3.053e+02 3.349e+02 3.952e+02 6.213e+02, threshold=6.698e+02, percent-clipped=0.0 2023-10-09 21:32:22,938 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=22.5 2023-10-09 21:32:23,613 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2879202.6666666665, ans=0.0 2023-10-09 21:32:27,890 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-10-09 21:32:32,673 INFO [train.py:1031] (0/4) Epoch 14, batch 32250, loss[loss=0.2101, simple_loss=0.2591, pruned_loss=0.05985, ctc_loss=0.1034, over 16855.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2662, pruned_loss=0.05761, ctc_loss=0.1009, over 3292916.78 frames. ], batch size: 259, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:33:28,777 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2879436.0, ans=0.2 2023-10-09 21:33:33,810 INFO [train.py:1031] (0/4) Epoch 14, batch 32300, loss[loss=0.2186, simple_loss=0.2707, pruned_loss=0.06162, ctc_loss=0.1082, over 16813.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2651, pruned_loss=0.05865, ctc_loss=0.1021, over 3277938.50 frames. ], batch size: 310, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:33:35,935 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2879482.6666666665, ans=0.1 2023-10-09 21:34:16,064 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:34:19,600 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+02 3.404e+02 3.971e+02 4.753e+02 7.959e+02, threshold=7.942e+02, percent-clipped=3.0 2023-10-09 21:34:39,151 INFO [train.py:1031] (0/4) Epoch 14, batch 32350, loss[loss=0.2904, simple_loss=0.3639, pruned_loss=0.07778, ctc_loss=0.1532, over 16584.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2723, pruned_loss=0.05937, ctc_loss=0.1045, over 3271054.08 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:34:40,518 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2879716.0, ans=0.2 2023-10-09 21:34:50,224 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2879762.6666666665, ans=0.0 2023-10-09 21:34:52,995 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2879762.6666666665, ans=0.0 2023-10-09 21:35:14,220 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2879856.0, ans=0.0 2023-10-09 21:35:30,132 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2879902.6666666665, ans=0.125 2023-10-09 21:35:40,797 INFO [train.py:1031] (0/4) Epoch 14, batch 32400, loss[loss=0.2138, simple_loss=0.2647, pruned_loss=0.0607, ctc_loss=0.1036, over 16875.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2764, pruned_loss=0.05953, ctc_loss=0.1056, over 3289829.07 frames. ], batch size: 189, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:35:50,994 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=22.5 2023-10-09 21:36:03,867 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2879996.0, ans=0.1 2023-10-09 21:36:06,043 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2880042.6666666665, ans=0.125 2023-10-09 21:36:14,008 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2023-10-09 21:36:24,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2880089.3333333335, ans=0.2 2023-10-09 21:36:26,219 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.219e+02 3.562e+02 4.144e+02 6.944e+02, threshold=7.124e+02, percent-clipped=0.0 2023-10-09 21:36:37,894 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2880136.0, ans=0.125 2023-10-09 21:36:43,387 INFO [train.py:1031] (0/4) Epoch 14, batch 32450, loss[loss=0.1893, simple_loss=0.258, pruned_loss=0.04502, ctc_loss=0.07646, over 16898.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2741, pruned_loss=0.06051, ctc_loss=0.107, over 3292813.92 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:37:10,977 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2880276.0, ans=0.1 2023-10-09 21:37:11,905 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2880276.0, ans=0.125 2023-10-09 21:37:13,695 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2880276.0, ans=0.125 2023-10-09 21:37:14,596 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2880276.0, ans=0.2 2023-10-09 21:37:25,419 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2880322.6666666665, ans=0.125 2023-10-09 21:37:31,277 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2880369.3333333335, ans=0.125 2023-10-09 21:37:41,810 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2880369.3333333335, ans=0.125 2023-10-09 21:37:44,360 INFO [train.py:1031] (0/4) Epoch 14, batch 32500, loss[loss=0.1969, simple_loss=0.2454, pruned_loss=0.05461, ctc_loss=0.09788, over 16008.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2686, pruned_loss=0.05995, ctc_loss=0.106, over 3296545.80 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:37:47,365 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2880416.0, ans=0.125 2023-10-09 21:38:04,850 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2880462.6666666665, ans=0.0 2023-10-09 21:38:31,979 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.513e+02 2.966e+02 3.455e+02 3.936e+02 8.435e+02, threshold=6.910e+02, percent-clipped=1.0 2023-10-09 21:38:38,212 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2880602.6666666665, ans=0.125 2023-10-09 21:38:46,476 INFO [train.py:1031] (0/4) Epoch 14, batch 32550, loss[loss=0.1535, simple_loss=0.2275, pruned_loss=0.02942, ctc_loss=0.05136, over 16765.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2612, pruned_loss=0.0554, ctc_loss=0.09805, over 3299306.42 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:39:04,548 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2880696.0, ans=0.125 2023-10-09 21:39:26,268 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2880789.3333333335, ans=0.09899494936611666 2023-10-09 21:39:42,713 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:39:47,249 INFO [train.py:1031] (0/4) Epoch 14, batch 32600, loss[loss=0.2034, simple_loss=0.2612, pruned_loss=0.05479, ctc_loss=0.08977, over 16960.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2585, pruned_loss=0.05452, ctc_loss=0.09612, over 3307687.05 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:39:48,014 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=22.5 2023-10-09 21:39:51,632 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-10-09 21:39:57,724 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2880929.3333333335, ans=0.125 2023-10-09 21:40:15,450 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2880976.0, ans=0.2 2023-10-09 21:40:27,890 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2881022.6666666665, ans=0.125 2023-10-09 21:40:32,212 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2881022.6666666665, ans=0.1 2023-10-09 21:40:33,686 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=22.5 2023-10-09 21:40:34,041 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.902e+02 3.409e+02 5.024e+02 1.088e+03, threshold=6.817e+02, percent-clipped=5.0 2023-10-09 21:40:34,348 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2881022.6666666665, ans=0.125 2023-10-09 21:40:47,954 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2881116.0, ans=0.2 2023-10-09 21:40:48,729 INFO [train.py:1031] (0/4) Epoch 14, batch 32650, loss[loss=0.2313, simple_loss=0.2801, pruned_loss=0.06908, ctc_loss=0.1107, over 16733.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2636, pruned_loss=0.05624, ctc_loss=0.09807, over 3308412.83 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:40:55,746 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2881116.0, ans=0.2 2023-10-09 21:40:56,837 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2881116.0, ans=0.0 2023-10-09 21:41:29,934 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2881256.0, ans=0.125 2023-10-09 21:41:31,101 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2881256.0, ans=0.0 2023-10-09 21:41:51,978 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2881349.3333333335, ans=0.2 2023-10-09 21:41:52,681 INFO [train.py:1031] (0/4) Epoch 14, batch 32700, loss[loss=0.337, simple_loss=0.3602, pruned_loss=0.1152, ctc_loss=0.2081, over 16603.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2746, pruned_loss=0.05973, ctc_loss=0.1039, over 3297887.43 frames. ], batch size: 350, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:42:41,824 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+02 3.539e+02 4.014e+02 5.290e+02 1.076e+03, threshold=8.028e+02, percent-clipped=8.0 2023-10-09 21:42:55,733 INFO [train.py:1031] (0/4) Epoch 14, batch 32750, loss[loss=0.2689, simple_loss=0.312, pruned_loss=0.08255, ctc_loss=0.1517, over 16642.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.281, pruned_loss=0.06312, ctc_loss=0.1099, over 3304702.76 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:43:03,820 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.58 vs. limit=6.0 2023-10-09 21:43:06,047 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2881582.6666666665, ans=0.0 2023-10-09 21:43:10,045 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2881629.3333333335, ans=0.04949747468305833 2023-10-09 21:43:40,312 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2881722.6666666665, ans=0.0 2023-10-09 21:43:57,093 INFO [train.py:1031] (0/4) Epoch 14, batch 32800, loss[loss=0.267, simple_loss=0.3046, pruned_loss=0.0845, ctc_loss=0.1512, over 16807.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2809, pruned_loss=0.06422, ctc_loss=0.1118, over 3303939.87 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:44:11,383 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2881862.6666666665, ans=0.0 2023-10-09 21:44:17,821 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2881862.6666666665, ans=0.125 2023-10-09 21:44:20,482 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2881909.3333333335, ans=0.2 2023-10-09 21:44:38,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2881956.0, ans=0.0 2023-10-09 21:44:39,139 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-10-09 21:44:39,761 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2881956.0, ans=0.0 2023-10-09 21:44:42,856 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2881956.0, ans=0.125 2023-10-09 21:44:42,889 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2881956.0, ans=0.125 2023-10-09 21:44:46,104 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.217e+02 3.697e+02 4.305e+02 8.023e+02, threshold=7.395e+02, percent-clipped=0.0 2023-10-09 21:44:57,218 INFO [train.py:1031] (0/4) Epoch 14, batch 32850, loss[loss=0.2156, simple_loss=0.2704, pruned_loss=0.06079, ctc_loss=0.09798, over 16785.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2798, pruned_loss=0.06376, ctc_loss=0.111, over 3310136.86 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:45:07,109 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2882049.3333333335, ans=0.125 2023-10-09 21:45:44,449 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:45:51,138 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2882236.0, ans=0.2 2023-10-09 21:45:53,244 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2882236.0, ans=0.025 2023-10-09 21:45:59,362 INFO [train.py:1031] (0/4) Epoch 14, batch 32900, loss[loss=0.2224, simple_loss=0.3007, pruned_loss=0.05235, ctc_loss=0.09857, over 16879.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2812, pruned_loss=0.06406, ctc_loss=0.1116, over 3314558.08 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:46:21,663 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=22.5 2023-10-09 21:46:25,126 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2882376.0, ans=0.0 2023-10-09 21:46:27,101 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.57 vs. limit=6.0 2023-10-09 21:46:37,287 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:46:40,684 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882422.6666666665, ans=0.1 2023-10-09 21:46:51,628 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+02 3.233e+02 3.650e+02 4.547e+02 8.623e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 21:47:02,670 INFO [train.py:1031] (0/4) Epoch 14, batch 32950, loss[loss=0.2497, simple_loss=0.3027, pruned_loss=0.07384, ctc_loss=0.1228, over 16583.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2865, pruned_loss=0.06504, ctc_loss=0.1134, over 3315567.64 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:47:04,565 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2882516.0, ans=0.125 2023-10-09 21:47:13,705 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.68 vs. limit=10.0 2023-10-09 21:47:23,230 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-10-09 21:47:23,832 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2882562.6666666665, ans=0.0 2023-10-09 21:47:55,036 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:48:05,298 INFO [train.py:1031] (0/4) Epoch 14, batch 33000, loss[loss=0.274, simple_loss=0.2997, pruned_loss=0.09261, ctc_loss=0.1577, over 16848.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2891, pruned_loss=0.06719, ctc_loss=0.1173, over 3308828.57 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:48:05,299 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 21:48:23,064 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2327, simple_loss=0.3031, pruned_loss=0.06268, ctc_loss=0.09218, over 1796401.00 frames. 2023-10-09 21:48:23,065 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 21:48:37,890 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2882796.0, ans=0.125 2023-10-09 21:48:47,352 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2882842.6666666665, ans=0.125 2023-10-09 21:48:52,097 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2882842.6666666665, ans=0.0 2023-10-09 21:48:54,880 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2882842.6666666665, ans=0.125 2023-10-09 21:49:02,242 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2882889.3333333335, ans=0.125 2023-10-09 21:49:11,005 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2882936.0, ans=0.125 2023-10-09 21:49:13,414 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.435e+02 3.950e+02 5.096e+02 8.924e+02, threshold=7.899e+02, percent-clipped=1.0 2023-10-09 21:49:14,800 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2882936.0, ans=0.0 2023-10-09 21:49:24,082 INFO [train.py:1031] (0/4) Epoch 14, batch 33050, loss[loss=0.2662, simple_loss=0.2906, pruned_loss=0.08869, ctc_loss=0.161, over 16660.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2867, pruned_loss=0.0672, ctc_loss=0.1172, over 3293604.72 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:49:46,170 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2883029.3333333335, ans=0.125 2023-10-09 21:49:51,327 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2883076.0, ans=0.125 2023-10-09 21:49:54,697 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2883076.0, ans=0.0 2023-10-09 21:50:08,074 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2883122.6666666665, ans=0.2 2023-10-09 21:50:08,274 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2023-10-09 21:50:18,414 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2883169.3333333335, ans=0.125 2023-10-09 21:50:19,512 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2883169.3333333335, ans=0.07 2023-10-09 21:50:25,683 INFO [train.py:1031] (0/4) Epoch 14, batch 33100, loss[loss=0.2413, simple_loss=0.2735, pruned_loss=0.07729, ctc_loss=0.1364, over 16589.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2842, pruned_loss=0.0673, ctc_loss=0.1172, over 3307373.96 frames. ], batch size: 415, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:50:45,273 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-10-09 21:51:03,642 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.60 vs. limit=6.0 2023-10-09 21:51:05,349 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2883356.0, ans=0.95 2023-10-09 21:51:18,564 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.087e+02 3.637e+02 4.211e+02 8.906e+02, threshold=7.275e+02, percent-clipped=1.0 2023-10-09 21:51:22,664 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-10-09 21:51:24,315 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=22.5 2023-10-09 21:51:28,124 INFO [train.py:1031] (0/4) Epoch 14, batch 33150, loss[loss=0.1916, simple_loss=0.2616, pruned_loss=0.04452, ctc_loss=0.08108, over 16857.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2805, pruned_loss=0.06432, ctc_loss=0.1123, over 3312984.84 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:51:30,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2883449.3333333335, ans=0.0 2023-10-09 21:51:32,463 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2883449.3333333335, ans=0.0 2023-10-09 21:51:42,756 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2883496.0, ans=0.125 2023-10-09 21:51:43,854 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2883496.0, ans=0.0 2023-10-09 21:51:44,917 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2883496.0, ans=10.0 2023-10-09 21:51:52,057 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2883496.0, ans=0.0 2023-10-09 21:51:54,759 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2883542.6666666665, ans=0.1 2023-10-09 21:51:58,011 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2883542.6666666665, ans=0.125 2023-10-09 21:52:01,847 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2883542.6666666665, ans=0.125 2023-10-09 21:52:07,351 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2883589.3333333335, ans=0.125 2023-10-09 21:52:12,301 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2883589.3333333335, ans=0.1 2023-10-09 21:52:31,979 INFO [train.py:1031] (0/4) Epoch 14, batch 33200, loss[loss=0.1847, simple_loss=0.2423, pruned_loss=0.04667, ctc_loss=0.0846, over 16714.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2798, pruned_loss=0.06226, ctc_loss=0.1097, over 3307365.73 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:52:37,366 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2023-10-09 21:52:42,070 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2883682.6666666665, ans=0.2 2023-10-09 21:52:46,798 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2023-10-09 21:52:58,857 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2883776.0, ans=0.0 2023-10-09 21:52:59,846 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2883776.0, ans=0.125 2023-10-09 21:53:25,120 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+02 3.107e+02 3.465e+02 4.067e+02 6.400e+02, threshold=6.930e+02, percent-clipped=0.0 2023-10-09 21:53:32,624 INFO [train.py:1031] (0/4) Epoch 14, batch 33250, loss[loss=0.1908, simple_loss=0.2479, pruned_loss=0.04891, ctc_loss=0.08962, over 16819.00 frames. ], tot_loss[loss=0.221, simple_loss=0.275, pruned_loss=0.06173, ctc_loss=0.1087, over 3302359.20 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:54:18,192 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2884056.0, ans=0.125 2023-10-09 21:54:21,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2884102.6666666665, ans=0.2 2023-10-09 21:54:25,326 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2884102.6666666665, ans=0.125 2023-10-09 21:54:31,558 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2884102.6666666665, ans=0.125 2023-10-09 21:54:35,048 INFO [train.py:1031] (0/4) Epoch 14, batch 33300, loss[loss=0.2278, simple_loss=0.2603, pruned_loss=0.07319, ctc_loss=0.122, over 16715.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2698, pruned_loss=0.06184, ctc_loss=0.1085, over 3304235.24 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:54:41,276 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2884149.3333333335, ans=0.125 2023-10-09 21:55:16,188 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2884289.3333333335, ans=0.2 2023-10-09 21:55:20,629 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2884289.3333333335, ans=0.0 2023-10-09 21:55:27,239 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2884336.0, ans=0.125 2023-10-09 21:55:31,594 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=12.0 2023-10-09 21:55:32,062 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.141e+02 3.663e+02 4.502e+02 8.687e+02, threshold=7.326e+02, percent-clipped=2.0 2023-10-09 21:55:38,479 INFO [train.py:1031] (0/4) Epoch 14, batch 33350, loss[loss=0.1901, simple_loss=0.2502, pruned_loss=0.04893, ctc_loss=0.08036, over 16619.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2736, pruned_loss=0.06175, ctc_loss=0.1089, over 3309255.16 frames. ], batch size: 111, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:55:53,204 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-10-09 21:55:59,807 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=2884429.3333333335, ans=0.1 2023-10-09 21:56:18,495 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2884522.6666666665, ans=0.125 2023-10-09 21:56:23,843 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2884522.6666666665, ans=0.2 2023-10-09 21:56:37,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2884569.3333333335, ans=0.125 2023-10-09 21:56:39,477 INFO [train.py:1031] (0/4) Epoch 14, batch 33400, loss[loss=0.2378, simple_loss=0.306, pruned_loss=0.06394, ctc_loss=0.1043, over 17061.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2788, pruned_loss=0.06235, ctc_loss=0.1102, over 3305938.79 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:56:55,390 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884662.6666666665, ans=0.1 2023-10-09 21:56:58,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2884662.6666666665, ans=0.125 2023-10-09 21:56:58,719 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2884662.6666666665, ans=0.125 2023-10-09 21:57:01,796 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2884662.6666666665, ans=0.125 2023-10-09 21:57:06,284 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2884709.3333333335, ans=0.125 2023-10-09 21:57:08,433 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2884709.3333333335, ans=0.125 2023-10-09 21:57:16,748 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2884756.0, ans=0.0 2023-10-09 21:57:28,340 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-10-09 21:57:36,674 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+02 3.309e+02 3.821e+02 4.723e+02 1.099e+03, threshold=7.641e+02, percent-clipped=5.0 2023-10-09 21:57:42,140 INFO [train.py:1031] (0/4) Epoch 14, batch 33450, loss[loss=0.2277, simple_loss=0.2899, pruned_loss=0.06239, ctc_loss=0.1021, over 16788.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.28, pruned_loss=0.06277, ctc_loss=0.1107, over 3298786.63 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:58:47,411 INFO [train.py:1031] (0/4) Epoch 14, batch 33500, loss[loss=0.2056, simple_loss=0.2516, pruned_loss=0.05942, ctc_loss=0.1019, over 16826.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.282, pruned_loss=0.06306, ctc_loss=0.11, over 3300916.04 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:58:59,367 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2885129.3333333335, ans=0.125 2023-10-09 21:59:05,843 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2885129.3333333335, ans=0.125 2023-10-09 21:59:15,964 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=12.0 2023-10-09 21:59:32,946 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2885222.6666666665, ans=0.2 2023-10-09 21:59:44,293 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885269.3333333335, ans=0.1 2023-10-09 21:59:46,064 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.525e+02 4.202e+02 5.122e+02 8.777e+02, threshold=8.403e+02, percent-clipped=5.0 2023-10-09 21:59:48,888 INFO [train.py:1031] (0/4) Epoch 14, batch 33550, loss[loss=0.2209, simple_loss=0.274, pruned_loss=0.06345, ctc_loss=0.1021, over 15976.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2771, pruned_loss=0.06242, ctc_loss=0.1086, over 3307427.85 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:00:11,095 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2885362.6666666665, ans=0.125 2023-10-09 22:00:21,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2885409.3333333335, ans=0.125 2023-10-09 22:00:48,808 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2885549.3333333335, ans=0.1 2023-10-09 22:00:49,677 INFO [train.py:1031] (0/4) Epoch 14, batch 33600, loss[loss=0.2191, simple_loss=0.2598, pruned_loss=0.0669, ctc_loss=0.1115, over 16725.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2712, pruned_loss=0.06207, ctc_loss=0.1079, over 3312320.44 frames. ], batch size: 328, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:00:55,165 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2885549.3333333335, ans=0.125 2023-10-09 22:00:59,844 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2885596.0, ans=0.1 2023-10-09 22:01:04,794 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2885596.0, ans=0.0 2023-10-09 22:01:20,351 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:01:21,325 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2885642.6666666665, ans=0.0 2023-10-09 22:01:27,284 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2885689.3333333335, ans=0.125 2023-10-09 22:01:32,468 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-10-09 22:01:36,034 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2885736.0, ans=0.125 2023-10-09 22:01:48,017 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.170e+02 3.773e+02 4.552e+02 1.576e+03, threshold=7.545e+02, percent-clipped=1.0 2023-10-09 22:01:49,717 INFO [train.py:1031] (0/4) Epoch 14, batch 33650, loss[loss=0.2007, simple_loss=0.2627, pruned_loss=0.05211, ctc_loss=0.0862, over 16972.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2673, pruned_loss=0.06196, ctc_loss=0.1076, over 3318388.22 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:02:22,798 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2885876.0, ans=0.0 2023-10-09 22:02:35,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885922.6666666665, ans=0.1 2023-10-09 22:02:37,186 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2885922.6666666665, ans=0.2 2023-10-09 22:02:47,502 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2885969.3333333335, ans=0.125 2023-10-09 22:02:49,259 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2885969.3333333335, ans=0.0 2023-10-09 22:02:52,472 INFO [train.py:1031] (0/4) Epoch 14, batch 33700, loss[loss=0.2325, simple_loss=0.283, pruned_loss=0.06703, ctc_loss=0.1198, over 16924.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2719, pruned_loss=0.06438, ctc_loss=0.1119, over 3313419.88 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:03:05,828 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2886062.6666666665, ans=0.125 2023-10-09 22:03:09,184 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2886062.6666666665, ans=0.0 2023-10-09 22:03:09,285 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:03:24,084 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2886109.3333333335, ans=0.0 2023-10-09 22:03:25,504 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2886109.3333333335, ans=0.125 2023-10-09 22:03:25,656 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2886109.3333333335, ans=0.125 2023-10-09 22:03:42,611 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2886202.6666666665, ans=0.125 2023-10-09 22:03:52,829 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+02 3.271e+02 3.899e+02 4.405e+02 9.865e+02, threshold=7.797e+02, percent-clipped=1.0 2023-10-09 22:03:52,856 INFO [train.py:1031] (0/4) Epoch 14, batch 33750, loss[loss=0.2791, simple_loss=0.3135, pruned_loss=0.08992, ctc_loss=0.1624, over 16739.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2774, pruned_loss=0.06718, ctc_loss=0.1165, over 3315458.54 frames. ], batch size: 352, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:04:11,273 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2886296.0, ans=0.125 2023-10-09 22:04:17,249 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2886342.6666666665, ans=0.125 2023-10-09 22:04:22,933 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2886342.6666666665, ans=0.2 2023-10-09 22:04:38,378 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2886389.3333333335, ans=0.1 2023-10-09 22:04:47,943 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2886436.0, ans=10.0 2023-10-09 22:04:51,797 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2886436.0, ans=0.125 2023-10-09 22:04:54,306 INFO [train.py:1031] (0/4) Epoch 14, batch 33800, loss[loss=0.229, simple_loss=0.2693, pruned_loss=0.07032, ctc_loss=0.1201, over 16930.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2773, pruned_loss=0.06754, ctc_loss=0.1173, over 3316265.39 frames. ], batch size: 259, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:04:57,394 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2886482.6666666665, ans=0.125 2023-10-09 22:04:58,546 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886482.6666666665, ans=0.1 2023-10-09 22:04:58,985 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2023-10-09 22:05:08,152 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2886529.3333333335, ans=0.0 2023-10-09 22:05:30,918 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2886622.6666666665, ans=0.0 2023-10-09 22:05:43,172 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-10-09 22:05:55,361 INFO [train.py:1031] (0/4) Epoch 14, batch 33850, loss[loss=0.2639, simple_loss=0.2751, pruned_loss=0.09387, ctc_loss=0.1626, over 16602.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2731, pruned_loss=0.0671, ctc_loss=0.1167, over 3309539.50 frames. ], batch size: 386, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:05:56,413 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+02 3.178e+02 3.599e+02 4.092e+02 7.716e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:06:11,964 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2886762.6666666665, ans=0.0 2023-10-09 22:06:24,756 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2886809.3333333335, ans=0.125 2023-10-09 22:06:30,905 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-10-09 22:06:50,838 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2023-10-09 22:06:56,614 INFO [train.py:1031] (0/4) Epoch 14, batch 33900, loss[loss=0.2448, simple_loss=0.3179, pruned_loss=0.06382, ctc_loss=0.1101, over 16753.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2734, pruned_loss=0.06629, ctc_loss=0.1148, over 3311242.01 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:07:07,925 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886949.3333333335, ans=0.1 2023-10-09 22:07:34,342 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2887089.3333333335, ans=0.125 2023-10-09 22:07:40,112 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2887089.3333333335, ans=0.015 2023-10-09 22:07:47,587 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-10-09 22:07:52,937 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2887136.0, ans=0.125 2023-10-09 22:07:57,079 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2887136.0, ans=0.1 2023-10-09 22:07:58,888 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2887182.6666666665, ans=0.0 2023-10-09 22:07:59,536 INFO [train.py:1031] (0/4) Epoch 14, batch 33950, loss[loss=0.2813, simple_loss=0.3611, pruned_loss=0.07342, ctc_loss=0.1367, over 16474.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2814, pruned_loss=0.06442, ctc_loss=0.1121, over 3296462.99 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:08:03,404 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.763e+02 3.464e+02 4.205e+02 4.959e+02 7.578e+02, threshold=8.409e+02, percent-clipped=4.0 2023-10-09 22:08:05,682 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2023-10-09 22:08:08,792 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2887182.6666666665, ans=0.125 2023-10-09 22:08:45,751 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2887322.6666666665, ans=0.125 2023-10-09 22:08:46,849 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2887322.6666666665, ans=0.0 2023-10-09 22:09:02,846 INFO [train.py:1031] (0/4) Epoch 14, batch 34000, loss[loss=0.2178, simple_loss=0.3181, pruned_loss=0.04202, ctc_loss=0.08356, over 16845.00 frames. ], tot_loss[loss=0.235, simple_loss=0.297, pruned_loss=0.06389, ctc_loss=0.1132, over 3295083.56 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:09:20,703 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2887462.6666666665, ans=0.125 2023-10-09 22:09:25,108 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2887462.6666666665, ans=0.125 2023-10-09 22:09:39,729 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=22.5 2023-10-09 22:10:03,845 INFO [train.py:1031] (0/4) Epoch 14, batch 34050, loss[loss=0.1948, simple_loss=0.2462, pruned_loss=0.05431, ctc_loss=0.08696, over 16378.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2957, pruned_loss=0.06105, ctc_loss=0.1091, over 3302387.37 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:10:08,601 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.098e+02 3.845e+02 4.884e+02 8.519e+02, threshold=7.690e+02, percent-clipped=1.0 2023-10-09 22:10:21,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2887696.0, ans=0.125 2023-10-09 22:10:36,959 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2887742.6666666665, ans=0.125 2023-10-09 22:10:40,773 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2887789.3333333335, ans=0.0 2023-10-09 22:10:45,680 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2887789.3333333335, ans=0.125 2023-10-09 22:11:04,698 INFO [train.py:1031] (0/4) Epoch 14, batch 34100, loss[loss=0.2339, simple_loss=0.2879, pruned_loss=0.06781, ctc_loss=0.1106, over 16765.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2923, pruned_loss=0.0619, ctc_loss=0.11, over 3298106.82 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:11:08,781 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2887882.6666666665, ans=0.07 2023-10-09 22:11:19,765 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2023-10-09 22:11:28,212 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2887976.0, ans=0.2 2023-10-09 22:11:30,633 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2023-10-09 22:12:01,407 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2888069.3333333335, ans=0.125 2023-10-09 22:12:05,997 INFO [train.py:1031] (0/4) Epoch 14, batch 34150, loss[loss=0.2123, simple_loss=0.3023, pruned_loss=0.04528, ctc_loss=0.07967, over 13543.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2934, pruned_loss=0.06383, ctc_loss=0.1129, over 3301186.46 frames. ], batch size: 40, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:12:11,412 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+02 3.257e+02 3.702e+02 4.193e+02 7.598e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 22:12:18,916 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2888162.6666666665, ans=0.125 2023-10-09 22:12:22,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2888162.6666666665, ans=0.0 2023-10-09 22:12:27,651 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2888162.6666666665, ans=0.0 2023-10-09 22:12:49,950 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2888256.0, ans=0.125 2023-10-09 22:13:08,600 INFO [train.py:1031] (0/4) Epoch 14, batch 34200, loss[loss=0.1923, simple_loss=0.247, pruned_loss=0.05134, ctc_loss=0.0873, over 16773.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2896, pruned_loss=0.06442, ctc_loss=0.1137, over 3306015.81 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:13:10,470 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2888349.3333333335, ans=0.1 2023-10-09 22:13:13,078 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-10-09 22:13:31,926 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2888442.6666666665, ans=0.05 2023-10-09 22:13:42,639 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-10-09 22:13:46,079 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888489.3333333335, ans=0.125 2023-10-09 22:13:48,305 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2888489.3333333335, ans=0.0 2023-10-09 22:13:53,034 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2888489.3333333335, ans=0.2 2023-10-09 22:14:02,509 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2888536.0, ans=0.2 2023-10-09 22:14:09,170 INFO [train.py:1031] (0/4) Epoch 14, batch 34250, loss[loss=0.199, simple_loss=0.2508, pruned_loss=0.05556, ctc_loss=0.09041, over 16971.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2814, pruned_loss=0.06312, ctc_loss=0.1113, over 3301369.99 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:14:15,724 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.386e+02 3.191e+02 3.616e+02 4.129e+02 7.013e+02, threshold=7.231e+02, percent-clipped=0.0 2023-10-09 22:14:17,983 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-10-09 22:14:23,524 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888629.3333333335, ans=0.1 2023-10-09 22:15:06,672 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-10-09 22:15:10,737 INFO [train.py:1031] (0/4) Epoch 14, batch 34300, loss[loss=0.2337, simple_loss=0.2709, pruned_loss=0.07218, ctc_loss=0.1306, over 16641.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2786, pruned_loss=0.06322, ctc_loss=0.1113, over 3304312.11 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:15:11,381 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-10-09 22:15:22,653 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2888862.6666666665, ans=0.1 2023-10-09 22:15:28,529 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2888862.6666666665, ans=0.0 2023-10-09 22:15:29,964 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2023-10-09 22:15:32,256 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2888862.6666666665, ans=0.125 2023-10-09 22:15:42,664 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-10-09 22:15:55,497 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2888956.0, ans=0.1 2023-10-09 22:16:07,707 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889002.6666666665, ans=0.1 2023-10-09 22:16:09,865 INFO [train.py:1031] (0/4) Epoch 14, batch 34350, loss[loss=0.2259, simple_loss=0.2741, pruned_loss=0.06801, ctc_loss=0.1042, over 16688.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2782, pruned_loss=0.06379, ctc_loss=0.1119, over 3297678.85 frames. ], batch size: 111, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:16:11,370 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2889049.3333333335, ans=0.0 2023-10-09 22:16:16,842 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 3.283e+02 3.799e+02 4.453e+02 1.021e+03, threshold=7.599e+02, percent-clipped=4.0 2023-10-09 22:16:22,790 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2889096.0, ans=0.2 2023-10-09 22:16:34,026 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2889142.6666666665, ans=0.0 2023-10-09 22:16:45,349 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2889189.3333333335, ans=0.125 2023-10-09 22:16:50,088 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2889189.3333333335, ans=0.125 2023-10-09 22:16:52,704 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-10-09 22:17:10,490 INFO [train.py:1031] (0/4) Epoch 14, batch 34400, loss[loss=0.2159, simple_loss=0.2665, pruned_loss=0.06143, ctc_loss=0.1064, over 16749.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2768, pruned_loss=0.064, ctc_loss=0.112, over 3293170.94 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:17:23,968 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2889329.3333333335, ans=0.125 2023-10-09 22:17:57,347 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2889469.3333333335, ans=0.0 2023-10-09 22:18:04,880 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2889469.3333333335, ans=0.0 2023-10-09 22:18:11,067 INFO [train.py:1031] (0/4) Epoch 14, batch 34450, loss[loss=0.2235, simple_loss=0.2761, pruned_loss=0.06356, ctc_loss=0.1093, over 16943.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2765, pruned_loss=0.06486, ctc_loss=0.1133, over 3300978.11 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:18:19,264 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+02 3.186e+02 3.591e+02 4.331e+02 7.838e+02, threshold=7.182e+02, percent-clipped=2.0 2023-10-09 22:18:21,257 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2889516.0, ans=0.125 2023-10-09 22:19:13,372 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2889749.3333333335, ans=0.0 2023-10-09 22:19:14,166 INFO [train.py:1031] (0/4) Epoch 14, batch 34500, loss[loss=0.2171, simple_loss=0.29, pruned_loss=0.05413, ctc_loss=0.09017, over 16713.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.285, pruned_loss=0.0672, ctc_loss=0.1172, over 3308421.42 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:19:14,416 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2889749.3333333335, ans=0.0 2023-10-09 22:19:23,042 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2889749.3333333335, ans=0.125 2023-10-09 22:19:27,032 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.17 vs. limit=6.0 2023-10-09 22:19:38,707 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2889796.0, ans=0.0 2023-10-09 22:19:45,112 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2889842.6666666665, ans=0.125 2023-10-09 22:19:46,052 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2889842.6666666665, ans=0.1 2023-10-09 22:20:20,487 INFO [train.py:1031] (0/4) Epoch 14, batch 34550, loss[loss=0.2548, simple_loss=0.3202, pruned_loss=0.06953, ctc_loss=0.1255, over 16864.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.2964, pruned_loss=0.06759, ctc_loss=0.1192, over 3313152.52 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:20:22,994 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2889982.6666666665, ans=0.125 2023-10-09 22:20:30,359 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.650e+02 3.661e+02 4.529e+02 6.004e+02 9.470e+02, threshold=9.059e+02, percent-clipped=10.0 2023-10-09 22:20:51,165 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2890076.0, ans=0.125 2023-10-09 22:20:57,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2890122.6666666665, ans=0.125 2023-10-09 22:21:24,119 INFO [train.py:1031] (0/4) Epoch 14, batch 34600, loss[loss=0.1575, simple_loss=0.2163, pruned_loss=0.03611, ctc_loss=0.06619, over 10936.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2944, pruned_loss=0.06569, ctc_loss=0.1162, over 3313356.28 frames. ], batch size: 35, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:21:24,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2890216.0, ans=0.0 2023-10-09 22:21:30,593 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-10-09 22:21:33,235 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-10-09 22:21:45,745 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2890262.6666666665, ans=0.1 2023-10-09 22:21:52,304 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2890309.3333333335, ans=0.0 2023-10-09 22:22:03,242 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2890356.0, ans=0.09899494936611666 2023-10-09 22:22:04,186 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2890356.0, ans=10.0 2023-10-09 22:22:25,920 INFO [train.py:1031] (0/4) Epoch 14, batch 34650, loss[loss=0.2186, simple_loss=0.2735, pruned_loss=0.05902, ctc_loss=0.1141, over 16815.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2866, pruned_loss=0.06217, ctc_loss=0.1103, over 3297671.27 frames. ], batch size: 201, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:22:37,036 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.928e+02 3.445e+02 4.113e+02 6.666e+02, threshold=6.890e+02, percent-clipped=0.0 2023-10-09 22:22:38,407 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2890496.0, ans=0.1 2023-10-09 22:22:51,761 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2890542.6666666665, ans=0.1 2023-10-09 22:22:58,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2890542.6666666665, ans=0.0 2023-10-09 22:23:11,590 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-10-09 22:23:23,493 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-10-09 22:23:24,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2890636.0, ans=0.125 2023-10-09 22:23:27,776 INFO [train.py:1031] (0/4) Epoch 14, batch 34700, loss[loss=0.2774, simple_loss=0.3111, pruned_loss=0.08997, ctc_loss=0.1592, over 16731.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2853, pruned_loss=0.06341, ctc_loss=0.1119, over 3301472.22 frames. ], batch size: 353, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:24:00,369 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2890776.0, ans=0.125 2023-10-09 22:24:14,633 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2890822.6666666665, ans=0.125 2023-10-09 22:24:31,577 INFO [train.py:1031] (0/4) Epoch 14, batch 34750, loss[loss=0.2394, simple_loss=0.2877, pruned_loss=0.07141, ctc_loss=0.1207, over 16938.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.288, pruned_loss=0.06632, ctc_loss=0.1168, over 3308084.07 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:24:33,961 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2890916.0, ans=0.125 2023-10-09 22:24:42,695 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+02 3.549e+02 4.003e+02 4.772e+02 8.039e+02, threshold=8.005e+02, percent-clipped=2.0 2023-10-09 22:24:47,822 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-10-09 22:24:51,313 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2890962.6666666665, ans=0.0 2023-10-09 22:24:56,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2891009.3333333335, ans=0.0 2023-10-09 22:25:21,038 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2891102.6666666665, ans=0.125 2023-10-09 22:25:31,229 INFO [train.py:1031] (0/4) Epoch 14, batch 34800, loss[loss=0.2127, simple_loss=0.2814, pruned_loss=0.05472, ctc_loss=0.08642, over 16902.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2877, pruned_loss=0.06708, ctc_loss=0.1177, over 3304183.58 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:25:37,092 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2023-10-09 22:25:56,764 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2891242.6666666665, ans=0.125 2023-10-09 22:26:07,174 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2891242.6666666665, ans=0.125 2023-10-09 22:26:18,397 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2891289.3333333335, ans=0.125 2023-10-09 22:26:33,344 INFO [train.py:1031] (0/4) Epoch 14, batch 34850, loss[loss=0.2622, simple_loss=0.2841, pruned_loss=0.09046, ctc_loss=0.1486, over 10814.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.284, pruned_loss=0.067, ctc_loss=0.1171, over 3293145.67 frames. ], batch size: 36, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:26:36,328 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2891382.6666666665, ans=0.125 2023-10-09 22:26:46,832 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+02 3.209e+02 3.596e+02 4.244e+02 8.793e+02, threshold=7.192e+02, percent-clipped=1.0 2023-10-09 22:26:59,393 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2891476.0, ans=0.1 2023-10-09 22:27:16,983 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2891522.6666666665, ans=15.0 2023-10-09 22:27:19,997 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2891522.6666666665, ans=0.125 2023-10-09 22:27:23,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2891569.3333333335, ans=0.2 2023-10-09 22:27:29,161 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2891569.3333333335, ans=0.1 2023-10-09 22:27:35,831 INFO [train.py:1031] (0/4) Epoch 14, batch 34900, loss[loss=0.2405, simple_loss=0.2858, pruned_loss=0.07389, ctc_loss=0.1184, over 16604.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2788, pruned_loss=0.06653, ctc_loss=0.116, over 3294181.91 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:27:52,592 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.27 vs. limit=12.0 2023-10-09 22:28:13,531 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2891756.0, ans=0.0 2023-10-09 22:28:30,907 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2891802.6666666665, ans=0.04949747468305833 2023-10-09 22:28:34,016 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2891802.6666666665, ans=0.0 2023-10-09 22:28:38,935 INFO [train.py:1031] (0/4) Epoch 14, batch 34950, loss[loss=0.2656, simple_loss=0.3197, pruned_loss=0.07939, ctc_loss=0.132, over 16654.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2817, pruned_loss=0.06746, ctc_loss=0.1177, over 3292996.51 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:28:46,585 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2891849.3333333335, ans=0.125 2023-10-09 22:28:54,409 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 3.348e+02 3.779e+02 4.801e+02 1.162e+03, threshold=7.559e+02, percent-clipped=3.0 2023-10-09 22:29:24,369 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.77 vs. limit=22.5 2023-10-09 22:29:26,504 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2891989.3333333335, ans=0.125 2023-10-09 22:29:42,588 INFO [train.py:1031] (0/4) Epoch 14, batch 35000, loss[loss=0.1946, simple_loss=0.255, pruned_loss=0.05088, ctc_loss=0.08098, over 16700.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2821, pruned_loss=0.06602, ctc_loss=0.1159, over 3295354.34 frames. ], batch size: 111, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:29:52,111 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2892082.6666666665, ans=0.04949747468305833 2023-10-09 22:30:08,019 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2892176.0, ans=0.0 2023-10-09 22:30:16,968 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2892176.0, ans=0.0 2023-10-09 22:30:37,795 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2892269.3333333335, ans=0.5 2023-10-09 22:30:43,781 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=22.5 2023-10-09 22:30:48,027 INFO [train.py:1031] (0/4) Epoch 14, batch 35050, loss[loss=0.2285, simple_loss=0.2924, pruned_loss=0.06246, ctc_loss=0.09932, over 16822.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2852, pruned_loss=0.06493, ctc_loss=0.1144, over 3299012.54 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:31:04,384 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 3.170e+02 3.753e+02 4.510e+02 9.970e+02, threshold=7.506e+02, percent-clipped=2.0 2023-10-09 22:31:08,564 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2892362.6666666665, ans=0.035 2023-10-09 22:31:18,156 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2892409.3333333335, ans=0.125 2023-10-09 22:31:31,536 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2892456.0, ans=0.2 2023-10-09 22:31:40,675 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2892502.6666666665, ans=0.125 2023-10-09 22:31:45,512 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2892502.6666666665, ans=0.125 2023-10-09 22:31:51,700 INFO [train.py:1031] (0/4) Epoch 14, batch 35100, loss[loss=0.2266, simple_loss=0.275, pruned_loss=0.06699, ctc_loss=0.1103, over 16517.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2859, pruned_loss=0.064, ctc_loss=0.1133, over 3296200.28 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:32:04,613 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2892596.0, ans=0.1 2023-10-09 22:32:54,776 INFO [train.py:1031] (0/4) Epoch 14, batch 35150, loss[loss=0.2767, simple_loss=0.3273, pruned_loss=0.08572, ctc_loss=0.1365, over 16952.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2883, pruned_loss=0.06507, ctc_loss=0.1155, over 3300635.63 frames. ], batch size: 91, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:33:00,543 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2892782.6666666665, ans=0.0 2023-10-09 22:33:12,902 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.271e+02 3.877e+02 4.489e+02 9.044e+02, threshold=7.754e+02, percent-clipped=1.0 2023-10-09 22:33:21,639 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2892876.0, ans=0.125 2023-10-09 22:33:23,688 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2892876.0, ans=0.0 2023-10-09 22:33:34,633 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2892922.6666666665, ans=0.125 2023-10-09 22:33:48,004 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-10-09 22:33:56,335 INFO [train.py:1031] (0/4) Epoch 14, batch 35200, loss[loss=0.212, simple_loss=0.2829, pruned_loss=0.05041, ctc_loss=0.1005, over 16819.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2883, pruned_loss=0.06337, ctc_loss=0.1128, over 3296187.07 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:34:04,071 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2893016.0, ans=0.0 2023-10-09 22:34:05,109 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2893016.0, ans=0.5 2023-10-09 22:34:44,327 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2893156.0, ans=0.125 2023-10-09 22:34:57,555 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2893202.6666666665, ans=0.125 2023-10-09 22:34:59,205 INFO [train.py:1031] (0/4) Epoch 14, batch 35250, loss[loss=0.2292, simple_loss=0.2825, pruned_loss=0.06577, ctc_loss=0.1109, over 16619.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2858, pruned_loss=0.06214, ctc_loss=0.1102, over 3290723.13 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:35:12,066 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2893296.0, ans=0.125 2023-10-09 22:35:19,503 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.987e+02 3.599e+02 4.398e+02 6.579e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:35:21,950 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-620000.pt 2023-10-09 22:35:25,549 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2893296.0, ans=0.125 2023-10-09 22:35:26,633 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2893342.6666666665, ans=0.2 2023-10-09 22:35:47,438 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2893389.3333333335, ans=0.125 2023-10-09 22:35:58,589 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2893436.0, ans=0.125 2023-10-09 22:36:05,210 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2893482.6666666665, ans=0.0 2023-10-09 22:36:05,973 INFO [train.py:1031] (0/4) Epoch 14, batch 35300, loss[loss=0.2306, simple_loss=0.2875, pruned_loss=0.06478, ctc_loss=0.1104, over 16857.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2968, pruned_loss=0.0643, ctc_loss=0.114, over 3286784.92 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:36:24,354 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2893529.3333333335, ans=0.0 2023-10-09 22:36:28,211 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893529.3333333335, ans=0.1 2023-10-09 22:36:30,897 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2893576.0, ans=0.0 2023-10-09 22:36:34,818 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2893576.0, ans=0.0 2023-10-09 22:36:38,467 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2893576.0, ans=0.125 2023-10-09 22:36:59,596 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2893669.3333333335, ans=0.0 2023-10-09 22:37:07,847 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2893669.3333333335, ans=0.2 2023-10-09 22:37:10,999 INFO [train.py:1031] (0/4) Epoch 14, batch 35350, loss[loss=0.251, simple_loss=0.3108, pruned_loss=0.07077, ctc_loss=0.1242, over 16706.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2994, pruned_loss=0.06646, ctc_loss=0.118, over 3291918.52 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:37:30,904 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-10-09 22:37:31,482 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+02 3.418e+02 3.862e+02 4.842e+02 9.244e+02, threshold=7.725e+02, percent-clipped=2.0 2023-10-09 22:37:45,620 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2893809.3333333335, ans=0.0 2023-10-09 22:37:54,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2893856.0, ans=0.125 2023-10-09 22:38:03,663 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2893902.6666666665, ans=0.1 2023-10-09 22:38:14,132 INFO [train.py:1031] (0/4) Epoch 14, batch 35400, loss[loss=0.2215, simple_loss=0.2873, pruned_loss=0.05762, ctc_loss=0.1012, over 16826.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.3055, pruned_loss=0.06736, ctc_loss=0.1197, over 3290802.37 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:38:37,376 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2894042.6666666665, ans=0.125 2023-10-09 22:38:47,874 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2894042.6666666665, ans=0.0 2023-10-09 22:38:54,922 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2894089.3333333335, ans=0.125 2023-10-09 22:38:58,329 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2894089.3333333335, ans=0.125 2023-10-09 22:39:14,643 INFO [train.py:1031] (0/4) Epoch 14, batch 35450, loss[loss=0.1986, simple_loss=0.2529, pruned_loss=0.05266, ctc_loss=0.09743, over 16733.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2976, pruned_loss=0.06609, ctc_loss=0.1171, over 3296446.74 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:39:15,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2894182.6666666665, ans=0.125 2023-10-09 22:39:21,655 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2894182.6666666665, ans=0.125 2023-10-09 22:39:25,631 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2023-10-09 22:39:34,079 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2894229.3333333335, ans=0.1 2023-10-09 22:39:36,533 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+02 3.230e+02 3.810e+02 4.860e+02 8.869e+02, threshold=7.620e+02, percent-clipped=1.0 2023-10-09 22:39:37,533 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=12.0 2023-10-09 22:40:17,596 INFO [train.py:1031] (0/4) Epoch 14, batch 35500, loss[loss=0.2074, simple_loss=0.2687, pruned_loss=0.0545, ctc_loss=0.09255, over 16834.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2941, pruned_loss=0.06763, ctc_loss=0.1194, over 3302909.21 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:40:27,997 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-10-09 22:41:01,204 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2894556.0, ans=0.95 2023-10-09 22:41:03,646 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2894556.0, ans=0.09899494936611666 2023-10-09 22:41:20,537 INFO [train.py:1031] (0/4) Epoch 14, batch 35550, loss[loss=0.2502, simple_loss=0.3039, pruned_loss=0.07256, ctc_loss=0.1284, over 16880.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.2965, pruned_loss=0.07023, ctc_loss=0.1236, over 3301631.45 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:41:37,020 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=2894696.0, ans=10.0 2023-10-09 22:41:38,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2894696.0, ans=0.125 2023-10-09 22:41:42,147 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.738e+02 3.683e+02 4.220e+02 5.051e+02 8.035e+02, threshold=8.441e+02, percent-clipped=1.0 2023-10-09 22:42:22,017 INFO [train.py:1031] (0/4) Epoch 14, batch 35600, loss[loss=0.2557, simple_loss=0.314, pruned_loss=0.07277, ctc_loss=0.1297, over 16527.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.2979, pruned_loss=0.0708, ctc_loss=0.1246, over 3299927.24 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:42:23,421 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2894882.6666666665, ans=0.125 2023-10-09 22:42:44,096 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-10-09 22:42:45,352 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2894976.0, ans=0.125 2023-10-09 22:42:48,469 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2894976.0, ans=0.0 2023-10-09 22:42:53,373 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2894976.0, ans=0.1 2023-10-09 22:43:04,783 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2895022.6666666665, ans=0.2 2023-10-09 22:43:23,152 INFO [train.py:1031] (0/4) Epoch 14, batch 35650, loss[loss=0.2299, simple_loss=0.2978, pruned_loss=0.05809, ctc_loss=0.1143, over 16245.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2937, pruned_loss=0.0664, ctc_loss=0.1175, over 3302711.06 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:43:47,003 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.989e+02 3.692e+02 4.285e+02 1.206e+03, threshold=7.384e+02, percent-clipped=2.0 2023-10-09 22:43:47,634 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-10-09 22:43:50,185 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2895209.3333333335, ans=0.09899494936611666 2023-10-09 22:43:50,219 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2895209.3333333335, ans=0.1 2023-10-09 22:44:02,795 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2895256.0, ans=0.07 2023-10-09 22:44:19,602 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2023-10-09 22:44:26,165 INFO [train.py:1031] (0/4) Epoch 14, batch 35700, loss[loss=0.2518, simple_loss=0.309, pruned_loss=0.07169, ctc_loss=0.1279, over 16826.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.2964, pruned_loss=0.0666, ctc_loss=0.1177, over 3302152.22 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:44:48,182 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2895396.0, ans=0.125 2023-10-09 22:45:12,981 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2895489.3333333335, ans=15.0 2023-10-09 22:45:27,082 INFO [train.py:1031] (0/4) Epoch 14, batch 35750, loss[loss=0.2483, simple_loss=0.2823, pruned_loss=0.07872, ctc_loss=0.1422, over 16602.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2958, pruned_loss=0.06798, ctc_loss=0.1198, over 3299917.10 frames. ], batch size: 415, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:45:53,022 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.764e+02 4.390e+02 5.354e+02 1.212e+03, threshold=8.781e+02, percent-clipped=8.0 2023-10-09 22:45:56,022 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2895676.0, ans=0.125 2023-10-09 22:46:29,802 INFO [train.py:1031] (0/4) Epoch 14, batch 35800, loss[loss=0.2975, simple_loss=0.3501, pruned_loss=0.08998, ctc_loss=0.1622, over 16620.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2957, pruned_loss=0.06916, ctc_loss=0.1214, over 3306985.20 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:47:05,499 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2895956.0, ans=0.2 2023-10-09 22:47:28,368 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2896002.6666666665, ans=0.125 2023-10-09 22:47:31,695 INFO [train.py:1031] (0/4) Epoch 14, batch 35850, loss[loss=0.2431, simple_loss=0.3436, pruned_loss=0.05208, ctc_loss=0.09603, over 15077.00 frames. ], tot_loss[loss=0.2437, simple_loss=0.2993, pruned_loss=0.06967, ctc_loss=0.1218, over 3313326.94 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:47:56,952 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2896142.6666666665, ans=0.125 2023-10-09 22:47:57,701 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 3.376e+02 4.105e+02 5.188e+02 8.758e+02, threshold=8.210e+02, percent-clipped=0.0 2023-10-09 22:48:32,288 INFO [train.py:1031] (0/4) Epoch 14, batch 35900, loss[loss=0.1911, simple_loss=0.2685, pruned_loss=0.04173, ctc_loss=0.07525, over 16845.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2953, pruned_loss=0.06385, ctc_loss=0.1124, over 3314894.35 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:48:36,623 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2896282.6666666665, ans=0.125 2023-10-09 22:48:58,411 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2896376.0, ans=0.125 2023-10-09 22:49:00,806 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2896376.0, ans=0.125 2023-10-09 22:49:02,710 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=12.0 2023-10-09 22:49:11,071 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-10-09 22:49:13,749 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2896422.6666666665, ans=0.125 2023-10-09 22:49:25,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2896469.3333333335, ans=0.2 2023-10-09 22:49:35,580 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2896516.0, ans=0.125 2023-10-09 22:49:36,315 INFO [train.py:1031] (0/4) Epoch 14, batch 35950, loss[loss=0.1656, simple_loss=0.2315, pruned_loss=0.03714, ctc_loss=0.06337, over 16715.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2935, pruned_loss=0.05989, ctc_loss=0.1061, over 3312552.99 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:49:42,555 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2896516.0, ans=0.125 2023-10-09 22:50:04,036 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.666e+02 3.384e+02 4.357e+02 7.839e+02, threshold=6.768e+02, percent-clipped=0.0 2023-10-09 22:50:28,494 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-10-09 22:50:30,743 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2896702.6666666665, ans=0.2 2023-10-09 22:50:38,131 INFO [train.py:1031] (0/4) Epoch 14, batch 36000, loss[loss=0.1921, simple_loss=0.2908, pruned_loss=0.03357, ctc_loss=0.06539, over 15058.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.284, pruned_loss=0.05491, ctc_loss=0.09772, over 3298003.36 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:50:38,131 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 22:50:58,876 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2335, simple_loss=0.304, pruned_loss=0.06295, ctc_loss=0.09275, over 1796401.00 frames. 2023-10-09 22:50:58,876 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 22:51:04,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2896749.3333333335, ans=0.2 2023-10-09 22:51:35,243 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2896889.3333333335, ans=0.125 2023-10-09 22:51:59,924 INFO [train.py:1031] (0/4) Epoch 14, batch 36050, loss[loss=0.2128, simple_loss=0.2635, pruned_loss=0.06164, ctc_loss=0.09714, over 16859.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2806, pruned_loss=0.05608, ctc_loss=0.09893, over 3295394.99 frames. ], batch size: 189, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:52:01,429 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2896982.6666666665, ans=0.125 2023-10-09 22:52:11,189 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2896982.6666666665, ans=0.125 2023-10-09 22:52:14,554 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2897029.3333333335, ans=0.125 2023-10-09 22:52:29,191 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.810e+02 3.555e+02 4.396e+02 7.920e+02, threshold=7.110e+02, percent-clipped=1.0 2023-10-09 22:52:34,969 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2897076.0, ans=0.0 2023-10-09 22:52:35,026 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2897076.0, ans=0.1 2023-10-09 22:52:42,221 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2897122.6666666665, ans=0.125 2023-10-09 22:52:59,572 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2897169.3333333335, ans=0.2 2023-10-09 22:53:02,272 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2897216.0, ans=0.0 2023-10-09 22:53:02,989 INFO [train.py:1031] (0/4) Epoch 14, batch 36100, loss[loss=0.2663, simple_loss=0.3143, pruned_loss=0.08034, ctc_loss=0.1442, over 16842.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2846, pruned_loss=0.06045, ctc_loss=0.1059, over 3307125.13 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:53:21,759 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:53:22,223 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=22.5 2023-10-09 22:53:25,661 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2897262.6666666665, ans=0.2 2023-10-09 22:54:05,787 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2897449.3333333335, ans=0.125 2023-10-09 22:54:06,392 INFO [train.py:1031] (0/4) Epoch 14, batch 36150, loss[loss=0.2081, simple_loss=0.2591, pruned_loss=0.05877, ctc_loss=0.09888, over 16779.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.287, pruned_loss=0.0626, ctc_loss=0.1099, over 3296575.60 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:54:24,581 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-10-09 22:54:36,549 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+02 3.412e+02 4.167e+02 5.128e+02 1.236e+03, threshold=8.334e+02, percent-clipped=3.0 2023-10-09 22:55:01,906 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2897636.0, ans=0.125 2023-10-09 22:55:09,643 INFO [train.py:1031] (0/4) Epoch 14, batch 36200, loss[loss=0.1921, simple_loss=0.2541, pruned_loss=0.0477, ctc_loss=0.08666, over 16750.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2882, pruned_loss=0.06369, ctc_loss=0.1122, over 3294347.02 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:55:27,813 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2023-10-09 22:55:59,242 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-10-09 22:56:11,655 INFO [train.py:1031] (0/4) Epoch 14, batch 36250, loss[loss=0.2322, simple_loss=0.279, pruned_loss=0.06935, ctc_loss=0.1168, over 16954.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2926, pruned_loss=0.06318, ctc_loss=0.1131, over 3302779.44 frames. ], batch size: 86, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:56:42,283 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.544e+02 3.450e+02 4.069e+02 4.879e+02 1.069e+03, threshold=8.138e+02, percent-clipped=4.0 2023-10-09 22:56:56,442 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2898056.0, ans=0.125 2023-10-09 22:57:13,562 INFO [train.py:1031] (0/4) Epoch 14, batch 36300, loss[loss=0.23, simple_loss=0.2918, pruned_loss=0.06221, ctc_loss=0.1095, over 16816.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2909, pruned_loss=0.0637, ctc_loss=0.114, over 3303399.81 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:57:17,721 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2898149.3333333335, ans=0.2 2023-10-09 22:57:20,765 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-10-09 22:57:24,594 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2898196.0, ans=0.05 2023-10-09 22:57:36,219 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2898196.0, ans=15.0 2023-10-09 22:57:49,184 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-10-09 22:58:12,698 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2898336.0, ans=0.025 2023-10-09 22:58:16,204 INFO [train.py:1031] (0/4) Epoch 14, batch 36350, loss[loss=0.2576, simple_loss=0.3014, pruned_loss=0.08007, ctc_loss=0.134, over 16860.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2945, pruned_loss=0.066, ctc_loss=0.1173, over 3311466.33 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:58:34,688 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2898429.3333333335, ans=0.0 2023-10-09 22:58:48,784 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+02 3.450e+02 4.170e+02 4.968e+02 1.204e+03, threshold=8.340e+02, percent-clipped=3.0 2023-10-09 22:59:00,365 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2898522.6666666665, ans=0.125 2023-10-09 22:59:10,975 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2898569.3333333335, ans=0.1 2023-10-09 22:59:19,336 INFO [train.py:1031] (0/4) Epoch 14, batch 36400, loss[loss=0.2183, simple_loss=0.2647, pruned_loss=0.06348, ctc_loss=0.1122, over 16715.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2933, pruned_loss=0.06737, ctc_loss=0.119, over 3313603.37 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:59:21,652 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2898616.0, ans=0.5 2023-10-09 22:59:32,811 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2898662.6666666665, ans=0.125 2023-10-09 22:59:42,514 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2898662.6666666665, ans=0.125 2023-10-09 22:59:51,311 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2898709.3333333335, ans=0.125 2023-10-09 22:59:52,160 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2898709.3333333335, ans=0.0 2023-10-09 22:59:59,736 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2898756.0, ans=0.125 2023-10-09 23:00:06,265 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2898756.0, ans=0.125 2023-10-09 23:00:07,233 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2898756.0, ans=0.125 2023-10-09 23:00:07,723 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2023-10-09 23:00:15,172 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2898802.6666666665, ans=0.125 2023-10-09 23:00:21,499 INFO [train.py:1031] (0/4) Epoch 14, batch 36450, loss[loss=0.1894, simple_loss=0.2432, pruned_loss=0.05002, ctc_loss=0.08892, over 16675.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2862, pruned_loss=0.06641, ctc_loss=0.1167, over 3298941.41 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:00:25,057 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2898849.3333333335, ans=0.2 2023-10-09 23:00:45,903 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2023-10-09 23:00:52,503 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2898942.6666666665, ans=0.125 2023-10-09 23:00:54,966 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.085e+02 3.494e+02 4.091e+02 1.458e+03, threshold=6.988e+02, percent-clipped=1.0 2023-10-09 23:01:04,981 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2898989.3333333335, ans=0.0 2023-10-09 23:01:22,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2899036.0, ans=0.035 2023-10-09 23:01:24,212 INFO [train.py:1031] (0/4) Epoch 14, batch 36500, loss[loss=0.193, simple_loss=0.2386, pruned_loss=0.05432, ctc_loss=0.09708, over 16823.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2788, pruned_loss=0.06513, ctc_loss=0.1147, over 3304727.77 frames. ], batch size: 189, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:01:27,420 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:01:33,357 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2899082.6666666665, ans=0.125 2023-10-09 23:01:43,531 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2023-10-09 23:01:50,786 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2899176.0, ans=0.125 2023-10-09 23:02:02,809 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2899222.6666666665, ans=0.0 2023-10-09 23:02:17,054 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2899269.3333333335, ans=0.125 2023-10-09 23:02:18,562 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=12.0 2023-10-09 23:02:27,714 INFO [train.py:1031] (0/4) Epoch 14, batch 36550, loss[loss=0.2075, simple_loss=0.2798, pruned_loss=0.04969, ctc_loss=0.08955, over 16687.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2784, pruned_loss=0.0644, ctc_loss=0.1137, over 3309822.15 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:02:45,077 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2899362.6666666665, ans=0.1 2023-10-09 23:02:56,702 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2899409.3333333335, ans=0.05 2023-10-09 23:03:01,170 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.250e+02 3.665e+02 4.225e+02 1.129e+03, threshold=7.330e+02, percent-clipped=1.0 2023-10-09 23:03:26,772 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-10-09 23:03:28,795 INFO [train.py:1031] (0/4) Epoch 14, batch 36600, loss[loss=0.186, simple_loss=0.2391, pruned_loss=0.04941, ctc_loss=0.08533, over 16644.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2758, pruned_loss=0.06265, ctc_loss=0.1106, over 3308699.19 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:03:37,706 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2899549.3333333335, ans=0.125 2023-10-09 23:03:51,223 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2899596.0, ans=0.1 2023-10-09 23:03:56,677 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2899642.6666666665, ans=0.025 2023-10-09 23:04:02,551 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2899642.6666666665, ans=0.2 2023-10-09 23:04:28,594 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-10-09 23:04:30,819 INFO [train.py:1031] (0/4) Epoch 14, batch 36650, loss[loss=0.1872, simple_loss=0.2426, pruned_loss=0.04857, ctc_loss=0.08691, over 16755.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2694, pruned_loss=0.0607, ctc_loss=0.1074, over 3309563.50 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:04:46,514 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:04:59,758 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=12.0 2023-10-09 23:05:04,757 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2899876.0, ans=0.0 2023-10-09 23:05:04,797 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2899876.0, ans=0.025 2023-10-09 23:05:06,012 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 3.036e+02 3.411e+02 4.060e+02 1.638e+03, threshold=6.823e+02, percent-clipped=3.0 2023-10-09 23:05:33,284 INFO [train.py:1031] (0/4) Epoch 14, batch 36700, loss[loss=0.2189, simple_loss=0.2327, pruned_loss=0.07441, ctc_loss=0.1407, over 15476.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2644, pruned_loss=0.06013, ctc_loss=0.1063, over 3307618.98 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:05:48,802 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2900062.6666666665, ans=0.025 2023-10-09 23:06:07,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2900109.3333333335, ans=0.0 2023-10-09 23:06:14,338 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:06:33,394 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.32 vs. limit=10.0 2023-10-09 23:06:34,441 INFO [train.py:1031] (0/4) Epoch 14, batch 36750, loss[loss=0.2382, simple_loss=0.2913, pruned_loss=0.06946, ctc_loss=0.1157, over 16758.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2673, pruned_loss=0.06243, ctc_loss=0.1094, over 3313847.74 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:06:45,806 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2900296.0, ans=0.0 2023-10-09 23:07:06,819 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2900342.6666666665, ans=0.0 2023-10-09 23:07:09,708 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+02 3.153e+02 3.511e+02 4.063e+02 5.415e+02, threshold=7.022e+02, percent-clipped=0.0 2023-10-09 23:07:10,095 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2900389.3333333335, ans=0.125 2023-10-09 23:07:18,658 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2900389.3333333335, ans=0.125 2023-10-09 23:07:34,281 INFO [train.py:1031] (0/4) Epoch 14, batch 36800, loss[loss=0.2371, simple_loss=0.2831, pruned_loss=0.07145, ctc_loss=0.1206, over 16937.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2679, pruned_loss=0.0626, ctc_loss=0.109, over 3307627.91 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:07:38,284 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900482.6666666665, ans=0.1 2023-10-09 23:07:38,293 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2900482.6666666665, ans=0.125 2023-10-09 23:07:38,328 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2900482.6666666665, ans=0.125 2023-10-09 23:07:50,596 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2900529.3333333335, ans=0.125 2023-10-09 23:08:18,173 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2900622.6666666665, ans=0.0 2023-10-09 23:08:23,573 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2900669.3333333335, ans=0.0 2023-10-09 23:08:35,593 INFO [train.py:1031] (0/4) Epoch 14, batch 36850, loss[loss=0.2455, simple_loss=0.3019, pruned_loss=0.06973, ctc_loss=0.124, over 16754.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2706, pruned_loss=0.06251, ctc_loss=0.1084, over 3304795.54 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:08:37,024 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2900716.0, ans=0.125 2023-10-09 23:08:50,335 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-10-09 23:08:51,162 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:09:11,568 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2900809.3333333335, ans=0.125 2023-10-09 23:09:16,103 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+02 3.452e+02 4.218e+02 5.067e+02 9.154e+02, threshold=8.437e+02, percent-clipped=6.0 2023-10-09 23:09:20,947 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2900856.0, ans=0.125 2023-10-09 23:09:28,070 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:09:29,206 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2900902.6666666665, ans=0.035 2023-10-09 23:09:38,547 INFO [train.py:1031] (0/4) Epoch 14, batch 36900, loss[loss=0.2514, simple_loss=0.3003, pruned_loss=0.07461, ctc_loss=0.1331, over 16940.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2779, pruned_loss=0.06545, ctc_loss=0.1136, over 3301501.66 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:09:40,560 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:09:58,581 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.29 vs. limit=10.0 2023-10-09 23:09:59,847 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:10:03,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2901042.6666666665, ans=0.0 2023-10-09 23:10:09,340 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2901042.6666666665, ans=0.125 2023-10-09 23:10:40,510 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2901136.0, ans=0.0 2023-10-09 23:10:43,338 INFO [train.py:1031] (0/4) Epoch 14, batch 36950, loss[loss=0.2589, simple_loss=0.3061, pruned_loss=0.07819, ctc_loss=0.1385, over 16208.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2851, pruned_loss=0.06858, ctc_loss=0.1191, over 3302369.92 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:11:04,204 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2901229.3333333335, ans=0.125 2023-10-09 23:11:18,763 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-10-09 23:11:25,051 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+02 3.608e+02 4.056e+02 4.983e+02 1.030e+03, threshold=8.112e+02, percent-clipped=3.0 2023-10-09 23:11:30,342 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2901322.6666666665, ans=0.125 2023-10-09 23:11:34,406 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-10-09 23:11:40,237 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2901369.3333333335, ans=0.125 2023-10-09 23:11:45,391 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-10-09 23:11:46,104 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2901416.0, ans=0.125 2023-10-09 23:11:46,858 INFO [train.py:1031] (0/4) Epoch 14, batch 37000, loss[loss=0.231, simple_loss=0.3012, pruned_loss=0.05927, ctc_loss=0.1056, over 16838.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.2921, pruned_loss=0.06938, ctc_loss=0.1209, over 3303148.36 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:12:01,220 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2023-10-09 23:12:08,209 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=12.0 2023-10-09 23:12:16,147 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2901509.3333333335, ans=0.125 2023-10-09 23:12:31,322 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2901556.0, ans=0.2 2023-10-09 23:12:49,889 INFO [train.py:1031] (0/4) Epoch 14, batch 37050, loss[loss=0.2133, simple_loss=0.2637, pruned_loss=0.06001, ctc_loss=0.1072, over 16754.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2878, pruned_loss=0.06782, ctc_loss=0.1184, over 3306361.04 frames. ], batch size: 310, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:13:18,666 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2901742.6666666665, ans=0.125 2023-10-09 23:13:24,262 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2901742.6666666665, ans=0.0 2023-10-09 23:13:26,107 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2901742.6666666665, ans=0.0 2023-10-09 23:13:31,556 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.263e+02 3.806e+02 4.315e+02 8.340e+02, threshold=7.611e+02, percent-clipped=1.0 2023-10-09 23:13:44,293 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2901836.0, ans=0.125 2023-10-09 23:13:51,986 INFO [train.py:1031] (0/4) Epoch 14, batch 37100, loss[loss=0.2099, simple_loss=0.2559, pruned_loss=0.06085, ctc_loss=0.1055, over 16830.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.28, pruned_loss=0.06605, ctc_loss=0.1155, over 3307639.00 frames. ], batch size: 243, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:14:05,049 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2901929.3333333335, ans=0.125 2023-10-09 23:14:26,716 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2902022.6666666665, ans=0.125 2023-10-09 23:14:39,837 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2902069.3333333335, ans=0.125 2023-10-09 23:14:53,065 INFO [train.py:1031] (0/4) Epoch 14, batch 37150, loss[loss=0.2174, simple_loss=0.2619, pruned_loss=0.06434, ctc_loss=0.1102, over 16509.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2729, pruned_loss=0.06462, ctc_loss=0.113, over 3308121.42 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:15:03,574 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2902116.0, ans=0.0 2023-10-09 23:15:08,881 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2902162.6666666665, ans=0.0 2023-10-09 23:15:10,589 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2902162.6666666665, ans=0.07 2023-10-09 23:15:11,674 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:15:13,257 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2902162.6666666665, ans=0.5 2023-10-09 23:15:23,480 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2902209.3333333335, ans=0.2 2023-10-09 23:15:24,714 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2902209.3333333335, ans=0.0 2023-10-09 23:15:34,534 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 3.053e+02 3.584e+02 4.083e+02 7.481e+02, threshold=7.169e+02, percent-clipped=0.0 2023-10-09 23:15:38,977 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-10-09 23:15:54,258 INFO [train.py:1031] (0/4) Epoch 14, batch 37200, loss[loss=0.2387, simple_loss=0.3166, pruned_loss=0.06065, ctc_loss=0.09858, over 16965.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2751, pruned_loss=0.06298, ctc_loss=0.1107, over 3304774.92 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:16:02,231 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=12.0 2023-10-09 23:16:27,953 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2902442.6666666665, ans=0.0 2023-10-09 23:16:31,522 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2902489.3333333335, ans=0.125 2023-10-09 23:16:43,503 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2902536.0, ans=0.0 2023-10-09 23:16:45,693 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2902536.0, ans=0.0 2023-10-09 23:16:49,479 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2902536.0, ans=0.09899494936611666 2023-10-09 23:16:53,169 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2902582.6666666665, ans=0.0 2023-10-09 23:16:53,688 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-10-09 23:16:53,922 INFO [train.py:1031] (0/4) Epoch 14, batch 37250, loss[loss=0.2417, simple_loss=0.2869, pruned_loss=0.07256, ctc_loss=0.1285, over 16781.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2797, pruned_loss=0.06175, ctc_loss=0.1086, over 3293973.15 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:17:02,379 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2902582.6666666665, ans=0.0 2023-10-09 23:17:02,388 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2902582.6666666665, ans=0.0 2023-10-09 23:17:28,186 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902676.0, ans=0.1 2023-10-09 23:17:33,654 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2902722.6666666665, ans=0.125 2023-10-09 23:17:36,500 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 2.936e+02 3.384e+02 3.917e+02 6.225e+02, threshold=6.767e+02, percent-clipped=0.0 2023-10-09 23:17:42,054 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.41 vs. limit=10.0 2023-10-09 23:17:54,220 INFO [train.py:1031] (0/4) Epoch 14, batch 37300, loss[loss=0.2064, simple_loss=0.2902, pruned_loss=0.04514, ctc_loss=0.08088, over 16930.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2777, pruned_loss=0.06041, ctc_loss=0.1062, over 3289340.46 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:18:08,466 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2902862.6666666665, ans=0.0 2023-10-09 23:18:10,722 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2902862.6666666665, ans=0.0 2023-10-09 23:18:17,342 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2902862.6666666665, ans=0.125 2023-10-09 23:18:19,987 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2902909.3333333335, ans=0.95 2023-10-09 23:18:22,142 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2902909.3333333335, ans=0.035 2023-10-09 23:18:44,277 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2903002.6666666665, ans=0.09899494936611666 2023-10-09 23:18:52,829 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2903002.6666666665, ans=10.0 2023-10-09 23:18:55,640 INFO [train.py:1031] (0/4) Epoch 14, batch 37350, loss[loss=0.2849, simple_loss=0.3177, pruned_loss=0.09344, ctc_loss=0.1629, over 16610.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2776, pruned_loss=0.05856, ctc_loss=0.1029, over 3283972.20 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:19:06,904 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2903096.0, ans=0.2 2023-10-09 23:19:23,438 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2903142.6666666665, ans=0.125 2023-10-09 23:19:38,026 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 2.949e+02 3.528e+02 4.105e+02 1.147e+03, threshold=7.057e+02, percent-clipped=0.0 2023-10-09 23:19:39,505 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2903189.3333333335, ans=0.125 2023-10-09 23:19:50,717 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2903236.0, ans=0.125 2023-10-09 23:19:54,563 INFO [train.py:1031] (0/4) Epoch 14, batch 37400, loss[loss=0.2318, simple_loss=0.2797, pruned_loss=0.06986, ctc_loss=0.1102, over 16938.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2744, pruned_loss=0.05863, ctc_loss=0.1025, over 3288707.45 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:19:57,061 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2903282.6666666665, ans=0.125 2023-10-09 23:20:01,826 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2903282.6666666665, ans=0.125 2023-10-09 23:20:11,650 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2903329.3333333335, ans=0.125 2023-10-09 23:20:26,083 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2903376.0, ans=0.0 2023-10-09 23:20:48,709 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2023-10-09 23:20:55,595 INFO [train.py:1031] (0/4) Epoch 14, batch 37450, loss[loss=0.228, simple_loss=0.2975, pruned_loss=0.05866, ctc_loss=0.1028, over 16842.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2737, pruned_loss=0.058, ctc_loss=0.1018, over 3285569.29 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:20:57,315 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-10-09 23:21:05,124 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2903516.0, ans=0.0 2023-10-09 23:21:24,362 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-10-09 23:21:41,546 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 2.982e+02 3.878e+02 4.493e+02 7.805e+02, threshold=7.755e+02, percent-clipped=2.0 2023-10-09 23:21:45,730 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2903702.6666666665, ans=0.04949747468305833 2023-10-09 23:21:58,570 INFO [train.py:1031] (0/4) Epoch 14, batch 37500, loss[loss=0.2471, simple_loss=0.3258, pruned_loss=0.06154, ctc_loss=0.1134, over 16279.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2783, pruned_loss=0.05975, ctc_loss=0.1049, over 3281613.86 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:22:09,577 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2903796.0, ans=0.0 2023-10-09 23:22:12,609 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2903796.0, ans=0.125 2023-10-09 23:22:18,869 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2903796.0, ans=0.025 2023-10-09 23:22:28,196 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2903842.6666666665, ans=0.2 2023-10-09 23:22:29,407 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2903842.6666666665, ans=0.0 2023-10-09 23:22:32,660 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2903842.6666666665, ans=0.125 2023-10-09 23:22:59,333 INFO [train.py:1031] (0/4) Epoch 14, batch 37550, loss[loss=0.1608, simple_loss=0.2368, pruned_loss=0.03126, ctc_loss=0.05538, over 16785.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2768, pruned_loss=0.05744, ctc_loss=0.1013, over 3263847.86 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:23:02,814 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2903982.6666666665, ans=0.0 2023-10-09 23:23:09,376 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2903982.6666666665, ans=0.07 2023-10-09 23:23:28,116 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.96 vs. limit=22.5 2023-10-09 23:23:34,631 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2904122.6666666665, ans=0.2 2023-10-09 23:23:46,184 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.897e+02 3.320e+02 4.036e+02 7.809e+02, threshold=6.640e+02, percent-clipped=1.0 2023-10-09 23:23:55,413 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2904169.3333333335, ans=0.125 2023-10-09 23:23:58,313 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2904169.3333333335, ans=0.125 2023-10-09 23:24:00,649 INFO [train.py:1031] (0/4) Epoch 14, batch 37600, loss[loss=0.2007, simple_loss=0.241, pruned_loss=0.05981, ctc_loss=0.1017, over 16671.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2725, pruned_loss=0.05774, ctc_loss=0.1013, over 3268166.38 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:24:05,207 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2904216.0, ans=0.125 2023-10-09 23:24:19,458 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2904262.6666666665, ans=0.125 2023-10-09 23:24:23,427 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2904309.3333333335, ans=0.2 2023-10-09 23:24:24,401 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2904309.3333333335, ans=0.125 2023-10-09 23:24:31,372 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2904309.3333333335, ans=0.1 2023-10-09 23:24:50,850 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2023-10-09 23:24:59,294 INFO [train.py:1031] (0/4) Epoch 14, batch 37650, loss[loss=0.2074, simple_loss=0.2419, pruned_loss=0.06269, ctc_loss=0.1189, over 15457.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2722, pruned_loss=0.05971, ctc_loss=0.1043, over 3265970.50 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:25:00,107 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-10-09 23:25:02,709 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=15.0 2023-10-09 23:25:48,254 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+02 3.454e+02 4.118e+02 4.727e+02 1.151e+03, threshold=8.236e+02, percent-clipped=7.0 2023-10-09 23:25:58,524 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2904636.0, ans=0.5 2023-10-09 23:26:00,432 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2904682.6666666665, ans=0.2 2023-10-09 23:26:01,812 INFO [train.py:1031] (0/4) Epoch 14, batch 37700, loss[loss=0.1882, simple_loss=0.2651, pruned_loss=0.04073, ctc_loss=0.07469, over 16835.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2749, pruned_loss=0.05975, ctc_loss=0.1045, over 3273423.83 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:26:29,073 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-10-09 23:26:46,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2904822.6666666665, ans=0.0 2023-10-09 23:26:48,045 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-10-09 23:27:03,709 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2904916.0, ans=0.2 2023-10-09 23:27:04,243 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2023-10-09 23:27:05,173 INFO [train.py:1031] (0/4) Epoch 14, batch 37750, loss[loss=0.228, simple_loss=0.3254, pruned_loss=0.04731, ctc_loss=0.08993, over 15072.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2748, pruned_loss=0.05619, ctc_loss=0.09922, over 3281602.65 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:27:18,501 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2904962.6666666665, ans=0.5 2023-10-09 23:27:37,037 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2905009.3333333335, ans=0.1 2023-10-09 23:27:56,212 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.858e+02 3.601e+02 4.405e+02 1.102e+03, threshold=7.202e+02, percent-clipped=1.0 2023-10-09 23:28:07,549 INFO [train.py:1031] (0/4) Epoch 14, batch 37800, loss[loss=0.2346, simple_loss=0.295, pruned_loss=0.06393, ctc_loss=0.1156, over 16758.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.284, pruned_loss=0.05889, ctc_loss=0.1048, over 3283687.62 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:28:10,782 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2905149.3333333335, ans=0.125 2023-10-09 23:28:31,173 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-10-09 23:28:51,624 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.63 vs. limit=15.0 2023-10-09 23:29:05,320 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2023-10-09 23:29:08,610 INFO [train.py:1031] (0/4) Epoch 14, batch 37850, loss[loss=0.1947, simple_loss=0.2847, pruned_loss=0.03838, ctc_loss=0.0695, over 16279.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2869, pruned_loss=0.05761, ctc_loss=0.1027, over 3284313.68 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:29:23,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2905429.3333333335, ans=0.0 2023-10-09 23:29:57,549 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2905522.6666666665, ans=0.125 2023-10-09 23:30:00,320 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2905569.3333333335, ans=0.0 2023-10-09 23:30:00,964 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 3.190e+02 3.752e+02 4.348e+02 7.334e+02, threshold=7.503e+02, percent-clipped=1.0 2023-10-09 23:30:13,341 INFO [train.py:1031] (0/4) Epoch 14, batch 37900, loss[loss=0.2695, simple_loss=0.3104, pruned_loss=0.0841, ctc_loss=0.151, over 16533.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2892, pruned_loss=0.05941, ctc_loss=0.1054, over 3281577.26 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:30:17,939 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2905616.0, ans=0.2 2023-10-09 23:30:35,336 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2905709.3333333335, ans=0.125 2023-10-09 23:30:47,248 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2905756.0, ans=0.0 2023-10-09 23:31:01,832 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2905802.6666666665, ans=0.125 2023-10-09 23:31:05,085 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2905802.6666666665, ans=0.125 2023-10-09 23:31:13,598 INFO [train.py:1031] (0/4) Epoch 14, batch 37950, loss[loss=0.2223, simple_loss=0.2668, pruned_loss=0.06551, ctc_loss=0.117, over 15193.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2903, pruned_loss=0.06228, ctc_loss=0.1098, over 3281455.95 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:31:17,173 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2905849.3333333335, ans=0.125 2023-10-09 23:31:19,234 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2905849.3333333335, ans=0.125 2023-10-09 23:31:26,308 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2905896.0, ans=0.125 2023-10-09 23:31:29,449 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2905896.0, ans=0.2 2023-10-09 23:31:48,016 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2905942.6666666665, ans=0.125 2023-10-09 23:31:55,785 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2905989.3333333335, ans=0.125 2023-10-09 23:32:00,202 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2905989.3333333335, ans=0.09899494936611666 2023-10-09 23:32:05,288 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.279e+02 3.863e+02 4.623e+02 8.979e+02, threshold=7.726e+02, percent-clipped=3.0 2023-10-09 23:32:11,454 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2906036.0, ans=0.125 2023-10-09 23:32:15,505 INFO [train.py:1031] (0/4) Epoch 14, batch 38000, loss[loss=0.2216, simple_loss=0.2664, pruned_loss=0.06593, ctc_loss=0.1124, over 16796.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2834, pruned_loss=0.06208, ctc_loss=0.1091, over 3284122.67 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:32:20,612 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2906082.6666666665, ans=0.0 2023-10-09 23:32:20,634 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2906082.6666666665, ans=0.0 2023-10-09 23:32:45,924 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2906176.0, ans=0.2 2023-10-09 23:33:16,626 INFO [train.py:1031] (0/4) Epoch 14, batch 38050, loss[loss=0.2604, simple_loss=0.3085, pruned_loss=0.08042, ctc_loss=0.1288, over 16709.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2813, pruned_loss=0.06322, ctc_loss=0.1107, over 3289797.20 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:33:34,246 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2906362.6666666665, ans=0.0 2023-10-09 23:33:36,671 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-10-09 23:33:51,539 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2906409.3333333335, ans=0.0 2023-10-09 23:33:54,275 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2906456.0, ans=0.125 2023-10-09 23:34:01,844 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2906456.0, ans=0.025 2023-10-09 23:34:05,601 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2906502.6666666665, ans=0.125 2023-10-09 23:34:10,171 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.275e+02 3.696e+02 4.420e+02 6.425e+02, threshold=7.391e+02, percent-clipped=0.0 2023-10-09 23:34:18,441 INFO [train.py:1031] (0/4) Epoch 14, batch 38100, loss[loss=0.2593, simple_loss=0.3087, pruned_loss=0.07771, ctc_loss=0.1361, over 16836.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.286, pruned_loss=0.06562, ctc_loss=0.1147, over 3288856.08 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:34:34,086 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2906596.0, ans=0.125 2023-10-09 23:34:35,103 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2906596.0, ans=0.0 2023-10-09 23:35:12,919 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-10-09 23:35:23,913 INFO [train.py:1031] (0/4) Epoch 14, batch 38150, loss[loss=0.2629, simple_loss=0.3674, pruned_loss=0.05758, ctc_loss=0.1082, over 16266.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2957, pruned_loss=0.06785, ctc_loss=0.1197, over 3289078.83 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:35:39,993 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2906829.3333333335, ans=0.07 2023-10-09 23:35:45,574 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2906829.3333333335, ans=0.2 2023-10-09 23:36:05,403 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2023-10-09 23:36:18,304 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2906969.3333333335, ans=0.0 2023-10-09 23:36:22,910 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.043e+02 3.850e+02 4.496e+02 5.551e+02 1.259e+03, threshold=8.992e+02, percent-clipped=8.0 2023-10-09 23:36:24,332 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2906969.3333333335, ans=0.2 2023-10-09 23:36:29,637 INFO [train.py:1031] (0/4) Epoch 14, batch 38200, loss[loss=0.3371, simple_loss=0.3611, pruned_loss=0.115, ctc_loss=0.2074, over 16670.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.301, pruned_loss=0.07012, ctc_loss=0.124, over 3293430.56 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:37:03,930 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2907109.3333333335, ans=0.1 2023-10-09 23:37:33,239 INFO [train.py:1031] (0/4) Epoch 14, batch 38250, loss[loss=0.2273, simple_loss=0.2708, pruned_loss=0.06759, ctc_loss=0.1212, over 16806.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.301, pruned_loss=0.06817, ctc_loss=0.1209, over 3290727.59 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:38:29,118 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.325e+02 3.787e+02 4.498e+02 1.020e+03, threshold=7.574e+02, percent-clipped=1.0 2023-10-09 23:38:34,793 INFO [train.py:1031] (0/4) Epoch 14, batch 38300, loss[loss=0.2085, simple_loss=0.265, pruned_loss=0.0555, ctc_loss=0.1023, over 16964.00 frames. ], tot_loss[loss=0.239, simple_loss=0.2963, pruned_loss=0.06708, ctc_loss=0.1188, over 3291381.27 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:38:44,392 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2907482.6666666665, ans=0.0 2023-10-09 23:38:58,477 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.96 vs. limit=15.0 2023-10-09 23:38:59,234 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2907576.0, ans=0.125 2023-10-09 23:39:11,493 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2907622.6666666665, ans=0.1 2023-10-09 23:39:13,721 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:39:14,834 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2907622.6666666665, ans=0.125 2023-10-09 23:39:15,927 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2907622.6666666665, ans=0.1 2023-10-09 23:39:36,990 INFO [train.py:1031] (0/4) Epoch 14, batch 38350, loss[loss=0.2934, simple_loss=0.3608, pruned_loss=0.08338, ctc_loss=0.1482, over 16801.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.302, pruned_loss=0.06882, ctc_loss=0.1223, over 3298611.93 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:39:41,723 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2907716.0, ans=0.125 2023-10-09 23:39:42,881 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2907716.0, ans=0.1 2023-10-09 23:40:12,058 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2907809.3333333335, ans=0.1 2023-10-09 23:40:15,873 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:40:19,712 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2907856.0, ans=0.125 2023-10-09 23:40:21,477 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2907856.0, ans=0.07 2023-10-09 23:40:21,730 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-10-09 23:40:21,828 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=22.5 2023-10-09 23:40:26,116 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2907856.0, ans=0.07 2023-10-09 23:40:27,083 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2907902.6666666665, ans=0.125 2023-10-09 23:40:35,688 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.571e+02 4.561e+02 5.514e+02 1.040e+03, threshold=9.121e+02, percent-clipped=3.0 2023-10-09 23:40:41,315 INFO [train.py:1031] (0/4) Epoch 14, batch 38400, loss[loss=0.2513, simple_loss=0.2962, pruned_loss=0.07669, ctc_loss=0.1323, over 16839.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.3072, pruned_loss=0.07104, ctc_loss=0.1257, over 3300307.81 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:41:00,487 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2907996.0, ans=0.0 2023-10-09 23:41:18,643 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2908089.3333333335, ans=0.5 2023-10-09 23:41:22,216 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.46 vs. limit=10.0 2023-10-09 23:41:29,486 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2908089.3333333335, ans=0.125 2023-10-09 23:41:36,539 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2023-10-09 23:41:43,999 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2908182.6666666665, ans=0.2 2023-10-09 23:41:44,675 INFO [train.py:1031] (0/4) Epoch 14, batch 38450, loss[loss=0.2141, simple_loss=0.2785, pruned_loss=0.05627, ctc_loss=0.09287, over 16718.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.3044, pruned_loss=0.06952, ctc_loss=0.1227, over 3305429.60 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:42:11,990 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2908276.0, ans=0.125 2023-10-09 23:42:16,937 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2908276.0, ans=0.125 2023-10-09 23:42:37,457 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2908369.3333333335, ans=0.1 2023-10-09 23:42:41,092 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2908369.3333333335, ans=0.125 2023-10-09 23:42:42,962 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.189e+02 3.714e+02 4.508e+02 1.225e+03, threshold=7.428e+02, percent-clipped=2.0 2023-10-09 23:42:47,042 INFO [train.py:1031] (0/4) Epoch 14, batch 38500, loss[loss=0.2636, simple_loss=0.3071, pruned_loss=0.08211, ctc_loss=0.1395, over 16600.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.3049, pruned_loss=0.06873, ctc_loss=0.1215, over 3310126.23 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:42:48,713 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2023-10-09 23:43:08,995 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2908462.6666666665, ans=0.04949747468305833 2023-10-09 23:43:18,820 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2908509.3333333335, ans=0.2 2023-10-09 23:43:24,777 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2908556.0, ans=0.125 2023-10-09 23:43:31,319 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2908556.0, ans=0.0 2023-10-09 23:43:49,235 INFO [train.py:1031] (0/4) Epoch 14, batch 38550, loss[loss=0.2341, simple_loss=0.2836, pruned_loss=0.06877, ctc_loss=0.1177, over 17069.00 frames. ], tot_loss[loss=0.2459, simple_loss=0.3034, pruned_loss=0.06959, ctc_loss=0.1233, over 3312502.52 frames. ], batch size: 216, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:43:57,399 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2908649.3333333335, ans=0.125 2023-10-09 23:44:36,009 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2908836.0, ans=0.0 2023-10-09 23:44:48,216 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+02 3.231e+02 3.757e+02 4.445e+02 8.253e+02, threshold=7.513e+02, percent-clipped=2.0 2023-10-09 23:44:49,800 INFO [train.py:1031] (0/4) Epoch 14, batch 38600, loss[loss=0.2243, simple_loss=0.2691, pruned_loss=0.06541, ctc_loss=0.122, over 16449.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.2977, pruned_loss=0.06925, ctc_loss=0.1219, over 3306798.75 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:45:07,620 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2908929.3333333335, ans=0.125 2023-10-09 23:45:07,691 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2908929.3333333335, ans=0.125 2023-10-09 23:45:10,808 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2908929.3333333335, ans=0.125 2023-10-09 23:45:10,819 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2908929.3333333335, ans=0.125 2023-10-09 23:45:25,947 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2023-10-09 23:45:27,885 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2909022.6666666665, ans=0.125 2023-10-09 23:45:35,843 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2909022.6666666665, ans=0.035 2023-10-09 23:45:51,535 INFO [train.py:1031] (0/4) Epoch 14, batch 38650, loss[loss=0.2055, simple_loss=0.2456, pruned_loss=0.06039, ctc_loss=0.1116, over 16525.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2933, pruned_loss=0.06913, ctc_loss=0.1213, over 3313576.48 frames. ], batch size: 466, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:46:19,906 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-10-09 23:46:21,065 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.84 vs. limit=6.0 2023-10-09 23:46:21,746 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2909209.3333333335, ans=0.0 2023-10-09 23:46:45,485 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2909302.6666666665, ans=0.125 2023-10-09 23:46:47,047 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2023-10-09 23:46:54,843 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.287e+02 3.697e+02 4.587e+02 9.347e+02, threshold=7.394e+02, percent-clipped=1.0 2023-10-09 23:46:54,871 INFO [train.py:1031] (0/4) Epoch 14, batch 38700, loss[loss=0.2625, simple_loss=0.3284, pruned_loss=0.07304, ctc_loss=0.1262, over 16829.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.291, pruned_loss=0.06888, ctc_loss=0.1209, over 3310922.56 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:46:55,086 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2909349.3333333335, ans=0.0 2023-10-09 23:47:08,226 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2909396.0, ans=0.125 2023-10-09 23:47:18,014 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2909396.0, ans=0.2 2023-10-09 23:47:20,986 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2909442.6666666665, ans=0.1 2023-10-09 23:47:26,066 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2909442.6666666665, ans=0.0 2023-10-09 23:47:28,874 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2909442.6666666665, ans=0.0 2023-10-09 23:47:36,224 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2909489.3333333335, ans=0.05 2023-10-09 23:47:54,639 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-10-09 23:47:58,604 INFO [train.py:1031] (0/4) Epoch 14, batch 38750, loss[loss=0.1414, simple_loss=0.1849, pruned_loss=0.03681, ctc_loss=0.06051, over 16693.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2924, pruned_loss=0.06902, ctc_loss=0.1213, over 3309676.59 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:48:07,715 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2909582.6666666665, ans=0.125 2023-10-09 23:48:21,811 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2909629.3333333335, ans=0.125 2023-10-09 23:48:52,171 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2909769.3333333335, ans=0.125 2023-10-09 23:49:02,385 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 3.418e+02 4.112e+02 5.352e+02 1.046e+03, threshold=8.225e+02, percent-clipped=4.0 2023-10-09 23:49:02,412 INFO [train.py:1031] (0/4) Epoch 14, batch 38800, loss[loss=0.2364, simple_loss=0.3154, pruned_loss=0.05639, ctc_loss=0.1113, over 16856.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2959, pruned_loss=0.06623, ctc_loss=0.117, over 3308711.22 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:49:03,118 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-10-09 23:49:15,349 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-10-09 23:49:16,792 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2909862.6666666665, ans=0.0 2023-10-09 23:50:04,805 INFO [train.py:1031] (0/4) Epoch 14, batch 38850, loss[loss=0.226, simple_loss=0.2896, pruned_loss=0.05911, ctc_loss=0.1104, over 16406.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.3008, pruned_loss=0.06563, ctc_loss=0.1172, over 3310976.94 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:50:13,414 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2910049.3333333335, ans=0.035 2023-10-09 23:50:13,511 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2910049.3333333335, ans=0.0 2023-10-09 23:50:22,648 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2910096.0, ans=0.2 2023-10-09 23:51:06,340 INFO [train.py:1031] (0/4) Epoch 14, batch 38900, loss[loss=0.2606, simple_loss=0.3092, pruned_loss=0.07958, ctc_loss=0.1319, over 16878.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2972, pruned_loss=0.06612, ctc_loss=0.1177, over 3302782.24 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:51:07,979 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+02 3.456e+02 4.311e+02 5.586e+02 1.002e+03, threshold=8.621e+02, percent-clipped=2.0 2023-10-09 23:51:10,866 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2910282.6666666665, ans=0.035 2023-10-09 23:51:14,079 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-10-09 23:51:20,885 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2910329.3333333335, ans=22.5 2023-10-09 23:51:22,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2910329.3333333335, ans=0.0 2023-10-09 23:51:38,519 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-10-09 23:51:52,237 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2910422.6666666665, ans=0.125 2023-10-09 23:52:09,369 INFO [train.py:1031] (0/4) Epoch 14, batch 38950, loss[loss=0.24, simple_loss=0.28, pruned_loss=0.07322, ctc_loss=0.1337, over 16607.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2921, pruned_loss=0.06603, ctc_loss=0.1168, over 3308137.78 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:52:12,530 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2910516.0, ans=0.125 2023-10-09 23:52:13,683 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2910516.0, ans=0.2 2023-10-09 23:52:33,586 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2910562.6666666665, ans=0.5 2023-10-09 23:52:40,341 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-10-09 23:52:41,016 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2910609.3333333335, ans=0.125 2023-10-09 23:53:14,604 INFO [train.py:1031] (0/4) Epoch 14, batch 39000, loss[loss=0.2632, simple_loss=0.3242, pruned_loss=0.07663, ctc_loss=0.1224, over 16761.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2922, pruned_loss=0.06686, ctc_loss=0.1182, over 3308833.87 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:53:14,604 INFO [train.py:1054] (0/4) Computing validation loss 2023-10-09 23:53:33,419 INFO [train.py:1063] (0/4) Epoch 14, validation: loss=0.2363, simple_loss=0.3035, pruned_loss=0.06558, ctc_loss=0.09478, over 1796401.00 frames. 2023-10-09 23:53:33,420 INFO [train.py:1064] (0/4) Maximum memory allocated so far is 14587MB 2023-10-09 23:53:35,525 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+02 3.267e+02 3.662e+02 4.475e+02 7.642e+02, threshold=7.323e+02, percent-clipped=0.0 2023-10-09 23:53:36,611 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2910749.3333333335, ans=0.125 2023-10-09 23:53:47,607 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2910796.0, ans=0.125 2023-10-09 23:53:52,094 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2910796.0, ans=0.1 2023-10-09 23:54:04,713 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2910842.6666666665, ans=0.125 2023-10-09 23:54:09,749 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2910889.3333333335, ans=0.0 2023-10-09 23:54:23,279 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2910936.0, ans=0.0 2023-10-09 23:54:32,254 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2910936.0, ans=0.1 2023-10-09 23:54:34,749 INFO [train.py:1031] (0/4) Epoch 14, batch 39050, loss[loss=0.2246, simple_loss=0.2792, pruned_loss=0.0631, ctc_loss=0.1097, over 16953.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2913, pruned_loss=0.06824, ctc_loss=0.1201, over 3302557.70 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:54:44,136 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2910982.6666666665, ans=0.04949747468305833 2023-10-09 23:54:49,467 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=22.5 2023-10-09 23:54:56,183 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2911029.3333333335, ans=0.125 2023-10-09 23:55:21,896 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2911169.3333333335, ans=0.2 2023-10-09 23:55:33,231 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2911169.3333333335, ans=0.1 2023-10-09 23:55:35,625 INFO [train.py:1031] (0/4) Epoch 14, batch 39100, loss[loss=0.2081, simple_loss=0.2494, pruned_loss=0.06168, ctc_loss=0.1086, over 16779.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2826, pruned_loss=0.06658, ctc_loss=0.117, over 3303520.59 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:55:39,883 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.243e+02 3.619e+02 4.225e+02 8.592e+02, threshold=7.239e+02, percent-clipped=2.0 2023-10-09 23:56:16,491 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=22.5 2023-10-09 23:56:26,384 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2911402.6666666665, ans=0.125 2023-10-09 23:56:33,228 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2911402.6666666665, ans=0.04949747468305833 2023-10-09 23:56:37,957 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2911402.6666666665, ans=0.1 2023-10-09 23:56:39,875 INFO [train.py:1031] (0/4) Epoch 14, batch 39150, loss[loss=0.2167, simple_loss=0.2561, pruned_loss=0.06545, ctc_loss=0.1158, over 16949.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2846, pruned_loss=0.06586, ctc_loss=0.1157, over 3288342.53 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:57:44,726 INFO [train.py:1031] (0/4) Epoch 14, batch 39200, loss[loss=0.2672, simple_loss=0.3576, pruned_loss=0.06567, ctc_loss=0.1138, over 15057.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2893, pruned_loss=0.06504, ctc_loss=0.1145, over 3282788.40 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:57:49,007 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+02 3.956e+02 5.075e+02 6.707e+02 1.311e+03, threshold=1.015e+03, percent-clipped=19.0 2023-10-09 23:58:06,793 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2911729.3333333335, ans=0.05 2023-10-09 23:58:16,152 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2911776.0, ans=0.125 2023-10-09 23:58:18,314 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2911776.0, ans=0.2 2023-10-09 23:58:28,553 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2911822.6666666665, ans=0.125 2023-10-09 23:58:47,254 INFO [train.py:1031] (0/4) Epoch 14, batch 39250, loss[loss=0.2263, simple_loss=0.2784, pruned_loss=0.06663, ctc_loss=0.1023, over 16910.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.287, pruned_loss=0.06434, ctc_loss=0.1119, over 3270388.40 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:58:52,363 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.85 vs. limit=22.5 2023-10-09 23:58:55,050 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2911916.0, ans=0.0 2023-10-09 23:59:09,371 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-w-ctc/checkpoint-624000.pt 2023-10-09 23:59:38,587 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2912056.0, ans=0.0 2023-10-09 23:59:51,394 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912102.6666666665, ans=0.1 2023-10-09 23:59:51,829 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.36 vs. limit=22.5 2023-10-09 23:59:53,304 INFO [train.py:1031] (0/4) Epoch 14, batch 39300, loss[loss=0.1595, simple_loss=0.2032, pruned_loss=0.04313, ctc_loss=0.07367, over 16724.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2875, pruned_loss=0.06307, ctc_loss=0.1091, over 3261378.78 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:59:54,375 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2912149.3333333335, ans=0.0 2023-10-10 00:00:00,604 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+02 3.228e+02 3.774e+02 4.947e+02 8.395e+02, threshold=7.547e+02, percent-clipped=0.0 2023-10-10 00:00:06,700 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2912196.0, ans=0.05 2023-10-10 00:00:30,751 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2912242.6666666665, ans=0.1 2023-10-10 00:00:36,183 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2912289.3333333335, ans=0.0 2023-10-10 00:00:43,305 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2912289.3333333335, ans=0.125 2023-10-10 00:00:57,813 INFO [train.py:1031] (0/4) Epoch 14, batch 39350, loss[loss=0.1919, simple_loss=0.2493, pruned_loss=0.05046, ctc_loss=0.0841, over 16742.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2849, pruned_loss=0.05957, ctc_loss=0.1035, over 3258380.02 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:01:00,705 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2912382.6666666665, ans=0.125 2023-10-10 00:01:31,826 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:01:59,507 INFO [train.py:1031] (0/4) Epoch 14, batch 39400, loss[loss=0.2084, simple_loss=0.2666, pruned_loss=0.05474, ctc_loss=0.1018, over 16783.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2855, pruned_loss=0.05963, ctc_loss=0.1041, over 3273027.47 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:02:06,854 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.092e+02 4.162e+02 5.159e+02 1.181e+03, threshold=8.323e+02, percent-clipped=5.0 2023-10-10 00:02:08,006 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2912616.0, ans=0.025 2023-10-10 00:02:24,562 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2912709.3333333335, ans=0.125 2023-10-10 00:02:32,546 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2912709.3333333335, ans=0.95 2023-10-10 00:02:48,231 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2912802.6666666665, ans=0.0 2023-10-10 00:02:56,531 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2912802.6666666665, ans=0.2 2023-10-10 00:02:59,394 INFO [train.py:1031] (0/4) Epoch 14, batch 39450, loss[loss=0.1678, simple_loss=0.2485, pruned_loss=0.03214, ctc_loss=0.057, over 16795.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2796, pruned_loss=0.0597, ctc_loss=0.1043, over 3279894.22 frames. ], batch size: 242, lr: 2.51e-03, grad_scale: 1.0 2023-10-10 00:03:04,808 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=12.0 2023-10-10 00:03:08,890 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2912849.3333333335, ans=0.035 2023-10-10 00:03:13,906 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2912896.0, ans=0.1 2023-10-10 00:03:16,175 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2912896.0, ans=0.5 2023-10-10 00:03:21,286 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2912896.0, ans=0.05 2023-10-10 00:03:36,165 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2912989.3333333335, ans=0.0 2023-10-10 00:03:52,267 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2913036.0, ans=0.125 2023-10-10 00:04:00,454 INFO [train.py:1031] (0/4) Epoch 14, batch 39500, loss[loss=0.1989, simple_loss=0.2607, pruned_loss=0.05057, ctc_loss=0.09, over 16929.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.271, pruned_loss=0.05535, ctc_loss=0.09689, over 3274513.48 frames. ], batch size: 243, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:04:10,626 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.656e+02 3.168e+02 3.988e+02 1.383e+03, threshold=6.335e+02, percent-clipped=1.0 2023-10-10 00:04:14,368 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=22.5 2023-10-10 00:04:14,373 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.29 vs. limit=15.0 2023-10-10 00:04:26,912 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2913176.0, ans=0.125 2023-10-10 00:04:28,054 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2913176.0, ans=0.0 2023-10-10 00:04:38,330 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2913222.6666666665, ans=0.0 2023-10-10 00:04:41,803 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2913222.6666666665, ans=0.015 2023-10-10 00:04:58,838 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2913269.3333333335, ans=0.125 2023-10-10 00:05:01,683 INFO [train.py:1031] (0/4) Epoch 14, batch 39550, loss[loss=0.2215, simple_loss=0.28, pruned_loss=0.05988, ctc_loss=0.1082, over 16976.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2716, pruned_loss=0.05717, ctc_loss=0.1001, over 3281012.13 frames. ], batch size: 215, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:05:22,155 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2913362.6666666665, ans=0.1 2023-10-10 00:05:25,931 INFO [scaling.py:1069] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:05:36,272 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=22.5 2023-10-10 00:05:54,041 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-10-10 00:06:03,880 INFO [train.py:1031] (0/4) Epoch 14, batch 39600, loss[loss=0.2195, simple_loss=0.2952, pruned_loss=0.05224, ctc_loss=0.0982, over 15251.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2738, pruned_loss=0.05583, ctc_loss=0.09828, over 3288143.26 frames. ], batch size: 527, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:06:13,821 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.076e+02 3.392e+02 3.895e+02 1.156e+03, threshold=6.785e+02, percent-clipped=2.0 2023-10-10 00:06:49,307 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2913689.3333333335, ans=0.125 2023-10-10 00:06:59,075 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2913736.0, ans=0.125 2023-10-10 00:07:02,924 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2913736.0, ans=0.125 2023-10-10 00:07:06,375 INFO [train.py:1031] (0/4) Epoch 14, batch 39650, loss[loss=0.248, simple_loss=0.3, pruned_loss=0.07207, ctc_loss=0.1299, over 16213.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2805, pruned_loss=0.05967, ctc_loss=0.1048, over 3296388.14 frames. ], batch size: 463, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:07:16,131 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2913782.6666666665, ans=0.2 2023-10-10 00:07:17,350 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-10-10 00:07:25,937 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2913829.3333333335, ans=0.1 2023-10-10 00:07:35,517 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2913876.0, ans=0.2 2023-10-10 00:07:39,932 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-10-10 00:07:59,653 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-10-10 00:08:04,553 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=12.0 2023-10-10 00:08:09,824 INFO [train.py:1031] (0/4) Epoch 14, batch 39700, loss[loss=0.2405, simple_loss=0.2898, pruned_loss=0.07137, ctc_loss=0.1213, over 16774.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2873, pruned_loss=0.06351, ctc_loss=0.111, over 3298325.97 frames. ], batch size: 151, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:08:14,614 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914016.0, ans=0.1 2023-10-10 00:08:17,290 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2914016.0, ans=0.0 2023-10-10 00:08:19,931 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-10-10 00:08:21,353 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+02 3.745e+02 4.250e+02 5.439e+02 1.201e+03, threshold=8.500e+02, percent-clipped=8.0 2023-10-10 00:08:26,298 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2914062.6666666665, ans=0.0 2023-10-10 00:08:32,665 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2914062.6666666665, ans=0.125 2023-10-10 00:08:58,328 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2914156.0, ans=0.125 2023-10-10 00:09:08,431 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914202.6666666665, ans=0.1 2023-10-10 00:09:13,571 INFO [train.py:1031] (0/4) Epoch 14, batch 39750, loss[loss=0.2554, simple_loss=0.2766, pruned_loss=0.08603, ctc_loss=0.1556, over 16486.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.287, pruned_loss=0.06539, ctc_loss=0.114, over 3298946.55 frames. ], batch size: 418, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:09:31,308 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2914296.0, ans=0.0 2023-10-10 00:09:55,445 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2914389.3333333335, ans=0.0 2023-10-10 00:10:03,132 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2914436.0, ans=0.025 2023-10-10 00:10:13,895 INFO [train.py:1031] (0/4) Epoch 14, batch 39800, loss[loss=0.2146, simple_loss=0.2187, pruned_loss=0.07536, ctc_loss=0.1496, over 15530.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2798, pruned_loss=0.06535, ctc_loss=0.114, over 3304400.40 frames. ], batch size: 533, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:10:15,545 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-10-10 00:10:17,896 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2914482.6666666665, ans=0.125 2023-10-10 00:10:19,629 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2914482.6666666665, ans=0.2 2023-10-10 00:10:20,671 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2914482.6666666665, ans=0.125 2023-10-10 00:10:21,810 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2914482.6666666665, ans=0.125 2023-10-10 00:10:26,705 INFO [optim.py:471] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+02 3.167e+02 3.567e+02 4.087e+02 1.118e+03, threshold=7.135e+02, percent-clipped=1.0 2023-10-10 00:10:27,119 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2914529.3333333335, ans=0.125 2023-10-10 00:10:33,497 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2914529.3333333335, ans=0.0 2023-10-10 00:10:39,911 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2914576.0, ans=0.125 2023-10-10 00:10:42,992 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2023-10-10 00:10:44,829 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2914576.0, ans=0.1 2023-10-10 00:11:15,154 INFO [train.py:1031] (0/4) Epoch 14, batch 39850, loss[loss=0.1978, simple_loss=0.241, pruned_loss=0.05738, ctc_loss=0.09928, over 16808.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2727, pruned_loss=0.06417, ctc_loss=0.112, over 3312531.30 frames. ], batch size: 176, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:11:19,635 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2914716.0, ans=0.2 2023-10-10 00:11:38,879 INFO [scaling.py:979] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-10-10 00:11:42,939 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914809.3333333335, ans=0.1 2023-10-10 00:11:42,966 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2914809.3333333335, ans=0.125 2023-10-10 00:11:52,583 INFO [scaling.py:199] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2914856.0, ans=0.125