2023-10-09 10:20:55,469 INFO [train.py:1099] (3/4) Training started 2023-10-09 10:20:55,469 INFO [train.py:1109] (3/4) Device: cuda:3 2023-10-09 10:20:55,473 INFO [train.py:1121] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '821ebc378e7fb99b8adc81950227963332821e01', 'k2-git-date': 'Wed Jul 19 15:38:25 2023', 'lhotse-version': '1.16.0.dev+git.1db4d97a.clean', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev_multi_zh-hans', 'icefall-git-sha1': '919793d-dirty', 'icefall-git-date': 'Thu Sep 7 21:06:37 2023', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.3.dev20230721+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.16.0.dev0+git.1db4d97a.clean-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-1-1220091118-57c4d55446-mvd6x', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 14, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-w-ctc'), 'bpe_model': 'data/lang_bpe_2000/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 700, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'vocab_size': 2000} 2023-10-09 10:20:55,473 INFO [train.py:1123] (3/4) About to create model 2023-10-09 10:20:56,059 INFO [train.py:1127] (3/4) Number of model parameters: 69651511 2023-10-09 10:20:56,059 INFO [checkpoint.py:112] (3/4) Loading checkpoint from zipformer/exp-w-ctc/epoch-13.pt 2023-10-09 10:21:02,356 INFO [train.py:1142] (3/4) Using DDP 2023-10-09 10:21:05,311 INFO [train.py:1154] (3/4) Loading optimizer state dict 2023-10-09 10:21:06,007 INFO [train.py:1162] (3/4) Loading scheduler state dict 2023-10-09 10:21:06,007 INFO [multi_dataset.py:52] (3/4) About to get multidataset train cuts 2023-10-09 10:21:06,007 INFO [multi_dataset.py:55] (3/4) Loading THCHS-30 in lazy mode 2023-10-09 10:21:06,010 INFO [multi_dataset.py:61] (3/4) Loading Aishell-1 in lazy mode 2023-10-09 10:21:06,012 INFO [multi_dataset.py:67] (3/4) Loading Aishell-2 in lazy mode 2023-10-09 10:21:06,013 INFO [multi_dataset.py:73] (3/4) Loading Aishell-4 in lazy mode 2023-10-09 10:21:06,016 INFO [multi_dataset.py:85] (3/4) Loading ST-CMDS in lazy mode 2023-10-09 10:21:06,017 INFO [multi_dataset.py:89] (3/4) Loading Primewords in lazy mode 2023-10-09 10:21:06,018 INFO [multi_dataset.py:95] (3/4) Loading MagicData in lazy mode 2023-10-09 10:21:06,019 INFO [multi_dataset.py:101] (3/4) Loading Aidatatang_200zh in lazy mode 2023-10-09 10:21:06,020 INFO [multi_dataset.py:107] (3/4) Loading Ali-Meeting in lazy mode 2023-10-09 10:21:06,021 INFO [multi_dataset.py:113] (3/4) Loading WeNetSpeech in lazy mode 2023-10-09 10:21:06,022 INFO [multi_dataset.py:119] (3/4) Loading KeSpeech in lazy mode 2023-10-09 10:22:53,148 INFO [asr_datamodule.py:218] (3/4) Enable MUSAN 2023-10-09 10:22:53,148 INFO [asr_datamodule.py:219] (3/4) About to get Musan cuts 2023-10-09 10:22:55,403 INFO [asr_datamodule.py:243] (3/4) Enable SpecAugment 2023-10-09 10:22:55,404 INFO [asr_datamodule.py:244] (3/4) Time warp factor: 80 2023-10-09 10:22:55,404 INFO [asr_datamodule.py:254] (3/4) Num frame mask: 10 2023-10-09 10:22:55,404 INFO [asr_datamodule.py:267] (3/4) About to create train dataset 2023-10-09 10:22:55,404 INFO [asr_datamodule.py:294] (3/4) Using DynamicBucketingSampler. 2023-10-09 10:22:59,044 INFO [asr_datamodule.py:309] (3/4) About to create train dataloader 2023-10-09 10:22:59,044 INFO [multi_dataset.py:161] (3/4) About to get multidataset dev cuts 2023-10-09 10:22:59,044 INFO [multi_dataset.py:164] (3/4) Loading Aidatatang_200zh DEV set in lazy mode 2023-10-09 10:22:59,059 INFO [multi_dataset.py:170] (3/4) Loading Aishell DEV set in lazy mode 2023-10-09 10:22:59,060 INFO [multi_dataset.py:176] (3/4) Loading Aishell-2 DEV set in lazy mode 2023-10-09 10:22:59,062 INFO [multi_dataset.py:182] (3/4) Loading Ali-Meeting DEV set in lazy mode 2023-10-09 10:22:59,063 INFO [multi_dataset.py:188] (3/4) Loading MagicData DEV set in lazy mode 2023-10-09 10:22:59,065 INFO [multi_dataset.py:194] (3/4) Loading KeSpeech DEV set in lazy mode 2023-10-09 10:22:59,068 INFO [multi_dataset.py:203] (3/4) Loading WeNetSpeech DEV set in lazy mode 2023-10-09 10:22:59,072 INFO [asr_datamodule.py:340] (3/4) About to create dev dataset 2023-10-09 10:22:59,555 INFO [asr_datamodule.py:357] (3/4) About to create dev dataloader 2023-10-09 10:22:59,555 INFO [train.py:1243] (3/4) Loading grad scaler state dict 2023-10-09 10:23:19,457 INFO [train.py:1031] (3/4) Epoch 14, batch 0, loss[loss=0.195, simple_loss=0.2618, pruned_loss=0.04753, ctc_loss=0.08269, over 16760.00 frames. ], tot_loss[loss=0.195, simple_loss=0.2618, pruned_loss=0.04753, ctc_loss=0.08269, over 16760.00 frames. ], batch size: 95, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:23:19,457 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 10:23:33,190 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2325, simple_loss=0.3081, pruned_loss=0.06029, ctc_loss=0.09091, over 1796401.00 frames. 2023-10-09 10:23:33,190 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 13038MB 2023-10-09 10:23:48,865 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.348e+02 4.018e+02 4.917e+02 9.056e+02, threshold=8.035e+02, percent-clipped=7.0 2023-10-09 10:24:06,958 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2728842.6666666665, ans=0.0 2023-10-09 10:24:12,931 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-10-09 10:24:30,520 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.65 vs. limit=6.0 2023-10-09 10:24:33,526 INFO [train.py:1031] (3/4) Epoch 14, batch 50, loss[loss=0.2228, simple_loss=0.3025, pruned_loss=0.05292, ctc_loss=0.09321, over 16783.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2833, pruned_loss=0.06256, ctc_loss=0.1094, over 736032.63 frames. ], batch size: 272, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:24:41,357 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2728982.6666666665, ans=0.2 2023-10-09 10:24:49,453 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2729029.3333333335, ans=0.125 2023-10-09 10:24:58,065 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2729076.0, ans=0.0 2023-10-09 10:25:04,704 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2729076.0, ans=0.2 2023-10-09 10:25:10,078 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2729122.6666666665, ans=0.125 2023-10-09 10:25:10,123 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2729122.6666666665, ans=0.2 2023-10-09 10:25:25,478 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2729169.3333333335, ans=0.0 2023-10-09 10:25:29,863 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2729169.3333333335, ans=0.2 2023-10-09 10:25:34,390 INFO [train.py:1031] (3/4) Epoch 14, batch 100, loss[loss=0.3003, simple_loss=0.3573, pruned_loss=0.08804, ctc_loss=0.1682, over 16512.00 frames. ], tot_loss[loss=0.243, simple_loss=0.3016, pruned_loss=0.06828, ctc_loss=0.1197, over 1302782.87 frames. ], batch size: 416, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:25:49,380 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.274e+02 3.784e+02 4.390e+02 8.009e+02, threshold=7.568e+02, percent-clipped=0.0 2023-10-09 10:25:59,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2729309.3333333335, ans=0.125 2023-10-09 10:26:15,412 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2729356.0, ans=0.0 2023-10-09 10:26:21,992 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2023-10-09 10:26:33,804 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2729449.3333333335, ans=0.125 2023-10-09 10:26:34,425 INFO [train.py:1031] (3/4) Epoch 14, batch 150, loss[loss=0.2299, simple_loss=0.2947, pruned_loss=0.06097, ctc_loss=0.1077, over 16826.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.3118, pruned_loss=0.06704, ctc_loss=0.1188, over 1751277.09 frames. ], batch size: 141, lr: 2.60e-03, grad_scale: 1.0 2023-10-09 10:26:43,961 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2729449.3333333335, ans=0.125 2023-10-09 10:26:44,872 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2729449.3333333335, ans=0.1 2023-10-09 10:26:59,206 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2729542.6666666665, ans=0.0 2023-10-09 10:27:02,851 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2729542.6666666665, ans=0.125 2023-10-09 10:27:28,456 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2729636.0, ans=0.0 2023-10-09 10:27:31,093 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2729636.0, ans=0.125 2023-10-09 10:27:36,038 INFO [train.py:1031] (3/4) Epoch 14, batch 200, loss[loss=0.2128, simple_loss=0.2711, pruned_loss=0.05863, ctc_loss=0.09311, over 16501.00 frames. ], tot_loss[loss=0.2491, simple_loss=0.3132, pruned_loss=0.06827, ctc_loss=0.1211, over 2093311.79 frames. ], batch size: 110, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:27:48,304 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2729729.3333333335, ans=0.125 2023-10-09 10:27:52,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2729729.3333333335, ans=0.2 2023-10-09 10:27:54,280 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.044e+02 3.577e+02 4.251e+02 7.739e+02, threshold=7.154e+02, percent-clipped=1.0 2023-10-09 10:27:56,342 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2729729.3333333335, ans=0.125 2023-10-09 10:28:35,919 INFO [train.py:1031] (3/4) Epoch 14, batch 250, loss[loss=0.212, simple_loss=0.286, pruned_loss=0.05088, ctc_loss=0.09084, over 16860.00 frames. ], tot_loss[loss=0.2459, simple_loss=0.3111, pruned_loss=0.06668, ctc_loss=0.1182, over 2345993.31 frames. ], batch size: 215, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:28:38,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2729916.0, ans=0.0 2023-10-09 10:29:37,208 INFO [train.py:1031] (3/4) Epoch 14, batch 300, loss[loss=0.2468, simple_loss=0.3072, pruned_loss=0.06897, ctc_loss=0.1213, over 16881.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.3055, pruned_loss=0.06454, ctc_loss=0.1145, over 2546423.46 frames. ], batch size: 215, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:29:45,214 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2730149.3333333335, ans=0.0 2023-10-09 10:29:54,689 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2730196.0, ans=0.5 2023-10-09 10:29:56,420 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+02 3.126e+02 3.650e+02 4.282e+02 7.513e+02, threshold=7.299e+02, percent-clipped=1.0 2023-10-09 10:30:28,634 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-10-09 10:30:34,229 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2730336.0, ans=0.125 2023-10-09 10:30:38,034 INFO [train.py:1031] (3/4) Epoch 14, batch 350, loss[loss=0.2288, simple_loss=0.2821, pruned_loss=0.06457, ctc_loss=0.1156, over 16907.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.3052, pruned_loss=0.06781, ctc_loss=0.12, over 2712979.01 frames. ], batch size: 228, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:30:38,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2730382.6666666665, ans=0.0 2023-10-09 10:30:42,163 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2730382.6666666665, ans=0.125 2023-10-09 10:30:46,863 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=8.0 2023-10-09 10:31:01,118 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2730476.0, ans=0.0 2023-10-09 10:31:02,514 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=22.5 2023-10-09 10:31:11,725 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=22.5 2023-10-09 10:31:19,711 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-10-09 10:31:38,088 INFO [train.py:1031] (3/4) Epoch 14, batch 400, loss[loss=0.2378, simple_loss=0.2874, pruned_loss=0.06918, ctc_loss=0.1246, over 16873.00 frames. ], tot_loss[loss=0.244, simple_loss=0.301, pruned_loss=0.06919, ctc_loss=0.1217, over 2848160.96 frames. ], batch size: 328, lr: 2.60e-03, grad_scale: 8.0 2023-10-09 10:31:57,468 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.680e+02 3.285e+02 3.968e+02 4.685e+02 8.332e+02, threshold=7.936e+02, percent-clipped=1.0 2023-10-09 10:32:02,677 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2730709.3333333335, ans=0.125 2023-10-09 10:32:10,708 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2730709.3333333335, ans=0.04949747468305833 2023-10-09 10:32:13,977 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:32:24,334 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2730756.0, ans=0.025 2023-10-09 10:32:24,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2730756.0, ans=0.125 2023-10-09 10:32:39,448 INFO [train.py:1031] (3/4) Epoch 14, batch 450, loss[loss=0.2188, simple_loss=0.2697, pruned_loss=0.06223, ctc_loss=0.1083, over 17019.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2989, pruned_loss=0.06841, ctc_loss=0.1201, over 2949818.14 frames. ], batch size: 86, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:32:43,673 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2730849.3333333335, ans=0.1 2023-10-09 10:32:49,792 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2023-10-09 10:32:53,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2730896.0, ans=0.0 2023-10-09 10:33:06,649 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2730942.6666666665, ans=0.125 2023-10-09 10:33:10,106 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=22.5 2023-10-09 10:33:40,962 INFO [train.py:1031] (3/4) Epoch 14, batch 500, loss[loss=0.1881, simple_loss=0.2367, pruned_loss=0.05157, ctc_loss=0.09098, over 16856.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2932, pruned_loss=0.06612, ctc_loss=0.116, over 3026828.96 frames. ], batch size: 189, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:33:45,383 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2731082.6666666665, ans=0.1 2023-10-09 10:33:52,740 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2731129.3333333335, ans=0.125 2023-10-09 10:34:00,456 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.135e+02 3.674e+02 4.514e+02 8.848e+02, threshold=7.348e+02, percent-clipped=4.0 2023-10-09 10:34:01,802 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2731129.3333333335, ans=0.0 2023-10-09 10:34:41,171 INFO [train.py:1031] (3/4) Epoch 14, batch 550, loss[loss=0.2128, simple_loss=0.2693, pruned_loss=0.05891, ctc_loss=0.09615, over 16864.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2849, pruned_loss=0.06555, ctc_loss=0.1147, over 3081563.54 frames. ], batch size: 243, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:34:48,625 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2731316.0, ans=0.1 2023-10-09 10:35:17,539 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2731456.0, ans=0.125 2023-10-09 10:35:17,573 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2731456.0, ans=0.0 2023-10-09 10:35:19,350 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=2731456.0, ans=0.02 2023-10-09 10:35:31,229 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:35:42,187 INFO [train.py:1031] (3/4) Epoch 14, batch 600, loss[loss=0.209, simple_loss=0.2631, pruned_loss=0.05752, ctc_loss=0.09961, over 16743.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2786, pruned_loss=0.06395, ctc_loss=0.1119, over 3120161.49 frames. ], batch size: 272, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:36:02,847 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 3.049e+02 3.429e+02 4.091e+02 7.448e+02, threshold=6.859e+02, percent-clipped=1.0 2023-10-09 10:36:13,030 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2731642.6666666665, ans=0.0 2023-10-09 10:36:16,225 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2731642.6666666665, ans=0.125 2023-10-09 10:36:22,279 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2731689.3333333335, ans=0.05 2023-10-09 10:36:39,624 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2731736.0, ans=0.125 2023-10-09 10:36:39,674 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2731736.0, ans=0.125 2023-10-09 10:36:43,669 INFO [train.py:1031] (3/4) Epoch 14, batch 650, loss[loss=0.1751, simple_loss=0.2286, pruned_loss=0.04519, ctc_loss=0.07813, over 16529.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2734, pruned_loss=0.06336, ctc_loss=0.1111, over 3164586.21 frames. ], batch size: 110, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:36:54,868 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:36:55,833 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2731829.3333333335, ans=0.125 2023-10-09 10:37:17,113 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-10-09 10:37:43,523 INFO [train.py:1031] (3/4) Epoch 14, batch 700, loss[loss=0.184, simple_loss=0.2664, pruned_loss=0.03759, ctc_loss=0.06583, over 16826.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2701, pruned_loss=0.06002, ctc_loss=0.1055, over 3180918.02 frames. ], batch size: 228, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:37:59,230 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2732062.6666666665, ans=0.125 2023-10-09 10:38:01,154 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2732062.6666666665, ans=0.09899494936611666 2023-10-09 10:38:06,355 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.869e+02 3.199e+02 3.835e+02 8.884e+02, threshold=6.398e+02, percent-clipped=1.0 2023-10-09 10:38:12,855 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732109.3333333335, ans=0.1 2023-10-09 10:38:14,038 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2732109.3333333335, ans=0.2 2023-10-09 10:38:30,073 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2732156.0, ans=0.125 2023-10-09 10:38:38,758 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2732202.6666666665, ans=0.125 2023-10-09 10:38:44,813 INFO [train.py:1031] (3/4) Epoch 14, batch 750, loss[loss=0.2474, simple_loss=0.3362, pruned_loss=0.0557, ctc_loss=0.1179, over 16777.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2813, pruned_loss=0.05904, ctc_loss=0.1053, over 3210280.88 frames. ], batch size: 272, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:38:45,378 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=22.5 2023-10-09 10:38:49,699 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:38:53,570 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732249.3333333335, ans=0.1 2023-10-09 10:38:56,305 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2732249.3333333335, ans=0.0 2023-10-09 10:39:06,738 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2732296.0, ans=0.125 2023-10-09 10:39:10,654 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2732342.6666666665, ans=0.2 2023-10-09 10:39:48,675 INFO [train.py:1031] (3/4) Epoch 14, batch 800, loss[loss=0.2548, simple_loss=0.3633, pruned_loss=0.05376, ctc_loss=0.09708, over 15017.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2952, pruned_loss=0.06154, ctc_loss=0.1101, over 3230019.75 frames. ], batch size: 526, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:40:07,789 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2732529.3333333335, ans=0.0 2023-10-09 10:40:08,792 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2732529.3333333335, ans=0.125 2023-10-09 10:40:12,733 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.375e+02 4.289e+02 5.326e+02 8.856e+02, threshold=8.578e+02, percent-clipped=11.0 2023-10-09 10:40:22,504 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2732576.0, ans=0.125 2023-10-09 10:40:50,027 INFO [train.py:1031] (3/4) Epoch 14, batch 850, loss[loss=0.2184, simple_loss=0.3162, pruned_loss=0.04382, ctc_loss=0.0825, over 16921.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2985, pruned_loss=0.06092, ctc_loss=0.1089, over 3234505.45 frames. ], batch size: 258, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:40:50,367 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2732716.0, ans=0.125 2023-10-09 10:41:25,454 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2732856.0, ans=0.0 2023-10-09 10:41:25,836 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2023-10-09 10:41:31,322 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2732856.0, ans=0.125 2023-10-09 10:41:45,420 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2732902.6666666665, ans=0.0 2023-10-09 10:41:49,387 INFO [train.py:1031] (3/4) Epoch 14, batch 900, loss[loss=0.2451, simple_loss=0.288, pruned_loss=0.07463, ctc_loss=0.1326, over 16829.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2979, pruned_loss=0.06169, ctc_loss=0.1098, over 3244214.65 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:41:49,789 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2732949.3333333335, ans=0.0 2023-10-09 10:42:02,525 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2732996.0, ans=10.0 2023-10-09 10:42:17,044 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.289e+02 4.047e+02 4.926e+02 9.646e+02, threshold=8.093e+02, percent-clipped=3.0 2023-10-09 10:42:18,453 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2733042.6666666665, ans=0.125 2023-10-09 10:42:20,773 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2733042.6666666665, ans=0.125 2023-10-09 10:42:35,447 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2733089.3333333335, ans=0.0 2023-10-09 10:42:51,360 INFO [train.py:1031] (3/4) Epoch 14, batch 950, loss[loss=0.2166, simple_loss=0.2772, pruned_loss=0.05781, ctc_loss=0.1012, over 16912.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2965, pruned_loss=0.06356, ctc_loss=0.1127, over 3261618.15 frames. ], batch size: 90, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:43:07,731 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2733229.3333333335, ans=0.09899494936611666 2023-10-09 10:43:19,959 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2733276.0, ans=0.125 2023-10-09 10:43:24,338 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2733276.0, ans=0.125 2023-10-09 10:43:36,352 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2733322.6666666665, ans=0.015 2023-10-09 10:43:39,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2733369.3333333335, ans=0.025 2023-10-09 10:43:51,556 INFO [train.py:1031] (3/4) Epoch 14, batch 1000, loss[loss=0.2435, simple_loss=0.2947, pruned_loss=0.07237, ctc_loss=0.1192, over 16795.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.3003, pruned_loss=0.06584, ctc_loss=0.1159, over 3275616.03 frames. ], batch size: 164, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:43:51,864 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2733416.0, ans=0.125 2023-10-09 10:44:03,075 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2733462.6666666665, ans=0.2 2023-10-09 10:44:03,082 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2733462.6666666665, ans=0.2 2023-10-09 10:44:05,135 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2733462.6666666665, ans=0.0 2023-10-09 10:44:15,608 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2733509.3333333335, ans=0.125 2023-10-09 10:44:18,369 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+02 3.383e+02 4.196e+02 5.191e+02 1.287e+03, threshold=8.392e+02, percent-clipped=5.0 2023-10-09 10:44:26,242 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2733509.3333333335, ans=0.125 2023-10-09 10:44:45,137 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2733602.6666666665, ans=0.0 2023-10-09 10:44:52,765 INFO [train.py:1031] (3/4) Epoch 14, batch 1050, loss[loss=0.2228, simple_loss=0.2737, pruned_loss=0.0644, ctc_loss=0.1077, over 16826.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2955, pruned_loss=0.06499, ctc_loss=0.1143, over 3278614.16 frames. ], batch size: 121, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:45:18,499 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2733742.6666666665, ans=0.1 2023-10-09 10:45:19,489 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2733742.6666666665, ans=0.125 2023-10-09 10:45:46,109 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2733836.0, ans=0.125 2023-10-09 10:45:52,761 INFO [train.py:1031] (3/4) Epoch 14, batch 1100, loss[loss=0.2156, simple_loss=0.2655, pruned_loss=0.06129, ctc_loss=0.1081, over 16845.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2899, pruned_loss=0.0646, ctc_loss=0.1134, over 3287284.13 frames. ], batch size: 176, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:46:03,935 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:46:05,536 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2733929.3333333335, ans=0.0 2023-10-09 10:46:18,218 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2733976.0, ans=0.0 2023-10-09 10:46:21,063 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.252e+02 3.598e+02 4.155e+02 7.430e+02, threshold=7.195e+02, percent-clipped=0.0 2023-10-09 10:46:22,449 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2733976.0, ans=0.0 2023-10-09 10:46:39,459 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2734069.3333333335, ans=0.0 2023-10-09 10:46:50,729 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2734069.3333333335, ans=0.1 2023-10-09 10:46:51,269 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=22.5 2023-10-09 10:46:52,572 INFO [train.py:1031] (3/4) Epoch 14, batch 1150, loss[loss=0.2265, simple_loss=0.2661, pruned_loss=0.07011, ctc_loss=0.1167, over 16896.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2849, pruned_loss=0.06439, ctc_loss=0.1128, over 3287120.17 frames. ], batch size: 82, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:47:32,016 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2734256.0, ans=0.0 2023-10-09 10:47:47,854 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2023-10-09 10:47:49,378 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2734302.6666666665, ans=0.125 2023-10-09 10:47:51,161 INFO [train.py:1031] (3/4) Epoch 14, batch 1200, loss[loss=0.2161, simple_loss=0.2646, pruned_loss=0.06217, ctc_loss=0.1082, over 16794.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.278, pruned_loss=0.06317, ctc_loss=0.1107, over 3277543.96 frames. ], batch size: 121, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:47:51,504 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:47:53,612 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2734349.3333333335, ans=0.05 2023-10-09 10:47:56,600 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2734349.3333333335, ans=0.125 2023-10-09 10:47:56,894 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=22.5 2023-10-09 10:48:05,204 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2734396.0, ans=0.0 2023-10-09 10:48:06,197 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2734396.0, ans=0.2 2023-10-09 10:48:14,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2734442.6666666665, ans=0.125 2023-10-09 10:48:20,270 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 2.978e+02 3.440e+02 3.913e+02 6.490e+02, threshold=6.880e+02, percent-clipped=0.0 2023-10-09 10:48:51,819 INFO [train.py:1031] (3/4) Epoch 14, batch 1250, loss[loss=0.1905, simple_loss=0.2525, pruned_loss=0.04799, ctc_loss=0.08161, over 16880.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2772, pruned_loss=0.06386, ctc_loss=0.1117, over 3288485.32 frames. ], batch size: 95, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:49:05,222 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2734629.3333333335, ans=0.1 2023-10-09 10:49:18,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2734676.0, ans=0.125 2023-10-09 10:49:26,199 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2734676.0, ans=0.2 2023-10-09 10:49:27,939 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2734722.6666666665, ans=0.125 2023-10-09 10:49:32,576 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2734722.6666666665, ans=0.125 2023-10-09 10:49:53,654 INFO [train.py:1031] (3/4) Epoch 14, batch 1300, loss[loss=0.2187, simple_loss=0.2615, pruned_loss=0.0662, ctc_loss=0.1087, over 16729.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2774, pruned_loss=0.06493, ctc_loss=0.1134, over 3296918.51 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:50:09,623 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2734862.6666666665, ans=0.125 2023-10-09 10:50:25,186 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+02 3.518e+02 3.904e+02 4.606e+02 8.060e+02, threshold=7.809e+02, percent-clipped=2.0 2023-10-09 10:50:39,316 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2734956.0, ans=0.2 2023-10-09 10:50:43,108 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2735002.6666666665, ans=0.0 2023-10-09 10:50:45,785 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2735002.6666666665, ans=0.0 2023-10-09 10:50:54,948 INFO [train.py:1031] (3/4) Epoch 14, batch 1350, loss[loss=0.2121, simple_loss=0.2671, pruned_loss=0.05769, ctc_loss=0.1045, over 16734.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2765, pruned_loss=0.06589, ctc_loss=0.1153, over 3297237.97 frames. ], batch size: 291, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:51:09,607 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2735096.0, ans=0.1 2023-10-09 10:51:21,942 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2735142.6666666665, ans=0.0 2023-10-09 10:51:22,975 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2735142.6666666665, ans=0.2 2023-10-09 10:51:29,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2735142.6666666665, ans=0.125 2023-10-09 10:51:36,170 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2735189.3333333335, ans=0.1 2023-10-09 10:51:48,674 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2023-10-09 10:51:54,158 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2735236.0, ans=0.125 2023-10-09 10:51:55,997 INFO [train.py:1031] (3/4) Epoch 14, batch 1400, loss[loss=0.2033, simple_loss=0.2637, pruned_loss=0.05309, ctc_loss=0.09151, over 16724.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2727, pruned_loss=0.0662, ctc_loss=0.1158, over 3301941.11 frames. ], batch size: 95, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:52:07,128 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2735329.3333333335, ans=0.0 2023-10-09 10:52:07,214 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2735329.3333333335, ans=0.05 2023-10-09 10:52:15,341 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2735329.3333333335, ans=0.0 2023-10-09 10:52:27,996 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.259e+02 3.795e+02 4.545e+02 1.175e+03, threshold=7.590e+02, percent-clipped=1.0 2023-10-09 10:52:38,179 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2735422.6666666665, ans=0.125 2023-10-09 10:52:41,925 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2735422.6666666665, ans=0.125 2023-10-09 10:52:55,878 INFO [train.py:1031] (3/4) Epoch 14, batch 1450, loss[loss=0.2095, simple_loss=0.2846, pruned_loss=0.04953, ctc_loss=0.0886, over 16958.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2737, pruned_loss=0.06382, ctc_loss=0.1116, over 3291139.28 frames. ], batch size: 216, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:53:11,195 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2735562.6666666665, ans=0.2 2023-10-09 10:53:23,012 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2735609.3333333335, ans=0.1 2023-10-09 10:53:44,449 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-10-09 10:53:57,017 INFO [train.py:1031] (3/4) Epoch 14, batch 1500, loss[loss=0.2431, simple_loss=0.2817, pruned_loss=0.07547, ctc_loss=0.1338, over 16932.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2758, pruned_loss=0.06378, ctc_loss=0.1117, over 3300351.81 frames. ], batch size: 202, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:53:59,856 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2735749.3333333335, ans=0.125 2023-10-09 10:54:00,945 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2735749.3333333335, ans=0.125 2023-10-09 10:54:05,762 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2735749.3333333335, ans=0.125 2023-10-09 10:54:32,303 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+02 3.308e+02 3.843e+02 4.778e+02 1.080e+03, threshold=7.686e+02, percent-clipped=1.0 2023-10-09 10:54:35,895 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2735889.3333333335, ans=0.2 2023-10-09 10:54:56,526 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2735936.0, ans=0.0 2023-10-09 10:55:00,100 INFO [train.py:1031] (3/4) Epoch 14, batch 1550, loss[loss=0.206, simple_loss=0.2824, pruned_loss=0.0465, ctc_loss=0.09159, over 16780.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2756, pruned_loss=0.06273, ctc_loss=0.1104, over 3307181.91 frames. ], batch size: 272, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:55:13,796 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2736029.3333333335, ans=0.0 2023-10-09 10:55:19,601 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2736029.3333333335, ans=0.0 2023-10-09 10:55:20,665 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736029.3333333335, ans=0.1 2023-10-09 10:55:22,827 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736029.3333333335, ans=0.1 2023-10-09 10:55:36,171 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2736122.6666666665, ans=0.125 2023-10-09 10:55:55,835 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2736169.3333333335, ans=0.0 2023-10-09 10:55:58,570 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2736169.3333333335, ans=0.2 2023-10-09 10:56:01,582 INFO [train.py:1031] (3/4) Epoch 14, batch 1600, loss[loss=0.2258, simple_loss=0.2877, pruned_loss=0.06157, ctc_loss=0.102, over 16944.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2732, pruned_loss=0.05913, ctc_loss=0.1045, over 3300513.09 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:56:04,572 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2736216.0, ans=0.04949747468305833 2023-10-09 10:56:07,718 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2736216.0, ans=0.2 2023-10-09 10:56:17,633 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2736262.6666666665, ans=0.0 2023-10-09 10:56:34,081 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2736309.3333333335, ans=0.1 2023-10-09 10:56:36,432 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.648e+02 3.123e+02 3.834e+02 1.151e+03, threshold=6.247e+02, percent-clipped=2.0 2023-10-09 10:56:59,798 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2736402.6666666665, ans=0.125 2023-10-09 10:57:01,586 INFO [train.py:1031] (3/4) Epoch 14, batch 1650, loss[loss=0.239, simple_loss=0.3048, pruned_loss=0.06386, ctc_loss=0.1135, over 16943.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2752, pruned_loss=0.06012, ctc_loss=0.1059, over 3304536.63 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:57:04,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2736449.3333333335, ans=0.125 2023-10-09 10:57:09,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2736449.3333333335, ans=0.1 2023-10-09 10:57:38,543 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2736589.3333333335, ans=0.125 2023-10-09 10:57:44,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2736589.3333333335, ans=0.0 2023-10-09 10:57:52,122 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2736636.0, ans=0.125 2023-10-09 10:58:03,242 INFO [train.py:1031] (3/4) Epoch 14, batch 1700, loss[loss=0.2407, simple_loss=0.2775, pruned_loss=0.07591, ctc_loss=0.1302, over 10607.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.282, pruned_loss=0.06374, ctc_loss=0.112, over 3303260.69 frames. ], batch size: 36, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:58:20,199 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2736729.3333333335, ans=0.1 2023-10-09 10:58:27,711 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2023-10-09 10:58:38,605 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2023-10-09 10:58:38,773 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.292e+02 3.848e+02 4.651e+02 1.016e+03, threshold=7.697e+02, percent-clipped=4.0 2023-10-09 10:58:58,457 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=15.0 2023-10-09 10:59:00,118 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2736869.3333333335, ans=0.125 2023-10-09 10:59:04,570 INFO [train.py:1031] (3/4) Epoch 14, batch 1750, loss[loss=0.2211, simple_loss=0.2626, pruned_loss=0.06445, ctc_loss=0.1265, over 15379.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2852, pruned_loss=0.0656, ctc_loss=0.1154, over 3303405.10 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:59:08,256 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2023-10-09 10:59:24,351 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2736962.6666666665, ans=0.125 2023-10-09 10:59:49,400 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737056.0, ans=0.1 2023-10-09 10:59:55,093 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2737102.6666666665, ans=0.1 2023-10-09 11:00:05,529 INFO [train.py:1031] (3/4) Epoch 14, batch 1800, loss[loss=0.2016, simple_loss=0.273, pruned_loss=0.04887, ctc_loss=0.08109, over 16718.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2845, pruned_loss=0.06392, ctc_loss=0.1126, over 3309721.50 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:00:10,704 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737149.3333333335, ans=0.1 2023-10-09 11:00:23,447 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=22.5 2023-10-09 11:00:26,824 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2737196.0, ans=0.125 2023-10-09 11:00:33,331 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2737242.6666666665, ans=0.0 2023-10-09 11:00:39,172 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2737242.6666666665, ans=0.0 2023-10-09 11:00:43,557 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.917e+02 3.383e+02 3.800e+02 1.043e+03, threshold=6.767e+02, percent-clipped=1.0 2023-10-09 11:01:06,589 INFO [train.py:1031] (3/4) Epoch 14, batch 1850, loss[loss=0.1912, simple_loss=0.2538, pruned_loss=0.04814, ctc_loss=0.08062, over 16733.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2856, pruned_loss=0.06172, ctc_loss=0.1091, over 3301217.92 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:01:18,181 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2737429.3333333335, ans=0.09899494936611666 2023-10-09 11:01:22,311 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2737429.3333333335, ans=0.1 2023-10-09 11:01:22,345 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2737429.3333333335, ans=0.0 2023-10-09 11:01:49,175 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2737522.6666666665, ans=0.125 2023-10-09 11:01:56,194 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-10-09 11:02:06,433 INFO [train.py:1031] (3/4) Epoch 14, batch 1900, loss[loss=0.2249, simple_loss=0.2914, pruned_loss=0.05812, ctc_loss=0.1051, over 16899.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2859, pruned_loss=0.0623, ctc_loss=0.1097, over 3295834.02 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:02:12,915 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-10-09 11:02:16,587 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2737616.0, ans=0.0 2023-10-09 11:02:31,131 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737709.3333333335, ans=0.1 2023-10-09 11:02:31,140 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2737709.3333333335, ans=0.125 2023-10-09 11:02:41,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2737756.0, ans=0.125 2023-10-09 11:02:43,659 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.102e+02 3.624e+02 4.440e+02 7.780e+02, threshold=7.248e+02, percent-clipped=1.0 2023-10-09 11:02:47,218 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2737756.0, ans=0.0 2023-10-09 11:03:06,284 INFO [train.py:1031] (3/4) Epoch 14, batch 1950, loss[loss=0.1922, simple_loss=0.232, pruned_loss=0.0575, ctc_loss=0.09352, over 10111.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2874, pruned_loss=0.06244, ctc_loss=0.11, over 3288026.35 frames. ], batch size: 35, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:03:32,189 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.14 vs. limit=10.0 2023-10-09 11:03:54,761 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2738036.0, ans=0.125 2023-10-09 11:03:56,316 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2738036.0, ans=0.035 2023-10-09 11:04:08,694 INFO [train.py:1031] (3/4) Epoch 14, batch 2000, loss[loss=0.235, simple_loss=0.2925, pruned_loss=0.06591, ctc_loss=0.1144, over 16940.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2901, pruned_loss=0.06403, ctc_loss=0.1129, over 3297744.01 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:04:10,305 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-10-09 11:04:10,409 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2023-10-09 11:04:23,738 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2738129.3333333335, ans=0.2 2023-10-09 11:04:25,826 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2738129.3333333335, ans=0.0 2023-10-09 11:04:31,836 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2738176.0, ans=0.125 2023-10-09 11:04:48,440 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+02 3.379e+02 3.817e+02 4.663e+02 9.562e+02, threshold=7.635e+02, percent-clipped=5.0 2023-10-09 11:05:09,402 INFO [train.py:1031] (3/4) Epoch 14, batch 2050, loss[loss=0.2205, simple_loss=0.2783, pruned_loss=0.05988, ctc_loss=0.1077, over 16882.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2948, pruned_loss=0.06636, ctc_loss=0.1172, over 3302042.42 frames. ], batch size: 141, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:05:09,723 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2738316.0, ans=0.1 2023-10-09 11:05:12,794 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2738316.0, ans=0.125 2023-10-09 11:05:22,456 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2738362.6666666665, ans=0.2 2023-10-09 11:05:40,699 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2738409.3333333335, ans=0.125 2023-10-09 11:05:42,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2738409.3333333335, ans=0.125 2023-10-09 11:06:00,821 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2738502.6666666665, ans=0.0 2023-10-09 11:06:10,928 INFO [train.py:1031] (3/4) Epoch 14, batch 2100, loss[loss=0.2082, simple_loss=0.2711, pruned_loss=0.05337, ctc_loss=0.09642, over 16790.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2932, pruned_loss=0.06651, ctc_loss=0.1173, over 3292594.42 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:06:26,656 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2738596.0, ans=0.0 2023-10-09 11:06:46,363 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2738642.6666666665, ans=0.125 2023-10-09 11:06:53,246 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.150e+02 3.651e+02 4.552e+02 6.884e+02, threshold=7.301e+02, percent-clipped=0.0 2023-10-09 11:06:56,924 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2738689.3333333335, ans=0.09899494936611666 2023-10-09 11:07:13,876 INFO [train.py:1031] (3/4) Epoch 14, batch 2150, loss[loss=0.2005, simple_loss=0.2581, pruned_loss=0.05319, ctc_loss=0.09142, over 16768.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2946, pruned_loss=0.06434, ctc_loss=0.1143, over 3293733.24 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:07:24,450 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2738782.6666666665, ans=0.1 2023-10-09 11:08:14,605 INFO [train.py:1031] (3/4) Epoch 14, batch 2200, loss[loss=0.234, simple_loss=0.2733, pruned_loss=0.07237, ctc_loss=0.1247, over 16889.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2929, pruned_loss=0.06437, ctc_loss=0.1142, over 3300813.51 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:08:15,397 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.24 vs. limit=22.5 2023-10-09 11:08:19,179 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2739016.0, ans=0.125 2023-10-09 11:08:46,240 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2739109.3333333335, ans=10.0 2023-10-09 11:08:58,373 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.214e+02 3.681e+02 4.721e+02 1.015e+03, threshold=7.363e+02, percent-clipped=4.0 2023-10-09 11:08:58,668 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2739156.0, ans=0.125 2023-10-09 11:09:05,659 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=10.0 2023-10-09 11:09:08,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2739202.6666666665, ans=0.0 2023-10-09 11:09:16,600 INFO [train.py:1031] (3/4) Epoch 14, batch 2250, loss[loss=0.2408, simple_loss=0.2753, pruned_loss=0.0752, ctc_loss=0.1398, over 16601.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2879, pruned_loss=0.06451, ctc_loss=0.1142, over 3310229.69 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:09:22,264 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2739249.3333333335, ans=0.0 2023-10-09 11:09:44,730 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2739342.6666666665, ans=0.125 2023-10-09 11:09:52,559 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-10-09 11:10:05,541 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2023-10-09 11:10:06,777 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2739436.0, ans=0.0 2023-10-09 11:10:11,145 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2739436.0, ans=0.125 2023-10-09 11:10:12,792 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739436.0, ans=0.1 2023-10-09 11:10:13,138 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-10-09 11:10:15,015 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2739436.0, ans=0.125 2023-10-09 11:10:18,410 INFO [train.py:1031] (3/4) Epoch 14, batch 2300, loss[loss=0.2042, simple_loss=0.2716, pruned_loss=0.05135, ctc_loss=0.08525, over 16940.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2804, pruned_loss=0.06357, ctc_loss=0.1125, over 3309810.81 frames. ], batch size: 78, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:10:19,740 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2739482.6666666665, ans=0.125 2023-10-09 11:10:40,266 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2739529.3333333335, ans=0.125 2023-10-09 11:10:43,063 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2739576.0, ans=0.0 2023-10-09 11:11:04,760 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+02 3.279e+02 3.727e+02 4.728e+02 7.971e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 11:11:06,925 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2739622.6666666665, ans=0.125 2023-10-09 11:11:15,440 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2739669.3333333335, ans=0.0 2023-10-09 11:11:20,437 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2739716.0, ans=0.125 2023-10-09 11:11:21,220 INFO [train.py:1031] (3/4) Epoch 14, batch 2350, loss[loss=0.2217, simple_loss=0.2837, pruned_loss=0.05999, ctc_loss=0.09921, over 16317.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2832, pruned_loss=0.06508, ctc_loss=0.1148, over 3303362.85 frames. ], batch size: 70, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:11:26,600 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2023-10-09 11:11:42,980 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-10-09 11:11:53,202 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2739809.3333333335, ans=0.0 2023-10-09 11:12:04,941 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2739856.0, ans=0.125 2023-10-09 11:12:07,087 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2739856.0, ans=0.125 2023-10-09 11:12:22,659 INFO [train.py:1031] (3/4) Epoch 14, batch 2400, loss[loss=0.2399, simple_loss=0.2895, pruned_loss=0.06886, ctc_loss=0.1318, over 16938.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2846, pruned_loss=0.06637, ctc_loss=0.1171, over 3303372.98 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:12:28,257 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.86 vs. limit=10.0 2023-10-09 11:12:29,444 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2739949.3333333335, ans=0.125 2023-10-09 11:13:09,496 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+02 3.337e+02 3.917e+02 4.663e+02 1.051e+03, threshold=7.833e+02, percent-clipped=2.0 2023-10-09 11:13:11,948 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2740136.0, ans=0.125 2023-10-09 11:13:15,896 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-10-09 11:13:25,551 INFO [train.py:1031] (3/4) Epoch 14, batch 2450, loss[loss=0.174, simple_loss=0.2592, pruned_loss=0.03192, ctc_loss=0.06224, over 16876.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2827, pruned_loss=0.06605, ctc_loss=0.1161, over 3309072.95 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:13:26,866 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2740182.6666666665, ans=0.125 2023-10-09 11:14:12,058 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2023-10-09 11:14:28,353 INFO [train.py:1031] (3/4) Epoch 14, batch 2500, loss[loss=0.1996, simple_loss=0.2683, pruned_loss=0.04779, ctc_loss=0.08822, over 16822.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2793, pruned_loss=0.0613, ctc_loss=0.1087, over 3310813.94 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:14:38,188 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2740416.0, ans=0.0 2023-10-09 11:14:41,998 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2740462.6666666665, ans=0.0 2023-10-09 11:14:51,642 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2740462.6666666665, ans=0.125 2023-10-09 11:15:14,279 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2023-10-09 11:15:17,468 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.820e+02 3.218e+02 3.802e+02 1.081e+03, threshold=6.436e+02, percent-clipped=2.0 2023-10-09 11:15:33,267 INFO [train.py:1031] (3/4) Epoch 14, batch 2550, loss[loss=0.1876, simple_loss=0.2438, pruned_loss=0.04879, ctc_loss=0.08445, over 16736.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2827, pruned_loss=0.06105, ctc_loss=0.1071, over 3315099.12 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:15:43,909 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2740649.3333333335, ans=0.0 2023-10-09 11:15:46,085 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2740696.0, ans=0.125 2023-10-09 11:15:55,562 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2740696.0, ans=0.2 2023-10-09 11:16:35,624 INFO [train.py:1031] (3/4) Epoch 14, batch 2600, loss[loss=0.1991, simple_loss=0.2539, pruned_loss=0.05355, ctc_loss=0.09293, over 16786.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2805, pruned_loss=0.06077, ctc_loss=0.1062, over 3307244.77 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:16:52,043 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2023-10-09 11:17:09,942 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2740976.0, ans=0.125 2023-10-09 11:17:16,295 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-10-09 11:17:23,703 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+02 2.979e+02 3.641e+02 4.453e+02 7.344e+02, threshold=7.282e+02, percent-clipped=4.0 2023-10-09 11:17:29,112 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2741069.3333333335, ans=0.1 2023-10-09 11:17:37,765 INFO [train.py:1031] (3/4) Epoch 14, batch 2650, loss[loss=0.244, simple_loss=0.2887, pruned_loss=0.0754, ctc_loss=0.1216, over 16788.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2832, pruned_loss=0.05992, ctc_loss=0.1055, over 3304231.22 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:17:38,582 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=15.0 2023-10-09 11:17:50,306 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2741162.6666666665, ans=10.0 2023-10-09 11:17:50,398 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=2741162.6666666665, ans=22.5 2023-10-09 11:17:53,253 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2741162.6666666665, ans=0.125 2023-10-09 11:17:57,321 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2741162.6666666665, ans=0.0 2023-10-09 11:18:12,032 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2741209.3333333335, ans=0.0 2023-10-09 11:18:31,820 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2741302.6666666665, ans=0.0 2023-10-09 11:18:35,127 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741302.6666666665, ans=0.1 2023-10-09 11:18:39,292 INFO [train.py:1031] (3/4) Epoch 14, batch 2700, loss[loss=0.2219, simple_loss=0.285, pruned_loss=0.05873, ctc_loss=0.1032, over 16772.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2892, pruned_loss=0.06381, ctc_loss=0.1122, over 3299458.10 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:18:48,031 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2741349.3333333335, ans=0.0 2023-10-09 11:18:49,030 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2741349.3333333335, ans=0.125 2023-10-09 11:18:51,215 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2741396.0, ans=0.125 2023-10-09 11:18:56,278 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2741396.0, ans=0.025 2023-10-09 11:19:01,874 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2741396.0, ans=0.2 2023-10-09 11:19:19,900 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741489.3333333335, ans=0.1 2023-10-09 11:19:29,377 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2023-10-09 11:19:31,015 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+02 3.579e+02 4.156e+02 4.960e+02 1.400e+03, threshold=8.312e+02, percent-clipped=4.0 2023-10-09 11:19:42,378 INFO [train.py:1031] (3/4) Epoch 14, batch 2750, loss[loss=0.2064, simple_loss=0.2644, pruned_loss=0.05554, ctc_loss=0.09331, over 16737.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2931, pruned_loss=0.06336, ctc_loss=0.1114, over 3305113.09 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:20:04,834 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2741629.3333333335, ans=0.0 2023-10-09 11:20:32,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2741769.3333333335, ans=0.125 2023-10-09 11:20:44,739 INFO [train.py:1031] (3/4) Epoch 14, batch 2800, loss[loss=0.17, simple_loss=0.2136, pruned_loss=0.04685, ctc_loss=0.08167, over 16920.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2889, pruned_loss=0.05933, ctc_loss=0.105, over 3295604.11 frames. ], batch size: 78, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:21:25,297 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=22.5 2023-10-09 11:21:32,351 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=22.5 2023-10-09 11:21:35,708 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 3.033e+02 3.735e+02 4.727e+02 1.179e+03, threshold=7.471e+02, percent-clipped=1.0 2023-10-09 11:21:38,750 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2742002.6666666665, ans=0.125 2023-10-09 11:21:47,212 INFO [train.py:1031] (3/4) Epoch 14, batch 2850, loss[loss=0.2602, simple_loss=0.3237, pruned_loss=0.07205, ctc_loss=0.1316, over 16508.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2844, pruned_loss=0.05673, ctc_loss=0.1006, over 3294909.44 frames. ], batch size: 350, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:22:06,885 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2742096.0, ans=0.125 2023-10-09 11:22:18,893 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2742142.6666666665, ans=0.1 2023-10-09 11:22:22,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2742142.6666666665, ans=0.1 2023-10-09 11:22:51,980 INFO [train.py:1031] (3/4) Epoch 14, batch 2900, loss[loss=0.2879, simple_loss=0.3486, pruned_loss=0.08212, ctc_loss=0.1575, over 16693.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2864, pruned_loss=0.05462, ctc_loss=0.09759, over 3293269.08 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:22:52,297 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2742282.6666666665, ans=0.125 2023-10-09 11:23:16,258 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2742376.0, ans=0.2 2023-10-09 11:23:29,161 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2742422.6666666665, ans=0.125 2023-10-09 11:23:43,139 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.117e+02 3.737e+02 4.874e+02 8.025e+02, threshold=7.473e+02, percent-clipped=2.0 2023-10-09 11:23:50,613 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-10-09 11:23:52,691 INFO [train.py:1031] (3/4) Epoch 14, batch 2950, loss[loss=0.201, simple_loss=0.2649, pruned_loss=0.05011, ctc_loss=0.09206, over 16911.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2875, pruned_loss=0.05561, ctc_loss=0.09922, over 3297164.04 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:23:53,009 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2742516.0, ans=0.0 2023-10-09 11:24:00,490 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2742516.0, ans=0.125 2023-10-09 11:24:12,929 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2742562.6666666665, ans=0.125 2023-10-09 11:24:30,583 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2742656.0, ans=0.125 2023-10-09 11:24:47,880 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2742702.6666666665, ans=0.125 2023-10-09 11:24:55,799 INFO [train.py:1031] (3/4) Epoch 14, batch 3000, loss[loss=0.2021, simple_loss=0.2491, pruned_loss=0.05664, ctc_loss=0.1043, over 16693.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2861, pruned_loss=0.05832, ctc_loss=0.1035, over 3307021.36 frames. ], batch size: 111, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:24:55,799 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 11:25:13,599 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2392, simple_loss=0.3062, pruned_loss=0.06637, ctc_loss=0.09863, over 1796401.00 frames. 2023-10-09 11:25:13,599 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14429MB 2023-10-09 11:25:22,436 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2742749.3333333335, ans=0.125 2023-10-09 11:25:39,324 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2023-10-09 11:25:44,909 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2742842.6666666665, ans=0.125 2023-10-09 11:25:50,147 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2742889.3333333335, ans=0.0 2023-10-09 11:26:05,143 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+02 3.037e+02 3.527e+02 4.152e+02 6.631e+02, threshold=7.054e+02, percent-clipped=0.0 2023-10-09 11:26:14,948 INFO [train.py:1031] (3/4) Epoch 14, batch 3050, loss[loss=0.2256, simple_loss=0.2621, pruned_loss=0.0705, ctc_loss=0.1202, over 16705.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.28, pruned_loss=0.05814, ctc_loss=0.103, over 3301688.54 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:26:21,594 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2742982.6666666665, ans=0.125 2023-10-09 11:26:28,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2743029.3333333335, ans=0.125 2023-10-09 11:26:46,227 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2023-10-09 11:27:12,742 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2743169.3333333335, ans=0.05 2023-10-09 11:27:15,157 INFO [train.py:1031] (3/4) Epoch 14, batch 3100, loss[loss=0.1814, simple_loss=0.2432, pruned_loss=0.04455, ctc_loss=0.07649, over 16655.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2742, pruned_loss=0.05888, ctc_loss=0.1042, over 3305519.21 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:27:23,052 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2743216.0, ans=0.125 2023-10-09 11:27:39,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2743309.3333333335, ans=0.125 2023-10-09 11:27:44,036 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=22.5 2023-10-09 11:27:46,591 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2743309.3333333335, ans=0.1 2023-10-09 11:28:07,263 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2743402.6666666665, ans=0.1 2023-10-09 11:28:08,004 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.923e+02 3.339e+02 4.092e+02 6.355e+02, threshold=6.678e+02, percent-clipped=0.0 2023-10-09 11:28:09,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2743402.6666666665, ans=0.2 2023-10-09 11:28:14,868 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-10-09 11:28:15,791 INFO [train.py:1031] (3/4) Epoch 14, batch 3150, loss[loss=0.1622, simple_loss=0.2, pruned_loss=0.04738, ctc_loss=0.07432, over 12216.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2701, pruned_loss=0.05737, ctc_loss=0.1015, over 3303637.34 frames. ], batch size: 46, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:28:32,225 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2743496.0, ans=0.0 2023-10-09 11:28:44,981 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2743542.6666666665, ans=0.5 2023-10-09 11:29:17,392 INFO [train.py:1031] (3/4) Epoch 14, batch 3200, loss[loss=0.2511, simple_loss=0.3029, pruned_loss=0.07529, ctc_loss=0.1215, over 12263.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.277, pruned_loss=0.05825, ctc_loss=0.1037, over 3294432.22 frames. ], batch size: 35, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:29:25,333 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-10-09 11:29:43,264 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2743776.0, ans=0.125 2023-10-09 11:30:11,827 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=22.5 2023-10-09 11:30:12,209 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.362e+02 3.915e+02 4.671e+02 1.064e+03, threshold=7.829e+02, percent-clipped=5.0 2023-10-09 11:30:17,846 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2743916.0, ans=0.0 2023-10-09 11:30:18,341 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-10-09 11:30:18,612 INFO [train.py:1031] (3/4) Epoch 14, batch 3250, loss[loss=0.2178, simple_loss=0.2777, pruned_loss=0.05887, ctc_loss=0.1002, over 16768.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2791, pruned_loss=0.06003, ctc_loss=0.1062, over 3302958.17 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:30:27,132 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-10-09 11:30:28,319 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2743916.0, ans=0.0 2023-10-09 11:30:40,368 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2743962.6666666665, ans=0.125 2023-10-09 11:31:06,994 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2744056.0, ans=0.125 2023-10-09 11:31:23,915 INFO [train.py:1031] (3/4) Epoch 14, batch 3300, loss[loss=0.2968, simple_loss=0.3457, pruned_loss=0.09131, ctc_loss=0.1633, over 16846.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2864, pruned_loss=0.06344, ctc_loss=0.1121, over 3305911.50 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:31:25,265 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:31:45,704 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2744196.0, ans=0.0 2023-10-09 11:31:52,117 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-10-09 11:31:54,778 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2744242.6666666665, ans=0.125 2023-10-09 11:31:58,553 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2744242.6666666665, ans=0.0 2023-10-09 11:32:00,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744242.6666666665, ans=0.1 2023-10-09 11:32:11,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2744289.3333333335, ans=0.125 2023-10-09 11:32:16,032 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-10-09 11:32:20,913 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.186e+02 3.803e+02 4.439e+02 1.060e+03, threshold=7.606e+02, percent-clipped=1.0 2023-10-09 11:32:26,287 INFO [train.py:1031] (3/4) Epoch 14, batch 3350, loss[loss=0.2341, simple_loss=0.2726, pruned_loss=0.07254, ctc_loss=0.1264, over 16764.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2843, pruned_loss=0.06396, ctc_loss=0.1128, over 3298563.30 frames. ], batch size: 310, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:32:29,258 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2744382.6666666665, ans=0.0 2023-10-09 11:32:33,512 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2744382.6666666665, ans=0.125 2023-10-09 11:32:49,833 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:33:04,532 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2744522.6666666665, ans=0.0 2023-10-09 11:33:06,357 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2023-10-09 11:33:07,221 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2744522.6666666665, ans=0.95 2023-10-09 11:33:26,642 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2744569.3333333335, ans=0.125 2023-10-09 11:33:29,578 INFO [train.py:1031] (3/4) Epoch 14, batch 3400, loss[loss=0.2885, simple_loss=0.3501, pruned_loss=0.08185, ctc_loss=0.1581, over 16813.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2832, pruned_loss=0.06383, ctc_loss=0.1127, over 3295650.31 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:33:55,075 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2744709.3333333335, ans=0.125 2023-10-09 11:34:08,684 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2744756.0, ans=0.2 2023-10-09 11:34:15,552 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2744756.0, ans=0.0 2023-10-09 11:34:26,691 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.089e+02 3.600e+02 4.217e+02 8.048e+02, threshold=7.200e+02, percent-clipped=1.0 2023-10-09 11:34:30,973 INFO [train.py:1031] (3/4) Epoch 14, batch 3450, loss[loss=0.2475, simple_loss=0.327, pruned_loss=0.05971, ctc_loss=0.1217, over 15223.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2881, pruned_loss=0.0644, ctc_loss=0.1137, over 3300927.24 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:34:36,775 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2744849.3333333335, ans=0.0 2023-10-09 11:34:38,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2744849.3333333335, ans=0.0 2023-10-09 11:34:48,035 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-10-09 11:34:58,681 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2744942.6666666665, ans=0.1 2023-10-09 11:35:12,647 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2744989.3333333335, ans=0.05 2023-10-09 11:35:17,692 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2744989.3333333335, ans=0.125 2023-10-09 11:35:30,765 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2745036.0, ans=0.0 2023-10-09 11:35:31,908 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=22.5 2023-10-09 11:35:32,418 INFO [train.py:1031] (3/4) Epoch 14, batch 3500, loss[loss=0.249, simple_loss=0.3012, pruned_loss=0.07258, ctc_loss=0.129, over 16782.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2856, pruned_loss=0.0621, ctc_loss=0.11, over 3306567.31 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:35:48,770 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2745129.3333333335, ans=0.2 2023-10-09 11:36:18,652 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2023-10-09 11:36:25,446 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2745269.3333333335, ans=0.125 2023-10-09 11:36:28,667 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 2.946e+02 3.396e+02 4.316e+02 6.919e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 11:36:31,831 INFO [train.py:1031] (3/4) Epoch 14, batch 3550, loss[loss=0.1991, simple_loss=0.2527, pruned_loss=0.05386, ctc_loss=0.09445, over 16688.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2782, pruned_loss=0.06094, ctc_loss=0.1076, over 3305742.82 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:36:38,336 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-10-09 11:36:47,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2745362.6666666665, ans=0.1 2023-10-09 11:37:07,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2745456.0, ans=0.0 2023-10-09 11:37:08,504 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2745456.0, ans=0.2 2023-10-09 11:37:21,965 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2745502.6666666665, ans=0.0 2023-10-09 11:37:22,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2745502.6666666665, ans=0.0 2023-10-09 11:37:32,889 INFO [train.py:1031] (3/4) Epoch 14, batch 3600, loss[loss=0.2025, simple_loss=0.2474, pruned_loss=0.05951, ctc_loss=0.09656, over 16916.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.272, pruned_loss=0.06026, ctc_loss=0.1059, over 3312076.64 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:38:28,592 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2745736.0, ans=0.125 2023-10-09 11:38:29,618 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2745736.0, ans=0.0 2023-10-09 11:38:31,934 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.169e+02 3.614e+02 4.285e+02 9.204e+02, threshold=7.228e+02, percent-clipped=2.0 2023-10-09 11:38:33,036 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2745782.6666666665, ans=0.125 2023-10-09 11:38:33,631 INFO [train.py:1031] (3/4) Epoch 14, batch 3650, loss[loss=0.2016, simple_loss=0.2461, pruned_loss=0.05802, ctc_loss=0.1024, over 16770.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2692, pruned_loss=0.06114, ctc_loss=0.1073, over 3309612.25 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:38:52,420 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2745829.3333333335, ans=0.125 2023-10-09 11:39:11,250 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-10-09 11:39:34,296 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=2745969.3333333335, ans=22.5 2023-10-09 11:39:36,594 INFO [train.py:1031] (3/4) Epoch 14, batch 3700, loss[loss=0.2784, simple_loss=0.323, pruned_loss=0.08653, ctc_loss=0.1519, over 16866.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2747, pruned_loss=0.06463, ctc_loss=0.1132, over 3310231.73 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:39:36,947 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2746016.0, ans=0.125 2023-10-09 11:39:38,028 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2746016.0, ans=0.2 2023-10-09 11:39:39,102 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2746016.0, ans=0.0 2023-10-09 11:39:48,868 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2746062.6666666665, ans=0.125 2023-10-09 11:39:51,658 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2746062.6666666665, ans=0.0 2023-10-09 11:40:15,541 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2746156.0, ans=0.0 2023-10-09 11:40:40,093 INFO [train.py:1031] (3/4) Epoch 14, batch 3750, loss[loss=0.3106, simple_loss=0.336, pruned_loss=0.1062, ctc_loss=0.1821, over 16540.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2813, pruned_loss=0.06761, ctc_loss=0.1183, over 3303660.33 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:40:40,335 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2746249.3333333335, ans=0.1 2023-10-09 11:40:40,355 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2746249.3333333335, ans=0.125 2023-10-09 11:40:40,422 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2746249.3333333335, ans=0.0 2023-10-09 11:40:41,103 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+02 3.323e+02 3.693e+02 4.050e+02 7.078e+02, threshold=7.386e+02, percent-clipped=0.0 2023-10-09 11:41:17,037 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2746389.3333333335, ans=0.125 2023-10-09 11:41:28,306 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2746389.3333333335, ans=0.0 2023-10-09 11:41:38,120 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2746436.0, ans=0.1 2023-10-09 11:41:43,228 INFO [train.py:1031] (3/4) Epoch 14, batch 3800, loss[loss=0.2264, simple_loss=0.2606, pruned_loss=0.07281, ctc_loss=0.1164, over 10758.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2877, pruned_loss=0.06961, ctc_loss=0.1219, over 3305032.99 frames. ], batch size: 38, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:42:07,663 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=22.5 2023-10-09 11:42:09,448 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2746576.0, ans=0.125 2023-10-09 11:42:36,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2746669.3333333335, ans=0.1 2023-10-09 11:42:44,577 INFO [train.py:1031] (3/4) Epoch 14, batch 3850, loss[loss=0.2401, simple_loss=0.2623, pruned_loss=0.08003, ctc_loss=0.1447, over 16404.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2825, pruned_loss=0.06776, ctc_loss=0.1187, over 3304155.33 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:42:47,253 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.112e+02 3.545e+02 3.960e+02 7.617e+02, threshold=7.089e+02, percent-clipped=1.0 2023-10-09 11:43:15,851 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=2746809.3333333335, ans=15.0 2023-10-09 11:43:22,014 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-10-09 11:43:29,577 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2746856.0, ans=0.125 2023-10-09 11:43:46,339 INFO [train.py:1031] (3/4) Epoch 14, batch 3900, loss[loss=0.2254, simple_loss=0.2821, pruned_loss=0.06179, ctc_loss=0.1127, over 16781.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2809, pruned_loss=0.06532, ctc_loss=0.1147, over 3306287.70 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:43:54,533 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-10-09 11:43:55,349 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-10-09 11:43:56,151 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2746949.3333333335, ans=0.09899494936611666 2023-10-09 11:44:02,578 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2746996.0, ans=0.125 2023-10-09 11:44:31,452 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2023-10-09 11:44:39,597 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2747136.0, ans=0.1 2023-10-09 11:44:47,987 INFO [train.py:1031] (3/4) Epoch 14, batch 3950, loss[loss=0.1979, simple_loss=0.2437, pruned_loss=0.0565, ctc_loss=0.0978, over 16688.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2781, pruned_loss=0.06381, ctc_loss=0.112, over 3313781.65 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:44:48,325 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2747182.6666666665, ans=0.1 2023-10-09 11:44:50,713 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.084e+02 3.430e+02 4.061e+02 1.180e+03, threshold=6.860e+02, percent-clipped=1.0 2023-10-09 11:45:11,901 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-10-09 11:45:21,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2747276.0, ans=0.2 2023-10-09 11:45:24,959 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.63 vs. limit=10.0 2023-10-09 11:45:29,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2747322.6666666665, ans=0.0 2023-10-09 11:45:37,695 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2747369.3333333335, ans=0.125 2023-10-09 11:45:50,017 INFO [train.py:1031] (3/4) Epoch 14, batch 4000, loss[loss=0.2685, simple_loss=0.317, pruned_loss=0.08217, ctc_loss=0.139, over 16765.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2789, pruned_loss=0.06542, ctc_loss=0.1151, over 3316602.37 frames. ], batch size: 111, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:46:00,521 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2747416.0, ans=0.015 2023-10-09 11:46:01,640 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2747462.6666666665, ans=0.125 2023-10-09 11:46:19,982 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2747509.3333333335, ans=0.1 2023-10-09 11:46:20,289 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2747509.3333333335, ans=6.0 2023-10-09 11:46:33,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2747556.0, ans=0.125 2023-10-09 11:46:51,637 INFO [train.py:1031] (3/4) Epoch 14, batch 4050, loss[loss=0.2295, simple_loss=0.2803, pruned_loss=0.06647, ctc_loss=0.1142, over 16843.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2838, pruned_loss=0.06768, ctc_loss=0.1187, over 3317333.94 frames. ], batch size: 141, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:46:54,422 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.332e+02 3.986e+02 4.544e+02 6.934e+02, threshold=7.972e+02, percent-clipped=1.0 2023-10-09 11:47:00,268 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2023-10-09 11:47:03,699 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2747696.0, ans=0.125 2023-10-09 11:47:06,752 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2747696.0, ans=0.0 2023-10-09 11:47:34,755 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2747789.3333333335, ans=0.125 2023-10-09 11:47:47,139 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2747836.0, ans=0.125 2023-10-09 11:47:52,885 INFO [train.py:1031] (3/4) Epoch 14, batch 4100, loss[loss=0.2295, simple_loss=0.2707, pruned_loss=0.07054, ctc_loss=0.1181, over 16839.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2834, pruned_loss=0.06798, ctc_loss=0.1192, over 3324139.26 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:48:04,595 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2747929.3333333335, ans=0.1 2023-10-09 11:48:33,089 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2023-10-09 11:48:54,163 INFO [train.py:1031] (3/4) Epoch 14, batch 4150, loss[loss=0.1799, simple_loss=0.2387, pruned_loss=0.04467, ctc_loss=0.07928, over 16747.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2821, pruned_loss=0.06625, ctc_loss=0.1155, over 3320749.24 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:48:57,302 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2748116.0, ans=0.125 2023-10-09 11:48:57,754 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2023-10-09 11:48:57,968 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 3.212e+02 3.640e+02 4.119e+02 7.384e+02, threshold=7.280e+02, percent-clipped=0.0 2023-10-09 11:49:15,536 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=12.0 2023-10-09 11:49:19,040 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2748209.3333333335, ans=0.0 2023-10-09 11:49:22,120 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2748209.3333333335, ans=0.125 2023-10-09 11:49:22,154 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2748209.3333333335, ans=0.95 2023-10-09 11:49:30,172 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2748209.3333333335, ans=0.05 2023-10-09 11:49:56,388 INFO [train.py:1031] (3/4) Epoch 14, batch 4200, loss[loss=0.2187, simple_loss=0.256, pruned_loss=0.06768, ctc_loss=0.1149, over 11362.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2779, pruned_loss=0.06431, ctc_loss=0.1114, over 3307186.03 frames. ], batch size: 35, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:49:58,054 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2023-10-09 11:49:58,750 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2748349.3333333335, ans=0.0 2023-10-09 11:50:07,894 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2748396.0, ans=0.09899494936611666 2023-10-09 11:50:31,494 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2748489.3333333335, ans=0.0 2023-10-09 11:50:38,675 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2748489.3333333335, ans=0.125 2023-10-09 11:50:49,085 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2748536.0, ans=0.125 2023-10-09 11:50:56,838 INFO [train.py:1031] (3/4) Epoch 14, batch 4250, loss[loss=0.2443, simple_loss=0.2942, pruned_loss=0.07288, ctc_loss=0.1217, over 16861.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2789, pruned_loss=0.06533, ctc_loss=0.1131, over 3307426.85 frames. ], batch size: 141, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:51:03,482 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.251e+02 3.808e+02 4.615e+02 8.624e+02, threshold=7.616e+02, percent-clipped=2.0 2023-10-09 11:51:35,397 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2748722.6666666665, ans=0.125 2023-10-09 11:51:50,092 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2748769.3333333335, ans=0.2 2023-10-09 11:51:55,998 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2748769.3333333335, ans=0.0 2023-10-09 11:51:58,943 INFO [train.py:1031] (3/4) Epoch 14, batch 4300, loss[loss=0.2419, simple_loss=0.3202, pruned_loss=0.06022, ctc_loss=0.1078, over 16870.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2862, pruned_loss=0.06582, ctc_loss=0.1144, over 3313145.12 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:52:09,834 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2748816.0, ans=0.125 2023-10-09 11:52:22,164 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2748862.6666666665, ans=0.05 2023-10-09 11:52:28,338 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=22.5 2023-10-09 11:52:35,518 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2748909.3333333335, ans=0.125 2023-10-09 11:52:37,646 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748956.0, ans=0.1 2023-10-09 11:52:57,366 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2749002.6666666665, ans=0.0 2023-10-09 11:52:57,408 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2749002.6666666665, ans=0.125 2023-10-09 11:53:04,756 INFO [train.py:1031] (3/4) Epoch 14, batch 4350, loss[loss=0.2747, simple_loss=0.3532, pruned_loss=0.07306, ctc_loss=0.1251, over 15358.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.2948, pruned_loss=0.06996, ctc_loss=0.1215, over 3311185.41 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:53:11,361 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+02 3.477e+02 4.126e+02 5.175e+02 8.890e+02, threshold=8.251e+02, percent-clipped=2.0 2023-10-09 11:53:12,774 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2749049.3333333335, ans=0.0 2023-10-09 11:53:42,722 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2749189.3333333335, ans=0.0 2023-10-09 11:53:44,003 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-10-09 11:54:06,553 INFO [train.py:1031] (3/4) Epoch 14, batch 4400, loss[loss=0.2444, simple_loss=0.3104, pruned_loss=0.06626, ctc_loss=0.115, over 16474.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2904, pruned_loss=0.06882, ctc_loss=0.1185, over 3315289.97 frames. ], batch size: 417, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:54:22,333 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2749329.3333333335, ans=0.125 2023-10-09 11:54:30,899 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2749376.0, ans=0.125 2023-10-09 11:54:32,561 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2749376.0, ans=0.2 2023-10-09 11:54:45,649 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2749422.6666666665, ans=0.0 2023-10-09 11:54:57,262 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2023-10-09 11:55:02,791 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2749469.3333333335, ans=0.125 2023-10-09 11:55:08,713 INFO [train.py:1031] (3/4) Epoch 14, batch 4450, loss[loss=0.2211, simple_loss=0.2687, pruned_loss=0.06591, ctc_loss=0.1042, over 16809.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2865, pruned_loss=0.067, ctc_loss=0.1154, over 3314034.54 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:55:16,085 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+02 3.136e+02 3.606e+02 4.303e+02 6.153e+02, threshold=7.211e+02, percent-clipped=0.0 2023-10-09 11:55:45,521 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2749656.0, ans=0.125 2023-10-09 11:55:46,528 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2749656.0, ans=0.0 2023-10-09 11:55:48,634 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2749656.0, ans=0.0 2023-10-09 11:56:10,349 INFO [train.py:1031] (3/4) Epoch 14, batch 4500, loss[loss=0.2319, simple_loss=0.2883, pruned_loss=0.06546, ctc_loss=0.1115, over 16963.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2851, pruned_loss=0.06736, ctc_loss=0.116, over 3323173.67 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:56:37,447 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-10-09 11:56:57,027 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2023-10-09 11:57:03,852 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2749936.0, ans=0.125 2023-10-09 11:57:09,968 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2749936.0, ans=0.125 2023-10-09 11:57:11,022 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2749982.6666666665, ans=0.02 2023-10-09 11:57:12,367 INFO [train.py:1031] (3/4) Epoch 14, batch 4550, loss[loss=0.2118, simple_loss=0.2667, pruned_loss=0.05848, ctc_loss=0.09967, over 16728.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.289, pruned_loss=0.06687, ctc_loss=0.1155, over 3326894.36 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:57:15,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2749982.6666666665, ans=0.0 2023-10-09 11:57:20,843 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.133e+02 3.571e+02 4.090e+02 7.081e+02, threshold=7.142e+02, percent-clipped=0.0 2023-10-09 11:57:31,605 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2750029.3333333335, ans=0.0 2023-10-09 11:57:49,674 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2023-10-09 11:57:59,648 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-10-09 11:58:14,436 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2750216.0, ans=0.125 2023-10-09 11:58:15,115 INFO [train.py:1031] (3/4) Epoch 14, batch 4600, loss[loss=0.2545, simple_loss=0.2973, pruned_loss=0.07913, ctc_loss=0.1334, over 16803.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2934, pruned_loss=0.06765, ctc_loss=0.1173, over 3322543.39 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:58:23,304 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2750216.0, ans=0.125 2023-10-09 11:58:34,607 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2750262.6666666665, ans=0.125 2023-10-09 11:59:11,338 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2023-10-09 11:59:18,163 INFO [train.py:1031] (3/4) Epoch 14, batch 4650, loss[loss=0.1975, simple_loss=0.2784, pruned_loss=0.04221, ctc_loss=0.08027, over 16815.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.2957, pruned_loss=0.06785, ctc_loss=0.1177, over 3320066.98 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:59:19,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2750449.3333333335, ans=0.125 2023-10-09 11:59:26,031 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2750449.3333333335, ans=0.125 2023-10-09 11:59:28,967 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+02 3.249e+02 3.763e+02 4.381e+02 6.611e+02, threshold=7.526e+02, percent-clipped=0.0 2023-10-09 11:59:34,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2750496.0, ans=0.125 2023-10-09 12:00:17,144 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2750636.0, ans=0.0 2023-10-09 12:00:19,218 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2750682.6666666665, ans=0.1 2023-10-09 12:00:19,912 INFO [train.py:1031] (3/4) Epoch 14, batch 4700, loss[loss=0.251, simple_loss=0.2977, pruned_loss=0.0761, ctc_loss=0.1303, over 16831.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2931, pruned_loss=0.06429, ctc_loss=0.1127, over 3317780.91 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:00:21,327 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:00:23,959 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2750682.6666666665, ans=0.125 2023-10-09 12:01:00,972 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2750822.6666666665, ans=0.0 2023-10-09 12:01:10,626 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2750869.3333333335, ans=0.07 2023-10-09 12:01:16,340 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-10-09 12:01:22,368 INFO [train.py:1031] (3/4) Epoch 14, batch 4750, loss[loss=0.1897, simple_loss=0.2463, pruned_loss=0.0497, ctc_loss=0.08408, over 16603.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2936, pruned_loss=0.06447, ctc_loss=0.1132, over 3324805.64 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 12:01:31,299 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2750916.0, ans=0.1 2023-10-09 12:01:32,393 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2750916.0, ans=0.0 2023-10-09 12:01:35,189 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+02 3.139e+02 3.749e+02 4.382e+02 2.421e+03, threshold=7.497e+02, percent-clipped=2.0 2023-10-09 12:01:37,277 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2750962.6666666665, ans=0.0 2023-10-09 12:01:50,841 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2751009.3333333335, ans=0.125 2023-10-09 12:01:54,587 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2751009.3333333335, ans=0.0 2023-10-09 12:01:55,638 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2751009.3333333335, ans=0.125 2023-10-09 12:02:24,706 INFO [train.py:1031] (3/4) Epoch 14, batch 4800, loss[loss=0.2934, simple_loss=0.3372, pruned_loss=0.09395, ctc_loss=0.154, over 16744.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.294, pruned_loss=0.06331, ctc_loss=0.1115, over 3323685.65 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:02:34,161 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-10-09 12:02:37,306 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2751196.0, ans=0.125 2023-10-09 12:02:43,115 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2751196.0, ans=0.0 2023-10-09 12:02:49,084 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2751242.6666666665, ans=0.0 2023-10-09 12:02:57,143 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2751242.6666666665, ans=0.0 2023-10-09 12:03:26,786 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2751336.0, ans=0.2 2023-10-09 12:03:28,562 INFO [train.py:1031] (3/4) Epoch 14, batch 4850, loss[loss=0.2662, simple_loss=0.3283, pruned_loss=0.07452, ctc_loss=0.1378, over 16912.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2968, pruned_loss=0.06614, ctc_loss=0.1161, over 3302406.46 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:03:29,144 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=15.0 2023-10-09 12:03:31,122 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2751382.6666666665, ans=0.0 2023-10-09 12:03:31,152 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2751382.6666666665, ans=0.0 2023-10-09 12:03:39,865 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2751429.3333333335, ans=0.0 2023-10-09 12:03:42,635 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.280e+02 3.688e+02 4.479e+02 9.310e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 12:03:49,784 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=22.5 2023-10-09 12:03:54,984 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2751476.0, ans=0.2 2023-10-09 12:03:58,448 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=22.5 2023-10-09 12:04:15,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2751522.6666666665, ans=0.125 2023-10-09 12:04:31,382 INFO [train.py:1031] (3/4) Epoch 14, batch 4900, loss[loss=0.2172, simple_loss=0.2877, pruned_loss=0.05454, ctc_loss=0.0938, over 16841.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2971, pruned_loss=0.06503, ctc_loss=0.1144, over 3299799.69 frames. ], batch size: 242, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:04:33,555 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2751616.0, ans=0.2 2023-10-09 12:04:38,236 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-10-09 12:04:58,197 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-10-09 12:05:36,550 INFO [train.py:1031] (3/4) Epoch 14, batch 4950, loss[loss=0.2401, simple_loss=0.2974, pruned_loss=0.06829, ctc_loss=0.1157, over 16712.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.2994, pruned_loss=0.06717, ctc_loss=0.1177, over 3295704.32 frames. ], batch size: 272, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:05:40,600 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2751849.3333333335, ans=0.125 2023-10-09 12:05:49,648 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-10-09 12:05:51,600 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.251e+02 3.637e+02 4.222e+02 8.685e+02, threshold=7.275e+02, percent-clipped=2.0 2023-10-09 12:06:05,588 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2751942.6666666665, ans=0.2 2023-10-09 12:06:17,322 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2751989.3333333335, ans=0.125 2023-10-09 12:06:23,058 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2023-10-09 12:06:25,899 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.17 vs. limit=15.0 2023-10-09 12:06:33,870 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2023-10-09 12:06:39,357 INFO [train.py:1031] (3/4) Epoch 14, batch 5000, loss[loss=0.222, simple_loss=0.2718, pruned_loss=0.0639, ctc_loss=0.1113, over 16809.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.295, pruned_loss=0.06832, ctc_loss=0.1192, over 3300990.29 frames. ], batch size: 273, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:06:57,836 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=12.0 2023-10-09 12:07:06,334 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2752176.0, ans=0.07 2023-10-09 12:07:16,945 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2752222.6666666665, ans=0.0 2023-10-09 12:07:27,620 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2752269.3333333335, ans=0.2 2023-10-09 12:07:32,869 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2752269.3333333335, ans=0.2 2023-10-09 12:07:37,430 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2752269.3333333335, ans=0.0 2023-10-09 12:07:41,578 INFO [train.py:1031] (3/4) Epoch 14, batch 5050, loss[loss=0.2018, simple_loss=0.305, pruned_loss=0.03566, ctc_loss=0.06849, over 16183.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2889, pruned_loss=0.06653, ctc_loss=0.1161, over 3303780.52 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:07:49,482 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2752316.0, ans=10.0 2023-10-09 12:07:56,489 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+02 3.308e+02 3.761e+02 4.513e+02 1.207e+03, threshold=7.522e+02, percent-clipped=1.0 2023-10-09 12:08:00,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2752362.6666666665, ans=0.0 2023-10-09 12:08:07,048 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2023-10-09 12:08:22,378 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2752456.0, ans=10.0 2023-10-09 12:08:30,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2752502.6666666665, ans=0.125 2023-10-09 12:08:32,236 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2752502.6666666665, ans=0.07 2023-10-09 12:08:33,363 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2752502.6666666665, ans=0.0 2023-10-09 12:08:42,501 INFO [train.py:1031] (3/4) Epoch 14, batch 5100, loss[loss=0.2838, simple_loss=0.3212, pruned_loss=0.0932, ctc_loss=0.15, over 16697.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2926, pruned_loss=0.066, ctc_loss=0.1154, over 3299869.24 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:09:21,283 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2752689.3333333335, ans=0.125 2023-10-09 12:09:35,636 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2752736.0, ans=0.0 2023-10-09 12:09:43,374 INFO [train.py:1031] (3/4) Epoch 14, batch 5150, loss[loss=0.221, simple_loss=0.2756, pruned_loss=0.06079, ctc_loss=0.112, over 16955.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2919, pruned_loss=0.0671, ctc_loss=0.1171, over 3302625.51 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:09:44,794 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2752782.6666666665, ans=0.125 2023-10-09 12:09:55,744 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-10-09 12:10:00,362 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.267e+02 3.761e+02 4.649e+02 7.424e+02, threshold=7.522e+02, percent-clipped=0.0 2023-10-09 12:10:13,032 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2752876.0, ans=0.1 2023-10-09 12:10:24,002 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2752922.6666666665, ans=0.125 2023-10-09 12:10:35,183 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2752969.3333333335, ans=0.0 2023-10-09 12:10:45,163 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-10-09 12:10:45,589 INFO [train.py:1031] (3/4) Epoch 14, batch 5200, loss[loss=0.2261, simple_loss=0.292, pruned_loss=0.05813, ctc_loss=0.11, over 16464.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2907, pruned_loss=0.066, ctc_loss=0.1153, over 3310754.09 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:10:51,451 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2753016.0, ans=0.0 2023-10-09 12:11:04,621 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2023-10-09 12:11:06,904 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-10-09 12:11:14,516 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2023-10-09 12:11:19,088 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753109.3333333335, ans=0.1 2023-10-09 12:11:26,824 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2753156.0, ans=0.2 2023-10-09 12:11:30,007 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2753156.0, ans=0.2 2023-10-09 12:11:30,076 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2753156.0, ans=0.125 2023-10-09 12:11:47,488 INFO [train.py:1031] (3/4) Epoch 14, batch 5250, loss[loss=0.227, simple_loss=0.2661, pruned_loss=0.07063, ctc_loss=0.1164, over 16613.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2847, pruned_loss=0.0654, ctc_loss=0.1141, over 3300890.32 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:11:51,221 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2753249.3333333335, ans=0.125 2023-10-09 12:12:05,605 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 2.914e+02 3.261e+02 3.772e+02 6.960e+02, threshold=6.522e+02, percent-clipped=0.0 2023-10-09 12:12:09,421 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2753296.0, ans=0.0 2023-10-09 12:12:10,690 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2753296.0, ans=0.125 2023-10-09 12:12:12,755 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2753342.6666666665, ans=0.0 2023-10-09 12:12:12,827 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2753342.6666666665, ans=0.125 2023-10-09 12:12:19,968 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2753342.6666666665, ans=0.125 2023-10-09 12:12:29,941 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2753389.3333333335, ans=0.125 2023-10-09 12:12:37,896 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2023-10-09 12:12:49,309 INFO [train.py:1031] (3/4) Epoch 14, batch 5300, loss[loss=0.24, simple_loss=0.2977, pruned_loss=0.06817, ctc_loss=0.115, over 16779.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2918, pruned_loss=0.06817, ctc_loss=0.1186, over 3294455.66 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:12:54,386 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-10-09 12:13:10,095 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2753529.3333333335, ans=0.0 2023-10-09 12:13:12,373 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2753529.3333333335, ans=0.0 2023-10-09 12:13:17,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2753576.0, ans=0.125 2023-10-09 12:13:23,325 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2753576.0, ans=0.125 2023-10-09 12:13:24,356 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2753576.0, ans=0.125 2023-10-09 12:13:26,512 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2753622.6666666665, ans=10.0 2023-10-09 12:13:35,298 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753622.6666666665, ans=0.1 2023-10-09 12:13:38,072 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2753622.6666666665, ans=0.0 2023-10-09 12:13:39,175 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2753669.3333333335, ans=0.0 2023-10-09 12:13:46,713 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2753669.3333333335, ans=0.125 2023-10-09 12:13:51,898 INFO [train.py:1031] (3/4) Epoch 14, batch 5350, loss[loss=0.2519, simple_loss=0.3195, pruned_loss=0.06816, ctc_loss=0.1201, over 16785.00 frames. ], tot_loss[loss=0.2472, simple_loss=0.3021, pruned_loss=0.07133, ctc_loss=0.1241, over 3294340.66 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:14:10,571 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2753762.6666666665, ans=0.07 2023-10-09 12:14:12,278 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+02 3.646e+02 4.307e+02 5.553e+02 1.031e+03, threshold=8.614e+02, percent-clipped=13.0 2023-10-09 12:14:46,028 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2753902.6666666665, ans=0.125 2023-10-09 12:14:46,125 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2753902.6666666665, ans=0.1 2023-10-09 12:14:54,878 INFO [train.py:1031] (3/4) Epoch 14, batch 5400, loss[loss=0.2786, simple_loss=0.3085, pruned_loss=0.09223, ctc_loss=0.1605, over 16631.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.3028, pruned_loss=0.072, ctc_loss=0.1257, over 3297958.69 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:15:02,216 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2753949.3333333335, ans=0.125 2023-10-09 12:15:17,861 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2754042.6666666665, ans=0.125 2023-10-09 12:15:19,408 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2754042.6666666665, ans=0.125 2023-10-09 12:15:20,517 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2754042.6666666665, ans=0.5 2023-10-09 12:15:39,669 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2754089.3333333335, ans=0.0 2023-10-09 12:15:47,744 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2754136.0, ans=0.05 2023-10-09 12:15:55,502 INFO [train.py:1031] (3/4) Epoch 14, batch 5450, loss[loss=0.1882, simple_loss=0.2376, pruned_loss=0.05119, ctc_loss=0.09075, over 16847.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.2945, pruned_loss=0.07054, ctc_loss=0.1233, over 3291012.21 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:15:56,845 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2754182.6666666665, ans=0.125 2023-10-09 12:16:03,480 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2754182.6666666665, ans=0.0 2023-10-09 12:16:10,215 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-10-09 12:16:16,183 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.013e+02 3.420e+02 3.920e+02 8.304e+02, threshold=6.840e+02, percent-clipped=0.0 2023-10-09 12:16:41,685 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2754322.6666666665, ans=0.125 2023-10-09 12:16:47,783 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2754369.3333333335, ans=0.2 2023-10-09 12:16:54,738 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2754369.3333333335, ans=0.2 2023-10-09 12:16:55,882 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2754369.3333333335, ans=0.0 2023-10-09 12:16:57,621 INFO [train.py:1031] (3/4) Epoch 14, batch 5500, loss[loss=0.2386, simple_loss=0.2876, pruned_loss=0.06904, ctc_loss=0.129, over 16199.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.2906, pruned_loss=0.07045, ctc_loss=0.1232, over 3288942.89 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:17:28,529 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2754509.3333333335, ans=0.2 2023-10-09 12:17:30,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2754509.3333333335, ans=0.0 2023-10-09 12:17:58,544 INFO [train.py:1031] (3/4) Epoch 14, batch 5550, loss[loss=0.2205, simple_loss=0.2874, pruned_loss=0.05715, ctc_loss=0.09838, over 16891.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.2932, pruned_loss=0.07074, ctc_loss=0.1239, over 3297556.64 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:18:18,622 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+02 3.038e+02 3.521e+02 4.365e+02 6.662e+02, threshold=7.043e+02, percent-clipped=0.0 2023-10-09 12:18:20,571 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2754696.0, ans=0.2 2023-10-09 12:18:26,910 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2754742.6666666665, ans=0.125 2023-10-09 12:18:30,225 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2754742.6666666665, ans=0.125 2023-10-09 12:18:46,354 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.41 vs. limit=6.0 2023-10-09 12:18:48,234 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2754836.0, ans=0.125 2023-10-09 12:18:52,505 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2754836.0, ans=0.1 2023-10-09 12:18:58,079 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2754836.0, ans=0.125 2023-10-09 12:18:58,965 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2754882.6666666665, ans=0.1 2023-10-09 12:18:59,765 INFO [train.py:1031] (3/4) Epoch 14, batch 5600, loss[loss=0.2281, simple_loss=0.3013, pruned_loss=0.05807, ctc_loss=0.09668, over 16866.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2912, pruned_loss=0.06812, ctc_loss=0.1197, over 3300595.41 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:19:00,053 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2754882.6666666665, ans=0.125 2023-10-09 12:19:00,506 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-10-09 12:19:16,248 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2023-10-09 12:19:26,677 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2754976.0, ans=0.125 2023-10-09 12:19:30,419 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2754976.0, ans=0.04949747468305833 2023-10-09 12:19:31,376 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2754976.0, ans=0.1 2023-10-09 12:19:34,589 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.93 vs. limit=15.0 2023-10-09 12:19:36,834 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2755022.6666666665, ans=0.09899494936611666 2023-10-09 12:19:53,654 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2755069.3333333335, ans=0.125 2023-10-09 12:20:00,859 INFO [train.py:1031] (3/4) Epoch 14, batch 5650, loss[loss=0.2103, simple_loss=0.2705, pruned_loss=0.05486, ctc_loss=0.101, over 16927.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2901, pruned_loss=0.06786, ctc_loss=0.1191, over 3308903.63 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:20:22,587 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+02 3.073e+02 3.464e+02 4.035e+02 6.010e+02, threshold=6.928e+02, percent-clipped=0.0 2023-10-09 12:20:27,613 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2755209.3333333335, ans=0.125 2023-10-09 12:20:30,208 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2755209.3333333335, ans=0.09899494936611666 2023-10-09 12:20:31,636 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=12.0 2023-10-09 12:20:43,642 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2755256.0, ans=0.2 2023-10-09 12:20:44,608 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2755256.0, ans=0.0 2023-10-09 12:20:53,182 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755302.6666666665, ans=0.1 2023-10-09 12:20:58,543 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2755302.6666666665, ans=10.0 2023-10-09 12:21:01,266 INFO [train.py:1031] (3/4) Epoch 14, batch 5700, loss[loss=0.1806, simple_loss=0.2447, pruned_loss=0.04314, ctc_loss=0.07559, over 16681.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2869, pruned_loss=0.06721, ctc_loss=0.118, over 3312794.00 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:21:09,776 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2755349.3333333335, ans=0.0 2023-10-09 12:21:25,214 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2755396.0, ans=0.125 2023-10-09 12:21:28,875 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2755442.6666666665, ans=0.1 2023-10-09 12:21:37,033 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2755442.6666666665, ans=0.0 2023-10-09 12:22:00,182 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=2755536.0, ans=12.0 2023-10-09 12:22:01,196 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-10-09 12:22:04,416 INFO [train.py:1031] (3/4) Epoch 14, batch 5750, loss[loss=0.2444, simple_loss=0.3064, pruned_loss=0.06698, ctc_loss=0.1211, over 16884.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2835, pruned_loss=0.06484, ctc_loss=0.1141, over 3304205.60 frames. ], batch size: 310, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:22:16,668 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2755629.3333333335, ans=0.95 2023-10-09 12:22:28,327 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 3.061e+02 3.546e+02 4.294e+02 7.342e+02, threshold=7.092e+02, percent-clipped=2.0 2023-10-09 12:22:48,667 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2023-10-09 12:22:58,797 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2755769.3333333335, ans=0.0 2023-10-09 12:23:07,365 INFO [train.py:1031] (3/4) Epoch 14, batch 5800, loss[loss=0.2461, simple_loss=0.2843, pruned_loss=0.07737, ctc_loss=0.1327, over 16665.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2872, pruned_loss=0.06724, ctc_loss=0.1184, over 3301787.62 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:23:08,638 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2755816.0, ans=0.2 2023-10-09 12:23:08,654 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2755816.0, ans=0.1 2023-10-09 12:23:21,606 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2755862.6666666665, ans=0.2 2023-10-09 12:23:25,146 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2023-10-09 12:23:25,240 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2755862.6666666665, ans=15.0 2023-10-09 12:23:29,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2755909.3333333335, ans=0.0 2023-10-09 12:23:44,082 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-10-09 12:23:48,534 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2755956.0, ans=0.0 2023-10-09 12:23:51,260 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2755956.0, ans=0.2 2023-10-09 12:23:58,826 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2023-10-09 12:24:04,583 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2756002.6666666665, ans=0.04949747468305833 2023-10-09 12:24:06,259 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.07 vs. limit=22.5 2023-10-09 12:24:06,438 INFO [train.py:1031] (3/4) Epoch 14, batch 5850, loss[loss=0.2153, simple_loss=0.2581, pruned_loss=0.06494, ctc_loss=0.1064, over 16883.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2827, pruned_loss=0.06645, ctc_loss=0.1169, over 3309969.11 frames. ], batch size: 78, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:24:27,541 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2756096.0, ans=0.0 2023-10-09 12:24:31,818 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.138e+02 3.574e+02 4.171e+02 9.183e+02, threshold=7.147e+02, percent-clipped=2.0 2023-10-09 12:24:48,754 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2756189.3333333335, ans=0.125 2023-10-09 12:24:53,574 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2756236.0, ans=0.125 2023-10-09 12:25:05,812 INFO [train.py:1031] (3/4) Epoch 14, batch 5900, loss[loss=0.269, simple_loss=0.2863, pruned_loss=0.09354, ctc_loss=0.1614, over 16784.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2765, pruned_loss=0.06562, ctc_loss=0.1155, over 3310045.74 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:25:20,020 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2756329.3333333335, ans=0.125 2023-10-09 12:25:36,862 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2756376.0, ans=0.1 2023-10-09 12:25:55,646 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2756469.3333333335, ans=0.1 2023-10-09 12:26:06,036 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2756516.0, ans=0.0 2023-10-09 12:26:06,822 INFO [train.py:1031] (3/4) Epoch 14, batch 5950, loss[loss=0.1785, simple_loss=0.2247, pruned_loss=0.05143, ctc_loss=0.07338, over 10641.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2765, pruned_loss=0.06612, ctc_loss=0.1161, over 3302600.34 frames. ], batch size: 35, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:26:15,144 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2756516.0, ans=0.0 2023-10-09 12:26:31,791 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 3.192e+02 3.462e+02 4.093e+02 6.652e+02, threshold=6.925e+02, percent-clipped=0.0 2023-10-09 12:26:43,035 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2756656.0, ans=0.125 2023-10-09 12:27:00,255 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2756702.6666666665, ans=0.125 2023-10-09 12:27:00,484 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-10-09 12:27:06,848 INFO [train.py:1031] (3/4) Epoch 14, batch 6000, loss[loss=0.1875, simple_loss=0.2654, pruned_loss=0.0403, ctc_loss=0.07266, over 16892.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2827, pruned_loss=0.06544, ctc_loss=0.1153, over 3310412.32 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:27:06,848 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 12:27:23,520 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2297, simple_loss=0.3012, pruned_loss=0.0607, ctc_loss=0.09172, over 1796401.00 frames. 2023-10-09 12:27:23,521 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 12:27:24,236 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2023-10-09 12:27:32,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2756749.3333333335, ans=0.0 2023-10-09 12:27:47,790 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2023-10-09 12:28:09,944 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2756889.3333333335, ans=0.0 2023-10-09 12:28:15,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2756936.0, ans=0.02 2023-10-09 12:28:23,965 INFO [train.py:1031] (3/4) Epoch 14, batch 6050, loss[loss=0.1833, simple_loss=0.2422, pruned_loss=0.04602, ctc_loss=0.08074, over 16778.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2762, pruned_loss=0.06179, ctc_loss=0.1091, over 3313490.08 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:28:41,579 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2757029.3333333335, ans=0.125 2023-10-09 12:28:45,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2757029.3333333335, ans=0.0 2023-10-09 12:28:52,473 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.832e+02 3.391e+02 4.153e+02 6.756e+02, threshold=6.782e+02, percent-clipped=0.0 2023-10-09 12:28:55,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2757076.0, ans=0.125 2023-10-09 12:29:04,236 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:29:04,266 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2757122.6666666665, ans=0.0 2023-10-09 12:29:12,187 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2757169.3333333335, ans=0.0 2023-10-09 12:29:24,168 INFO [train.py:1031] (3/4) Epoch 14, batch 6100, loss[loss=0.2544, simple_loss=0.3189, pruned_loss=0.06853, ctc_loss=0.1318, over 16744.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2727, pruned_loss=0.06188, ctc_loss=0.1093, over 3312554.70 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:29:34,759 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2757216.0, ans=0.2 2023-10-09 12:29:51,921 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2757309.3333333335, ans=0.125 2023-10-09 12:29:53,990 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2757309.3333333335, ans=0.1 2023-10-09 12:30:00,442 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2757356.0, ans=0.125 2023-10-09 12:30:01,546 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2757356.0, ans=0.0 2023-10-09 12:30:01,650 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2757356.0, ans=0.0 2023-10-09 12:30:05,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2757356.0, ans=0.125 2023-10-09 12:30:16,420 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2757402.6666666665, ans=0.0 2023-10-09 12:30:20,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2757402.6666666665, ans=0.1 2023-10-09 12:30:22,274 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2757402.6666666665, ans=0.125 2023-10-09 12:30:22,362 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2757402.6666666665, ans=0.125 2023-10-09 12:30:25,861 INFO [train.py:1031] (3/4) Epoch 14, batch 6150, loss[loss=0.2431, simple_loss=0.3056, pruned_loss=0.06489, ctc_loss=0.1268, over 16635.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2775, pruned_loss=0.06056, ctc_loss=0.1077, over 3304027.51 frames. ], batch size: 351, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:30:29,048 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2757449.3333333335, ans=0.0 2023-10-09 12:30:45,352 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2023-10-09 12:30:56,530 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 2.946e+02 3.372e+02 3.986e+02 9.785e+02, threshold=6.744e+02, percent-clipped=2.0 2023-10-09 12:31:26,111 INFO [train.py:1031] (3/4) Epoch 14, batch 6200, loss[loss=0.2406, simple_loss=0.2858, pruned_loss=0.07191, ctc_loss=0.1288, over 16853.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2797, pruned_loss=0.06196, ctc_loss=0.1102, over 3309396.58 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:31:29,214 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2757682.6666666665, ans=0.125 2023-10-09 12:31:33,008 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2757682.6666666665, ans=0.05 2023-10-09 12:31:40,482 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-10-09 12:31:53,339 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2757776.0, ans=0.1 2023-10-09 12:32:01,998 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2757822.6666666665, ans=0.125 2023-10-09 12:32:03,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2757822.6666666665, ans=0.0 2023-10-09 12:32:04,809 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2757822.6666666665, ans=0.125 2023-10-09 12:32:24,171 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2757869.3333333335, ans=0.1 2023-10-09 12:32:26,032 INFO [train.py:1031] (3/4) Epoch 14, batch 6250, loss[loss=0.2053, simple_loss=0.2682, pruned_loss=0.05281, ctc_loss=0.09226, over 16739.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2791, pruned_loss=0.06106, ctc_loss=0.1083, over 3313392.15 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:32:27,559 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2757916.0, ans=0.125 2023-10-09 12:32:43,486 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2757962.6666666665, ans=0.5 2023-10-09 12:32:43,886 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-10-09 12:32:49,209 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-10-09 12:32:55,892 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 2.973e+02 3.416e+02 4.008e+02 8.801e+02, threshold=6.831e+02, percent-clipped=1.0 2023-10-09 12:33:22,332 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2758102.6666666665, ans=0.0 2023-10-09 12:33:26,295 INFO [train.py:1031] (3/4) Epoch 14, batch 6300, loss[loss=0.212, simple_loss=0.2652, pruned_loss=0.05925, ctc_loss=0.1008, over 16680.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2778, pruned_loss=0.05723, ctc_loss=0.1017, over 3309996.27 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:33:33,314 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2758149.3333333335, ans=0.0 2023-10-09 12:33:33,579 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2023-10-09 12:33:57,839 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=22.5 2023-10-09 12:34:02,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2758242.6666666665, ans=0.125 2023-10-09 12:34:11,476 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2758289.3333333335, ans=0.125 2023-10-09 12:34:16,286 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2758336.0, ans=0.0 2023-10-09 12:34:19,887 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-10-09 12:34:24,618 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2758336.0, ans=0.0 2023-10-09 12:34:25,760 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2758336.0, ans=0.0 2023-10-09 12:34:28,496 INFO [train.py:1031] (3/4) Epoch 14, batch 6350, loss[loss=0.2535, simple_loss=0.318, pruned_loss=0.07002, ctc_loss=0.1221, over 15136.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2779, pruned_loss=0.0578, ctc_loss=0.1026, over 3312647.45 frames. ], batch size: 527, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:34:39,803 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:34:50,806 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2758429.3333333335, ans=0.2 2023-10-09 12:35:00,733 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 3.028e+02 3.570e+02 4.973e+02 1.101e+03, threshold=7.141e+02, percent-clipped=8.0 2023-10-09 12:35:11,259 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2758522.6666666665, ans=0.125 2023-10-09 12:35:32,010 INFO [train.py:1031] (3/4) Epoch 14, batch 6400, loss[loss=0.2365, simple_loss=0.3093, pruned_loss=0.06163, ctc_loss=0.1012, over 16798.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2856, pruned_loss=0.06003, ctc_loss=0.1065, over 3310655.87 frames. ], batch size: 201, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:35:34,237 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2758616.0, ans=0.0 2023-10-09 12:35:51,816 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2758662.6666666665, ans=0.0 2023-10-09 12:36:09,388 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2758756.0, ans=0.1 2023-10-09 12:36:28,462 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2758802.6666666665, ans=0.125 2023-10-09 12:36:29,466 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2758802.6666666665, ans=0.1 2023-10-09 12:36:33,708 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:36:34,417 INFO [train.py:1031] (3/4) Epoch 14, batch 6450, loss[loss=0.2578, simple_loss=0.3143, pruned_loss=0.07403, ctc_loss=0.1331, over 16878.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2969, pruned_loss=0.06324, ctc_loss=0.112, over 3304744.31 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:36:46,442 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:36:51,270 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2758896.0, ans=0.09899494936611666 2023-10-09 12:36:51,296 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2758896.0, ans=0.0 2023-10-09 12:37:09,123 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.527e+02 4.142e+02 5.294e+02 1.315e+03, threshold=8.284e+02, percent-clipped=10.0 2023-10-09 12:37:37,501 INFO [train.py:1031] (3/4) Epoch 14, batch 6500, loss[loss=0.2227, simple_loss=0.3009, pruned_loss=0.05338, ctc_loss=0.09427, over 16848.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.3006, pruned_loss=0.06517, ctc_loss=0.1153, over 3304064.46 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:37:39,561 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2759082.6666666665, ans=0.0 2023-10-09 12:37:40,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2759082.6666666665, ans=0.0 2023-10-09 12:37:58,358 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-10-09 12:38:02,160 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759176.0, ans=0.1 2023-10-09 12:38:09,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2759176.0, ans=0.125 2023-10-09 12:38:28,647 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2759269.3333333335, ans=0.0 2023-10-09 12:38:39,314 INFO [train.py:1031] (3/4) Epoch 14, batch 6550, loss[loss=0.2206, simple_loss=0.2991, pruned_loss=0.05218, ctc_loss=0.09403, over 16897.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.3014, pruned_loss=0.06308, ctc_loss=0.1116, over 3296473.76 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:38:50,025 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2759316.0, ans=0.0 2023-10-09 12:39:12,900 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-10-09 12:39:13,986 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+02 3.110e+02 3.501e+02 4.783e+02 9.305e+02, threshold=7.003e+02, percent-clipped=1.0 2023-10-09 12:39:22,362 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2759456.0, ans=0.125 2023-10-09 12:39:37,526 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2759502.6666666665, ans=0.125 2023-10-09 12:39:37,613 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2759502.6666666665, ans=0.125 2023-10-09 12:39:41,348 INFO [train.py:1031] (3/4) Epoch 14, batch 6600, loss[loss=0.239, simple_loss=0.2847, pruned_loss=0.07045, ctc_loss=0.1313, over 16722.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2984, pruned_loss=0.06241, ctc_loss=0.1106, over 3294420.79 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:39:44,363 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2759549.3333333335, ans=0.0 2023-10-09 12:39:51,116 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2759549.3333333335, ans=0.0 2023-10-09 12:40:39,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2759736.0, ans=0.125 2023-10-09 12:40:43,319 INFO [train.py:1031] (3/4) Epoch 14, batch 6650, loss[loss=0.2554, simple_loss=0.2997, pruned_loss=0.07808, ctc_loss=0.1374, over 15272.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2912, pruned_loss=0.0628, ctc_loss=0.1109, over 3297573.41 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:40:44,592 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2759782.6666666665, ans=0.125 2023-10-09 12:41:15,942 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2759876.0, ans=0.5 2023-10-09 12:41:18,760 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.057e+02 3.351e+02 3.889e+02 6.888e+02, threshold=6.703e+02, percent-clipped=0.0 2023-10-09 12:41:19,218 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2759922.6666666665, ans=0.0 2023-10-09 12:41:40,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759969.3333333335, ans=0.1 2023-10-09 12:41:45,286 INFO [train.py:1031] (3/4) Epoch 14, batch 6700, loss[loss=0.2457, simple_loss=0.3261, pruned_loss=0.0601, ctc_loss=0.1128, over 16852.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2931, pruned_loss=0.06242, ctc_loss=0.1107, over 3301622.51 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:42:03,734 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:42:13,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2760109.3333333335, ans=0.0 2023-10-09 12:42:14,269 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2760109.3333333335, ans=0.0 2023-10-09 12:42:32,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2760156.0, ans=0.07 2023-10-09 12:42:43,766 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2760202.6666666665, ans=0.09899494936611666 2023-10-09 12:42:48,658 INFO [train.py:1031] (3/4) Epoch 14, batch 6750, loss[loss=0.2463, simple_loss=0.312, pruned_loss=0.06662, ctc_loss=0.1184, over 16937.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.3034, pruned_loss=0.06481, ctc_loss=0.1158, over 3295555.53 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:43:08,110 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2760296.0, ans=0.07 2023-10-09 12:43:25,360 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 3.277e+02 3.937e+02 4.777e+02 6.969e+02, threshold=7.873e+02, percent-clipped=1.0 2023-10-09 12:43:26,897 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2760389.3333333335, ans=0.2 2023-10-09 12:43:28,238 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.84 vs. limit=6.0 2023-10-09 12:43:49,732 INFO [train.py:1031] (3/4) Epoch 14, batch 6800, loss[loss=0.2398, simple_loss=0.2917, pruned_loss=0.07016, ctc_loss=0.1188, over 16842.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.302, pruned_loss=0.06612, ctc_loss=0.1178, over 3293152.12 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:43:55,418 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2760482.6666666665, ans=0.125 2023-10-09 12:44:24,240 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2760576.0, ans=0.125 2023-10-09 12:44:27,430 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:44:34,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2760622.6666666665, ans=0.125 2023-10-09 12:44:34,492 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2760622.6666666665, ans=0.0 2023-10-09 12:44:35,566 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2760622.6666666665, ans=0.125 2023-10-09 12:44:38,760 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2760669.3333333335, ans=0.125 2023-10-09 12:44:38,892 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2760669.3333333335, ans=0.125 2023-10-09 12:44:51,403 INFO [train.py:1031] (3/4) Epoch 14, batch 6850, loss[loss=0.2727, simple_loss=0.3369, pruned_loss=0.07477, ctc_loss=0.1475, over 16547.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.3, pruned_loss=0.06536, ctc_loss=0.1165, over 3297985.80 frames. ], batch size: 350, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:45:09,259 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2760762.6666666665, ans=0.5 2023-10-09 12:45:19,068 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2023-10-09 12:45:26,070 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2760809.3333333335, ans=0.1 2023-10-09 12:45:28,873 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.178e+02 3.827e+02 4.505e+02 1.079e+03, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 12:45:29,451 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2023-10-09 12:45:35,787 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2760856.0, ans=0.125 2023-10-09 12:45:43,323 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2760902.6666666665, ans=0.125 2023-10-09 12:45:48,207 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2760902.6666666665, ans=0.125 2023-10-09 12:45:54,911 INFO [train.py:1031] (3/4) Epoch 14, batch 6900, loss[loss=0.256, simple_loss=0.3101, pruned_loss=0.07444, ctc_loss=0.1324, over 16736.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.3018, pruned_loss=0.06663, ctc_loss=0.1183, over 3305768.08 frames. ], batch size: 201, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 12:45:57,974 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=12.0 2023-10-09 12:46:01,733 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2760949.3333333335, ans=0.125 2023-10-09 12:46:30,335 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2761089.3333333335, ans=0.125 2023-10-09 12:46:30,639 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.54 vs. limit=10.0 2023-10-09 12:46:35,224 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2023-10-09 12:46:45,981 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2761136.0, ans=0.0 2023-10-09 12:46:55,271 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2761182.6666666665, ans=0.125 2023-10-09 12:46:55,999 INFO [train.py:1031] (3/4) Epoch 14, batch 6950, loss[loss=0.2245, simple_loss=0.2844, pruned_loss=0.06194, ctc_loss=0.1016, over 16706.00 frames. ], tot_loss[loss=0.2445, simple_loss=0.3037, pruned_loss=0.06844, ctc_loss=0.1211, over 3310279.95 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:46:58,444 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2761182.6666666665, ans=0.125 2023-10-09 12:47:34,796 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.216e+02 3.562e+02 4.288e+02 5.901e+02, threshold=7.125e+02, percent-clipped=0.0 2023-10-09 12:47:55,787 INFO [train.py:1031] (3/4) Epoch 14, batch 7000, loss[loss=0.238, simple_loss=0.2898, pruned_loss=0.06946, ctc_loss=0.118, over 16999.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.3002, pruned_loss=0.06784, ctc_loss=0.1201, over 3316272.14 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:48:06,638 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2023-10-09 12:48:09,464 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2761462.6666666665, ans=0.2 2023-10-09 12:48:47,037 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2761602.6666666665, ans=0.125 2023-10-09 12:48:48,050 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2761602.6666666665, ans=0.0 2023-10-09 12:48:56,168 INFO [train.py:1031] (3/4) Epoch 14, batch 7050, loss[loss=0.1709, simple_loss=0.2436, pruned_loss=0.03529, ctc_loss=0.06902, over 16894.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2915, pruned_loss=0.06525, ctc_loss=0.1157, over 3319261.48 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:49:03,004 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2761649.3333333335, ans=0.0 2023-10-09 12:49:04,401 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2023-10-09 12:49:09,545 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-10-09 12:49:37,464 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2761789.3333333335, ans=0.0 2023-10-09 12:49:38,091 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.795e+02 3.132e+02 3.641e+02 6.976e+02, threshold=6.264e+02, percent-clipped=0.0 2023-10-09 12:49:50,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2761836.0, ans=0.0 2023-10-09 12:49:54,723 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2761836.0, ans=0.0 2023-10-09 12:49:56,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2761836.0, ans=0.0 2023-10-09 12:49:58,342 INFO [train.py:1031] (3/4) Epoch 14, batch 7100, loss[loss=0.1866, simple_loss=0.2218, pruned_loss=0.05497, ctc_loss=0.1034, over 15277.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2824, pruned_loss=0.06332, ctc_loss=0.1124, over 3321703.81 frames. ], batch size: 529, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:50:46,517 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2762022.6666666665, ans=0.1 2023-10-09 12:51:00,104 INFO [train.py:1031] (3/4) Epoch 14, batch 7150, loss[loss=0.2084, simple_loss=0.2599, pruned_loss=0.05812, ctc_loss=0.1019, over 16855.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2748, pruned_loss=0.06237, ctc_loss=0.1104, over 3312492.89 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:51:14,775 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-10-09 12:51:43,559 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.162e+02 3.626e+02 4.175e+02 1.632e+03, threshold=7.251e+02, percent-clipped=2.0 2023-10-09 12:51:53,827 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2762302.6666666665, ans=0.1 2023-10-09 12:52:00,870 INFO [train.py:1031] (3/4) Epoch 14, batch 7200, loss[loss=0.2129, simple_loss=0.2648, pruned_loss=0.0597, ctc_loss=0.1041, over 16769.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2736, pruned_loss=0.06328, ctc_loss=0.1118, over 3314010.89 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:52:12,934 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2762396.0, ans=0.125 2023-10-09 12:52:14,482 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2762396.0, ans=0.0 2023-10-09 12:52:23,402 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2762442.6666666665, ans=0.125 2023-10-09 12:52:27,579 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=22.5 2023-10-09 12:53:03,087 INFO [train.py:1031] (3/4) Epoch 14, batch 7250, loss[loss=0.2089, simple_loss=0.2757, pruned_loss=0.053, ctc_loss=0.09032, over 16761.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2764, pruned_loss=0.06409, ctc_loss=0.1128, over 3314472.91 frames. ], batch size: 140, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:53:03,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2762582.6666666665, ans=0.125 2023-10-09 12:53:11,522 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2762582.6666666665, ans=0.0 2023-10-09 12:53:17,647 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=22.5 2023-10-09 12:53:39,270 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2762676.0, ans=0.125 2023-10-09 12:53:49,742 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+02 3.091e+02 3.554e+02 4.025e+02 7.139e+02, threshold=7.107e+02, percent-clipped=0.0 2023-10-09 12:53:54,393 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:53:56,491 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2762769.3333333335, ans=0.1 2023-10-09 12:54:01,988 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=12.0 2023-10-09 12:54:07,974 INFO [train.py:1031] (3/4) Epoch 14, batch 7300, loss[loss=0.196, simple_loss=0.2468, pruned_loss=0.05427, ctc_loss=0.09162, over 16710.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2778, pruned_loss=0.06221, ctc_loss=0.11, over 3317834.12 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:54:20,015 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2762862.6666666665, ans=0.0 2023-10-09 12:55:04,685 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2763002.6666666665, ans=0.125 2023-10-09 12:55:07,496 INFO [train.py:1031] (3/4) Epoch 14, batch 7350, loss[loss=0.2529, simple_loss=0.2858, pruned_loss=0.08172, ctc_loss=0.1416, over 16617.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2782, pruned_loss=0.06306, ctc_loss=0.1113, over 3312325.49 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:55:10,874 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2023-10-09 12:55:43,474 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-10-09 12:55:50,540 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.022e+02 3.585e+02 4.093e+02 1.089e+03, threshold=7.169e+02, percent-clipped=4.0 2023-10-09 12:56:07,708 INFO [train.py:1031] (3/4) Epoch 14, batch 7400, loss[loss=0.2289, simple_loss=0.2972, pruned_loss=0.05797, ctc_loss=0.1115, over 16893.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.281, pruned_loss=0.06428, ctc_loss=0.1135, over 3321539.64 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:56:21,920 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2763329.3333333335, ans=0.0 2023-10-09 12:56:23,093 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2763329.3333333335, ans=0.125 2023-10-09 12:56:33,671 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2763376.0, ans=0.125 2023-10-09 12:56:48,780 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2763422.6666666665, ans=0.0 2023-10-09 12:57:07,673 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2763469.3333333335, ans=0.125 2023-10-09 12:57:09,480 INFO [train.py:1031] (3/4) Epoch 14, batch 7450, loss[loss=0.3268, simple_loss=0.367, pruned_loss=0.1046, ctc_loss=0.1937, over 16726.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2875, pruned_loss=0.0641, ctc_loss=0.1135, over 3313333.33 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:57:24,830 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2763562.6666666665, ans=0.125 2023-10-09 12:57:32,304 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=12.0 2023-10-09 12:57:36,386 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2023-10-09 12:57:37,222 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2763609.3333333335, ans=0.0 2023-10-09 12:57:46,566 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2763609.3333333335, ans=10.0 2023-10-09 12:57:47,546 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2763656.0, ans=0.125 2023-10-09 12:57:56,151 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2763656.0, ans=0.1 2023-10-09 12:57:58,526 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 3.056e+02 3.585e+02 4.525e+02 9.951e+02, threshold=7.170e+02, percent-clipped=3.0 2023-10-09 12:58:12,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2763749.3333333335, ans=0.125 2023-10-09 12:58:13,553 INFO [train.py:1031] (3/4) Epoch 14, batch 7500, loss[loss=0.213, simple_loss=0.2868, pruned_loss=0.05184, ctc_loss=0.08854, over 16858.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.286, pruned_loss=0.06077, ctc_loss=0.1079, over 3314464.77 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:58:33,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2763796.0, ans=0.0 2023-10-09 12:58:42,294 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2763842.6666666665, ans=0.125 2023-10-09 12:58:42,483 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-10-09 12:58:45,776 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2763842.6666666665, ans=0.125 2023-10-09 12:59:12,000 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2763936.0, ans=0.0 2023-10-09 12:59:13,832 INFO [train.py:1031] (3/4) Epoch 14, batch 7550, loss[loss=0.197, simple_loss=0.271, pruned_loss=0.04468, ctc_loss=0.08392, over 16723.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.285, pruned_loss=0.05839, ctc_loss=0.1038, over 3304338.85 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:59:15,860 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2763982.6666666665, ans=0.125 2023-10-09 12:59:40,473 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-10-09 12:59:59,786 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+02 3.477e+02 4.054e+02 5.168e+02 9.952e+02, threshold=8.108e+02, percent-clipped=5.0 2023-10-09 13:00:08,141 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2764169.3333333335, ans=0.2 2023-10-09 13:00:13,013 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2764169.3333333335, ans=0.125 2023-10-09 13:00:14,751 INFO [train.py:1031] (3/4) Epoch 14, batch 7600, loss[loss=0.2227, simple_loss=0.2828, pruned_loss=0.05924, ctc_loss=0.1103, over 16915.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2849, pruned_loss=0.0609, ctc_loss=0.1079, over 3308722.61 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:00:29,919 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:00:32,057 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2764262.6666666665, ans=0.04949747468305833 2023-10-09 13:00:42,284 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-10-09 13:01:01,687 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2764356.0, ans=0.2 2023-10-09 13:01:01,913 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-10-09 13:01:04,885 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2764402.6666666665, ans=0.125 2023-10-09 13:01:16,410 INFO [train.py:1031] (3/4) Epoch 14, batch 7650, loss[loss=0.2042, simple_loss=0.242, pruned_loss=0.0631, ctc_loss=0.1004, over 10450.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2843, pruned_loss=0.06308, ctc_loss=0.1116, over 3309224.97 frames. ], batch size: 35, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:01:20,962 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2764449.3333333335, ans=0.1 2023-10-09 13:02:03,962 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.219e+02 3.717e+02 4.421e+02 1.818e+03, threshold=7.434e+02, percent-clipped=3.0 2023-10-09 13:02:07,426 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2764636.0, ans=0.125 2023-10-09 13:02:12,711 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2764636.0, ans=0.125 2023-10-09 13:02:16,474 INFO [train.py:1031] (3/4) Epoch 14, batch 7700, loss[loss=0.2049, simple_loss=0.2686, pruned_loss=0.05202, ctc_loss=0.09307, over 16836.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.283, pruned_loss=0.06346, ctc_loss=0.1119, over 3307475.35 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:02:19,474 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2764682.6666666665, ans=0.1 2023-10-09 13:02:37,598 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2764729.3333333335, ans=0.015 2023-10-09 13:02:45,129 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2764776.0, ans=0.125 2023-10-09 13:02:56,409 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2764822.6666666665, ans=0.125 2023-10-09 13:03:05,105 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:03:17,589 INFO [train.py:1031] (3/4) Epoch 14, batch 7750, loss[loss=0.2845, simple_loss=0.2996, pruned_loss=0.1034, ctc_loss=0.1565, over 16376.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2807, pruned_loss=0.06392, ctc_loss=0.1126, over 3308171.98 frames. ], batch size: 70, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:03:18,575 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2764916.0, ans=0.1 2023-10-09 13:03:19,426 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764916.0, ans=0.1 2023-10-09 13:03:44,048 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2765009.3333333335, ans=0.125 2023-10-09 13:03:55,245 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2765056.0, ans=0.0 2023-10-09 13:04:08,224 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.190e+02 3.453e+02 4.126e+02 8.582e+02, threshold=6.905e+02, percent-clipped=1.0 2023-10-09 13:04:11,999 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-10-09 13:04:14,262 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2765102.6666666665, ans=0.125 2023-10-09 13:04:15,362 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2765102.6666666665, ans=0.0 2023-10-09 13:04:20,485 INFO [train.py:1031] (3/4) Epoch 14, batch 7800, loss[loss=0.2455, simple_loss=0.327, pruned_loss=0.06268, ctc_loss=0.09645, over 15114.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2809, pruned_loss=0.06466, ctc_loss=0.1127, over 3306874.80 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:04:29,465 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2765149.3333333335, ans=0.125 2023-10-09 13:04:50,289 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2765242.6666666665, ans=0.0 2023-10-09 13:05:17,003 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2023-10-09 13:05:19,458 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2765336.0, ans=0.1 2023-10-09 13:05:22,025 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2765336.0, ans=15.0 2023-10-09 13:05:23,388 INFO [train.py:1031] (3/4) Epoch 14, batch 7850, loss[loss=0.2317, simple_loss=0.2816, pruned_loss=0.06959, ctc_loss=0.1062, over 16911.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2848, pruned_loss=0.06449, ctc_loss=0.1118, over 3311561.51 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:05:30,173 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2765382.6666666665, ans=0.125 2023-10-09 13:05:32,837 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:05:42,313 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2765429.3333333335, ans=0.125 2023-10-09 13:05:45,031 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2765429.3333333335, ans=0.125 2023-10-09 13:05:49,398 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:06:10,643 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-10-09 13:06:14,907 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+02 3.051e+02 3.784e+02 4.491e+02 1.708e+03, threshold=7.568e+02, percent-clipped=4.0 2023-10-09 13:06:26,263 INFO [train.py:1031] (3/4) Epoch 14, batch 7900, loss[loss=0.1995, simple_loss=0.2673, pruned_loss=0.04945, ctc_loss=0.08196, over 16833.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2879, pruned_loss=0.06248, ctc_loss=0.1087, over 3312380.32 frames. ], batch size: 141, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:07:00,374 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2765709.3333333335, ans=0.2 2023-10-09 13:07:07,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2765756.0, ans=0.0 2023-10-09 13:07:27,540 INFO [train.py:1031] (3/4) Epoch 14, batch 7950, loss[loss=0.2153, simple_loss=0.2786, pruned_loss=0.0564, ctc_loss=0.09813, over 16841.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2881, pruned_loss=0.06241, ctc_loss=0.1085, over 3311437.34 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:07:33,412 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2023-10-09 13:08:19,020 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.992e+02 3.305e+02 3.981e+02 8.015e+02, threshold=6.609e+02, percent-clipped=1.0 2023-10-09 13:08:28,687 INFO [train.py:1031] (3/4) Epoch 14, batch 8000, loss[loss=0.2222, simple_loss=0.2872, pruned_loss=0.05974, ctc_loss=0.09447, over 16782.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2852, pruned_loss=0.06338, ctc_loss=0.1101, over 3299577.86 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:08:37,993 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2766082.6666666665, ans=0.1 2023-10-09 13:08:40,008 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2766129.3333333335, ans=0.2 2023-10-09 13:08:40,087 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2766129.3333333335, ans=0.05 2023-10-09 13:09:13,109 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=2766222.6666666665, ans=15.0 2023-10-09 13:09:29,361 INFO [train.py:1031] (3/4) Epoch 14, batch 8050, loss[loss=0.2146, simple_loss=0.2721, pruned_loss=0.05935, ctc_loss=0.09624, over 17046.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2838, pruned_loss=0.0646, ctc_loss=0.1124, over 3308515.01 frames. ], batch size: 216, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:09:40,095 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.30 vs. limit=10.0 2023-10-09 13:10:15,196 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2766456.0, ans=0.1 2023-10-09 13:10:17,252 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2766502.6666666665, ans=0.125 2023-10-09 13:10:22,705 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.190e+02 3.616e+02 4.250e+02 6.056e+02, threshold=7.233e+02, percent-clipped=0.0 2023-10-09 13:10:30,819 INFO [train.py:1031] (3/4) Epoch 14, batch 8100, loss[loss=0.2251, simple_loss=0.2761, pruned_loss=0.06432, ctc_loss=0.1136, over 16953.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2822, pruned_loss=0.06541, ctc_loss=0.1141, over 3320090.56 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:10:32,433 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-10-09 13:10:50,525 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2766596.0, ans=15.0 2023-10-09 13:11:10,053 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-10-09 13:11:31,971 INFO [train.py:1031] (3/4) Epoch 14, batch 8150, loss[loss=0.2107, simple_loss=0.2767, pruned_loss=0.05302, ctc_loss=0.09633, over 16811.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2804, pruned_loss=0.0657, ctc_loss=0.1151, over 3319196.98 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:12:18,364 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2766922.6666666665, ans=0.2 2023-10-09 13:12:26,044 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 3.114e+02 3.645e+02 4.244e+02 8.238e+02, threshold=7.291e+02, percent-clipped=3.0 2023-10-09 13:12:33,449 INFO [train.py:1031] (3/4) Epoch 14, batch 8200, loss[loss=0.2628, simple_loss=0.3498, pruned_loss=0.06412, ctc_loss=0.1189, over 15173.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2809, pruned_loss=0.06321, ctc_loss=0.1112, over 3317837.03 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:12:49,316 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2023-10-09 13:13:02,579 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2767109.3333333335, ans=0.0 2023-10-09 13:13:28,569 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-10-09 13:13:31,983 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2767202.6666666665, ans=0.125 2023-10-09 13:13:37,132 INFO [train.py:1031] (3/4) Epoch 14, batch 8250, loss[loss=0.2101, simple_loss=0.3076, pruned_loss=0.04088, ctc_loss=0.07728, over 16428.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2898, pruned_loss=0.06205, ctc_loss=0.1101, over 3302671.88 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:13:43,470 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2767249.3333333335, ans=0.125 2023-10-09 13:13:43,710 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=12.0 2023-10-09 13:13:47,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2767249.3333333335, ans=6.0 2023-10-09 13:14:01,213 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2767342.6666666665, ans=0.2 2023-10-09 13:14:13,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2767342.6666666665, ans=0.1 2023-10-09 13:14:32,778 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.653e+02 3.032e+02 3.705e+02 6.938e+02, threshold=6.064e+02, percent-clipped=0.0 2023-10-09 13:14:37,110 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2767436.0, ans=10.0 2023-10-09 13:14:40,098 INFO [train.py:1031] (3/4) Epoch 14, batch 8300, loss[loss=0.2911, simple_loss=0.3567, pruned_loss=0.08192, ctc_loss=0.1539, over 16690.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2894, pruned_loss=0.05866, ctc_loss=0.1048, over 3299293.94 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:14:51,968 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2767529.3333333335, ans=0.025 2023-10-09 13:14:56,847 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2767529.3333333335, ans=0.035 2023-10-09 13:15:05,036 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2023-10-09 13:15:23,098 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2767622.6666666665, ans=0.1 2023-10-09 13:15:23,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2767622.6666666665, ans=0.0 2023-10-09 13:15:42,545 INFO [train.py:1031] (3/4) Epoch 14, batch 8350, loss[loss=0.197, simple_loss=0.2669, pruned_loss=0.04713, ctc_loss=0.08203, over 16793.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2887, pruned_loss=0.05728, ctc_loss=0.1024, over 3299035.27 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:16:01,256 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2767762.6666666665, ans=0.125 2023-10-09 13:16:23,264 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2767856.0, ans=0.125 2023-10-09 13:16:38,122 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.846e+02 3.361e+02 4.142e+02 6.688e+02, threshold=6.722e+02, percent-clipped=2.0 2023-10-09 13:16:44,684 INFO [train.py:1031] (3/4) Epoch 14, batch 8400, loss[loss=0.1655, simple_loss=0.2357, pruned_loss=0.03542, ctc_loss=0.06096, over 16705.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2827, pruned_loss=0.05328, ctc_loss=0.09595, over 3303259.42 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:16:46,345 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-10-09 13:16:58,661 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=22.5 2023-10-09 13:17:06,553 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-10-09 13:17:16,209 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2768042.6666666665, ans=0.125 2023-10-09 13:17:26,954 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.76 vs. limit=10.0 2023-10-09 13:17:33,651 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.12 vs. limit=15.0 2023-10-09 13:17:46,092 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2768136.0, ans=0.125 2023-10-09 13:17:46,328 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-10-09 13:17:48,487 INFO [train.py:1031] (3/4) Epoch 14, batch 8450, loss[loss=0.2584, simple_loss=0.3364, pruned_loss=0.06608, ctc_loss=0.1203, over 16903.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2912, pruned_loss=0.05589, ctc_loss=0.1013, over 3309837.84 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:17:59,423 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2023-10-09 13:18:45,222 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 3.438e+02 4.157e+02 5.474e+02 9.633e+02, threshold=8.314e+02, percent-clipped=10.0 2023-10-09 13:18:48,294 INFO [train.py:1031] (3/4) Epoch 14, batch 8500, loss[loss=0.2479, simple_loss=0.2987, pruned_loss=0.07249, ctc_loss=0.1302, over 16846.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2924, pruned_loss=0.05794, ctc_loss=0.1046, over 3299538.28 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:18:53,452 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2768416.0, ans=0.125 2023-10-09 13:18:55,718 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-10-09 13:19:07,806 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2768462.6666666665, ans=0.125 2023-10-09 13:19:10,834 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2768509.3333333335, ans=0.125 2023-10-09 13:19:30,579 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2768556.0, ans=0.125 2023-10-09 13:19:30,644 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2768556.0, ans=0.125 2023-10-09 13:19:33,140 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=2768556.0, ans=0.1 2023-10-09 13:19:46,533 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2768602.6666666665, ans=0.125 2023-10-09 13:19:48,998 INFO [train.py:1031] (3/4) Epoch 14, batch 8550, loss[loss=0.2111, simple_loss=0.2647, pruned_loss=0.05861, ctc_loss=0.1005, over 16944.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2901, pruned_loss=0.06036, ctc_loss=0.1084, over 3298526.91 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:20:07,474 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2768696.0, ans=0.125 2023-10-09 13:20:13,956 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2768742.6666666665, ans=0.0 2023-10-09 13:20:25,536 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2768742.6666666665, ans=0.125 2023-10-09 13:20:34,856 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2768789.3333333335, ans=0.0 2023-10-09 13:20:35,877 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2768789.3333333335, ans=0.0 2023-10-09 13:20:37,214 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-10-09 13:20:40,184 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2023-10-09 13:20:42,581 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2768836.0, ans=0.125 2023-10-09 13:20:44,812 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2768836.0, ans=0.5 2023-10-09 13:20:50,430 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2768836.0, ans=0.0 2023-10-09 13:20:51,139 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.120e+02 3.650e+02 4.330e+02 6.615e+02, threshold=7.300e+02, percent-clipped=0.0 2023-10-09 13:20:53,245 INFO [train.py:1031] (3/4) Epoch 14, batch 8600, loss[loss=0.2386, simple_loss=0.317, pruned_loss=0.05825, ctc_loss=0.1094, over 16543.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2883, pruned_loss=0.05985, ctc_loss=0.1072, over 3293773.36 frames. ], batch size: 351, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:21:11,909 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2768929.3333333335, ans=0.04949747468305833 2023-10-09 13:21:20,962 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2768976.0, ans=0.1 2023-10-09 13:21:29,286 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-10-09 13:21:30,917 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2769022.6666666665, ans=0.1 2023-10-09 13:21:39,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2769022.6666666665, ans=6.0 2023-10-09 13:21:48,053 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=15.0 2023-10-09 13:21:56,025 INFO [train.py:1031] (3/4) Epoch 14, batch 8650, loss[loss=0.2023, simple_loss=0.2996, pruned_loss=0.03775, ctc_loss=0.07371, over 15075.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2855, pruned_loss=0.05679, ctc_loss=0.1018, over 3292198.24 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:22:22,332 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2769209.3333333335, ans=0.0 2023-10-09 13:22:26,152 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2769209.3333333335, ans=0.0 2023-10-09 13:22:29,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2769209.3333333335, ans=0.125 2023-10-09 13:22:59,313 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 2.781e+02 3.323e+02 4.118e+02 1.274e+03, threshold=6.646e+02, percent-clipped=1.0 2023-10-09 13:23:00,363 INFO [train.py:1031] (3/4) Epoch 14, batch 8700, loss[loss=0.2168, simple_loss=0.2837, pruned_loss=0.05426, ctc_loss=0.1032, over 16868.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2844, pruned_loss=0.05525, ctc_loss=0.09911, over 3302879.98 frames. ], batch size: 189, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:23:19,706 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2769396.0, ans=0.07 2023-10-09 13:23:25,158 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2769442.6666666665, ans=0.0 2023-10-09 13:23:27,258 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:23:37,979 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2769489.3333333335, ans=0.2 2023-10-09 13:23:47,048 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2769489.3333333335, ans=0.0 2023-10-09 13:23:50,233 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2769536.0, ans=0.125 2023-10-09 13:24:00,642 INFO [train.py:1031] (3/4) Epoch 14, batch 8750, loss[loss=0.2202, simple_loss=0.3035, pruned_loss=0.04949, ctc_loss=0.09489, over 16964.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2874, pruned_loss=0.05522, ctc_loss=0.09953, over 3301422.51 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:24:05,902 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2769582.6666666665, ans=0.09899494936611666 2023-10-09 13:24:06,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2769582.6666666665, ans=0.2 2023-10-09 13:24:26,641 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2769676.0, ans=0.2 2023-10-09 13:24:29,773 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2769676.0, ans=10.0 2023-10-09 13:24:32,361 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2769676.0, ans=0.0 2023-10-09 13:24:33,369 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2769676.0, ans=0.125 2023-10-09 13:24:37,752 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2769722.6666666665, ans=0.125 2023-10-09 13:24:40,101 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2769722.6666666665, ans=0.125 2023-10-09 13:24:52,292 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2769769.3333333335, ans=0.125 2023-10-09 13:25:02,703 INFO [train.py:1031] (3/4) Epoch 14, batch 8800, loss[loss=0.1793, simple_loss=0.2654, pruned_loss=0.03492, ctc_loss=0.05847, over 16787.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2851, pruned_loss=0.05131, ctc_loss=0.09318, over 3306801.37 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:25:03,717 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.673e+02 3.234e+02 4.604e+02 9.306e+02, threshold=6.469e+02, percent-clipped=8.0 2023-10-09 13:25:04,994 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=22.5 2023-10-09 13:25:11,528 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2769816.0, ans=0.2 2023-10-09 13:25:21,440 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2769862.6666666665, ans=0.125 2023-10-09 13:25:27,434 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2769909.3333333335, ans=0.125 2023-10-09 13:25:28,598 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2769909.3333333335, ans=0.0 2023-10-09 13:26:05,139 INFO [train.py:1031] (3/4) Epoch 14, batch 8850, loss[loss=0.1656, simple_loss=0.2469, pruned_loss=0.03027, ctc_loss=0.05937, over 16799.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2786, pruned_loss=0.04713, ctc_loss=0.08588, over 3309434.31 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:26:13,728 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2770049.3333333335, ans=15.0 2023-10-09 13:26:14,434 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2770049.3333333335, ans=0.1 2023-10-09 13:26:21,882 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2770096.0, ans=0.125 2023-10-09 13:26:44,586 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2023-10-09 13:27:05,689 INFO [train.py:1031] (3/4) Epoch 14, batch 8900, loss[loss=0.2259, simple_loss=0.2845, pruned_loss=0.06211, ctc_loss=0.1078, over 16865.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2738, pruned_loss=0.04618, ctc_loss=0.08362, over 3298149.29 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:27:05,911 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2770282.6666666665, ans=0.125 2023-10-09 13:27:08,454 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.336e+02 2.693e+02 3.509e+02 6.659e+02, threshold=5.387e+02, percent-clipped=1.0 2023-10-09 13:27:26,431 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2770329.3333333335, ans=0.0 2023-10-09 13:27:29,535 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2770376.0, ans=0.125 2023-10-09 13:27:29,574 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2770376.0, ans=0.0 2023-10-09 13:27:46,867 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2023-10-09 13:27:53,500 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2770422.6666666665, ans=0.0 2023-10-09 13:28:02,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2770469.3333333335, ans=0.035 2023-10-09 13:28:08,427 INFO [train.py:1031] (3/4) Epoch 14, batch 8950, loss[loss=0.236, simple_loss=0.2816, pruned_loss=0.06985, ctc_loss=0.1269, over 16801.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.2747, pruned_loss=0.0507, ctc_loss=0.09068, over 3301699.54 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:28:21,615 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-10-09 13:28:24,751 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-10-09 13:28:33,553 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2770609.3333333335, ans=0.125 2023-10-09 13:28:33,616 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2770609.3333333335, ans=0.1 2023-10-09 13:29:06,192 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2770702.6666666665, ans=0.07 2023-10-09 13:29:10,860 INFO [train.py:1031] (3/4) Epoch 14, batch 9000, loss[loss=0.2057, simple_loss=0.2532, pruned_loss=0.05884, ctc_loss=0.1014, over 16726.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2712, pruned_loss=0.0537, ctc_loss=0.09537, over 3299731.36 frames. ], batch size: 140, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:29:10,860 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 13:29:26,448 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2412, simple_loss=0.3097, pruned_loss=0.06635, ctc_loss=0.1001, over 1796401.00 frames. 2023-10-09 13:29:26,449 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 13:29:29,126 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.435e+02 3.467e+02 3.875e+02 4.625e+02 8.873e+02, threshold=7.750e+02, percent-clipped=12.0 2023-10-09 13:30:02,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2770889.3333333335, ans=0.0 2023-10-09 13:30:10,210 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2770889.3333333335, ans=0.125 2023-10-09 13:30:13,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2770889.3333333335, ans=0.035 2023-10-09 13:30:28,024 INFO [train.py:1031] (3/4) Epoch 14, batch 9050, loss[loss=0.1966, simple_loss=0.2494, pruned_loss=0.05352, ctc_loss=0.09193, over 16804.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2666, pruned_loss=0.05507, ctc_loss=0.0973, over 3300420.27 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:30:39,097 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2771029.3333333335, ans=0.0 2023-10-09 13:30:42,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2771029.3333333335, ans=0.125 2023-10-09 13:30:45,553 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2771029.3333333335, ans=0.1 2023-10-09 13:31:27,916 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2771216.0, ans=0.09899494936611666 2023-10-09 13:31:29,169 INFO [train.py:1031] (3/4) Epoch 14, batch 9100, loss[loss=0.1946, simple_loss=0.245, pruned_loss=0.05355, ctc_loss=0.09271, over 16904.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2632, pruned_loss=0.05545, ctc_loss=0.09771, over 3289969.48 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:31:34,401 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.919e+02 3.286e+02 3.918e+02 6.845e+02, threshold=6.573e+02, percent-clipped=0.0 2023-10-09 13:31:49,591 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771262.6666666665, ans=0.1 2023-10-09 13:31:59,263 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2771309.3333333335, ans=0.125 2023-10-09 13:32:11,605 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2771356.0, ans=0.2 2023-10-09 13:32:18,137 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2771402.6666666665, ans=0.125 2023-10-09 13:32:19,217 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2771402.6666666665, ans=0.125 2023-10-09 13:32:21,318 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2771402.6666666665, ans=0.125 2023-10-09 13:32:30,943 INFO [train.py:1031] (3/4) Epoch 14, batch 9150, loss[loss=0.2044, simple_loss=0.2717, pruned_loss=0.04927, ctc_loss=0.09634, over 16948.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2634, pruned_loss=0.05291, ctc_loss=0.09415, over 3278006.38 frames. ], batch size: 216, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:32:37,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2771449.3333333335, ans=0.2 2023-10-09 13:32:43,943 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-10-09 13:32:48,308 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=8.0 2023-10-09 13:33:30,898 INFO [train.py:1031] (3/4) Epoch 14, batch 9200, loss[loss=0.2794, simple_loss=0.3046, pruned_loss=0.09421, ctc_loss=0.1645, over 16638.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2677, pruned_loss=0.05548, ctc_loss=0.09827, over 3276220.49 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:33:34,527 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771682.6666666665, ans=0.1 2023-10-09 13:33:37,286 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.803e+02 3.406e+02 4.271e+02 8.869e+02, threshold=6.811e+02, percent-clipped=4.0 2023-10-09 13:33:55,611 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2771776.0, ans=0.125 2023-10-09 13:34:13,220 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2771822.6666666665, ans=0.2 2023-10-09 13:34:32,045 INFO [train.py:1031] (3/4) Epoch 14, batch 9250, loss[loss=0.1879, simple_loss=0.2561, pruned_loss=0.04426, ctc_loss=0.07791, over 16795.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2692, pruned_loss=0.05696, ctc_loss=0.1006, over 3280479.91 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:34:37,634 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2771916.0, ans=0.0 2023-10-09 13:34:41,374 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2771916.0, ans=0.2 2023-10-09 13:35:03,813 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2772009.3333333335, ans=0.1 2023-10-09 13:35:10,769 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:35:17,334 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2772056.0, ans=0.125 2023-10-09 13:35:24,704 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2772102.6666666665, ans=0.125 2023-10-09 13:35:33,973 INFO [train.py:1031] (3/4) Epoch 14, batch 9300, loss[loss=0.2403, simple_loss=0.3288, pruned_loss=0.05499, ctc_loss=0.1046, over 16241.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2694, pruned_loss=0.05588, ctc_loss=0.09918, over 3283643.84 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:35:38,086 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:35:41,376 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.952e+02 3.315e+02 3.911e+02 8.519e+02, threshold=6.629e+02, percent-clipped=4.0 2023-10-09 13:35:49,030 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2772196.0, ans=0.125 2023-10-09 13:35:56,747 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2772196.0, ans=0.0 2023-10-09 13:35:59,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2772242.6666666665, ans=0.125 2023-10-09 13:36:07,274 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.31 vs. limit=10.0 2023-10-09 13:36:15,091 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=15.0 2023-10-09 13:36:31,712 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2772336.0, ans=0.07 2023-10-09 13:36:34,005 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2772336.0, ans=0.125 2023-10-09 13:36:35,794 INFO [train.py:1031] (3/4) Epoch 14, batch 9350, loss[loss=0.1951, simple_loss=0.265, pruned_loss=0.04619, ctc_loss=0.08177, over 16748.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2757, pruned_loss=0.05728, ctc_loss=0.1021, over 3286264.15 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:36:36,054 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2772382.6666666665, ans=0.0 2023-10-09 13:36:42,582 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2772382.6666666665, ans=0.035 2023-10-09 13:37:15,829 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2772522.6666666665, ans=0.0 2023-10-09 13:37:17,934 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2772522.6666666665, ans=0.1 2023-10-09 13:37:29,260 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2772569.3333333335, ans=0.125 2023-10-09 13:37:30,969 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2772569.3333333335, ans=0.05 2023-10-09 13:37:39,103 INFO [train.py:1031] (3/4) Epoch 14, batch 9400, loss[loss=0.1958, simple_loss=0.2814, pruned_loss=0.03948, ctc_loss=0.07817, over 16770.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2812, pruned_loss=0.05652, ctc_loss=0.1015, over 3281433.67 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:37:39,362 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2772616.0, ans=0.125 2023-10-09 13:37:39,476 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2772616.0, ans=0.2 2023-10-09 13:37:46,565 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 3.369e+02 4.289e+02 5.603e+02 1.054e+03, threshold=8.577e+02, percent-clipped=14.0 2023-10-09 13:37:51,876 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2772662.6666666665, ans=0.0 2023-10-09 13:37:56,822 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2772662.6666666665, ans=0.1 2023-10-09 13:38:07,152 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=22.5 2023-10-09 13:38:40,996 INFO [train.py:1031] (3/4) Epoch 14, batch 9450, loss[loss=0.2265, simple_loss=0.294, pruned_loss=0.05807, ctc_loss=0.1073, over 16965.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2842, pruned_loss=0.05428, ctc_loss=0.0978, over 3277421.96 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:38:42,758 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-10-09 13:39:04,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2772896.0, ans=10.0 2023-10-09 13:39:15,933 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2772942.6666666665, ans=0.125 2023-10-09 13:39:15,994 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2772942.6666666665, ans=0.125 2023-10-09 13:39:25,330 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2772989.3333333335, ans=10.0 2023-10-09 13:39:35,370 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-10-09 13:39:43,172 INFO [train.py:1031] (3/4) Epoch 14, batch 9500, loss[loss=0.2315, simple_loss=0.2792, pruned_loss=0.06961, ctc_loss=0.1115, over 16800.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2836, pruned_loss=0.05658, ctc_loss=0.1008, over 3281449.19 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:39:44,900 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=22.5 2023-10-09 13:39:51,848 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 3.046e+02 3.571e+02 4.127e+02 8.787e+02, threshold=7.141e+02, percent-clipped=1.0 2023-10-09 13:40:00,852 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2773129.3333333335, ans=0.125 2023-10-09 13:40:03,939 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.54 vs. limit=6.0 2023-10-09 13:40:09,039 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2773176.0, ans=0.0 2023-10-09 13:40:28,718 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-10-09 13:40:46,274 INFO [train.py:1031] (3/4) Epoch 14, batch 9550, loss[loss=0.2342, simple_loss=0.2861, pruned_loss=0.06652, ctc_loss=0.1231, over 16243.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2879, pruned_loss=0.06122, ctc_loss=0.1083, over 3291368.65 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:40:47,765 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2773316.0, ans=0.0 2023-10-09 13:40:53,640 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2023-10-09 13:40:57,536 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=12.0 2023-10-09 13:41:11,370 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2773409.3333333335, ans=0.125 2023-10-09 13:41:16,224 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2773409.3333333335, ans=0.125 2023-10-09 13:41:42,508 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2773502.6666666665, ans=0.125 2023-10-09 13:41:44,535 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2773502.6666666665, ans=0.0 2023-10-09 13:41:48,549 INFO [train.py:1031] (3/4) Epoch 14, batch 9600, loss[loss=0.2495, simple_loss=0.3073, pruned_loss=0.07158, ctc_loss=0.1214, over 16876.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2922, pruned_loss=0.06407, ctc_loss=0.113, over 3296584.68 frames. ], batch size: 141, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:41:56,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2773549.3333333335, ans=0.0 2023-10-09 13:42:00,632 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.675e+02 3.309e+02 3.670e+02 4.199e+02 1.268e+03, threshold=7.340e+02, percent-clipped=3.0 2023-10-09 13:42:35,275 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2773689.3333333335, ans=0.2 2023-10-09 13:42:39,805 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2773736.0, ans=0.125 2023-10-09 13:42:52,927 INFO [train.py:1031] (3/4) Epoch 14, batch 9650, loss[loss=0.2525, simple_loss=0.321, pruned_loss=0.06773, ctc_loss=0.1212, over 16807.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2948, pruned_loss=0.0654, ctc_loss=0.1155, over 3293995.30 frames. ], batch size: 329, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:43:09,541 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:43:09,586 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2773829.3333333335, ans=0.125 2023-10-09 13:43:13,031 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-10-09 13:43:20,480 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=22.5 2023-10-09 13:43:22,768 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2773876.0, ans=0.2 2023-10-09 13:43:46,985 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=22.5 2023-10-09 13:43:49,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2773969.3333333335, ans=0.125 2023-10-09 13:43:55,644 INFO [train.py:1031] (3/4) Epoch 14, batch 9700, loss[loss=0.2386, simple_loss=0.2846, pruned_loss=0.06945, ctc_loss=0.1342, over 16803.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2935, pruned_loss=0.06336, ctc_loss=0.1124, over 3300686.17 frames. ], batch size: 329, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:43:56,181 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-10-09 13:44:00,261 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2023-10-09 13:44:06,804 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.889e+02 3.444e+02 4.302e+02 1.235e+03, threshold=6.889e+02, percent-clipped=2.0 2023-10-09 13:44:13,048 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2774062.6666666665, ans=0.0 2023-10-09 13:44:14,085 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2774062.6666666665, ans=0.0 2023-10-09 13:44:30,306 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774109.3333333335, ans=0.1 2023-10-09 13:44:44,443 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2774202.6666666665, ans=0.2 2023-10-09 13:44:54,288 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2774202.6666666665, ans=0.0 2023-10-09 13:44:56,792 INFO [train.py:1031] (3/4) Epoch 14, batch 9750, loss[loss=0.2133, simple_loss=0.2525, pruned_loss=0.06501, ctc_loss=0.1101, over 16732.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2865, pruned_loss=0.06313, ctc_loss=0.1117, over 3298261.61 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:45:11,559 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2774296.0, ans=0.05 2023-10-09 13:45:11,948 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.24 vs. limit=10.0 2023-10-09 13:45:42,793 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2774389.3333333335, ans=0.0 2023-10-09 13:45:45,514 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2774389.3333333335, ans=0.2 2023-10-09 13:45:45,968 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2023-10-09 13:45:47,640 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2774436.0, ans=0.04949747468305833 2023-10-09 13:45:59,207 INFO [train.py:1031] (3/4) Epoch 14, batch 9800, loss[loss=0.2018, simple_loss=0.3055, pruned_loss=0.03554, ctc_loss=0.06753, over 16254.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2836, pruned_loss=0.06097, ctc_loss=0.1082, over 3304215.62 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:46:11,661 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.062e+02 3.512e+02 4.119e+02 7.038e+02, threshold=7.024e+02, percent-clipped=1.0 2023-10-09 13:46:20,964 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774529.3333333335, ans=0.1 2023-10-09 13:46:34,798 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2774576.0, ans=0.125 2023-10-09 13:46:34,839 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2774576.0, ans=0.125 2023-10-09 13:46:49,068 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2774669.3333333335, ans=0.1 2023-10-09 13:46:53,854 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774669.3333333335, ans=0.1 2023-10-09 13:47:01,110 INFO [train.py:1031] (3/4) Epoch 14, batch 9850, loss[loss=0.2323, simple_loss=0.2722, pruned_loss=0.0726, ctc_loss=0.118, over 16735.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2851, pruned_loss=0.06202, ctc_loss=0.1098, over 3308866.57 frames. ], batch size: 140, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:47:38,761 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2774856.0, ans=0.125 2023-10-09 13:48:02,738 INFO [train.py:1031] (3/4) Epoch 14, batch 9900, loss[loss=0.1648, simple_loss=0.2223, pruned_loss=0.03938, ctc_loss=0.0715, over 16692.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2792, pruned_loss=0.06136, ctc_loss=0.1083, over 3308802.63 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:48:04,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2774949.3333333335, ans=0.2 2023-10-09 13:48:10,499 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774949.3333333335, ans=0.125 2023-10-09 13:48:16,792 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+02 2.878e+02 3.183e+02 3.713e+02 1.156e+03, threshold=6.367e+02, percent-clipped=1.0 2023-10-09 13:48:29,367 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2775042.6666666665, ans=0.125 2023-10-09 13:48:30,450 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2775042.6666666665, ans=0.0 2023-10-09 13:48:39,660 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2775089.3333333335, ans=0.125 2023-10-09 13:48:54,276 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2775136.0, ans=0.0 2023-10-09 13:49:00,193 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:49:05,472 INFO [train.py:1031] (3/4) Epoch 14, batch 9950, loss[loss=0.1831, simple_loss=0.2451, pruned_loss=0.04478, ctc_loss=0.07901, over 16674.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2726, pruned_loss=0.0601, ctc_loss=0.1063, over 3292437.60 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:49:16,558 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2775229.3333333335, ans=0.05 2023-10-09 13:49:21,016 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2775229.3333333335, ans=0.2 2023-10-09 13:49:21,510 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-10-09 13:49:24,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2775229.3333333335, ans=0.025 2023-10-09 13:49:32,667 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2775276.0, ans=0.1 2023-10-09 13:49:46,424 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=12.0 2023-10-09 13:49:49,235 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=22.5 2023-10-09 13:49:52,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2775322.6666666665, ans=0.1 2023-10-09 13:49:56,553 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2775369.3333333335, ans=0.2 2023-10-09 13:50:08,677 INFO [train.py:1031] (3/4) Epoch 14, batch 10000, loss[loss=0.2004, simple_loss=0.2373, pruned_loss=0.06037, ctc_loss=0.1071, over 16180.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2676, pruned_loss=0.05775, ctc_loss=0.102, over 3288388.63 frames. ], batch size: 466, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:50:10,068 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:50:17,637 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2775416.0, ans=0.1 2023-10-09 13:50:24,183 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2775462.6666666665, ans=0.125 2023-10-09 13:50:24,893 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+02 2.835e+02 3.193e+02 3.667e+02 1.150e+03, threshold=6.386e+02, percent-clipped=3.0 2023-10-09 13:50:36,554 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2775509.3333333335, ans=0.0 2023-10-09 13:50:38,549 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:50:38,616 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2775509.3333333335, ans=0.125 2023-10-09 13:50:45,414 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2775556.0, ans=6.0 2023-10-09 13:50:47,809 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2775556.0, ans=0.1 2023-10-09 13:50:49,902 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2775556.0, ans=0.125 2023-10-09 13:51:10,632 INFO [train.py:1031] (3/4) Epoch 14, batch 10050, loss[loss=0.2025, simple_loss=0.2587, pruned_loss=0.05332, ctc_loss=0.09941, over 16806.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2641, pruned_loss=0.05805, ctc_loss=0.1022, over 3299524.01 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:51:21,338 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2775649.3333333335, ans=0.0 2023-10-09 13:51:26,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2775696.0, ans=0.125 2023-10-09 13:51:29,877 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2775696.0, ans=0.125 2023-10-09 13:51:29,957 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2775696.0, ans=0.2 2023-10-09 13:51:47,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2775742.6666666665, ans=0.125 2023-10-09 13:51:54,055 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2775789.3333333335, ans=0.0 2023-10-09 13:52:13,733 INFO [train.py:1031] (3/4) Epoch 14, batch 10100, loss[loss=0.2153, simple_loss=0.2548, pruned_loss=0.0654, ctc_loss=0.1122, over 16663.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2623, pruned_loss=0.05784, ctc_loss=0.1016, over 3293853.55 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:52:22,173 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2775882.6666666665, ans=0.125 2023-10-09 13:52:30,295 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.838e+02 3.162e+02 3.584e+02 6.355e+02, threshold=6.323e+02, percent-clipped=0.0 2023-10-09 13:52:58,547 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2776022.6666666665, ans=0.125 2023-10-09 13:53:05,281 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2776069.3333333335, ans=0.125 2023-10-09 13:53:08,445 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2776069.3333333335, ans=0.125 2023-10-09 13:53:12,915 INFO [train.py:1031] (3/4) Epoch 14, batch 10150, loss[loss=0.2538, simple_loss=0.2876, pruned_loss=0.08199, ctc_loss=0.14, over 16587.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.263, pruned_loss=0.05933, ctc_loss=0.104, over 3297347.71 frames. ], batch size: 351, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:53:46,242 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2776209.3333333335, ans=0.125 2023-10-09 13:53:52,158 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2776256.0, ans=0.1 2023-10-09 13:54:12,026 INFO [train.py:1031] (3/4) Epoch 14, batch 10200, loss[loss=0.2121, simple_loss=0.2673, pruned_loss=0.05895, ctc_loss=0.09753, over 16827.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2655, pruned_loss=0.06115, ctc_loss=0.107, over 3295591.01 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:54:23,049 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2776396.0, ans=0.125 2023-10-09 13:54:28,915 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.256e+02 3.649e+02 4.243e+02 9.669e+02, threshold=7.298e+02, percent-clipped=6.0 2023-10-09 13:54:52,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2776489.3333333335, ans=0.125 2023-10-09 13:55:12,756 INFO [train.py:1031] (3/4) Epoch 14, batch 10250, loss[loss=0.2274, simple_loss=0.2667, pruned_loss=0.07011, ctc_loss=0.1198, over 16804.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2644, pruned_loss=0.06232, ctc_loss=0.1089, over 3304411.29 frames. ], batch size: 329, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:56:00,559 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2776769.3333333335, ans=0.125 2023-10-09 13:56:00,935 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.36 vs. limit=10.0 2023-10-09 13:56:03,291 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2776769.3333333335, ans=0.0 2023-10-09 13:56:14,047 INFO [train.py:1031] (3/4) Epoch 14, batch 10300, loss[loss=0.2245, simple_loss=0.2679, pruned_loss=0.06829, ctc_loss=0.1111, over 16795.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2658, pruned_loss=0.06377, ctc_loss=0.1116, over 3313966.77 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:56:19,680 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2776816.0, ans=0.1 2023-10-09 13:56:24,850 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2776862.6666666665, ans=0.125 2023-10-09 13:56:32,957 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2776862.6666666665, ans=0.125 2023-10-09 13:56:33,609 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+02 3.333e+02 3.833e+02 4.530e+02 9.139e+02, threshold=7.666e+02, percent-clipped=3.0 2023-10-09 13:57:06,876 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2777002.6666666665, ans=0.125 2023-10-09 13:57:13,236 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2777002.6666666665, ans=0.0 2023-10-09 13:57:16,378 INFO [train.py:1031] (3/4) Epoch 14, batch 10350, loss[loss=0.2178, simple_loss=0.29, pruned_loss=0.05373, ctc_loss=0.09539, over 16880.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2671, pruned_loss=0.06256, ctc_loss=0.1098, over 3311293.66 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:57:19,418 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2777049.3333333335, ans=0.0 2023-10-09 13:57:19,872 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2023-10-09 13:57:38,095 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2777096.0, ans=0.0 2023-10-09 13:57:42,928 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:57:50,477 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2777142.6666666665, ans=0.0 2023-10-09 13:58:07,064 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2777236.0, ans=0.125 2023-10-09 13:58:08,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2777236.0, ans=0.0 2023-10-09 13:58:17,994 INFO [train.py:1031] (3/4) Epoch 14, batch 10400, loss[loss=0.2101, simple_loss=0.2772, pruned_loss=0.0535, ctc_loss=0.08989, over 16843.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2679, pruned_loss=0.0576, ctc_loss=0.102, over 3302389.62 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:58:24,350 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2777282.6666666665, ans=0.0 2023-10-09 13:58:25,582 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2777282.6666666665, ans=0.125 2023-10-09 13:58:37,078 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.958e+02 3.554e+02 4.330e+02 8.227e+02, threshold=7.107e+02, percent-clipped=1.0 2023-10-09 13:58:47,606 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2777376.0, ans=0.125 2023-10-09 13:59:01,181 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-10-09 13:59:04,687 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2777422.6666666665, ans=0.0 2023-10-09 13:59:09,906 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-10-09 13:59:18,217 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2777469.3333333335, ans=0.07 2023-10-09 13:59:20,075 INFO [train.py:1031] (3/4) Epoch 14, batch 10450, loss[loss=0.2918, simple_loss=0.3268, pruned_loss=0.09351, ctc_loss=0.1743, over 16701.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2745, pruned_loss=0.05984, ctc_loss=0.1057, over 3309272.83 frames. ], batch size: 328, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:59:23,440 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2777516.0, ans=10.0 2023-10-09 13:59:27,756 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2777516.0, ans=0.125 2023-10-09 13:59:43,336 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=12.0 2023-10-09 13:59:55,789 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2777656.0, ans=10.0 2023-10-09 13:59:59,485 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2777656.0, ans=0.0 2023-10-09 14:00:21,485 INFO [train.py:1031] (3/4) Epoch 14, batch 10500, loss[loss=0.209, simple_loss=0.2579, pruned_loss=0.05932, ctc_loss=0.1035, over 16760.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2747, pruned_loss=0.06207, ctc_loss=0.1093, over 3317007.98 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:00:42,237 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2777796.0, ans=0.2 2023-10-09 14:00:43,439 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+02 3.496e+02 3.857e+02 4.755e+02 1.181e+03, threshold=7.715e+02, percent-clipped=1.0 2023-10-09 14:00:53,276 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.74 vs. limit=6.0 2023-10-09 14:01:04,190 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2777889.3333333335, ans=0.0 2023-10-09 14:01:14,793 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2777936.0, ans=0.125 2023-10-09 14:01:22,069 INFO [train.py:1031] (3/4) Epoch 14, batch 10550, loss[loss=0.2088, simple_loss=0.2655, pruned_loss=0.05683, ctc_loss=0.09597, over 11746.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2703, pruned_loss=0.06111, ctc_loss=0.1073, over 3315692.76 frames. ], batch size: 35, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:01:29,769 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2777982.6666666665, ans=0.125 2023-10-09 14:01:44,646 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2778029.3333333335, ans=0.125 2023-10-09 14:01:48,546 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.32 vs. limit=10.0 2023-10-09 14:01:49,821 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2778076.0, ans=0.0 2023-10-09 14:02:05,113 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2778122.6666666665, ans=0.125 2023-10-09 14:02:06,588 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2023-10-09 14:02:12,476 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2778169.3333333335, ans=0.1 2023-10-09 14:02:22,375 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2778169.3333333335, ans=0.05 2023-10-09 14:02:24,162 INFO [train.py:1031] (3/4) Epoch 14, batch 10600, loss[loss=0.2339, simple_loss=0.3004, pruned_loss=0.06302, ctc_loss=0.1034, over 16719.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2735, pruned_loss=0.06053, ctc_loss=0.1068, over 3316477.35 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:02:41,994 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:02:47,791 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.164e+02 3.650e+02 4.243e+02 8.211e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 14:03:00,379 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2023-10-09 14:03:26,248 INFO [train.py:1031] (3/4) Epoch 14, batch 10650, loss[loss=0.1867, simple_loss=0.2601, pruned_loss=0.04073, ctc_loss=0.07966, over 16876.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2786, pruned_loss=0.06244, ctc_loss=0.1098, over 3323002.63 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:03:46,215 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2778496.0, ans=0.125 2023-10-09 14:04:19,949 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2778636.0, ans=0.0 2023-10-09 14:04:28,565 INFO [train.py:1031] (3/4) Epoch 14, batch 10700, loss[loss=0.205, simple_loss=0.2689, pruned_loss=0.0532, ctc_loss=0.08696, over 16705.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.275, pruned_loss=0.05962, ctc_loss=0.105, over 3320591.05 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:04:39,734 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2778729.3333333335, ans=0.0 2023-10-09 14:04:51,544 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-10-09 14:04:52,616 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 3.058e+02 3.576e+02 4.175e+02 9.953e+02, threshold=7.153e+02, percent-clipped=1.0 2023-10-09 14:05:18,737 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2778869.3333333335, ans=0.125 2023-10-09 14:05:21,824 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-10-09 14:05:26,409 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2778869.3333333335, ans=0.125 2023-10-09 14:05:27,710 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-10-09 14:05:32,626 INFO [train.py:1031] (3/4) Epoch 14, batch 10750, loss[loss=0.2522, simple_loss=0.2934, pruned_loss=0.07752, ctc_loss=0.1398, over 16189.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2802, pruned_loss=0.06216, ctc_loss=0.1088, over 3315226.99 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:05:43,889 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2778962.6666666665, ans=0.2 2023-10-09 14:05:44,087 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-10-09 14:05:48,075 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2778962.6666666665, ans=0.0 2023-10-09 14:05:49,203 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2778962.6666666665, ans=0.125 2023-10-09 14:05:54,176 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2778962.6666666665, ans=0.125 2023-10-09 14:05:57,385 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-10-09 14:05:58,033 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2779009.3333333335, ans=0.125 2023-10-09 14:06:04,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2779009.3333333335, ans=0.2 2023-10-09 14:06:10,786 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.21 vs. limit=10.0 2023-10-09 14:06:23,394 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2779102.6666666665, ans=0.125 2023-10-09 14:06:34,932 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-10-09 14:06:35,718 INFO [train.py:1031] (3/4) Epoch 14, batch 10800, loss[loss=0.2021, simple_loss=0.2535, pruned_loss=0.05556, ctc_loss=0.09889, over 16624.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2784, pruned_loss=0.06335, ctc_loss=0.1107, over 3307597.96 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:06:43,492 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:06:55,111 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:07:01,261 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.600e+02 3.349e+02 3.657e+02 4.515e+02 8.469e+02, threshold=7.313e+02, percent-clipped=4.0 2023-10-09 14:07:26,239 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-10-09 14:07:32,736 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2779336.0, ans=0.09899494936611666 2023-10-09 14:07:36,229 INFO [train.py:1031] (3/4) Epoch 14, batch 10850, loss[loss=0.2069, simple_loss=0.267, pruned_loss=0.05585, ctc_loss=0.08743, over 16919.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2742, pruned_loss=0.06303, ctc_loss=0.11, over 3302978.05 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:08:32,852 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2779569.3333333335, ans=0.0 2023-10-09 14:08:33,082 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2779569.3333333335, ans=15.0 2023-10-09 14:08:38,606 INFO [train.py:1031] (3/4) Epoch 14, batch 10900, loss[loss=0.1987, simple_loss=0.253, pruned_loss=0.0541, ctc_loss=0.09021, over 16767.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2688, pruned_loss=0.06239, ctc_loss=0.1089, over 3298376.36 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:08:39,159 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=22.5 2023-10-09 14:09:05,442 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.210e+02 3.855e+02 4.821e+02 1.226e+03, threshold=7.710e+02, percent-clipped=2.0 2023-10-09 14:09:16,187 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2779756.0, ans=0.125 2023-10-09 14:09:39,057 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=12.0 2023-10-09 14:09:39,568 INFO [train.py:1031] (3/4) Epoch 14, batch 10950, loss[loss=0.2033, simple_loss=0.2439, pruned_loss=0.0604, ctc_loss=0.1049, over 16711.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2648, pruned_loss=0.06148, ctc_loss=0.1076, over 3308048.58 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:09:40,085 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-10-09 14:09:47,371 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2779849.3333333335, ans=0.025 2023-10-09 14:09:52,853 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2779896.0, ans=0.125 2023-10-09 14:10:00,802 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779896.0, ans=0.1 2023-10-09 14:10:14,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2779942.6666666665, ans=0.025 2023-10-09 14:10:42,344 INFO [train.py:1031] (3/4) Epoch 14, batch 11000, loss[loss=0.2394, simple_loss=0.2835, pruned_loss=0.07204, ctc_loss=0.128, over 16855.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.265, pruned_loss=0.06275, ctc_loss=0.1098, over 3305574.04 frames. ], batch size: 292, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:10:57,761 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2780129.3333333335, ans=0.0 2023-10-09 14:11:11,554 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.328e+02 3.883e+02 5.018e+02 9.874e+02, threshold=7.766e+02, percent-clipped=3.0 2023-10-09 14:11:18,879 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=22.5 2023-10-09 14:11:24,359 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2780222.6666666665, ans=0.1 2023-10-09 14:11:46,308 INFO [train.py:1031] (3/4) Epoch 14, batch 11050, loss[loss=0.2316, simple_loss=0.309, pruned_loss=0.0549, ctc_loss=0.1111, over 15183.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2725, pruned_loss=0.0656, ctc_loss=0.1147, over 3304262.73 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:11:52,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2780316.0, ans=0.0 2023-10-09 14:12:01,083 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2780362.6666666665, ans=0.125 2023-10-09 14:12:25,170 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2780456.0, ans=0.125 2023-10-09 14:12:27,179 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2780456.0, ans=0.125 2023-10-09 14:12:36,911 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2780502.6666666665, ans=0.0 2023-10-09 14:12:36,957 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2780502.6666666665, ans=0.125 2023-10-09 14:12:38,066 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2780502.6666666665, ans=0.1 2023-10-09 14:12:49,760 INFO [train.py:1031] (3/4) Epoch 14, batch 11100, loss[loss=0.214, simple_loss=0.3093, pruned_loss=0.04352, ctc_loss=0.0791, over 15213.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2742, pruned_loss=0.06424, ctc_loss=0.1122, over 3291774.46 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:13:01,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2780596.0, ans=0.125 2023-10-09 14:13:03,061 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-10-09 14:13:18,549 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+02 3.611e+02 4.307e+02 5.885e+02 1.880e+03, threshold=8.614e+02, percent-clipped=7.0 2023-10-09 14:13:21,084 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2780642.6666666665, ans=0.0 2023-10-09 14:13:46,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2780736.0, ans=0.125 2023-10-09 14:13:51,664 INFO [train.py:1031] (3/4) Epoch 14, batch 11150, loss[loss=0.2085, simple_loss=0.2593, pruned_loss=0.0586, ctc_loss=0.1012, over 16816.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2718, pruned_loss=0.06353, ctc_loss=0.1104, over 3290037.80 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:13:59,305 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-10-09 14:14:02,775 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2780782.6666666665, ans=0.125 2023-10-09 14:14:05,855 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2780829.3333333335, ans=0.1 2023-10-09 14:14:23,309 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-10-09 14:14:32,284 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2780922.6666666665, ans=0.125 2023-10-09 14:14:40,978 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-10-09 14:14:42,598 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2780969.3333333335, ans=0.0 2023-10-09 14:14:51,791 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-10-09 14:14:53,093 INFO [train.py:1031] (3/4) Epoch 14, batch 11200, loss[loss=0.2775, simple_loss=0.3232, pruned_loss=0.08425, ctc_loss=0.1584, over 15271.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2731, pruned_loss=0.06489, ctc_loss=0.1127, over 3290809.62 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:14:56,421 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2023-10-09 14:15:00,292 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-10-09 14:15:08,217 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2781062.6666666665, ans=0.0 2023-10-09 14:15:20,071 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.26 vs. limit=10.0 2023-10-09 14:15:25,072 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.168e+02 3.497e+02 4.095e+02 1.585e+03, threshold=6.993e+02, percent-clipped=3.0 2023-10-09 14:15:40,064 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2781156.0, ans=0.0 2023-10-09 14:15:41,514 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=22.5 2023-10-09 14:15:45,423 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.37 vs. limit=6.0 2023-10-09 14:15:47,423 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2781202.6666666665, ans=0.125 2023-10-09 14:15:49,457 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2781202.6666666665, ans=0.125 2023-10-09 14:15:49,547 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2781202.6666666665, ans=0.2 2023-10-09 14:15:50,489 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2781202.6666666665, ans=0.125 2023-10-09 14:15:55,972 INFO [train.py:1031] (3/4) Epoch 14, batch 11250, loss[loss=0.2271, simple_loss=0.3041, pruned_loss=0.05471, ctc_loss=0.1018, over 16815.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2849, pruned_loss=0.06628, ctc_loss=0.1155, over 3292865.64 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:16:11,874 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2781296.0, ans=0.2 2023-10-09 14:16:13,592 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781296.0, ans=0.1 2023-10-09 14:16:22,609 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-10-09 14:16:55,374 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2023-10-09 14:17:03,087 INFO [train.py:1031] (3/4) Epoch 14, batch 11300, loss[loss=0.2194, simple_loss=0.2766, pruned_loss=0.06152, ctc_loss=0.09813, over 16717.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2905, pruned_loss=0.06445, ctc_loss=0.1127, over 3293613.20 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:17:05,052 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2781482.6666666665, ans=0.0 2023-10-09 14:17:14,528 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2781529.3333333335, ans=0.2 2023-10-09 14:17:21,896 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=22.5 2023-10-09 14:17:23,706 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2781529.3333333335, ans=0.125 2023-10-09 14:17:25,979 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=12.0 2023-10-09 14:17:28,190 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-10-09 14:17:33,912 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 3.091e+02 3.848e+02 4.979e+02 9.254e+02, threshold=7.696e+02, percent-clipped=6.0 2023-10-09 14:17:50,273 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2781622.6666666665, ans=0.125 2023-10-09 14:17:54,392 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2781669.3333333335, ans=0.07 2023-10-09 14:18:04,196 INFO [train.py:1031] (3/4) Epoch 14, batch 11350, loss[loss=0.2039, simple_loss=0.2599, pruned_loss=0.05468, ctc_loss=0.09641, over 16912.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2879, pruned_loss=0.06172, ctc_loss=0.1084, over 3299380.33 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:18:07,053 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2781716.0, ans=0.125 2023-10-09 14:18:08,653 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2781716.0, ans=0.0 2023-10-09 14:18:52,780 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2781902.6666666665, ans=0.04949747468305833 2023-10-09 14:19:01,320 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2781902.6666666665, ans=0.125 2023-10-09 14:19:05,834 INFO [train.py:1031] (3/4) Epoch 14, batch 11400, loss[loss=0.2693, simple_loss=0.3136, pruned_loss=0.08571, ctc_loss=0.1338, over 17033.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2866, pruned_loss=0.0632, ctc_loss=0.1104, over 3307504.82 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:19:24,860 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2781996.0, ans=0.125 2023-10-09 14:19:25,876 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2781996.0, ans=0.125 2023-10-09 14:19:34,135 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2782042.6666666665, ans=0.125 2023-10-09 14:19:36,216 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2782042.6666666665, ans=0.04949747468305833 2023-10-09 14:19:37,884 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.191e+02 3.489e+02 4.256e+02 5.952e+02, threshold=6.979e+02, percent-clipped=0.0 2023-10-09 14:19:58,763 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:20:01,910 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2023-10-09 14:20:07,767 INFO [train.py:1031] (3/4) Epoch 14, batch 11450, loss[loss=0.2089, simple_loss=0.2679, pruned_loss=0.05568, ctc_loss=0.09623, over 16925.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2843, pruned_loss=0.06432, ctc_loss=0.1122, over 3316662.45 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:20:18,194 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.32 vs. limit=10.0 2023-10-09 14:20:21,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2782229.3333333335, ans=0.125 2023-10-09 14:20:34,204 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2782276.0, ans=0.125 2023-10-09 14:21:08,870 INFO [train.py:1031] (3/4) Epoch 14, batch 11500, loss[loss=0.2405, simple_loss=0.294, pruned_loss=0.07007, ctc_loss=0.1175, over 16758.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2858, pruned_loss=0.06641, ctc_loss=0.1157, over 3313500.69 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:21:09,888 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2782416.0, ans=0.125 2023-10-09 14:21:22,471 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2782462.6666666665, ans=0.1 2023-10-09 14:21:32,168 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2782462.6666666665, ans=0.0 2023-10-09 14:21:38,626 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:39,762 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:43,330 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:44,090 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.359e+02 3.820e+02 4.365e+02 7.019e+02, threshold=7.640e+02, percent-clipped=1.0 2023-10-09 14:21:46,664 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2023-10-09 14:21:48,445 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2782556.0, ans=0.05 2023-10-09 14:22:05,094 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2782602.6666666665, ans=0.04949747468305833 2023-10-09 14:22:11,655 INFO [train.py:1031] (3/4) Epoch 14, batch 11550, loss[loss=0.1889, simple_loss=0.2684, pruned_loss=0.04057, ctc_loss=0.07097, over 16832.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2888, pruned_loss=0.06798, ctc_loss=0.1185, over 3313237.39 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:22:26,915 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=22.5 2023-10-09 14:22:32,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2782696.0, ans=0.125 2023-10-09 14:22:56,734 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2782789.3333333335, ans=0.125 2023-10-09 14:23:08,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2782836.0, ans=0.0 2023-10-09 14:23:11,164 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2782836.0, ans=0.0 2023-10-09 14:23:15,897 INFO [train.py:1031] (3/4) Epoch 14, batch 11600, loss[loss=0.2699, simple_loss=0.3776, pruned_loss=0.05775, ctc_loss=0.1167, over 16291.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2949, pruned_loss=0.06709, ctc_loss=0.1174, over 3308094.62 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:23:32,074 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2782929.3333333335, ans=0.125 2023-10-09 14:23:52,925 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 3.363e+02 4.073e+02 4.861e+02 8.872e+02, threshold=8.146e+02, percent-clipped=3.0 2023-10-09 14:24:04,027 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2783022.6666666665, ans=0.2 2023-10-09 14:24:07,737 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2783069.3333333335, ans=0.2 2023-10-09 14:24:11,941 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2783069.3333333335, ans=0.1 2023-10-09 14:24:19,981 INFO [train.py:1031] (3/4) Epoch 14, batch 11650, loss[loss=0.2229, simple_loss=0.2791, pruned_loss=0.0629, ctc_loss=0.1024, over 16836.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.3006, pruned_loss=0.06826, ctc_loss=0.1196, over 3304618.28 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:24:39,386 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2783162.6666666665, ans=0.125 2023-10-09 14:24:51,975 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2783209.3333333335, ans=0.125 2023-10-09 14:24:58,967 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2783256.0, ans=0.125 2023-10-09 14:25:10,891 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2783302.6666666665, ans=0.1 2023-10-09 14:25:16,952 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2783302.6666666665, ans=0.0 2023-10-09 14:25:21,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2783349.3333333335, ans=0.125 2023-10-09 14:25:23,312 INFO [train.py:1031] (3/4) Epoch 14, batch 11700, loss[loss=0.2188, simple_loss=0.2668, pruned_loss=0.0631, ctc_loss=0.1115, over 16706.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2938, pruned_loss=0.06725, ctc_loss=0.1176, over 3301001.58 frames. ], batch size: 308, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:25:29,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2783349.3333333335, ans=0.125 2023-10-09 14:25:42,596 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2783396.0, ans=0.0 2023-10-09 14:25:43,615 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2783396.0, ans=0.1 2023-10-09 14:25:58,213 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.451e+02 4.279e+02 5.142e+02 9.107e+02, threshold=8.558e+02, percent-clipped=4.0 2023-10-09 14:26:00,737 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2783489.3333333335, ans=0.2 2023-10-09 14:26:11,081 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2783536.0, ans=0.2 2023-10-09 14:26:23,053 INFO [train.py:1031] (3/4) Epoch 14, batch 11750, loss[loss=0.2122, simple_loss=0.2594, pruned_loss=0.0614, ctc_loss=0.1057, over 16786.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2855, pruned_loss=0.06601, ctc_loss=0.1152, over 3297864.74 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:26:30,515 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2783582.6666666665, ans=0.125 2023-10-09 14:26:36,666 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2783629.3333333335, ans=0.125 2023-10-09 14:26:36,696 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2783629.3333333335, ans=0.1 2023-10-09 14:26:46,454 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2783676.0, ans=0.125 2023-10-09 14:26:49,260 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2783676.0, ans=0.1 2023-10-09 14:27:14,186 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2783769.3333333335, ans=0.125 2023-10-09 14:27:14,249 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2783769.3333333335, ans=0.0 2023-10-09 14:27:24,342 INFO [train.py:1031] (3/4) Epoch 14, batch 11800, loss[loss=0.2416, simple_loss=0.2875, pruned_loss=0.07219, ctc_loss=0.128, over 16555.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2788, pruned_loss=0.06433, ctc_loss=0.1121, over 3307304.79 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:27:24,760 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2783816.0, ans=0.0 2023-10-09 14:28:03,481 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+02 3.032e+02 3.578e+02 4.296e+02 8.317e+02, threshold=7.156e+02, percent-clipped=0.0 2023-10-09 14:28:07,772 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2783956.0, ans=0.125 2023-10-09 14:28:15,035 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2783956.0, ans=0.0 2023-10-09 14:28:21,152 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2784002.6666666665, ans=0.0 2023-10-09 14:28:29,807 INFO [train.py:1031] (3/4) Epoch 14, batch 11850, loss[loss=0.248, simple_loss=0.3657, pruned_loss=0.04705, ctc_loss=0.09043, over 15081.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2844, pruned_loss=0.06407, ctc_loss=0.1124, over 3303564.00 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:28:59,735 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2784142.6666666665, ans=0.125 2023-10-09 14:29:15,822 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2784189.3333333335, ans=0.0 2023-10-09 14:29:16,955 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784189.3333333335, ans=0.1 2023-10-09 14:29:27,798 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=22.5 2023-10-09 14:29:33,115 INFO [train.py:1031] (3/4) Epoch 14, batch 11900, loss[loss=0.2038, simple_loss=0.2981, pruned_loss=0.0395, ctc_loss=0.07594, over 16284.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2882, pruned_loss=0.06324, ctc_loss=0.1113, over 3302095.94 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:29:46,578 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2784329.3333333335, ans=0.125 2023-10-09 14:29:46,880 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-10-09 14:30:14,076 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.208e+02 3.772e+02 4.590e+02 1.035e+03, threshold=7.543e+02, percent-clipped=4.0 2023-10-09 14:30:25,837 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2784469.3333333335, ans=0.125 2023-10-09 14:30:36,486 INFO [train.py:1031] (3/4) Epoch 14, batch 11950, loss[loss=0.2402, simple_loss=0.3037, pruned_loss=0.0668, ctc_loss=0.1078, over 12666.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2908, pruned_loss=0.06537, ctc_loss=0.1144, over 3291785.59 frames. ], batch size: 38, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:30:50,520 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2784562.6666666665, ans=0.125 2023-10-09 14:31:00,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2784562.6666666665, ans=0.2 2023-10-09 14:31:01,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2784609.3333333335, ans=0.125 2023-10-09 14:31:17,822 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-10-09 14:31:28,016 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2784702.6666666665, ans=0.2 2023-10-09 14:31:40,164 INFO [train.py:1031] (3/4) Epoch 14, batch 12000, loss[loss=0.2142, simple_loss=0.2729, pruned_loss=0.05883, ctc_loss=0.09448, over 16811.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2942, pruned_loss=0.06587, ctc_loss=0.1156, over 3291855.59 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:31:40,164 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 14:31:54,599 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2358, simple_loss=0.3055, pruned_loss=0.064, ctc_loss=0.09509, over 1796401.00 frames. 2023-10-09 14:31:54,600 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 14:31:59,459 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-10-09 14:32:09,207 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2784796.0, ans=0.0 2023-10-09 14:32:13,780 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2784796.0, ans=0.125 2023-10-09 14:32:25,414 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784842.6666666665, ans=0.1 2023-10-09 14:32:30,677 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784842.6666666665, ans=0.1 2023-10-09 14:32:35,878 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.57 vs. limit=22.5 2023-10-09 14:32:36,798 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 3.474e+02 4.193e+02 5.077e+02 1.283e+03, threshold=8.386e+02, percent-clipped=9.0 2023-10-09 14:32:57,429 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2784936.0, ans=10.0 2023-10-09 14:33:00,830 INFO [train.py:1031] (3/4) Epoch 14, batch 12050, loss[loss=0.2287, simple_loss=0.2855, pruned_loss=0.06507, ctc_loss=0.1044, over 16851.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2987, pruned_loss=0.06728, ctc_loss=0.117, over 3281295.68 frames. ], batch size: 215, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:33:09,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2784982.6666666665, ans=0.0 2023-10-09 14:33:15,291 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2785029.3333333335, ans=0.125 2023-10-09 14:33:21,140 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2785029.3333333335, ans=0.0 2023-10-09 14:33:34,975 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2023-10-09 14:33:50,529 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2785169.3333333335, ans=0.0 2023-10-09 14:33:56,450 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2785169.3333333335, ans=0.2 2023-10-09 14:33:57,457 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2785169.3333333335, ans=0.125 2023-10-09 14:34:03,693 INFO [train.py:1031] (3/4) Epoch 14, batch 12100, loss[loss=0.2288, simple_loss=0.28, pruned_loss=0.06536, ctc_loss=0.117, over 16405.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2972, pruned_loss=0.06695, ctc_loss=0.1162, over 3275038.88 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:34:04,071 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2785216.0, ans=0.0 2023-10-09 14:34:45,439 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+02 3.559e+02 4.280e+02 5.187e+02 9.097e+02, threshold=8.560e+02, percent-clipped=2.0 2023-10-09 14:35:03,360 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2785402.6666666665, ans=0.2 2023-10-09 14:35:06,632 INFO [train.py:1031] (3/4) Epoch 14, batch 12150, loss[loss=0.207, simple_loss=0.2777, pruned_loss=0.05058, ctc_loss=0.08781, over 16801.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2952, pruned_loss=0.06695, ctc_loss=0.1166, over 3276203.80 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:35:11,223 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2785449.3333333335, ans=0.125 2023-10-09 14:35:22,805 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2785496.0, ans=0.1 2023-10-09 14:35:27,061 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2785496.0, ans=0.1 2023-10-09 14:35:46,968 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2785589.3333333335, ans=0.2 2023-10-09 14:36:03,049 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2785636.0, ans=0.0 2023-10-09 14:36:09,748 INFO [train.py:1031] (3/4) Epoch 14, batch 12200, loss[loss=0.3173, simple_loss=0.3821, pruned_loss=0.09187, ctc_loss=0.172, over 16623.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.3017, pruned_loss=0.06655, ctc_loss=0.1177, over 3274402.87 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:36:36,308 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2785776.0, ans=0.125 2023-10-09 14:36:41,393 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2785776.0, ans=0.0 2023-10-09 14:36:45,660 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:36:52,410 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.614e+02 4.622e+02 6.130e+02 1.347e+03, threshold=9.244e+02, percent-clipped=11.0 2023-10-09 14:37:12,094 INFO [train.py:1031] (3/4) Epoch 14, batch 12250, loss[loss=0.202, simple_loss=0.2586, pruned_loss=0.05324, ctc_loss=0.0974, over 16788.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2958, pruned_loss=0.06459, ctc_loss=0.115, over 3289362.34 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:37:28,026 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2785962.6666666665, ans=0.125 2023-10-09 14:37:34,574 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2786009.3333333335, ans=0.125 2023-10-09 14:37:51,259 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2786056.0, ans=0.125 2023-10-09 14:38:07,079 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:38:12,140 INFO [train.py:1031] (3/4) Epoch 14, batch 12300, loss[loss=0.2103, simple_loss=0.2654, pruned_loss=0.05825, ctc_loss=0.09677, over 16915.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2864, pruned_loss=0.06352, ctc_loss=0.1127, over 3287378.44 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:38:39,958 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2786242.6666666665, ans=0.1 2023-10-09 14:38:47,481 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:38:47,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2786289.3333333335, ans=0.125 2023-10-09 14:38:50,187 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2786289.3333333335, ans=0.1 2023-10-09 14:38:55,170 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+02 3.070e+02 3.744e+02 4.897e+02 1.313e+03, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 14:38:56,143 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2786289.3333333335, ans=0.2 2023-10-09 14:39:02,445 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2786336.0, ans=0.0 2023-10-09 14:39:05,447 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-10-09 14:39:08,282 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2786336.0, ans=0.125 2023-10-09 14:39:09,760 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2023-10-09 14:39:13,379 INFO [train.py:1031] (3/4) Epoch 14, batch 12350, loss[loss=0.1681, simple_loss=0.2385, pruned_loss=0.03536, ctc_loss=0.06732, over 16752.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.29, pruned_loss=0.06365, ctc_loss=0.1127, over 3297154.11 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:39:22,298 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2786382.6666666665, ans=0.09899494936611666 2023-10-09 14:39:33,124 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2786429.3333333335, ans=0.07 2023-10-09 14:39:33,483 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2786429.3333333335, ans=15.0 2023-10-09 14:39:51,766 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2023-10-09 14:40:03,265 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2786569.3333333335, ans=0.2 2023-10-09 14:40:07,108 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2786569.3333333335, ans=0.125 2023-10-09 14:40:12,564 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2786569.3333333335, ans=0.07 2023-10-09 14:40:14,932 INFO [train.py:1031] (3/4) Epoch 14, batch 12400, loss[loss=0.2155, simple_loss=0.2769, pruned_loss=0.0574, ctc_loss=0.09818, over 16633.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2879, pruned_loss=0.06134, ctc_loss=0.1094, over 3291459.70 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:40:24,968 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2786616.0, ans=0.125 2023-10-09 14:40:43,850 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2786709.3333333335, ans=0.0 2023-10-09 14:41:00,376 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.233e+02 3.612e+02 4.098e+02 6.929e+02, threshold=7.223e+02, percent-clipped=0.0 2023-10-09 14:41:01,879 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2786756.0, ans=0.0 2023-10-09 14:41:16,829 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2786849.3333333335, ans=0.0 2023-10-09 14:41:17,630 INFO [train.py:1031] (3/4) Epoch 14, batch 12450, loss[loss=0.2443, simple_loss=0.3202, pruned_loss=0.06088, ctc_loss=0.1164, over 16224.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2882, pruned_loss=0.06139, ctc_loss=0.1096, over 3299643.28 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:41:19,283 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2023-10-09 14:41:19,320 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.64 vs. limit=6.0 2023-10-09 14:41:22,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2786849.3333333335, ans=0.125 2023-10-09 14:41:39,202 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2786896.0, ans=0.0 2023-10-09 14:41:48,439 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-10-09 14:41:59,476 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2786989.3333333335, ans=0.125 2023-10-09 14:42:09,806 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2023-10-09 14:42:19,505 INFO [train.py:1031] (3/4) Epoch 14, batch 12500, loss[loss=0.1871, simple_loss=0.2431, pruned_loss=0.04798, ctc_loss=0.08762, over 16754.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2862, pruned_loss=0.05971, ctc_loss=0.1067, over 3306453.49 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:42:21,396 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2787082.6666666665, ans=0.125 2023-10-09 14:42:43,913 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2787176.0, ans=0.0 2023-10-09 14:42:43,977 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2787176.0, ans=0.0 2023-10-09 14:42:50,982 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2787176.0, ans=0.125 2023-10-09 14:42:52,651 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2787176.0, ans=0.0 2023-10-09 14:43:00,000 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2787222.6666666665, ans=0.125 2023-10-09 14:43:03,920 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2787222.6666666665, ans=0.125 2023-10-09 14:43:05,062 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2787222.6666666665, ans=0.125 2023-10-09 14:43:07,996 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.957e+02 3.304e+02 4.556e+02 8.176e+02, threshold=6.608e+02, percent-clipped=1.0 2023-10-09 14:43:23,454 INFO [train.py:1031] (3/4) Epoch 14, batch 12550, loss[loss=0.2262, simple_loss=0.3045, pruned_loss=0.05152, ctc_loss=0.1123, over 16578.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2845, pruned_loss=0.05792, ctc_loss=0.1041, over 3294925.21 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:43:54,172 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-10-09 14:43:58,224 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2787456.0, ans=0.05 2023-10-09 14:44:00,736 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-10-09 14:44:03,601 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2787456.0, ans=0.125 2023-10-09 14:44:23,450 INFO [train.py:1031] (3/4) Epoch 14, batch 12600, loss[loss=0.213, simple_loss=0.2735, pruned_loss=0.0564, ctc_loss=0.09909, over 16778.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2818, pruned_loss=0.05535, ctc_loss=0.09977, over 3294331.09 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:44:31,433 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2787549.3333333335, ans=0.0 2023-10-09 14:44:32,371 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2787549.3333333335, ans=0.1 2023-10-09 14:44:43,843 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2023-10-09 14:44:49,980 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2787642.6666666665, ans=0.1 2023-10-09 14:44:50,015 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2787642.6666666665, ans=0.125 2023-10-09 14:45:01,723 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:45:11,285 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 3.144e+02 3.491e+02 4.128e+02 9.398e+02, threshold=6.982e+02, percent-clipped=1.0 2023-10-09 14:45:24,787 INFO [train.py:1031] (3/4) Epoch 14, batch 12650, loss[loss=0.2208, simple_loss=0.2737, pruned_loss=0.06346, ctc_loss=0.1023, over 16739.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2816, pruned_loss=0.05842, ctc_loss=0.1044, over 3297151.35 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:45:49,053 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2787876.0, ans=0.1 2023-10-09 14:46:10,863 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2787922.6666666665, ans=0.2 2023-10-09 14:46:23,865 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2787969.3333333335, ans=0.125 2023-10-09 14:46:26,368 INFO [train.py:1031] (3/4) Epoch 14, batch 12700, loss[loss=0.203, simple_loss=0.2583, pruned_loss=0.05483, ctc_loss=0.09481, over 16771.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2769, pruned_loss=0.05973, ctc_loss=0.106, over 3294454.08 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:46:54,541 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2788109.3333333335, ans=0.125 2023-10-09 14:47:01,396 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2788109.3333333335, ans=0.1 2023-10-09 14:47:05,839 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2788156.0, ans=10.0 2023-10-09 14:47:15,779 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.456e+02 3.994e+02 4.833e+02 1.526e+03, threshold=7.989e+02, percent-clipped=4.0 2023-10-09 14:47:27,068 INFO [train.py:1031] (3/4) Epoch 14, batch 12750, loss[loss=0.3757, simple_loss=0.3906, pruned_loss=0.133, ctc_loss=0.2367, over 16680.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2775, pruned_loss=0.06234, ctc_loss=0.1099, over 3298276.32 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:47:45,540 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2788296.0, ans=0.125 2023-10-09 14:47:55,825 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2788342.6666666665, ans=0.0 2023-10-09 14:48:07,750 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2788389.3333333335, ans=0.0 2023-10-09 14:48:20,230 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=22.5 2023-10-09 14:48:21,976 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2788436.0, ans=0.0 2023-10-09 14:48:29,382 INFO [train.py:1031] (3/4) Epoch 14, batch 12800, loss[loss=0.1868, simple_loss=0.2499, pruned_loss=0.0464, ctc_loss=0.07715, over 16814.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.286, pruned_loss=0.06392, ctc_loss=0.1132, over 3285389.76 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:48:35,602 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2788482.6666666665, ans=0.1 2023-10-09 14:48:36,513 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2788482.6666666665, ans=0.125 2023-10-09 14:48:43,747 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2788529.3333333335, ans=0.2 2023-10-09 14:48:52,104 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2788576.0, ans=0.125 2023-10-09 14:49:08,996 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2788622.6666666665, ans=0.5 2023-10-09 14:49:13,073 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2788622.6666666665, ans=0.0 2023-10-09 14:49:18,037 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+02 3.551e+02 3.934e+02 4.932e+02 8.018e+02, threshold=7.868e+02, percent-clipped=1.0 2023-10-09 14:49:30,712 INFO [train.py:1031] (3/4) Epoch 14, batch 12850, loss[loss=0.3011, simple_loss=0.3469, pruned_loss=0.09356, ctc_loss=0.1704, over 16750.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2914, pruned_loss=0.0657, ctc_loss=0.1159, over 3289403.23 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:49:36,873 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2788716.0, ans=0.125 2023-10-09 14:49:56,971 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2788809.3333333335, ans=0.125 2023-10-09 14:50:02,546 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2023-10-09 14:50:04,542 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2788809.3333333335, ans=0.125 2023-10-09 14:50:32,958 INFO [train.py:1031] (3/4) Epoch 14, batch 12900, loss[loss=0.27, simple_loss=0.3684, pruned_loss=0.06231, ctc_loss=0.1176, over 16214.00 frames. ], tot_loss[loss=0.2405, simple_loss=0.2972, pruned_loss=0.06795, ctc_loss=0.1199, over 3290667.56 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:50:43,098 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2788949.3333333335, ans=0.1 2023-10-09 14:50:48,277 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2788996.0, ans=0.0 2023-10-09 14:50:55,932 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2788996.0, ans=0.0 2023-10-09 14:50:58,070 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2789042.6666666665, ans=0.04949747468305833 2023-10-09 14:51:02,333 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2789042.6666666665, ans=0.125 2023-10-09 14:51:03,380 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2789042.6666666665, ans=0.125 2023-10-09 14:51:03,436 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2789042.6666666665, ans=0.125 2023-10-09 14:51:05,035 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:51:26,855 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+02 3.447e+02 3.800e+02 4.409e+02 9.438e+02, threshold=7.600e+02, percent-clipped=3.0 2023-10-09 14:51:27,146 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2789136.0, ans=0.125 2023-10-09 14:51:30,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2789136.0, ans=0.2 2023-10-09 14:51:31,834 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-10-09 14:51:35,867 INFO [train.py:1031] (3/4) Epoch 14, batch 12950, loss[loss=0.162, simple_loss=0.2367, pruned_loss=0.03225, ctc_loss=0.0569, over 16788.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2961, pruned_loss=0.06456, ctc_loss=0.1145, over 3287332.64 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:51:55,894 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2789229.3333333335, ans=0.1 2023-10-09 14:52:24,012 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-10-09 14:52:26,815 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2789369.3333333335, ans=0.125 2023-10-09 14:52:26,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2789369.3333333335, ans=0.2 2023-10-09 14:52:36,320 INFO [train.py:1031] (3/4) Epoch 14, batch 13000, loss[loss=0.2043, simple_loss=0.2579, pruned_loss=0.05497, ctc_loss=0.1019, over 15336.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2879, pruned_loss=0.06169, ctc_loss=0.1093, over 3284887.92 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:52:44,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2789416.0, ans=0.0 2023-10-09 14:52:59,158 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2789509.3333333335, ans=0.0 2023-10-09 14:53:25,744 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2789602.6666666665, ans=0.125 2023-10-09 14:53:28,013 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.825e+02 3.282e+02 3.972e+02 1.143e+03, threshold=6.563e+02, percent-clipped=1.0 2023-10-09 14:53:36,422 INFO [train.py:1031] (3/4) Epoch 14, batch 13050, loss[loss=0.1945, simple_loss=0.2448, pruned_loss=0.05359, ctc_loss=0.09265, over 16827.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.281, pruned_loss=0.06134, ctc_loss=0.1085, over 3289675.52 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:54:23,053 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2789789.3333333335, ans=0.0 2023-10-09 14:54:37,357 INFO [train.py:1031] (3/4) Epoch 14, batch 13100, loss[loss=0.2216, simple_loss=0.2705, pruned_loss=0.06348, ctc_loss=0.1141, over 16855.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2799, pruned_loss=0.06245, ctc_loss=0.1101, over 3295609.15 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:54:43,917 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-10-09 14:54:58,690 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2789929.3333333335, ans=0.95 2023-10-09 14:55:14,419 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2023-10-09 14:55:19,427 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2790022.6666666665, ans=0.1 2023-10-09 14:55:32,623 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.251e+02 4.048e+02 5.157e+02 1.010e+03, threshold=8.097e+02, percent-clipped=11.0 2023-10-09 14:55:33,687 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:55:36,928 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2790069.3333333335, ans=0.0 2023-10-09 14:55:42,104 INFO [train.py:1031] (3/4) Epoch 14, batch 13150, loss[loss=0.2578, simple_loss=0.3182, pruned_loss=0.07184, ctc_loss=0.1345, over 15163.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2913, pruned_loss=0.06463, ctc_loss=0.1146, over 3285624.95 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:55:47,050 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2790116.0, ans=0.125 2023-10-09 14:56:19,231 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2790256.0, ans=0.125 2023-10-09 14:56:28,483 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.79 vs. limit=10.0 2023-10-09 14:56:36,742 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2790302.6666666665, ans=0.5 2023-10-09 14:56:45,813 INFO [train.py:1031] (3/4) Epoch 14, batch 13200, loss[loss=0.2431, simple_loss=0.2978, pruned_loss=0.06889, ctc_loss=0.1266, over 16231.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.2971, pruned_loss=0.06769, ctc_loss=0.1196, over 3289624.22 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:56:51,030 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790349.3333333335, ans=0.1 2023-10-09 14:56:52,516 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2023-10-09 14:56:57,895 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2790396.0, ans=0.1 2023-10-09 14:57:06,047 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2790396.0, ans=0.0 2023-10-09 14:57:19,813 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2790442.6666666665, ans=0.125 2023-10-09 14:57:29,941 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=22.5 2023-10-09 14:57:41,600 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+02 3.287e+02 3.760e+02 4.559e+02 7.411e+02, threshold=7.519e+02, percent-clipped=0.0 2023-10-09 14:57:48,113 INFO [train.py:1031] (3/4) Epoch 14, batch 13250, loss[loss=0.2306, simple_loss=0.2786, pruned_loss=0.06776, ctc_loss=0.1178, over 16681.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2977, pruned_loss=0.06628, ctc_loss=0.1173, over 3288018.42 frames. ], batch size: 271, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:57:59,935 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2790629.3333333335, ans=0.0 2023-10-09 14:58:11,992 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2790676.0, ans=0.07 2023-10-09 14:58:13,038 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:58:31,462 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2790722.6666666665, ans=0.125 2023-10-09 14:58:42,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790769.3333333335, ans=0.1 2023-10-09 14:58:49,203 INFO [train.py:1031] (3/4) Epoch 14, batch 13300, loss[loss=0.2097, simple_loss=0.2582, pruned_loss=0.06047, ctc_loss=0.1007, over 16690.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.289, pruned_loss=0.0652, ctc_loss=0.1151, over 3290018.32 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:58:58,243 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2790816.0, ans=0.2 2023-10-09 14:59:19,731 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-10-09 14:59:22,342 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2790909.3333333335, ans=0.0 2023-10-09 14:59:22,761 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-10-09 14:59:37,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2790956.0, ans=0.125 2023-10-09 14:59:47,825 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+02 3.361e+02 3.791e+02 4.833e+02 1.183e+03, threshold=7.583e+02, percent-clipped=5.0 2023-10-09 14:59:52,780 INFO [train.py:1031] (3/4) Epoch 14, batch 13350, loss[loss=0.2709, simple_loss=0.3342, pruned_loss=0.07535, ctc_loss=0.1421, over 16537.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2911, pruned_loss=0.06424, ctc_loss=0.1139, over 3293432.75 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:00:02,307 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2791049.3333333335, ans=0.125 2023-10-09 15:00:03,256 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2791049.3333333335, ans=0.0 2023-10-09 15:00:19,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2791142.6666666665, ans=0.125 2023-10-09 15:00:20,898 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=22.5 2023-10-09 15:00:26,480 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2791142.6666666665, ans=0.2 2023-10-09 15:00:44,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2791236.0, ans=0.125 2023-10-09 15:00:45,377 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2791236.0, ans=0.1 2023-10-09 15:00:55,813 INFO [train.py:1031] (3/4) Epoch 14, batch 13400, loss[loss=0.267, simple_loss=0.3663, pruned_loss=0.0637, ctc_loss=0.1009, over 15108.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2954, pruned_loss=0.06548, ctc_loss=0.1149, over 3295570.55 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:00:56,195 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2791282.6666666665, ans=0.125 2023-10-09 15:01:01,812 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2791282.6666666665, ans=0.0 2023-10-09 15:01:11,384 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2791329.3333333335, ans=0.2 2023-10-09 15:01:38,875 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2791422.6666666665, ans=0.125 2023-10-09 15:01:50,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2791469.3333333335, ans=0.0 2023-10-09 15:01:55,324 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.467e+02 4.135e+02 5.221e+02 9.023e+02, threshold=8.270e+02, percent-clipped=2.0 2023-10-09 15:01:57,438 INFO [train.py:1031] (3/4) Epoch 14, batch 13450, loss[loss=0.2304, simple_loss=0.2603, pruned_loss=0.07413, ctc_loss=0.1306, over 16346.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2881, pruned_loss=0.06514, ctc_loss=0.1138, over 3303012.40 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:01:59,604 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2791516.0, ans=0.0 2023-10-09 15:02:12,345 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791562.6666666665, ans=0.1 2023-10-09 15:02:37,707 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2791656.0, ans=0.125 2023-10-09 15:02:52,647 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2791702.6666666665, ans=0.125 2023-10-09 15:02:55,570 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:02:55,585 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2791702.6666666665, ans=0.0 2023-10-09 15:02:57,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2791702.6666666665, ans=0.125 2023-10-09 15:02:59,437 INFO [train.py:1031] (3/4) Epoch 14, batch 13500, loss[loss=0.2256, simple_loss=0.2986, pruned_loss=0.05622, ctc_loss=0.1003, over 16398.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2832, pruned_loss=0.0627, ctc_loss=0.1099, over 3306274.24 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:03:04,115 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2791749.3333333335, ans=0.0 2023-10-09 15:03:06,753 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2791749.3333333335, ans=0.07 2023-10-09 15:03:06,812 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2791749.3333333335, ans=0.0 2023-10-09 15:03:21,258 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2791796.0, ans=0.125 2023-10-09 15:03:26,414 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2791842.6666666665, ans=0.0 2023-10-09 15:03:43,332 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2791889.3333333335, ans=0.2 2023-10-09 15:03:51,399 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2791936.0, ans=0.0 2023-10-09 15:04:01,770 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.985e+02 3.501e+02 4.657e+02 8.617e+02, threshold=7.002e+02, percent-clipped=1.0 2023-10-09 15:04:01,802 INFO [train.py:1031] (3/4) Epoch 14, batch 13550, loss[loss=0.2278, simple_loss=0.2814, pruned_loss=0.06469, ctc_loss=0.112, over 16779.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2829, pruned_loss=0.06222, ctc_loss=0.1092, over 3301108.23 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 0.5 2023-10-09 15:04:40,168 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792122.6666666665, ans=0.1 2023-10-09 15:04:54,338 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:05:02,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2792169.3333333335, ans=0.125 2023-10-09 15:05:05,326 INFO [train.py:1031] (3/4) Epoch 14, batch 13600, loss[loss=0.231, simple_loss=0.2952, pruned_loss=0.06115, ctc_loss=0.1113, over 15254.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2888, pruned_loss=0.06502, ctc_loss=0.1139, over 3307377.38 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:05:26,837 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-10-09 15:05:42,787 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=2792356.0, ans=15.0 2023-10-09 15:06:04,485 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2792402.6666666665, ans=0.07 2023-10-09 15:06:08,501 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 3.317e+02 4.234e+02 5.653e+02 1.556e+03, threshold=8.468e+02, percent-clipped=11.0 2023-10-09 15:06:08,528 INFO [train.py:1031] (3/4) Epoch 14, batch 13650, loss[loss=0.2363, simple_loss=0.3194, pruned_loss=0.05553, ctc_loss=0.1053, over 16844.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2902, pruned_loss=0.06135, ctc_loss=0.1084, over 3298760.97 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:06:09,865 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2792449.3333333335, ans=0.0 2023-10-09 15:06:11,539 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2792449.3333333335, ans=0.125 2023-10-09 15:06:51,679 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2792589.3333333335, ans=0.0 2023-10-09 15:06:51,757 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2792589.3333333335, ans=0.125 2023-10-09 15:06:55,341 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-10-09 15:07:02,539 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792636.0, ans=0.1 2023-10-09 15:07:07,854 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2792636.0, ans=0.125 2023-10-09 15:07:11,279 INFO [train.py:1031] (3/4) Epoch 14, batch 13700, loss[loss=0.2274, simple_loss=0.2998, pruned_loss=0.05714, ctc_loss=0.102, over 16836.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2964, pruned_loss=0.06242, ctc_loss=0.1105, over 3301140.08 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:07:12,902 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.16 vs. limit=10.0 2023-10-09 15:07:20,561 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=22.5 2023-10-09 15:07:28,933 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2792729.3333333335, ans=0.0 2023-10-09 15:07:41,094 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792776.0, ans=0.1 2023-10-09 15:07:47,484 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2792776.0, ans=0.2 2023-10-09 15:07:55,153 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2792822.6666666665, ans=0.125 2023-10-09 15:08:15,041 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 3.067e+02 3.765e+02 4.506e+02 1.005e+03, threshold=7.530e+02, percent-clipped=2.0 2023-10-09 15:08:15,068 INFO [train.py:1031] (3/4) Epoch 14, batch 13750, loss[loss=0.2192, simple_loss=0.2913, pruned_loss=0.05403, ctc_loss=0.0975, over 16818.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.296, pruned_loss=0.06052, ctc_loss=0.1082, over 3299365.30 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:08:21,984 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2792916.0, ans=0.125 2023-10-09 15:08:23,690 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2792916.0, ans=0.1 2023-10-09 15:08:30,676 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2792962.6666666665, ans=0.09899494936611666 2023-10-09 15:08:44,001 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-10-09 15:09:03,854 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2793056.0, ans=0.0 2023-10-09 15:09:08,899 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2023-10-09 15:09:17,082 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2793149.3333333335, ans=0.2 2023-10-09 15:09:17,790 INFO [train.py:1031] (3/4) Epoch 14, batch 13800, loss[loss=0.2331, simple_loss=0.2875, pruned_loss=0.06621, ctc_loss=0.1159, over 16805.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.299, pruned_loss=0.06415, ctc_loss=0.1139, over 3292624.12 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:09:20,800 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2793149.3333333335, ans=0.0 2023-10-09 15:09:34,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2793196.0, ans=0.125 2023-10-09 15:09:37,523 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2793196.0, ans=0.0 2023-10-09 15:09:43,218 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.96 vs. limit=10.0 2023-10-09 15:09:48,267 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2793242.6666666665, ans=0.125 2023-10-09 15:10:16,887 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2023-10-09 15:10:19,965 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2793336.0, ans=0.5 2023-10-09 15:10:21,757 INFO [train.py:1031] (3/4) Epoch 14, batch 13850, loss[loss=0.1895, simple_loss=0.2357, pruned_loss=0.05355, ctc_loss=0.09048, over 16797.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2923, pruned_loss=0.06468, ctc_loss=0.1146, over 3294207.65 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:10:22,840 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.299e+02 3.702e+02 4.236e+02 7.153e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 15:10:25,220 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2793382.6666666665, ans=0.125 2023-10-09 15:10:41,883 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2793429.3333333335, ans=0.125 2023-10-09 15:10:43,341 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2023-10-09 15:10:45,777 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2793476.0, ans=0.0 2023-10-09 15:11:10,662 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2793522.6666666665, ans=0.1 2023-10-09 15:11:16,681 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2793569.3333333335, ans=0.125 2023-10-09 15:11:25,337 INFO [train.py:1031] (3/4) Epoch 14, batch 13900, loss[loss=0.1985, simple_loss=0.2564, pruned_loss=0.0516, ctc_loss=0.0932, over 16736.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2879, pruned_loss=0.06417, ctc_loss=0.1137, over 3292176.78 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:11:33,066 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2793616.0, ans=0.0 2023-10-09 15:11:46,175 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-10-09 15:12:18,083 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2793802.6666666665, ans=0.04949747468305833 2023-10-09 15:12:28,096 INFO [train.py:1031] (3/4) Epoch 14, batch 13950, loss[loss=0.2305, simple_loss=0.2845, pruned_loss=0.06652, ctc_loss=0.1088, over 16737.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2953, pruned_loss=0.06446, ctc_loss=0.1145, over 3294731.36 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:12:30,204 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+02 3.293e+02 3.736e+02 4.752e+02 8.901e+02, threshold=7.472e+02, percent-clipped=3.0 2023-10-09 15:13:08,574 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-10-09 15:13:11,192 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2793989.3333333335, ans=0.125 2023-10-09 15:13:17,627 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794036.0, ans=0.1 2023-10-09 15:13:17,680 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2794036.0, ans=0.0 2023-10-09 15:13:23,131 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-10-09 15:13:31,045 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2794082.6666666665, ans=0.2 2023-10-09 15:13:31,757 INFO [train.py:1031] (3/4) Epoch 14, batch 14000, loss[loss=0.2435, simple_loss=0.2847, pruned_loss=0.0741, ctc_loss=0.135, over 15274.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.299, pruned_loss=0.06643, ctc_loss=0.1176, over 3295883.67 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:13:35,452 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2794082.6666666665, ans=0.125 2023-10-09 15:13:45,329 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794129.3333333335, ans=0.1 2023-10-09 15:13:46,409 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794129.3333333335, ans=0.1 2023-10-09 15:14:17,326 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2794222.6666666665, ans=0.125 2023-10-09 15:14:31,533 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2794269.3333333335, ans=0.125 2023-10-09 15:14:31,558 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2794269.3333333335, ans=0.125 2023-10-09 15:14:34,471 INFO [train.py:1031] (3/4) Epoch 14, batch 14050, loss[loss=0.2007, simple_loss=0.2635, pruned_loss=0.05217, ctc_loss=0.08384, over 16943.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2954, pruned_loss=0.06513, ctc_loss=0.115, over 3300838.69 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:14:38,946 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+02 3.126e+02 3.568e+02 4.195e+02 6.339e+02, threshold=7.137e+02, percent-clipped=0.0 2023-10-09 15:14:40,780 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2023-10-09 15:14:45,456 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=22.5 2023-10-09 15:14:50,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2794362.6666666665, ans=0.0 2023-10-09 15:15:03,158 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2794409.3333333335, ans=0.125 2023-10-09 15:15:12,983 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2794456.0, ans=0.2 2023-10-09 15:15:37,072 INFO [train.py:1031] (3/4) Epoch 14, batch 14100, loss[loss=0.2144, simple_loss=0.2568, pruned_loss=0.06404, ctc_loss=0.1098, over 16757.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2863, pruned_loss=0.06415, ctc_loss=0.1131, over 3292938.49 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:15:51,190 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:15:56,933 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794596.0, ans=0.1 2023-10-09 15:15:57,031 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:15:57,987 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:16:00,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2794642.6666666665, ans=0.0 2023-10-09 15:16:01,040 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2794642.6666666665, ans=0.125 2023-10-09 15:16:10,767 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2794642.6666666665, ans=0.1 2023-10-09 15:16:12,001 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2023-10-09 15:16:16,170 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2794689.3333333335, ans=0.0 2023-10-09 15:16:17,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2794689.3333333335, ans=0.1 2023-10-09 15:16:18,924 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794689.3333333335, ans=0.1 2023-10-09 15:16:26,344 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-10-09 15:16:27,229 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2794736.0, ans=0.125 2023-10-09 15:16:37,983 INFO [train.py:1031] (3/4) Epoch 14, batch 14150, loss[loss=0.2089, simple_loss=0.2606, pruned_loss=0.05934, ctc_loss=0.09639, over 16931.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2787, pruned_loss=0.06377, ctc_loss=0.1121, over 3285730.89 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:16:44,072 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.061e+02 3.515e+02 4.416e+02 9.283e+02, threshold=7.030e+02, percent-clipped=2.0 2023-10-09 15:16:53,496 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2794829.3333333335, ans=0.1 2023-10-09 15:17:19,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2794922.6666666665, ans=0.0 2023-10-09 15:17:20,531 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2794922.6666666665, ans=0.0 2023-10-09 15:17:23,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2794922.6666666665, ans=0.09899494936611666 2023-10-09 15:17:36,484 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2794969.3333333335, ans=0.0 2023-10-09 15:17:39,323 INFO [train.py:1031] (3/4) Epoch 14, batch 14200, loss[loss=0.2351, simple_loss=0.2839, pruned_loss=0.07086, ctc_loss=0.1113, over 16541.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2746, pruned_loss=0.06207, ctc_loss=0.1094, over 3288018.28 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:17:41,400 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2795016.0, ans=0.125 2023-10-09 15:18:01,045 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2023-10-09 15:18:10,604 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:18:20,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2795156.0, ans=0.125 2023-10-09 15:18:33,728 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2795202.6666666665, ans=0.0 2023-10-09 15:18:43,117 INFO [train.py:1031] (3/4) Epoch 14, batch 14250, loss[loss=0.2984, simple_loss=0.3592, pruned_loss=0.08931, ctc_loss=0.1473, over 16979.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2809, pruned_loss=0.06459, ctc_loss=0.1136, over 3285942.08 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:18:46,247 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2795249.3333333335, ans=0.2 2023-10-09 15:18:49,204 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.880e+02 3.480e+02 3.927e+02 7.059e+02, threshold=6.960e+02, percent-clipped=1.0 2023-10-09 15:18:56,515 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2795296.0, ans=0.125 2023-10-09 15:19:05,923 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2795296.0, ans=0.125 2023-10-09 15:19:09,538 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-10-09 15:19:13,694 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2023-10-09 15:19:25,887 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2795389.3333333335, ans=0.125 2023-10-09 15:19:32,708 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2795436.0, ans=0.125 2023-10-09 15:19:44,954 INFO [train.py:1031] (3/4) Epoch 14, batch 14300, loss[loss=0.2516, simple_loss=0.2981, pruned_loss=0.0768, ctc_loss=0.1286, over 16770.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2855, pruned_loss=0.06632, ctc_loss=0.1164, over 3287648.17 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 15:19:57,125 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795529.3333333335, ans=0.1 2023-10-09 15:20:01,574 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-10-09 15:20:12,478 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2795576.0, ans=0.125 2023-10-09 15:20:16,407 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.68 vs. limit=15.0 2023-10-09 15:20:35,173 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2795669.3333333335, ans=0.1 2023-10-09 15:20:47,322 INFO [train.py:1031] (3/4) Epoch 14, batch 14350, loss[loss=0.206, simple_loss=0.2634, pruned_loss=0.05423, ctc_loss=0.1002, over 16848.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2837, pruned_loss=0.0662, ctc_loss=0.116, over 3295886.05 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:20:51,462 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2795716.0, ans=0.125 2023-10-09 15:20:53,862 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.121e+02 3.540e+02 4.017e+02 5.602e+02, threshold=7.080e+02, percent-clipped=0.0 2023-10-09 15:21:04,037 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2795762.6666666665, ans=0.0 2023-10-09 15:21:31,688 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2795856.0, ans=0.2 2023-10-09 15:21:46,005 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2023-10-09 15:21:48,999 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2795949.3333333335, ans=0.0 2023-10-09 15:21:50,348 INFO [train.py:1031] (3/4) Epoch 14, batch 14400, loss[loss=0.1982, simple_loss=0.2562, pruned_loss=0.05186, ctc_loss=0.09134, over 16726.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2824, pruned_loss=0.06481, ctc_loss=0.1141, over 3299791.01 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:22:01,616 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2795996.0, ans=0.125 2023-10-09 15:22:03,043 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-10-09 15:22:09,344 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2795996.0, ans=0.0 2023-10-09 15:22:24,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2796042.6666666665, ans=0.125 2023-10-09 15:22:40,080 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2796136.0, ans=0.125 2023-10-09 15:22:45,948 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2796136.0, ans=0.125 2023-10-09 15:22:53,856 INFO [train.py:1031] (3/4) Epoch 14, batch 14450, loss[loss=0.3092, simple_loss=0.3537, pruned_loss=0.09613, ctc_loss=0.1812, over 16668.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2874, pruned_loss=0.0664, ctc_loss=0.1169, over 3307469.72 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:23:00,761 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+02 3.304e+02 3.703e+02 4.462e+02 6.927e+02, threshold=7.405e+02, percent-clipped=0.0 2023-10-09 15:23:26,261 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2796276.0, ans=0.125 2023-10-09 15:23:54,675 INFO [train.py:1031] (3/4) Epoch 14, batch 14500, loss[loss=0.2225, simple_loss=0.2638, pruned_loss=0.06729, ctc_loss=0.1165, over 16731.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2859, pruned_loss=0.06502, ctc_loss=0.1144, over 3296146.82 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:24:05,633 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2796462.6666666665, ans=0.0 2023-10-09 15:24:18,813 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2796509.3333333335, ans=0.2 2023-10-09 15:24:34,004 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2796556.0, ans=0.0 2023-10-09 15:24:39,344 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2796556.0, ans=0.125 2023-10-09 15:24:56,623 INFO [train.py:1031] (3/4) Epoch 14, batch 14550, loss[loss=0.2092, simple_loss=0.2606, pruned_loss=0.05994, ctc_loss=0.09483, over 17072.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2784, pruned_loss=0.06334, ctc_loss=0.1112, over 3281708.86 frames. ], batch size: 87, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:24:56,981 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2796649.3333333335, ans=0.1 2023-10-09 15:24:56,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2796649.3333333335, ans=0.125 2023-10-09 15:25:05,844 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+02 3.180e+02 3.818e+02 4.474e+02 1.185e+03, threshold=7.637e+02, percent-clipped=2.0 2023-10-09 15:25:18,175 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:25:24,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2796742.6666666665, ans=0.125 2023-10-09 15:25:33,459 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2796789.3333333335, ans=0.0 2023-10-09 15:25:56,523 INFO [train.py:1031] (3/4) Epoch 14, batch 14600, loss[loss=0.2355, simple_loss=0.3028, pruned_loss=0.06134, ctc_loss=0.1138, over 16949.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2801, pruned_loss=0.0637, ctc_loss=0.1114, over 3283408.95 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:26:14,493 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2796929.3333333335, ans=0.5 2023-10-09 15:26:24,315 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2796976.0, ans=0.2 2023-10-09 15:26:56,326 INFO [train.py:1031] (3/4) Epoch 14, batch 14650, loss[loss=0.2018, simple_loss=0.24, pruned_loss=0.05839, ctc_loss=0.1171, over 15479.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2804, pruned_loss=0.06382, ctc_loss=0.1117, over 3271602.03 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:27:03,931 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2797116.0, ans=0.1 2023-10-09 15:27:05,750 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.040e+02 3.470e+02 3.930e+02 6.552e+02, threshold=6.941e+02, percent-clipped=0.0 2023-10-09 15:27:13,088 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2797162.6666666665, ans=0.025 2023-10-09 15:27:25,671 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2797209.3333333335, ans=0.125 2023-10-09 15:27:57,842 INFO [train.py:1031] (3/4) Epoch 14, batch 14700, loss[loss=0.2232, simple_loss=0.2672, pruned_loss=0.06474, ctc_loss=0.1242, over 16810.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2772, pruned_loss=0.06382, ctc_loss=0.1118, over 3275056.81 frames. ], batch size: 310, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:28:01,910 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-10-09 15:28:25,307 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797442.6666666665, ans=0.1 2023-10-09 15:28:25,347 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2797442.6666666665, ans=0.125 2023-10-09 15:28:31,785 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2797442.6666666665, ans=0.125 2023-10-09 15:28:56,867 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=12.0 2023-10-09 15:28:59,994 INFO [train.py:1031] (3/4) Epoch 14, batch 14750, loss[loss=0.2471, simple_loss=0.2955, pruned_loss=0.07237, ctc_loss=0.1348, over 16537.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2729, pruned_loss=0.06338, ctc_loss=0.1112, over 3282563.16 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:29:04,247 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2797582.6666666665, ans=0.1 2023-10-09 15:29:09,382 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-10-09 15:29:11,861 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 3.078e+02 3.394e+02 3.991e+02 6.777e+02, threshold=6.787e+02, percent-clipped=0.0 2023-10-09 15:29:16,164 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2797629.3333333335, ans=0.125 2023-10-09 15:29:38,963 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-10-09 15:29:50,050 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2797769.3333333335, ans=0.125 2023-10-09 15:30:01,444 INFO [train.py:1031] (3/4) Epoch 14, batch 14800, loss[loss=0.2245, simple_loss=0.2745, pruned_loss=0.06478, ctc_loss=0.1124, over 16776.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2777, pruned_loss=0.06489, ctc_loss=0.1137, over 3286350.41 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:30:02,440 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797816.0, ans=0.1 2023-10-09 15:30:05,416 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2023-10-09 15:30:15,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2797862.6666666665, ans=0.0 2023-10-09 15:30:22,366 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2797862.6666666665, ans=0.125 2023-10-09 15:30:23,722 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-10-09 15:30:39,055 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-10-09 15:31:05,099 INFO [train.py:1031] (3/4) Epoch 14, batch 14850, loss[loss=0.2061, simple_loss=0.2514, pruned_loss=0.06041, ctc_loss=0.09982, over 16758.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2785, pruned_loss=0.0657, ctc_loss=0.1146, over 3275732.18 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:31:14,449 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2798049.3333333335, ans=0.1 2023-10-09 15:31:16,850 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.615e+02 3.104e+02 3.584e+02 4.093e+02 5.889e+02, threshold=7.167e+02, percent-clipped=0.0 2023-10-09 15:31:17,416 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=22.5 2023-10-09 15:31:20,536 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2798096.0, ans=0.125 2023-10-09 15:31:26,589 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2798096.0, ans=0.125 2023-10-09 15:31:30,041 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:31:50,630 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2798189.3333333335, ans=0.2 2023-10-09 15:31:57,230 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2798236.0, ans=0.125 2023-10-09 15:31:57,662 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2798236.0, ans=15.0 2023-10-09 15:31:58,356 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2798236.0, ans=0.125 2023-10-09 15:32:04,382 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2798236.0, ans=0.125 2023-10-09 15:32:08,281 INFO [train.py:1031] (3/4) Epoch 14, batch 14900, loss[loss=0.1732, simple_loss=0.2435, pruned_loss=0.03717, ctc_loss=0.07148, over 16869.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2743, pruned_loss=0.06429, ctc_loss=0.1124, over 3284141.49 frames. ], batch size: 243, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:32:18,862 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.72 vs. limit=22.5 2023-10-09 15:32:38,687 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-10-09 15:32:44,233 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2798376.0, ans=0.125 2023-10-09 15:32:44,254 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2798376.0, ans=0.0 2023-10-09 15:33:11,274 INFO [train.py:1031] (3/4) Epoch 14, batch 14950, loss[loss=0.2721, simple_loss=0.3153, pruned_loss=0.08422, ctc_loss=0.151, over 16530.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2747, pruned_loss=0.06364, ctc_loss=0.1119, over 3287459.20 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:33:13,742 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2798516.0, ans=0.0 2023-10-09 15:33:21,951 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2798516.0, ans=0.0 2023-10-09 15:33:25,938 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+02 3.074e+02 3.344e+02 3.882e+02 5.335e+02, threshold=6.688e+02, percent-clipped=0.0 2023-10-09 15:33:48,012 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.79 vs. limit=8.0 2023-10-09 15:34:12,323 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2798749.3333333335, ans=0.125 2023-10-09 15:34:13,090 INFO [train.py:1031] (3/4) Epoch 14, batch 15000, loss[loss=0.2171, simple_loss=0.2588, pruned_loss=0.06584, ctc_loss=0.1095, over 16528.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2776, pruned_loss=0.06444, ctc_loss=0.1135, over 3289231.77 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:34:13,090 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 15:34:26,145 INFO [zipformer.py:1853] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.2500, 2.0702, 4.3712, 2.3932], device='cuda:3') 2023-10-09 15:34:29,426 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2384, simple_loss=0.3088, pruned_loss=0.06452, ctc_loss=0.09761, over 1796401.00 frames. 2023-10-09 15:34:29,427 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 15:34:30,884 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2798749.3333333335, ans=10.0 2023-10-09 15:35:01,654 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-10-09 15:35:05,745 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2798842.6666666665, ans=0.125 2023-10-09 15:35:25,266 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-10-09 15:35:30,017 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2798936.0, ans=0.125 2023-10-09 15:35:32,342 INFO [train.py:1031] (3/4) Epoch 14, batch 15050, loss[loss=0.1803, simple_loss=0.2273, pruned_loss=0.04988, ctc_loss=0.08379, over 16714.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2739, pruned_loss=0.06155, ctc_loss=0.1084, over 3288938.37 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:35:34,910 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2023-10-09 15:35:49,237 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+02 3.126e+02 3.487e+02 4.278e+02 6.504e+02, threshold=6.973e+02, percent-clipped=0.0 2023-10-09 15:36:03,254 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2799076.0, ans=0.125 2023-10-09 15:36:06,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2799076.0, ans=0.125 2023-10-09 15:36:07,542 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2799076.0, ans=0.1 2023-10-09 15:36:14,535 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2799122.6666666665, ans=0.0 2023-10-09 15:36:17,375 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2799122.6666666665, ans=0.125 2023-10-09 15:36:35,024 INFO [train.py:1031] (3/4) Epoch 14, batch 15100, loss[loss=0.2287, simple_loss=0.2812, pruned_loss=0.06725, ctc_loss=0.1041, over 16804.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2777, pruned_loss=0.06227, ctc_loss=0.1087, over 3291394.84 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:36:47,928 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2799262.6666666665, ans=0.125 2023-10-09 15:37:23,325 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2799356.0, ans=0.125 2023-10-09 15:37:30,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2799402.6666666665, ans=0.125 2023-10-09 15:37:36,948 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2799449.3333333335, ans=0.0 2023-10-09 15:37:37,652 INFO [train.py:1031] (3/4) Epoch 14, batch 15150, loss[loss=0.2051, simple_loss=0.2651, pruned_loss=0.05272, ctc_loss=0.09894, over 16144.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2832, pruned_loss=0.06401, ctc_loss=0.1117, over 3292196.15 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:37:45,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2799449.3333333335, ans=0.0 2023-10-09 15:37:55,142 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.319e+02 4.410e+02 5.242e+02 1.151e+03, threshold=8.819e+02, percent-clipped=3.0 2023-10-09 15:38:20,742 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2799589.3333333335, ans=0.95 2023-10-09 15:38:20,794 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799589.3333333335, ans=0.1 2023-10-09 15:38:28,040 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2799636.0, ans=0.0 2023-10-09 15:38:33,213 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2799636.0, ans=0.0 2023-10-09 15:38:38,457 INFO [train.py:1031] (3/4) Epoch 14, batch 15200, loss[loss=0.2181, simple_loss=0.2905, pruned_loss=0.05185, ctc_loss=0.1051, over 16972.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.282, pruned_loss=0.06241, ctc_loss=0.1084, over 3294184.97 frames. ], batch size: 259, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:38:40,571 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=12.0 2023-10-09 15:38:44,462 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-10-09 15:39:08,056 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2023-10-09 15:39:19,853 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2799822.6666666665, ans=0.125 2023-10-09 15:39:25,760 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799822.6666666665, ans=0.1 2023-10-09 15:39:40,021 INFO [train.py:1031] (3/4) Epoch 14, batch 15250, loss[loss=0.1874, simple_loss=0.2724, pruned_loss=0.03719, ctc_loss=0.06991, over 16966.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2817, pruned_loss=0.05984, ctc_loss=0.1043, over 3301101.50 frames. ], batch size: 259, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:39:58,267 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.655e+02 2.982e+02 3.898e+02 5.868e+02, threshold=5.964e+02, percent-clipped=0.0 2023-10-09 15:40:44,709 INFO [train.py:1031] (3/4) Epoch 14, batch 15300, loss[loss=0.2363, simple_loss=0.2929, pruned_loss=0.06789, ctc_loss=0.1098, over 16715.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.2764, pruned_loss=0.05549, ctc_loss=0.09726, over 3305132.66 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:40:46,667 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2800149.3333333335, ans=0.0 2023-10-09 15:41:08,038 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-10-09 15:41:10,273 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=2800242.6666666665, ans=12.0 2023-10-09 15:41:14,289 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800242.6666666665, ans=0.1 2023-10-09 15:41:23,853 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2800289.3333333335, ans=0.0 2023-10-09 15:41:25,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2800289.3333333335, ans=0.125 2023-10-09 15:41:26,998 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:41:45,403 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2800336.0, ans=0.0 2023-10-09 15:41:45,411 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2800336.0, ans=0.0 2023-10-09 15:41:45,413 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2800336.0, ans=0.125 2023-10-09 15:41:48,937 INFO [train.py:1031] (3/4) Epoch 14, batch 15350, loss[loss=0.2165, simple_loss=0.274, pruned_loss=0.05931, ctc_loss=0.1006, over 16773.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2807, pruned_loss=0.05851, ctc_loss=0.1022, over 3308930.06 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:41:49,394 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2800382.6666666665, ans=0.0 2023-10-09 15:42:09,089 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.939e+02 3.401e+02 4.199e+02 7.970e+02, threshold=6.801e+02, percent-clipped=2.0 2023-10-09 15:42:09,472 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2800429.3333333335, ans=0.125 2023-10-09 15:42:27,601 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2023-10-09 15:42:39,753 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2800569.3333333335, ans=0.0 2023-10-09 15:42:53,683 INFO [train.py:1031] (3/4) Epoch 14, batch 15400, loss[loss=0.2177, simple_loss=0.2988, pruned_loss=0.04944, ctc_loss=0.09451, over 16867.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2865, pruned_loss=0.05977, ctc_loss=0.1046, over 3306633.94 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:42:59,858 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-10-09 15:43:07,665 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2800662.6666666665, ans=0.2 2023-10-09 15:43:16,936 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2800662.6666666665, ans=0.0 2023-10-09 15:43:17,934 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2800709.3333333335, ans=0.2 2023-10-09 15:43:47,615 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2800802.6666666665, ans=0.1 2023-10-09 15:43:56,848 INFO [train.py:1031] (3/4) Epoch 14, batch 15450, loss[loss=0.2317, simple_loss=0.3079, pruned_loss=0.05816, ctc_loss=0.09824, over 15081.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.285, pruned_loss=0.05984, ctc_loss=0.1039, over 3311004.84 frames. ], batch size: 529, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:44:00,083 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2800849.3333333335, ans=0.125 2023-10-09 15:44:06,503 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:44:09,405 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:44:17,381 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 3.271e+02 3.987e+02 5.026e+02 8.046e+02, threshold=7.973e+02, percent-clipped=4.0 2023-10-09 15:44:25,863 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2800942.6666666665, ans=0.2 2023-10-09 15:44:27,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2800942.6666666665, ans=0.125 2023-10-09 15:44:31,633 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2800942.6666666665, ans=0.0 2023-10-09 15:44:33,837 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2800989.3333333335, ans=0.125 2023-10-09 15:44:52,562 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2801036.0, ans=0.0 2023-10-09 15:45:00,351 INFO [train.py:1031] (3/4) Epoch 14, batch 15500, loss[loss=0.2396, simple_loss=0.2989, pruned_loss=0.06786, ctc_loss=0.1114, over 16421.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2786, pruned_loss=0.05897, ctc_loss=0.1011, over 3299621.33 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:45:11,432 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:45:25,603 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2801176.0, ans=0.125 2023-10-09 15:45:26,647 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:45:30,158 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=22.5 2023-10-09 15:45:36,300 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0 2023-10-09 15:45:40,672 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.68 vs. limit=6.0 2023-10-09 15:45:45,745 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2801222.6666666665, ans=0.1 2023-10-09 15:45:52,865 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2801269.3333333335, ans=0.125 2023-10-09 15:45:59,993 INFO [train.py:1031] (3/4) Epoch 14, batch 15550, loss[loss=0.2083, simple_loss=0.2941, pruned_loss=0.04634, ctc_loss=0.07483, over 15197.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2769, pruned_loss=0.05967, ctc_loss=0.1014, over 3296800.76 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:46:01,758 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2801316.0, ans=6.0 2023-10-09 15:46:10,169 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801316.0, ans=0.1 2023-10-09 15:46:22,076 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.223e+02 3.587e+02 4.203e+02 7.757e+02, threshold=7.174e+02, percent-clipped=0.0 2023-10-09 15:46:34,957 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2801456.0, ans=0.0 2023-10-09 15:46:35,908 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2801456.0, ans=0.125 2023-10-09 15:46:46,702 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2801502.6666666665, ans=0.125 2023-10-09 15:46:59,428 INFO [train.py:1031] (3/4) Epoch 14, batch 15600, loss[loss=0.2144, simple_loss=0.2789, pruned_loss=0.05565, ctc_loss=0.0965, over 16852.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2841, pruned_loss=0.06325, ctc_loss=0.1083, over 3301086.70 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:47:17,650 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2801596.0, ans=0.07 2023-10-09 15:47:19,153 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-10-09 15:47:19,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2801596.0, ans=0.0 2023-10-09 15:47:26,856 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2801642.6666666665, ans=0.125 2023-10-09 15:47:31,730 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2801642.6666666665, ans=0.125 2023-10-09 15:47:32,689 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2801642.6666666665, ans=0.1 2023-10-09 15:47:37,862 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2801689.3333333335, ans=0.125 2023-10-09 15:47:40,592 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-10-09 15:47:44,094 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2801689.3333333335, ans=0.125 2023-10-09 15:48:00,432 INFO [train.py:1031] (3/4) Epoch 14, batch 15650, loss[loss=0.227, simple_loss=0.2739, pruned_loss=0.06551, ctc_loss=0.1227, over 16750.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2827, pruned_loss=0.06083, ctc_loss=0.1048, over 3296102.87 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 15:48:02,978 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2801782.6666666665, ans=0.2 2023-10-09 15:48:13,786 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2801829.3333333335, ans=0.2 2023-10-09 15:48:23,318 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 3.056e+02 3.461e+02 4.046e+02 6.916e+02, threshold=6.921e+02, percent-clipped=0.0 2023-10-09 15:48:23,587 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2801876.0, ans=0.125 2023-10-09 15:48:25,355 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2801876.0, ans=0.0 2023-10-09 15:48:26,325 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2801876.0, ans=0.1 2023-10-09 15:48:31,352 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2801876.0, ans=0.125 2023-10-09 15:49:00,033 INFO [train.py:1031] (3/4) Epoch 14, batch 15700, loss[loss=0.2115, simple_loss=0.2654, pruned_loss=0.05894, ctc_loss=0.09947, over 16949.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.277, pruned_loss=0.06042, ctc_loss=0.1042, over 3286104.09 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:49:06,218 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2802016.0, ans=0.0 2023-10-09 15:49:14,733 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.52 vs. limit=10.0 2023-10-09 15:49:15,394 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802062.6666666665, ans=0.1 2023-10-09 15:49:31,481 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2802109.3333333335, ans=0.125 2023-10-09 15:49:55,767 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2802202.6666666665, ans=0.125 2023-10-09 15:49:57,443 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2802202.6666666665, ans=0.125 2023-10-09 15:50:01,939 INFO [train.py:1031] (3/4) Epoch 14, batch 15750, loss[loss=0.2155, simple_loss=0.2736, pruned_loss=0.05954, ctc_loss=0.09566, over 16891.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2723, pruned_loss=0.06023, ctc_loss=0.104, over 3293275.23 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:50:04,868 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2802249.3333333335, ans=0.0 2023-10-09 15:50:26,227 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.013e+02 3.498e+02 4.173e+02 6.687e+02, threshold=6.996e+02, percent-clipped=0.0 2023-10-09 15:50:35,745 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2802342.6666666665, ans=0.5 2023-10-09 15:50:53,876 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2802436.0, ans=0.125 2023-10-09 15:51:02,037 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=22.5 2023-10-09 15:51:03,854 INFO [train.py:1031] (3/4) Epoch 14, batch 15800, loss[loss=0.1982, simple_loss=0.2758, pruned_loss=0.04431, ctc_loss=0.07968, over 16833.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2696, pruned_loss=0.05902, ctc_loss=0.1026, over 3291807.93 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:51:06,357 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2802482.6666666665, ans=0.05 2023-10-09 15:51:10,820 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2802482.6666666665, ans=0.125 2023-10-09 15:51:20,670 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802529.3333333335, ans=0.1 2023-10-09 15:51:51,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2802622.6666666665, ans=0.125 2023-10-09 15:52:09,231 INFO [train.py:1031] (3/4) Epoch 14, batch 15850, loss[loss=0.1738, simple_loss=0.2305, pruned_loss=0.04436, ctc_loss=0.07078, over 16749.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2761, pruned_loss=0.05894, ctc_loss=0.1023, over 3286455.68 frames. ], batch size: 140, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:52:09,838 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2023-10-09 15:52:11,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2802716.0, ans=0.1 2023-10-09 15:52:34,653 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-10-09 15:52:36,124 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+02 3.156e+02 3.985e+02 5.059e+02 1.038e+03, threshold=7.970e+02, percent-clipped=10.0 2023-10-09 15:52:48,361 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2802856.0, ans=0.125 2023-10-09 15:52:49,287 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2802856.0, ans=0.0 2023-10-09 15:53:00,687 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2802902.6666666665, ans=0.2 2023-10-09 15:53:06,570 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2802902.6666666665, ans=0.1 2023-10-09 15:53:11,396 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2802949.3333333335, ans=0.125 2023-10-09 15:53:12,701 INFO [train.py:1031] (3/4) Epoch 14, batch 15900, loss[loss=0.2249, simple_loss=0.2718, pruned_loss=0.06492, ctc_loss=0.1202, over 16286.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2737, pruned_loss=0.05722, ctc_loss=0.09912, over 3286989.92 frames. ], batch size: 414, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:53:21,615 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2802949.3333333335, ans=0.1 2023-10-09 15:53:30,777 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.28 vs. limit=15.0 2023-10-09 15:53:42,528 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2803042.6666666665, ans=0.1 2023-10-09 15:54:02,851 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=8.0 2023-10-09 15:54:14,353 INFO [train.py:1031] (3/4) Epoch 14, batch 15950, loss[loss=0.2502, simple_loss=0.2993, pruned_loss=0.07335, ctc_loss=0.1362, over 16886.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.272, pruned_loss=0.05784, ctc_loss=0.1002, over 3294075.85 frames. ], batch size: 310, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:54:16,707 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2023-10-09 15:54:22,388 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2803182.6666666665, ans=0.125 2023-10-09 15:54:30,376 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2803229.3333333335, ans=0.5 2023-10-09 15:54:41,755 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 3.015e+02 3.467e+02 4.153e+02 6.024e+02, threshold=6.935e+02, percent-clipped=0.0 2023-10-09 15:54:49,894 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2803276.0, ans=0.125 2023-10-09 15:54:57,040 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2803322.6666666665, ans=0.0 2023-10-09 15:55:15,558 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2803416.0, ans=0.2 2023-10-09 15:55:16,767 INFO [train.py:1031] (3/4) Epoch 14, batch 16000, loss[loss=0.363, simple_loss=0.3838, pruned_loss=0.1276, ctc_loss=0.2176, over 16706.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2801, pruned_loss=0.06265, ctc_loss=0.1086, over 3299759.32 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:55:17,134 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2803416.0, ans=0.125 2023-10-09 15:55:19,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2803416.0, ans=0.125 2023-10-09 15:55:30,171 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-10-09 15:55:32,415 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2803462.6666666665, ans=0.07 2023-10-09 15:55:37,223 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=22.5 2023-10-09 15:56:12,048 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2803602.6666666665, ans=0.07 2023-10-09 15:56:16,757 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2803602.6666666665, ans=0.1 2023-10-09 15:56:19,092 INFO [train.py:1031] (3/4) Epoch 14, batch 16050, loss[loss=0.2233, simple_loss=0.2838, pruned_loss=0.05952, ctc_loss=0.1093, over 16831.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2911, pruned_loss=0.06509, ctc_loss=0.1145, over 3296189.85 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:56:24,187 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2803649.3333333335, ans=0.125 2023-10-09 15:56:39,345 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2803696.0, ans=0.0 2023-10-09 15:56:48,623 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 3.378e+02 4.238e+02 4.995e+02 7.928e+02, threshold=8.476e+02, percent-clipped=3.0 2023-10-09 15:56:54,931 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.28 vs. limit=22.5 2023-10-09 15:57:00,740 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2803789.3333333335, ans=0.125 2023-10-09 15:57:12,968 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2803836.0, ans=0.125 2023-10-09 15:57:16,499 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2803836.0, ans=0.0 2023-10-09 15:57:21,632 INFO [train.py:1031] (3/4) Epoch 14, batch 16100, loss[loss=0.2586, simple_loss=0.3111, pruned_loss=0.07547, ctc_loss=0.1378, over 16929.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2935, pruned_loss=0.06473, ctc_loss=0.1146, over 3298997.78 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:57:54,248 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2803976.0, ans=0.125 2023-10-09 15:57:59,728 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2804022.6666666665, ans=0.1 2023-10-09 15:58:12,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804069.3333333335, ans=0.1 2023-10-09 15:58:20,382 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2804069.3333333335, ans=0.125 2023-10-09 15:58:21,500 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2804069.3333333335, ans=0.0 2023-10-09 15:58:23,846 INFO [train.py:1031] (3/4) Epoch 14, batch 16150, loss[loss=0.2033, simple_loss=0.2665, pruned_loss=0.05131, ctc_loss=0.09362, over 16690.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2945, pruned_loss=0.06564, ctc_loss=0.1156, over 3299516.81 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:58:24,177 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2804116.0, ans=0.2 2023-10-09 15:58:53,759 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.176e+02 3.660e+02 4.435e+02 1.361e+03, threshold=7.321e+02, percent-clipped=1.0 2023-10-09 15:58:58,814 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=2804209.3333333335, ans=10.0 2023-10-09 15:59:09,422 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2804256.0, ans=0.125 2023-10-09 15:59:10,438 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804256.0, ans=0.1 2023-10-09 15:59:20,171 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:59:24,936 INFO [train.py:1031] (3/4) Epoch 14, batch 16200, loss[loss=0.1988, simple_loss=0.2515, pruned_loss=0.05391, ctc_loss=0.09552, over 16729.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2881, pruned_loss=0.06374, ctc_loss=0.1122, over 3295086.15 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:59:28,027 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2804349.3333333335, ans=0.0 2023-10-09 15:59:28,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2804349.3333333335, ans=0.125 2023-10-09 15:59:41,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2804396.0, ans=0.0 2023-10-09 15:59:42,585 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2804396.0, ans=0.1 2023-10-09 15:59:50,659 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2804442.6666666665, ans=0.0 2023-10-09 15:59:52,722 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2804442.6666666665, ans=0.09899494936611666 2023-10-09 16:00:08,001 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2804489.3333333335, ans=0.0 2023-10-09 16:00:27,729 INFO [train.py:1031] (3/4) Epoch 14, batch 16250, loss[loss=0.2111, simple_loss=0.2852, pruned_loss=0.05063, ctc_loss=0.08921, over 16826.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2814, pruned_loss=0.0622, ctc_loss=0.1094, over 3296596.45 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:00:58,613 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 3.037e+02 3.428e+02 4.095e+02 1.009e+03, threshold=6.855e+02, percent-clipped=2.0 2023-10-09 16:01:22,787 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2804769.3333333335, ans=0.0 2023-10-09 16:01:30,634 INFO [train.py:1031] (3/4) Epoch 14, batch 16300, loss[loss=0.2035, simple_loss=0.2505, pruned_loss=0.05734, ctc_loss=0.1046, over 15342.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2807, pruned_loss=0.05952, ctc_loss=0.1055, over 3288679.67 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:01:57,943 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2804909.3333333335, ans=0.05 2023-10-09 16:02:14,093 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=22.5 2023-10-09 16:02:16,828 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-10-09 16:02:17,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2804956.0, ans=0.125 2023-10-09 16:02:21,169 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2805002.6666666665, ans=0.125 2023-10-09 16:02:22,235 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2805002.6666666665, ans=0.2 2023-10-09 16:02:28,101 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2805002.6666666665, ans=0.0 2023-10-09 16:02:31,523 INFO [train.py:1031] (3/4) Epoch 14, batch 16350, loss[loss=0.2108, simple_loss=0.2689, pruned_loss=0.05811, ctc_loss=0.09136, over 16941.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2753, pruned_loss=0.05858, ctc_loss=0.1036, over 3298949.39 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:02:32,174 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-10-09 16:02:43,932 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2805096.0, ans=0.1 2023-10-09 16:02:51,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2805096.0, ans=0.125 2023-10-09 16:02:58,815 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2805142.6666666665, ans=0.0 2023-10-09 16:03:01,698 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.121e+02 3.548e+02 4.178e+02 8.324e+02, threshold=7.096e+02, percent-clipped=2.0 2023-10-09 16:03:16,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2805189.3333333335, ans=0.125 2023-10-09 16:03:17,543 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2805189.3333333335, ans=0.125 2023-10-09 16:03:31,998 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2023-10-09 16:03:32,989 INFO [train.py:1031] (3/4) Epoch 14, batch 16400, loss[loss=0.204, simple_loss=0.2647, pruned_loss=0.05269, ctc_loss=0.0949, over 16888.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2747, pruned_loss=0.05979, ctc_loss=0.1055, over 3302922.42 frames. ], batch size: 189, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:03:42,755 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2805282.6666666665, ans=0.1 2023-10-09 16:03:52,674 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2805329.3333333335, ans=0.125 2023-10-09 16:04:12,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2805422.6666666665, ans=0.125 2023-10-09 16:04:28,819 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2805469.3333333335, ans=0.125 2023-10-09 16:04:30,893 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=8.0 2023-10-09 16:04:31,344 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2805469.3333333335, ans=0.125 2023-10-09 16:04:34,782 INFO [train.py:1031] (3/4) Epoch 14, batch 16450, loss[loss=0.1999, simple_loss=0.2566, pruned_loss=0.05307, ctc_loss=0.09282, over 16789.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2738, pruned_loss=0.06155, ctc_loss=0.1082, over 3301893.33 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:04:36,691 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2805516.0, ans=0.0 2023-10-09 16:04:52,170 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-10-09 16:05:06,538 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+02 3.324e+02 3.650e+02 4.238e+02 1.011e+03, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 16:05:14,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2805656.0, ans=0.0 2023-10-09 16:05:35,713 INFO [train.py:1031] (3/4) Epoch 14, batch 16500, loss[loss=0.2057, simple_loss=0.2486, pruned_loss=0.06131, ctc_loss=0.1007, over 16812.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2694, pruned_loss=0.06141, ctc_loss=0.1079, over 3311198.10 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:05:56,222 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2805796.0, ans=0.125 2023-10-09 16:05:58,435 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2805796.0, ans=0.125 2023-10-09 16:06:03,190 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2805842.6666666665, ans=0.07 2023-10-09 16:06:07,454 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2805842.6666666665, ans=0.0 2023-10-09 16:06:15,459 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2805889.3333333335, ans=0.1 2023-10-09 16:06:20,584 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.00 vs. limit=10.0 2023-10-09 16:06:27,193 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2805936.0, ans=0.125 2023-10-09 16:06:29,018 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=12.0 2023-10-09 16:06:35,167 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2805936.0, ans=0.125 2023-10-09 16:06:37,059 INFO [train.py:1031] (3/4) Epoch 14, batch 16550, loss[loss=0.182, simple_loss=0.2391, pruned_loss=0.04691, ctc_loss=0.07778, over 16744.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2709, pruned_loss=0.0612, ctc_loss=0.1074, over 3309949.31 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:06:39,387 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2805982.6666666665, ans=0.125 2023-10-09 16:06:53,456 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2806029.3333333335, ans=0.125 2023-10-09 16:07:09,907 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+02 3.011e+02 3.365e+02 4.120e+02 6.132e+02, threshold=6.730e+02, percent-clipped=0.0 2023-10-09 16:07:31,552 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2806169.3333333335, ans=0.0 2023-10-09 16:07:37,226 INFO [train.py:1031] (3/4) Epoch 14, batch 16600, loss[loss=0.2229, simple_loss=0.2715, pruned_loss=0.06492, ctc_loss=0.1113, over 12130.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2692, pruned_loss=0.06179, ctc_loss=0.1078, over 3304095.58 frames. ], batch size: 37, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:07:46,266 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2806216.0, ans=0.125 2023-10-09 16:07:52,878 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2806262.6666666665, ans=0.0 2023-10-09 16:08:29,104 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2806402.6666666665, ans=0.1 2023-10-09 16:08:39,073 INFO [train.py:1031] (3/4) Epoch 14, batch 16650, loss[loss=0.1968, simple_loss=0.254, pruned_loss=0.05243, ctc_loss=0.08721, over 16705.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2712, pruned_loss=0.06068, ctc_loss=0.1065, over 3300040.90 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:08:45,369 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2806449.3333333335, ans=0.025 2023-10-09 16:08:50,468 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-10-09 16:08:57,033 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2806496.0, ans=0.0 2023-10-09 16:09:08,433 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2806542.6666666665, ans=0.0 2023-10-09 16:09:12,590 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-10-09 16:09:13,193 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2806542.6666666665, ans=0.0 2023-10-09 16:09:15,080 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 2.878e+02 3.292e+02 3.921e+02 8.519e+02, threshold=6.584e+02, percent-clipped=3.0 2023-10-09 16:09:16,473 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2806589.3333333335, ans=0.0 2023-10-09 16:09:40,522 INFO [train.py:1031] (3/4) Epoch 14, batch 16700, loss[loss=0.2333, simple_loss=0.2614, pruned_loss=0.07659, ctc_loss=0.1302, over 16362.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.268, pruned_loss=0.06074, ctc_loss=0.1061, over 3296923.35 frames. ], batch size: 417, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:09:59,707 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2023-10-09 16:10:42,245 INFO [train.py:1031] (3/4) Epoch 14, batch 16750, loss[loss=0.2045, simple_loss=0.2809, pruned_loss=0.04794, ctc_loss=0.08074, over 16718.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2678, pruned_loss=0.06065, ctc_loss=0.1058, over 3300289.12 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:11:18,573 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.049e+02 3.546e+02 4.303e+02 6.611e+02, threshold=7.093e+02, percent-clipped=1.0 2023-10-09 16:11:20,787 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-10-09 16:11:42,929 INFO [train.py:1031] (3/4) Epoch 14, batch 16800, loss[loss=0.2043, simple_loss=0.2609, pruned_loss=0.05367, ctc_loss=0.1006, over 16478.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2701, pruned_loss=0.06046, ctc_loss=0.1059, over 3298589.05 frames. ], batch size: 466, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:12:24,984 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-10-09 16:12:30,251 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2807289.3333333335, ans=0.0 2023-10-09 16:12:45,118 INFO [train.py:1031] (3/4) Epoch 14, batch 16850, loss[loss=0.2022, simple_loss=0.2576, pruned_loss=0.05355, ctc_loss=0.09928, over 16814.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2712, pruned_loss=0.0612, ctc_loss=0.1073, over 3300968.94 frames. ], batch size: 141, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:12:54,066 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2807382.6666666665, ans=0.125 2023-10-09 16:12:54,070 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2807382.6666666665, ans=0.125 2023-10-09 16:13:00,892 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-10-09 16:13:06,061 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2807429.3333333335, ans=0.0 2023-10-09 16:13:24,924 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.198e+02 3.748e+02 4.342e+02 8.032e+02, threshold=7.496e+02, percent-clipped=3.0 2023-10-09 16:13:27,011 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2807522.6666666665, ans=0.0 2023-10-09 16:13:27,437 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2023-10-09 16:13:40,630 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2807569.3333333335, ans=0.09899494936611666 2023-10-09 16:13:48,646 INFO [train.py:1031] (3/4) Epoch 14, batch 16900, loss[loss=0.2436, simple_loss=0.3079, pruned_loss=0.06621, ctc_loss=0.1169, over 16831.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2745, pruned_loss=0.06077, ctc_loss=0.107, over 3311213.38 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:14:38,161 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2807802.6666666665, ans=0.125 2023-10-09 16:14:48,665 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807802.6666666665, ans=0.1 2023-10-09 16:14:50,888 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2807849.3333333335, ans=0.125 2023-10-09 16:14:51,576 INFO [train.py:1031] (3/4) Epoch 14, batch 16950, loss[loss=0.2539, simple_loss=0.31, pruned_loss=0.07198, ctc_loss=0.1346, over 16899.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2812, pruned_loss=0.06325, ctc_loss=0.1114, over 3315062.35 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:15:19,785 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2023-10-09 16:15:26,384 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807942.6666666665, ans=0.1 2023-10-09 16:15:31,169 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-10-09 16:15:33,196 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+02 3.296e+02 3.627e+02 4.465e+02 8.431e+02, threshold=7.254e+02, percent-clipped=3.0 2023-10-09 16:15:33,630 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2807989.3333333335, ans=0.125 2023-10-09 16:15:34,571 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2807989.3333333335, ans=0.2 2023-10-09 16:15:55,750 INFO [train.py:1031] (3/4) Epoch 14, batch 17000, loss[loss=0.3069, simple_loss=0.3594, pruned_loss=0.09302, ctc_loss=0.1711, over 16663.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2858, pruned_loss=0.06457, ctc_loss=0.1137, over 3310456.79 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:16:05,379 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808082.6666666665, ans=0.1 2023-10-09 16:16:06,427 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2808082.6666666665, ans=0.0 2023-10-09 16:16:15,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2808129.3333333335, ans=0.0 2023-10-09 16:16:25,031 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2808176.0, ans=0.04949747468305833 2023-10-09 16:16:35,342 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=22.5 2023-10-09 16:16:38,005 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2808222.6666666665, ans=0.04949747468305833 2023-10-09 16:16:41,085 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2808222.6666666665, ans=0.1 2023-10-09 16:16:43,753 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2808222.6666666665, ans=0.0 2023-10-09 16:16:55,441 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-10-09 16:16:59,294 INFO [train.py:1031] (3/4) Epoch 14, batch 17050, loss[loss=0.2162, simple_loss=0.2819, pruned_loss=0.05557, ctc_loss=0.09854, over 16849.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2854, pruned_loss=0.06252, ctc_loss=0.1104, over 3311005.22 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:17:05,021 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2808316.0, ans=0.0 2023-10-09 16:17:41,819 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+02 3.300e+02 3.832e+02 4.647e+02 9.893e+02, threshold=7.664e+02, percent-clipped=3.0 2023-10-09 16:17:49,652 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.24 vs. limit=15.0 2023-10-09 16:17:53,973 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=15.0 2023-10-09 16:18:02,362 INFO [train.py:1031] (3/4) Epoch 14, batch 17100, loss[loss=0.3238, simple_loss=0.3452, pruned_loss=0.1123, ctc_loss=0.1944, over 16699.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2878, pruned_loss=0.06414, ctc_loss=0.1133, over 3306604.42 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:18:31,515 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2808642.6666666665, ans=0.125 2023-10-09 16:18:35,778 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2808642.6666666665, ans=0.0 2023-10-09 16:18:49,480 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2808689.3333333335, ans=10.0 2023-10-09 16:19:03,699 INFO [train.py:1031] (3/4) Epoch 14, batch 17150, loss[loss=0.2213, simple_loss=0.2912, pruned_loss=0.0562, ctc_loss=0.09726, over 16797.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2883, pruned_loss=0.06304, ctc_loss=0.1111, over 3301688.63 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:19:05,222 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2808782.6666666665, ans=0.2 2023-10-09 16:19:27,583 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2808876.0, ans=0.0 2023-10-09 16:19:29,259 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2808876.0, ans=0.025 2023-10-09 16:19:46,138 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.091e+02 3.589e+02 4.240e+02 6.885e+02, threshold=7.178e+02, percent-clipped=0.0 2023-10-09 16:19:51,186 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:19:57,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2808969.3333333335, ans=0.125 2023-10-09 16:20:05,575 INFO [train.py:1031] (3/4) Epoch 14, batch 17200, loss[loss=0.2208, simple_loss=0.2996, pruned_loss=0.05256, ctc_loss=0.0921, over 16873.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2941, pruned_loss=0.06296, ctc_loss=0.1115, over 3302964.54 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:20:18,162 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2809062.6666666665, ans=0.125 2023-10-09 16:20:21,078 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2809062.6666666665, ans=0.125 2023-10-09 16:20:22,121 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2809062.6666666665, ans=0.125 2023-10-09 16:20:40,144 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2809109.3333333335, ans=0.0 2023-10-09 16:20:41,559 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=22.5 2023-10-09 16:20:42,339 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2809109.3333333335, ans=0.0 2023-10-09 16:20:53,622 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2809156.0, ans=0.0 2023-10-09 16:20:56,674 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2023-10-09 16:21:08,178 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2809202.6666666665, ans=0.0 2023-10-09 16:21:12,740 INFO [train.py:1031] (3/4) Epoch 14, batch 17250, loss[loss=0.2854, simple_loss=0.3757, pruned_loss=0.06918, ctc_loss=0.142, over 16786.00 frames. ], tot_loss[loss=0.2453, simple_loss=0.3128, pruned_loss=0.06526, ctc_loss=0.1181, over 3301228.98 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:21:21,330 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2809249.3333333335, ans=0.125 2023-10-09 16:21:57,826 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.919e+02 4.626e+02 5.820e+02 9.725e+02, threshold=9.252e+02, percent-clipped=7.0 2023-10-09 16:21:58,186 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2809389.3333333335, ans=0.125 2023-10-09 16:22:10,540 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2809436.0, ans=0.125 2023-10-09 16:22:13,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2809436.0, ans=0.0 2023-10-09 16:22:14,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2809482.6666666665, ans=0.2 2023-10-09 16:22:16,152 INFO [train.py:1031] (3/4) Epoch 14, batch 17300, loss[loss=0.2078, simple_loss=0.2423, pruned_loss=0.06535, ctc_loss=0.1065, over 13642.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.3179, pruned_loss=0.06605, ctc_loss=0.1193, over 3296313.46 frames. ], batch size: 51, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:22:41,413 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2809576.0, ans=0.1 2023-10-09 16:22:45,886 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2809576.0, ans=0.125 2023-10-09 16:22:46,909 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2809576.0, ans=0.2 2023-10-09 16:22:48,664 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2809576.0, ans=0.125 2023-10-09 16:22:59,902 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2809622.6666666665, ans=0.0 2023-10-09 16:23:04,565 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2809669.3333333335, ans=0.0 2023-10-09 16:23:12,765 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-10-09 16:23:14,506 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2809669.3333333335, ans=0.125 2023-10-09 16:23:17,640 INFO [train.py:1031] (3/4) Epoch 14, batch 17350, loss[loss=0.271, simple_loss=0.3394, pruned_loss=0.07368, ctc_loss=0.1383, over 16815.00 frames. ], tot_loss[loss=0.2517, simple_loss=0.3219, pruned_loss=0.06679, ctc_loss=0.1199, over 3296187.15 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:23:19,611 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2809716.0, ans=0.2 2023-10-09 16:23:42,228 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2809809.3333333335, ans=0.125 2023-10-09 16:23:56,878 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2809856.0, ans=0.0 2023-10-09 16:24:01,203 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.229e+02 3.810e+02 5.005e+02 1.294e+03, threshold=7.619e+02, percent-clipped=1.0 2023-10-09 16:24:13,830 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2809902.6666666665, ans=0.125 2023-10-09 16:24:14,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2809902.6666666665, ans=0.0 2023-10-09 16:24:18,377 INFO [train.py:1031] (3/4) Epoch 14, batch 17400, loss[loss=0.2067, simple_loss=0.2526, pruned_loss=0.06002, ctc_loss=0.1023, over 16704.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.3116, pruned_loss=0.06564, ctc_loss=0.1171, over 3286542.02 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:24:18,687 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2809949.3333333335, ans=0.1 2023-10-09 16:24:40,076 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2809996.0, ans=0.2 2023-10-09 16:25:18,381 INFO [train.py:1031] (3/4) Epoch 14, batch 17450, loss[loss=0.2204, simple_loss=0.2596, pruned_loss=0.06739, ctc_loss=0.116, over 16711.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.2994, pruned_loss=0.06457, ctc_loss=0.1144, over 3295331.43 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:25:24,787 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2810182.6666666665, ans=0.5 2023-10-09 16:26:03,499 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2810322.6666666665, ans=0.125 2023-10-09 16:26:05,311 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+02 3.049e+02 3.427e+02 3.970e+02 9.337e+02, threshold=6.853e+02, percent-clipped=1.0 2023-10-09 16:26:20,868 INFO [train.py:1031] (3/4) Epoch 14, batch 17500, loss[loss=0.2527, simple_loss=0.2975, pruned_loss=0.07595, ctc_loss=0.1399, over 16772.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2901, pruned_loss=0.06445, ctc_loss=0.1139, over 3302536.38 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:26:35,150 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2810462.6666666665, ans=0.125 2023-10-09 16:26:44,420 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2810462.6666666665, ans=0.125 2023-10-09 16:26:57,184 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2810556.0, ans=0.035 2023-10-09 16:26:58,224 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2810556.0, ans=0.125 2023-10-09 16:27:12,921 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2810602.6666666665, ans=0.0 2023-10-09 16:27:22,359 INFO [train.py:1031] (3/4) Epoch 14, batch 17550, loss[loss=0.2641, simple_loss=0.3156, pruned_loss=0.07722, ctc_loss=0.1451, over 16721.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2906, pruned_loss=0.06683, ctc_loss=0.1178, over 3301609.35 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:27:39,287 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2810696.0, ans=0.125 2023-10-09 16:27:53,508 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2810742.6666666665, ans=0.0 2023-10-09 16:27:53,524 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2810742.6666666665, ans=0.125 2023-10-09 16:28:12,034 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2023-10-09 16:28:12,372 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.113e+02 3.532e+02 4.348e+02 7.721e+02, threshold=7.063e+02, percent-clipped=2.0 2023-10-09 16:28:25,650 INFO [train.py:1031] (3/4) Epoch 14, batch 17600, loss[loss=0.2145, simple_loss=0.2809, pruned_loss=0.05501, ctc_loss=0.09521, over 16924.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2939, pruned_loss=0.06518, ctc_loss=0.1154, over 3301366.28 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:28:30,690 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2810882.6666666665, ans=0.0 2023-10-09 16:28:31,741 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2810882.6666666665, ans=0.1 2023-10-09 16:28:47,950 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2810929.3333333335, ans=0.125 2023-10-09 16:28:47,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2810929.3333333335, ans=0.1 2023-10-09 16:28:49,038 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2810976.0, ans=0.0 2023-10-09 16:28:49,089 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2810976.0, ans=0.0 2023-10-09 16:28:50,199 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2810976.0, ans=0.2 2023-10-09 16:28:56,520 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2810976.0, ans=0.125 2023-10-09 16:29:03,427 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2811022.6666666665, ans=0.125 2023-10-09 16:29:27,533 INFO [train.py:1031] (3/4) Epoch 14, batch 17650, loss[loss=0.2233, simple_loss=0.2917, pruned_loss=0.0568, ctc_loss=0.1034, over 16782.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2933, pruned_loss=0.06406, ctc_loss=0.1134, over 3307757.59 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:29:35,562 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2811116.0, ans=0.125 2023-10-09 16:29:37,069 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2811116.0, ans=0.1 2023-10-09 16:29:45,815 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2023-10-09 16:29:48,109 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2811162.6666666665, ans=0.125 2023-10-09 16:29:55,785 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2811209.3333333335, ans=0.125 2023-10-09 16:30:17,961 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.983e+02 3.277e+02 4.147e+02 6.506e+02, threshold=6.554e+02, percent-clipped=0.0 2023-10-09 16:30:31,456 INFO [train.py:1031] (3/4) Epoch 14, batch 17700, loss[loss=0.2026, simple_loss=0.2597, pruned_loss=0.05386, ctc_loss=0.0945, over 16700.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2932, pruned_loss=0.06159, ctc_loss=0.1096, over 3307164.57 frames. ], batch size: 102, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:30:33,607 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2811349.3333333335, ans=0.125 2023-10-09 16:30:34,575 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2811349.3333333335, ans=0.2 2023-10-09 16:30:34,831 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-10-09 16:30:38,913 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2023-10-09 16:30:51,911 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2023-10-09 16:31:35,730 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-10-09 16:31:35,982 INFO [train.py:1031] (3/4) Epoch 14, batch 17750, loss[loss=0.223, simple_loss=0.2852, pruned_loss=0.0593, ctc_loss=0.1057, over 16703.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2905, pruned_loss=0.05961, ctc_loss=0.1064, over 3295690.17 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:31:44,447 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2811582.6666666665, ans=0.125 2023-10-09 16:31:51,396 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2811629.3333333335, ans=0.125 2023-10-09 16:31:53,670 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2811629.3333333335, ans=0.0 2023-10-09 16:32:21,433 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2023-10-09 16:32:26,655 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+02 3.107e+02 3.479e+02 4.054e+02 7.691e+02, threshold=6.958e+02, percent-clipped=4.0 2023-10-09 16:32:39,088 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:32:39,789 INFO [train.py:1031] (3/4) Epoch 14, batch 17800, loss[loss=0.1478, simple_loss=0.1859, pruned_loss=0.03937, ctc_loss=0.07723, over 16566.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2925, pruned_loss=0.05784, ctc_loss=0.1045, over 3297387.20 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:32:40,029 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2811816.0, ans=0.125 2023-10-09 16:33:14,491 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2811909.3333333335, ans=0.125 2023-10-09 16:33:16,428 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2811956.0, ans=0.125 2023-10-09 16:33:23,667 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2811956.0, ans=0.1 2023-10-09 16:33:23,804 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.67 vs. limit=10.0 2023-10-09 16:33:41,468 INFO [train.py:1031] (3/4) Epoch 14, batch 17850, loss[loss=0.2201, simple_loss=0.2696, pruned_loss=0.06329, ctc_loss=0.1099, over 16778.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2863, pruned_loss=0.05608, ctc_loss=0.1013, over 3292639.41 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:33:41,699 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2812049.3333333335, ans=0.125 2023-10-09 16:33:48,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2812049.3333333335, ans=0.2 2023-10-09 16:34:00,023 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2812096.0, ans=0.125 2023-10-09 16:34:04,127 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2812096.0, ans=0.125 2023-10-09 16:34:32,668 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.983e+02 3.516e+02 4.147e+02 7.275e+02, threshold=7.033e+02, percent-clipped=1.0 2023-10-09 16:34:43,848 INFO [train.py:1031] (3/4) Epoch 14, batch 17900, loss[loss=0.2118, simple_loss=0.2464, pruned_loss=0.06538, ctc_loss=0.1161, over 16295.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2802, pruned_loss=0.0572, ctc_loss=0.1023, over 3290834.95 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:34:45,232 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2812282.6666666665, ans=0.125 2023-10-09 16:34:47,553 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2023-10-09 16:34:55,047 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=22.5 2023-10-09 16:34:55,744 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2812329.3333333335, ans=0.125 2023-10-09 16:35:43,154 INFO [train.py:1031] (3/4) Epoch 14, batch 17950, loss[loss=0.2294, simple_loss=0.2913, pruned_loss=0.06144, ctc_loss=0.1116, over 16740.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2795, pruned_loss=0.0596, ctc_loss=0.1059, over 3296971.73 frames. ], batch size: 271, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:35:54,332 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2023-10-09 16:36:22,342 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2812656.0, ans=0.125 2023-10-09 16:36:22,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2812656.0, ans=0.0 2023-10-09 16:36:37,264 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+02 3.955e+02 4.561e+02 5.519e+02 1.023e+03, threshold=9.123e+02, percent-clipped=10.0 2023-10-09 16:36:47,005 INFO [train.py:1031] (3/4) Epoch 14, batch 18000, loss[loss=0.3076, simple_loss=0.3382, pruned_loss=0.1011, ctc_loss=0.1872, over 16671.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2847, pruned_loss=0.06314, ctc_loss=0.1114, over 3300427.08 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:36:47,005 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 16:37:05,083 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2359, simple_loss=0.3042, pruned_loss=0.06468, ctc_loss=0.09589, over 1796401.00 frames. 2023-10-09 16:37:05,084 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 16:37:13,463 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2812749.3333333335, ans=0.2 2023-10-09 16:38:10,373 INFO [train.py:1031] (3/4) Epoch 14, batch 18050, loss[loss=0.2286, simple_loss=0.3055, pruned_loss=0.05487, ctc_loss=0.1048, over 16905.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2901, pruned_loss=0.06547, ctc_loss=0.1156, over 3299943.27 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:38:10,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2812982.6666666665, ans=0.125 2023-10-09 16:38:53,327 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2813122.6666666665, ans=0.1 2023-10-09 16:39:06,177 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+02 3.447e+02 3.987e+02 5.015e+02 1.069e+03, threshold=7.973e+02, percent-clipped=1.0 2023-10-09 16:39:14,517 INFO [train.py:1031] (3/4) Epoch 14, batch 18100, loss[loss=0.2221, simple_loss=0.3299, pruned_loss=0.04131, ctc_loss=0.07896, over 16256.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2915, pruned_loss=0.06367, ctc_loss=0.1126, over 3285212.06 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:39:31,793 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2813262.6666666665, ans=0.125 2023-10-09 16:39:39,350 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2813309.3333333335, ans=0.125 2023-10-09 16:40:16,215 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2813449.3333333335, ans=0.0 2023-10-09 16:40:16,845 INFO [train.py:1031] (3/4) Epoch 14, batch 18150, loss[loss=0.2267, simple_loss=0.2718, pruned_loss=0.06681, ctc_loss=0.1202, over 16757.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2888, pruned_loss=0.06261, ctc_loss=0.1104, over 3292749.79 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 16:41:02,306 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=12.0 2023-10-09 16:41:12,482 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.202e+02 3.701e+02 4.396e+02 8.361e+02, threshold=7.403e+02, percent-clipped=2.0 2023-10-09 16:41:19,066 INFO [train.py:1031] (3/4) Epoch 14, batch 18200, loss[loss=0.1891, simple_loss=0.2555, pruned_loss=0.04488, ctc_loss=0.08239, over 16833.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2825, pruned_loss=0.06225, ctc_loss=0.1096, over 3295022.40 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:41:19,329 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2813682.6666666665, ans=0.125 2023-10-09 16:41:46,628 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2813776.0, ans=0.125 2023-10-09 16:41:55,288 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2813822.6666666665, ans=0.0 2023-10-09 16:41:57,265 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-10-09 16:41:59,079 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2813822.6666666665, ans=0.07 2023-10-09 16:42:02,890 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2813822.6666666665, ans=0.0 2023-10-09 16:42:14,350 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2813869.3333333335, ans=0.0 2023-10-09 16:42:14,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2813869.3333333335, ans=0.125 2023-10-09 16:42:21,166 INFO [train.py:1031] (3/4) Epoch 14, batch 18250, loss[loss=0.1628, simple_loss=0.2367, pruned_loss=0.03249, ctc_loss=0.05994, over 16859.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2743, pruned_loss=0.05829, ctc_loss=0.1027, over 3290774.31 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:42:29,149 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2813916.0, ans=0.125 2023-10-09 16:42:30,193 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:42:33,969 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2813962.6666666665, ans=0.0 2023-10-09 16:42:50,555 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2023-10-09 16:42:52,232 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2814009.3333333335, ans=0.125 2023-10-09 16:42:57,096 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2814056.0, ans=0.2 2023-10-09 16:43:13,224 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2814102.6666666665, ans=0.1 2023-10-09 16:43:14,345 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2814102.6666666665, ans=0.0 2023-10-09 16:43:16,732 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.801e+02 3.276e+02 4.033e+02 6.396e+02, threshold=6.552e+02, percent-clipped=0.0 2023-10-09 16:43:22,461 INFO [train.py:1031] (3/4) Epoch 14, batch 18300, loss[loss=0.1868, simple_loss=0.2727, pruned_loss=0.03743, ctc_loss=0.06502, over 16831.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2669, pruned_loss=0.0533, ctc_loss=0.09415, over 3300211.84 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:43:28,366 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2814149.3333333335, ans=0.0 2023-10-09 16:43:30,510 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2814149.3333333335, ans=0.0 2023-10-09 16:43:36,932 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2814196.0, ans=0.125 2023-10-09 16:43:36,977 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2814196.0, ans=0.125 2023-10-09 16:43:39,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2814196.0, ans=0.0 2023-10-09 16:43:45,307 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2814196.0, ans=0.05 2023-10-09 16:44:02,139 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=12.0 2023-10-09 16:44:13,670 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2814336.0, ans=0.125 2023-10-09 16:44:25,851 INFO [train.py:1031] (3/4) Epoch 14, batch 18350, loss[loss=0.1726, simple_loss=0.228, pruned_loss=0.04301, ctc_loss=0.07784, over 16935.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2727, pruned_loss=0.05425, ctc_loss=0.09606, over 3305888.50 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:44:37,632 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2814429.3333333335, ans=0.0 2023-10-09 16:45:00,644 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2814476.0, ans=0.0 2023-10-09 16:45:03,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2814522.6666666665, ans=0.125 2023-10-09 16:45:17,814 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2814569.3333333335, ans=0.125 2023-10-09 16:45:22,232 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 3.059e+02 3.585e+02 4.224e+02 7.359e+02, threshold=7.170e+02, percent-clipped=2.0 2023-10-09 16:45:26,920 INFO [train.py:1031] (3/4) Epoch 14, batch 18400, loss[loss=0.2272, simple_loss=0.286, pruned_loss=0.06208, ctc_loss=0.1107, over 16771.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2774, pruned_loss=0.05609, ctc_loss=0.09947, over 3312157.43 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:45:41,823 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.77 vs. limit=10.0 2023-10-09 16:46:00,973 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2814709.3333333335, ans=10.0 2023-10-09 16:46:05,918 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.71 vs. limit=10.0 2023-10-09 16:46:13,885 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2814756.0, ans=0.0 2023-10-09 16:46:20,828 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2814802.6666666665, ans=0.0 2023-10-09 16:46:27,823 INFO [train.py:1031] (3/4) Epoch 14, batch 18450, loss[loss=0.241, simple_loss=0.2863, pruned_loss=0.07282, ctc_loss=0.125, over 16447.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2788, pruned_loss=0.05925, ctc_loss=0.1042, over 3315537.19 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:46:32,126 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=22.5 2023-10-09 16:46:43,471 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2814896.0, ans=0.0 2023-10-09 16:46:55,576 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=2814942.6666666665, ans=0.5 2023-10-09 16:47:24,423 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.91 vs. limit=15.0 2023-10-09 16:47:26,510 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+02 3.308e+02 3.613e+02 4.264e+02 6.985e+02, threshold=7.226e+02, percent-clipped=0.0 2023-10-09 16:47:30,895 INFO [train.py:1031] (3/4) Epoch 14, batch 18500, loss[loss=0.2375, simple_loss=0.3014, pruned_loss=0.06388, ctc_loss=0.1147, over 16928.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2817, pruned_loss=0.06162, ctc_loss=0.1082, over 3322574.92 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:47:37,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2815082.6666666665, ans=0.1 2023-10-09 16:47:41,480 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-10-09 16:47:49,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2815129.3333333335, ans=0.0 2023-10-09 16:47:52,953 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=22.5 2023-10-09 16:48:24,835 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2815269.3333333335, ans=0.0 2023-10-09 16:48:32,667 INFO [train.py:1031] (3/4) Epoch 14, batch 18550, loss[loss=0.3527, simple_loss=0.377, pruned_loss=0.1188, ctc_loss=0.227, over 16709.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.288, pruned_loss=0.06597, ctc_loss=0.1157, over 3311162.34 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:49:04,633 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2815409.3333333335, ans=0.0 2023-10-09 16:49:34,438 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+02 3.368e+02 3.936e+02 4.731e+02 1.128e+03, threshold=7.872e+02, percent-clipped=2.0 2023-10-09 16:49:36,571 INFO [train.py:1031] (3/4) Epoch 14, batch 18600, loss[loss=0.2731, simple_loss=0.3791, pruned_loss=0.06184, ctc_loss=0.1084, over 16275.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.2991, pruned_loss=0.06851, ctc_loss=0.1207, over 3304942.54 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:49:37,505 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815549.3333333335, ans=0.1 2023-10-09 16:49:44,304 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:49:46,085 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2815549.3333333335, ans=0.0 2023-10-09 16:50:25,003 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2815689.3333333335, ans=0.05 2023-10-09 16:50:31,673 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2815736.0, ans=0.125 2023-10-09 16:50:32,715 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2815736.0, ans=0.2 2023-10-09 16:50:39,794 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2815782.6666666665, ans=0.125 2023-10-09 16:50:41,208 INFO [train.py:1031] (3/4) Epoch 14, batch 18650, loss[loss=0.2832, simple_loss=0.3261, pruned_loss=0.08785, ctc_loss=0.1615, over 16625.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.3053, pruned_loss=0.0695, ctc_loss=0.1225, over 3301746.10 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:51:09,424 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2815876.0, ans=0.07 2023-10-09 16:51:10,614 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2815876.0, ans=0.1 2023-10-09 16:51:19,344 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2815922.6666666665, ans=0.1 2023-10-09 16:51:41,576 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.349e+02 3.828e+02 4.485e+02 8.259e+02, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 16:51:43,744 INFO [train.py:1031] (3/4) Epoch 14, batch 18700, loss[loss=0.2003, simple_loss=0.2719, pruned_loss=0.04826, ctc_loss=0.08074, over 16797.00 frames. ], tot_loss[loss=0.2459, simple_loss=0.3038, pruned_loss=0.06955, ctc_loss=0.1223, over 3302601.79 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:51:51,723 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=15.0 2023-10-09 16:51:58,482 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2816062.6666666665, ans=0.0 2023-10-09 16:52:14,267 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=2816109.3333333335, ans=15.0 2023-10-09 16:52:46,841 INFO [train.py:1031] (3/4) Epoch 14, batch 18750, loss[loss=0.2478, simple_loss=0.3369, pruned_loss=0.05759, ctc_loss=0.1085, over 15036.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.3013, pruned_loss=0.06667, ctc_loss=0.1177, over 3298995.54 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:52:48,869 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2816249.3333333335, ans=0.125 2023-10-09 16:52:59,768 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2816296.0, ans=0.0 2023-10-09 16:53:15,441 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2816342.6666666665, ans=0.0 2023-10-09 16:53:24,098 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2023-10-09 16:53:35,799 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2816436.0, ans=0.07 2023-10-09 16:53:48,759 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.936e+02 3.595e+02 4.298e+02 1.016e+03, threshold=7.191e+02, percent-clipped=2.0 2023-10-09 16:53:48,786 INFO [train.py:1031] (3/4) Epoch 14, batch 18800, loss[loss=0.1865, simple_loss=0.2645, pruned_loss=0.03916, ctc_loss=0.07545, over 16841.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2933, pruned_loss=0.06259, ctc_loss=0.1109, over 3292543.02 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:53:55,302 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2023-10-09 16:54:00,941 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2816529.3333333335, ans=0.2 2023-10-09 16:54:23,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2816576.0, ans=0.025 2023-10-09 16:54:45,473 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2816669.3333333335, ans=0.125 2023-10-09 16:54:45,497 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2816669.3333333335, ans=0.125 2023-10-09 16:54:48,903 INFO [train.py:1031] (3/4) Epoch 14, batch 18850, loss[loss=0.1983, simple_loss=0.2544, pruned_loss=0.05292, ctc_loss=0.09107, over 16930.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2887, pruned_loss=0.06167, ctc_loss=0.1089, over 3297448.85 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:55:05,420 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-10-09 16:55:08,820 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2816762.6666666665, ans=0.0 2023-10-09 16:55:12,712 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2023-10-09 16:55:17,273 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2816809.3333333335, ans=0.125 2023-10-09 16:55:24,842 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2816856.0, ans=0.0 2023-10-09 16:55:26,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2816856.0, ans=0.035 2023-10-09 16:55:27,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2816856.0, ans=0.125 2023-10-09 16:55:49,881 INFO [train.py:1031] (3/4) Epoch 14, batch 18900, loss[loss=0.2286, simple_loss=0.2888, pruned_loss=0.06222, ctc_loss=0.1099, over 16834.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2883, pruned_loss=0.06345, ctc_loss=0.112, over 3299430.16 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:55:53,222 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 3.158e+02 3.575e+02 4.091e+02 5.831e+02, threshold=7.150e+02, percent-clipped=0.0 2023-10-09 16:55:56,924 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2816949.3333333335, ans=0.125 2023-10-09 16:56:01,146 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2816996.0, ans=0.2 2023-10-09 16:56:03,835 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=22.5 2023-10-09 16:56:34,166 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2817089.3333333335, ans=0.0 2023-10-09 16:56:44,156 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2817136.0, ans=0.95 2023-10-09 16:56:48,784 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.75 vs. limit=15.0 2023-10-09 16:56:54,182 INFO [train.py:1031] (3/4) Epoch 14, batch 18950, loss[loss=0.211, simple_loss=0.2798, pruned_loss=0.0525, ctc_loss=0.09276, over 16680.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2922, pruned_loss=0.06611, ctc_loss=0.1168, over 3300820.20 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:56:57,919 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-10-09 16:57:14,148 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817229.3333333335, ans=0.1 2023-10-09 16:57:21,826 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2817276.0, ans=0.0 2023-10-09 16:57:31,734 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2817322.6666666665, ans=0.0 2023-10-09 16:57:31,827 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2817322.6666666665, ans=0.5 2023-10-09 16:57:47,778 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2817369.3333333335, ans=0.125 2023-10-09 16:57:55,558 INFO [train.py:1031] (3/4) Epoch 14, batch 19000, loss[loss=0.2889, simple_loss=0.3254, pruned_loss=0.0932, ctc_loss=0.1648, over 16829.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2893, pruned_loss=0.06404, ctc_loss=0.1127, over 3303940.31 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:57:58,301 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 3.278e+02 3.626e+02 4.352e+02 8.941e+02, threshold=7.252e+02, percent-clipped=2.0 2023-10-09 16:58:03,899 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-10-09 16:58:04,702 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2817416.0, ans=0.1 2023-10-09 16:58:13,055 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2817462.6666666665, ans=0.2 2023-10-09 16:58:19,027 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2817509.3333333335, ans=0.125 2023-10-09 16:58:45,209 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2817602.6666666665, ans=0.1 2023-10-09 16:58:55,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2817602.6666666665, ans=0.0 2023-10-09 16:58:57,888 INFO [train.py:1031] (3/4) Epoch 14, batch 19050, loss[loss=0.2464, simple_loss=0.3088, pruned_loss=0.06784, ctc_loss=0.1206, over 16915.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.287, pruned_loss=0.06404, ctc_loss=0.1126, over 3295480.31 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:58:58,155 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2817649.3333333335, ans=0.125 2023-10-09 16:58:58,200 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:59:00,356 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2817649.3333333335, ans=0.0 2023-10-09 16:59:19,456 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2817696.0, ans=0.125 2023-10-09 16:59:21,856 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2817742.6666666665, ans=0.0 2023-10-09 16:59:35,702 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-10-09 16:59:35,799 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2023-10-09 17:00:00,806 INFO [train.py:1031] (3/4) Epoch 14, batch 19100, loss[loss=0.142, simple_loss=0.1748, pruned_loss=0.04104, ctc_loss=0.0678, over 9285.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2882, pruned_loss=0.06565, ctc_loss=0.1151, over 3294902.66 frames. ], batch size: 35, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:00:04,649 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.758e+02 3.450e+02 4.008e+02 4.699e+02 1.096e+03, threshold=8.015e+02, percent-clipped=2.0 2023-10-09 17:00:27,634 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2817976.0, ans=0.0 2023-10-09 17:00:43,300 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2818022.6666666665, ans=0.04949747468305833 2023-10-09 17:01:02,250 INFO [train.py:1031] (3/4) Epoch 14, batch 19150, loss[loss=0.2139, simple_loss=0.3125, pruned_loss=0.04209, ctc_loss=0.07779, over 16232.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2901, pruned_loss=0.06354, ctc_loss=0.1121, over 3304901.74 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:01:04,641 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2818116.0, ans=0.2 2023-10-09 17:01:09,108 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:01:14,087 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2818162.6666666665, ans=0.125 2023-10-09 17:01:46,067 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2818256.0, ans=0.125 2023-10-09 17:01:47,506 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2818256.0, ans=15.0 2023-10-09 17:02:03,494 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2818302.6666666665, ans=0.125 2023-10-09 17:02:06,623 INFO [train.py:1031] (3/4) Epoch 14, batch 19200, loss[loss=0.2931, simple_loss=0.3435, pruned_loss=0.08913, ctc_loss=0.1612, over 16662.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2881, pruned_loss=0.06122, ctc_loss=0.1085, over 3307381.98 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:02:12,383 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.095e+02 3.707e+02 4.645e+02 1.379e+03, threshold=7.414e+02, percent-clipped=4.0 2023-10-09 17:02:24,117 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2818396.0, ans=0.125 2023-10-09 17:02:28,293 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2023-10-09 17:02:36,804 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2818442.6666666665, ans=0.0 2023-10-09 17:02:39,775 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-10-09 17:03:09,812 INFO [train.py:1031] (3/4) Epoch 14, batch 19250, loss[loss=0.2612, simple_loss=0.3198, pruned_loss=0.07304, ctc_loss=0.1414, over 16693.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2873, pruned_loss=0.06031, ctc_loss=0.1077, over 3317163.22 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:03:25,268 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2818629.3333333335, ans=0.0 2023-10-09 17:03:25,291 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2818629.3333333335, ans=0.0 2023-10-09 17:03:29,079 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2818629.3333333335, ans=0.0 2023-10-09 17:03:36,313 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2818676.0, ans=0.125 2023-10-09 17:03:44,094 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2818676.0, ans=0.2 2023-10-09 17:03:59,467 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2818722.6666666665, ans=0.1 2023-10-09 17:04:06,120 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:04:09,965 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2818769.3333333335, ans=0.0 2023-10-09 17:04:14,917 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2818816.0, ans=0.0 2023-10-09 17:04:15,645 INFO [train.py:1031] (3/4) Epoch 14, batch 19300, loss[loss=0.2386, simple_loss=0.2825, pruned_loss=0.0708, ctc_loss=0.1331, over 15304.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2874, pruned_loss=0.06146, ctc_loss=0.1099, over 3313132.53 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:04:15,955 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2818816.0, ans=0.125 2023-10-09 17:04:15,980 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2818816.0, ans=0.125 2023-10-09 17:04:18,942 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2818816.0, ans=0.04949747468305833 2023-10-09 17:04:24,139 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.286e+02 3.972e+02 4.950e+02 6.905e+02, threshold=7.944e+02, percent-clipped=0.0 2023-10-09 17:04:33,290 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2818862.6666666665, ans=0.0 2023-10-09 17:04:49,577 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2818909.3333333335, ans=0.125 2023-10-09 17:04:51,658 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2818909.3333333335, ans=0.125 2023-10-09 17:05:12,025 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2819002.6666666665, ans=0.09899494936611666 2023-10-09 17:05:18,465 INFO [train.py:1031] (3/4) Epoch 14, batch 19350, loss[loss=0.1715, simple_loss=0.2315, pruned_loss=0.04221, ctc_loss=0.06768, over 16627.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2865, pruned_loss=0.06166, ctc_loss=0.1094, over 3304499.89 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:05:21,554 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2819049.3333333335, ans=0.2 2023-10-09 17:05:30,135 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819096.0, ans=0.1 2023-10-09 17:05:37,717 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819096.0, ans=0.1 2023-10-09 17:05:44,664 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2819142.6666666665, ans=0.025 2023-10-09 17:06:05,291 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-10-09 17:06:11,593 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2819236.0, ans=0.125 2023-10-09 17:06:18,198 INFO [train.py:1031] (3/4) Epoch 14, batch 19400, loss[loss=0.2286, simple_loss=0.2962, pruned_loss=0.06142, ctc_loss=0.09548, over 16858.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2807, pruned_loss=0.05863, ctc_loss=0.104, over 3310641.42 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:06:25,712 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.971e+02 3.609e+02 4.450e+02 6.456e+02, threshold=7.218e+02, percent-clipped=0.0 2023-10-09 17:06:37,875 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=22.5 2023-10-09 17:06:40,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2819329.3333333335, ans=0.125 2023-10-09 17:06:46,121 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2819376.0, ans=0.2 2023-10-09 17:06:48,120 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2819376.0, ans=0.125 2023-10-09 17:06:55,336 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-10-09 17:06:56,315 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=22.5 2023-10-09 17:07:09,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2819469.3333333335, ans=0.0 2023-10-09 17:07:15,372 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2819469.3333333335, ans=0.125 2023-10-09 17:07:19,288 INFO [train.py:1031] (3/4) Epoch 14, batch 19450, loss[loss=0.2406, simple_loss=0.2876, pruned_loss=0.07136, ctc_loss=0.1272, over 16756.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2806, pruned_loss=0.06064, ctc_loss=0.1071, over 3309848.48 frames. ], batch size: 111, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:07:21,072 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.83 vs. limit=22.5 2023-10-09 17:07:40,527 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2819562.6666666665, ans=0.1 2023-10-09 17:08:18,256 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2819702.6666666665, ans=0.125 2023-10-09 17:08:19,260 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2819702.6666666665, ans=0.125 2023-10-09 17:08:21,518 INFO [train.py:1031] (3/4) Epoch 14, batch 19500, loss[loss=0.2384, simple_loss=0.305, pruned_loss=0.06369, ctc_loss=0.1111, over 16699.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2841, pruned_loss=0.06057, ctc_loss=0.1072, over 3312853.77 frames. ], batch size: 271, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:08:22,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2819749.3333333335, ans=0.125 2023-10-09 17:08:31,251 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 3.015e+02 3.593e+02 4.173e+02 8.054e+02, threshold=7.186e+02, percent-clipped=2.0 2023-10-09 17:08:47,702 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2023-10-09 17:09:03,561 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2819889.3333333335, ans=0.125 2023-10-09 17:09:10,613 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2819936.0, ans=0.125 2023-10-09 17:09:10,949 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-10-09 17:09:19,608 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=22.5 2023-10-09 17:09:21,220 INFO [train.py:1031] (3/4) Epoch 14, batch 19550, loss[loss=0.2429, simple_loss=0.2979, pruned_loss=0.07069, ctc_loss=0.1166, over 16789.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.287, pruned_loss=0.06292, ctc_loss=0.1111, over 3300115.91 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:09:22,658 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2819982.6666666665, ans=0.125 2023-10-09 17:09:25,014 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-10-09 17:09:31,379 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2819982.6666666665, ans=0.125 2023-10-09 17:09:35,271 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2820029.3333333335, ans=0.1 2023-10-09 17:09:41,044 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2820029.3333333335, ans=0.125 2023-10-09 17:09:54,330 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2820076.0, ans=0.125 2023-10-09 17:09:59,351 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2820122.6666666665, ans=0.125 2023-10-09 17:10:03,140 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2820122.6666666665, ans=0.2 2023-10-09 17:10:11,752 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2820169.3333333335, ans=0.125 2023-10-09 17:10:21,303 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2820169.3333333335, ans=0.0 2023-10-09 17:10:24,798 INFO [train.py:1031] (3/4) Epoch 14, batch 19600, loss[loss=0.2166, simple_loss=0.2606, pruned_loss=0.06491, ctc_loss=0.1066, over 16507.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2837, pruned_loss=0.06219, ctc_loss=0.11, over 3303926.49 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:10:35,229 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.075e+02 3.430e+02 4.007e+02 6.363e+02, threshold=6.860e+02, percent-clipped=0.0 2023-10-09 17:10:59,835 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2820309.3333333335, ans=0.05 2023-10-09 17:11:01,456 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820309.3333333335, ans=0.1 2023-10-09 17:11:08,879 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2820356.0, ans=0.07 2023-10-09 17:11:28,205 INFO [train.py:1031] (3/4) Epoch 14, batch 19650, loss[loss=0.2437, simple_loss=0.2999, pruned_loss=0.07002, ctc_loss=0.1187, over 16603.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2862, pruned_loss=0.06396, ctc_loss=0.1132, over 3309999.38 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:11:53,652 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2820542.6666666665, ans=0.125 2023-10-09 17:11:57,825 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2820542.6666666665, ans=0.125 2023-10-09 17:12:21,799 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2820636.0, ans=0.125 2023-10-09 17:12:30,818 INFO [train.py:1031] (3/4) Epoch 14, batch 19700, loss[loss=0.2308, simple_loss=0.2829, pruned_loss=0.06621, ctc_loss=0.1159, over 16870.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2856, pruned_loss=0.06542, ctc_loss=0.1151, over 3310102.76 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:12:42,622 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+02 3.372e+02 3.843e+02 4.494e+02 9.285e+02, threshold=7.687e+02, percent-clipped=3.0 2023-10-09 17:12:46,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2820729.3333333335, ans=0.0 2023-10-09 17:12:48,954 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2820729.3333333335, ans=0.125 2023-10-09 17:12:49,971 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2820729.3333333335, ans=0.0 2023-10-09 17:12:54,280 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-10-09 17:13:21,582 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2820869.3333333335, ans=0.125 2023-10-09 17:13:31,274 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-10-09 17:13:31,684 INFO [train.py:1031] (3/4) Epoch 14, batch 19750, loss[loss=0.211, simple_loss=0.285, pruned_loss=0.04994, ctc_loss=0.09265, over 16762.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2871, pruned_loss=0.06473, ctc_loss=0.1142, over 3306389.51 frames. ], batch size: 201, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:13:50,778 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-10-09 17:13:58,021 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2821009.3333333335, ans=0.0 2023-10-09 17:13:59,977 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-10-09 17:14:02,752 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2821009.3333333335, ans=0.0 2023-10-09 17:14:19,027 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-10-09 17:14:24,632 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2023-10-09 17:14:34,795 INFO [train.py:1031] (3/4) Epoch 14, batch 19800, loss[loss=0.2643, simple_loss=0.3087, pruned_loss=0.0817, ctc_loss=0.141, over 16819.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2918, pruned_loss=0.06604, ctc_loss=0.1164, over 3306001.73 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:14:41,293 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2821149.3333333335, ans=0.0 2023-10-09 17:14:43,937 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821149.3333333335, ans=0.125 2023-10-09 17:14:47,458 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+02 3.299e+02 3.756e+02 4.593e+02 7.524e+02, threshold=7.512e+02, percent-clipped=0.0 2023-10-09 17:14:54,614 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-10-09 17:14:59,414 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2821242.6666666665, ans=0.1 2023-10-09 17:15:11,780 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2821289.3333333335, ans=0.07 2023-10-09 17:15:13,430 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2821289.3333333335, ans=0.0 2023-10-09 17:15:22,823 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2821289.3333333335, ans=10.0 2023-10-09 17:15:26,828 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2821336.0, ans=0.0 2023-10-09 17:15:38,927 INFO [train.py:1031] (3/4) Epoch 14, batch 19850, loss[loss=0.2436, simple_loss=0.3023, pruned_loss=0.06801, ctc_loss=0.1224, over 16639.00 frames. ], tot_loss[loss=0.2406, simple_loss=0.2954, pruned_loss=0.06869, ctc_loss=0.1213, over 3303865.82 frames. ], batch size: 271, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:15:56,506 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=8.0 2023-10-09 17:16:22,607 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2821522.6666666665, ans=0.2 2023-10-09 17:16:39,918 INFO [train.py:1031] (3/4) Epoch 14, batch 19900, loss[loss=0.2309, simple_loss=0.2744, pruned_loss=0.07079, ctc_loss=0.1146, over 16680.00 frames. ], tot_loss[loss=0.243, simple_loss=0.2975, pruned_loss=0.0698, ctc_loss=0.1224, over 3291484.43 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:16:50,468 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:16:54,365 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+02 3.692e+02 4.204e+02 4.980e+02 8.655e+02, threshold=8.408e+02, percent-clipped=2.0 2023-10-09 17:17:07,622 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2821709.3333333335, ans=0.0 2023-10-09 17:17:10,809 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2821709.3333333335, ans=0.0 2023-10-09 17:17:30,105 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2821802.6666666665, ans=0.125 2023-10-09 17:17:41,864 INFO [train.py:1031] (3/4) Epoch 14, batch 19950, loss[loss=0.2158, simple_loss=0.2771, pruned_loss=0.05742, ctc_loss=0.09921, over 17041.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.295, pruned_loss=0.07006, ctc_loss=0.1224, over 3283429.91 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:17:54,079 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2821896.0, ans=10.0 2023-10-09 17:18:42,975 INFO [train.py:1031] (3/4) Epoch 14, batch 20000, loss[loss=0.213, simple_loss=0.2654, pruned_loss=0.05996, ctc_loss=0.1017, over 16821.00 frames. ], tot_loss[loss=0.2449, simple_loss=0.2968, pruned_loss=0.07153, ctc_loss=0.1248, over 3289419.10 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:18:57,613 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2822129.3333333335, ans=0.125 2023-10-09 17:18:58,233 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+02 3.390e+02 3.727e+02 4.517e+02 8.491e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 17:19:18,242 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2822176.0, ans=0.5 2023-10-09 17:19:28,466 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=22.5 2023-10-09 17:19:36,464 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2822269.3333333335, ans=0.125 2023-10-09 17:19:46,369 INFO [train.py:1031] (3/4) Epoch 14, batch 20050, loss[loss=0.217, simple_loss=0.2919, pruned_loss=0.05349, ctc_loss=0.08775, over 16295.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.2902, pruned_loss=0.0693, ctc_loss=0.1203, over 3290596.76 frames. ], batch size: 466, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:20:43,664 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2822502.6666666665, ans=0.07 2023-10-09 17:20:44,825 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2822502.6666666665, ans=0.1 2023-10-09 17:20:50,089 INFO [train.py:1031] (3/4) Epoch 14, batch 20100, loss[loss=0.2253, simple_loss=0.2753, pruned_loss=0.06493, ctc_loss=0.1136, over 16799.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2844, pruned_loss=0.06732, ctc_loss=0.1166, over 3291561.86 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:20:50,511 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2822549.3333333335, ans=0.125 2023-10-09 17:20:50,831 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-10-09 17:21:07,903 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.352e+02 3.979e+02 4.568e+02 7.750e+02, threshold=7.958e+02, percent-clipped=1.0 2023-10-09 17:21:25,582 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2822642.6666666665, ans=0.125 2023-10-09 17:21:54,781 INFO [train.py:1031] (3/4) Epoch 14, batch 20150, loss[loss=0.3498, simple_loss=0.408, pruned_loss=0.1066, ctc_loss=0.1957, over 16671.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.2892, pruned_loss=0.06682, ctc_loss=0.1167, over 3295126.65 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:22:09,463 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2822829.3333333335, ans=0.125 2023-10-09 17:22:13,704 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.14 vs. limit=5.0 2023-10-09 17:22:35,922 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2822922.6666666665, ans=0.1 2023-10-09 17:22:55,210 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2823016.0, ans=0.125 2023-10-09 17:22:55,239 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2823016.0, ans=0.125 2023-10-09 17:22:55,975 INFO [train.py:1031] (3/4) Epoch 14, batch 20200, loss[loss=0.2441, simple_loss=0.2833, pruned_loss=0.07549, ctc_loss=0.1349, over 16441.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.294, pruned_loss=0.06729, ctc_loss=0.118, over 3305055.58 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:22:57,379 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2823016.0, ans=0.0 2023-10-09 17:23:12,687 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+02 3.410e+02 4.005e+02 4.580e+02 8.040e+02, threshold=8.011e+02, percent-clipped=1.0 2023-10-09 17:23:15,833 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2823062.6666666665, ans=0.125 2023-10-09 17:23:38,514 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:23:44,328 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2823202.6666666665, ans=0.125 2023-10-09 17:23:47,708 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.24 vs. limit=5.0 2023-10-09 17:23:48,268 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2823202.6666666665, ans=0.0 2023-10-09 17:23:55,824 INFO [train.py:1031] (3/4) Epoch 14, batch 20250, loss[loss=0.2004, simple_loss=0.2666, pruned_loss=0.05101, ctc_loss=0.08056, over 12022.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2914, pruned_loss=0.06669, ctc_loss=0.1168, over 3295630.21 frames. ], batch size: 35, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:23:56,096 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2823249.3333333335, ans=0.125 2023-10-09 17:24:06,302 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2823249.3333333335, ans=0.125 2023-10-09 17:24:08,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2823296.0, ans=0.0 2023-10-09 17:24:48,019 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2823436.0, ans=0.0 2023-10-09 17:24:56,618 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=12.0 2023-10-09 17:24:58,239 INFO [train.py:1031] (3/4) Epoch 14, batch 20300, loss[loss=0.1797, simple_loss=0.2397, pruned_loss=0.04466, ctc_loss=0.07606, over 16836.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2871, pruned_loss=0.06318, ctc_loss=0.1114, over 3296606.29 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:25:13,558 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2823529.3333333335, ans=0.125 2023-10-09 17:25:18,637 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+02 3.144e+02 3.729e+02 4.448e+02 8.440e+02, threshold=7.458e+02, percent-clipped=1.0 2023-10-09 17:25:18,938 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2823529.3333333335, ans=0.2 2023-10-09 17:25:28,654 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2823576.0, ans=0.125 2023-10-09 17:25:58,926 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2823669.3333333335, ans=0.0 2023-10-09 17:26:00,759 INFO [train.py:1031] (3/4) Epoch 14, batch 20350, loss[loss=0.2144, simple_loss=0.2532, pruned_loss=0.06537, ctc_loss=0.1122, over 16702.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2795, pruned_loss=0.0623, ctc_loss=0.1097, over 3299082.82 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:26:12,702 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:26:14,719 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2823762.6666666665, ans=0.0 2023-10-09 17:26:28,404 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2823809.3333333335, ans=0.2 2023-10-09 17:26:29,349 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2823809.3333333335, ans=0.125 2023-10-09 17:27:02,820 INFO [train.py:1031] (3/4) Epoch 14, batch 20400, loss[loss=0.1922, simple_loss=0.2455, pruned_loss=0.05239, ctc_loss=0.0852, over 16832.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2785, pruned_loss=0.06215, ctc_loss=0.108, over 3303171.02 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:27:15,206 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2823996.0, ans=0.0 2023-10-09 17:27:23,293 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+02 3.333e+02 4.109e+02 4.919e+02 1.143e+03, threshold=8.217e+02, percent-clipped=3.0 2023-10-09 17:28:03,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2824136.0, ans=0.0 2023-10-09 17:28:03,317 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=12.0 2023-10-09 17:28:05,999 INFO [train.py:1031] (3/4) Epoch 14, batch 20450, loss[loss=0.2047, simple_loss=0.2713, pruned_loss=0.05135, ctc_loss=0.08868, over 16770.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2756, pruned_loss=0.06097, ctc_loss=0.1047, over 3306750.04 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:28:08,106 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2824182.6666666665, ans=0.125 2023-10-09 17:28:08,109 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2824182.6666666665, ans=0.1 2023-10-09 17:28:11,716 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2824182.6666666665, ans=0.125 2023-10-09 17:28:47,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2824322.6666666665, ans=0.0 2023-10-09 17:29:01,513 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2824369.3333333335, ans=0.125 2023-10-09 17:29:11,360 INFO [train.py:1031] (3/4) Epoch 14, batch 20500, loss[loss=0.2394, simple_loss=0.3275, pruned_loss=0.05639, ctc_loss=0.09633, over 16923.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2746, pruned_loss=0.05884, ctc_loss=0.1012, over 3304607.39 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:29:28,261 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2824462.6666666665, ans=0.125 2023-10-09 17:29:29,539 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-10-09 17:29:32,928 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.206e+02 4.100e+02 5.464e+02 8.452e+02, threshold=8.200e+02, percent-clipped=1.0 2023-10-09 17:29:47,375 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2824509.3333333335, ans=0.125 2023-10-09 17:29:52,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2824556.0, ans=0.0 2023-10-09 17:29:57,720 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2824556.0, ans=0.0 2023-10-09 17:30:15,066 INFO [train.py:1031] (3/4) Epoch 14, batch 20550, loss[loss=0.3467, simple_loss=0.3931, pruned_loss=0.1112, ctc_loss=0.1947, over 16694.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2865, pruned_loss=0.06008, ctc_loss=0.1047, over 3303396.70 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:30:28,491 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2824696.0, ans=0.0 2023-10-09 17:30:55,341 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2824789.3333333335, ans=0.0 2023-10-09 17:30:56,338 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2824789.3333333335, ans=0.2 2023-10-09 17:30:59,654 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2824789.3333333335, ans=0.125 2023-10-09 17:31:13,352 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-10-09 17:31:17,513 INFO [train.py:1031] (3/4) Epoch 14, batch 20600, loss[loss=0.2529, simple_loss=0.315, pruned_loss=0.0709, ctc_loss=0.1224, over 16840.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2941, pruned_loss=0.06171, ctc_loss=0.1083, over 3298280.35 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:31:21,775 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2824882.6666666665, ans=0.125 2023-10-09 17:31:30,925 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2824929.3333333335, ans=0.025 2023-10-09 17:31:31,326 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2023-10-09 17:31:35,795 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2824929.3333333335, ans=0.04949747468305833 2023-10-09 17:31:40,867 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.711e+02 4.411e+02 5.380e+02 7.131e+02, threshold=8.823e+02, percent-clipped=0.0 2023-10-09 17:31:42,370 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2824976.0, ans=0.0 2023-10-09 17:31:43,479 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2824976.0, ans=0.125 2023-10-09 17:31:59,150 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2825022.6666666665, ans=0.2 2023-10-09 17:32:01,300 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2825022.6666666665, ans=0.2 2023-10-09 17:32:04,091 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2825022.6666666665, ans=0.015 2023-10-09 17:32:15,675 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:32:20,107 INFO [train.py:1031] (3/4) Epoch 14, batch 20650, loss[loss=0.2662, simple_loss=0.3032, pruned_loss=0.08393, ctc_loss=0.1532, over 15244.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.2983, pruned_loss=0.06507, ctc_loss=0.1145, over 3299350.19 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:32:22,169 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825116.0, ans=0.1 2023-10-09 17:32:26,334 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2023-10-09 17:32:29,307 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2825116.0, ans=0.0 2023-10-09 17:32:39,976 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2825162.6666666665, ans=0.125 2023-10-09 17:32:42,205 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825162.6666666665, ans=0.1 2023-10-09 17:32:49,505 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2825209.3333333335, ans=0.0 2023-10-09 17:32:49,863 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.99 vs. limit=6.0 2023-10-09 17:32:58,094 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2825256.0, ans=0.0 2023-10-09 17:33:10,523 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2825302.6666666665, ans=0.125 2023-10-09 17:33:19,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2825302.6666666665, ans=0.125 2023-10-09 17:33:21,894 INFO [train.py:1031] (3/4) Epoch 14, batch 20700, loss[loss=0.2824, simple_loss=0.3045, pruned_loss=0.09615, ctc_loss=0.1701, over 16661.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2967, pruned_loss=0.0669, ctc_loss=0.1176, over 3300320.51 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:33:32,126 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=12.0 2023-10-09 17:33:45,255 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.306e+02 3.691e+02 4.281e+02 9.878e+02, threshold=7.382e+02, percent-clipped=2.0 2023-10-09 17:33:54,480 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2023-10-09 17:33:58,014 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2825489.3333333335, ans=0.125 2023-10-09 17:33:58,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2825489.3333333335, ans=0.0 2023-10-09 17:34:01,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2825489.3333333335, ans=0.2 2023-10-09 17:34:01,577 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=22.5 2023-10-09 17:34:08,721 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2825489.3333333335, ans=0.125 2023-10-09 17:34:21,648 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825582.6666666665, ans=0.1 2023-10-09 17:34:22,949 INFO [train.py:1031] (3/4) Epoch 14, batch 20750, loss[loss=0.2271, simple_loss=0.2928, pruned_loss=0.05905, ctc_loss=0.1081, over 16923.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2955, pruned_loss=0.06808, ctc_loss=0.1195, over 3307175.66 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:34:35,460 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-10-09 17:34:43,393 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.85 vs. limit=15.0 2023-10-09 17:34:55,486 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2825676.0, ans=0.0 2023-10-09 17:35:18,073 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825769.3333333335, ans=0.1 2023-10-09 17:35:23,262 INFO [train.py:1031] (3/4) Epoch 14, batch 20800, loss[loss=0.1921, simple_loss=0.2439, pruned_loss=0.05169, ctc_loss=0.09214, over 16626.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2946, pruned_loss=0.06699, ctc_loss=0.1187, over 3314752.22 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:35:46,255 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+02 3.235e+02 3.640e+02 4.210e+02 8.474e+02, threshold=7.280e+02, percent-clipped=1.0 2023-10-09 17:36:00,973 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2825956.0, ans=0.1 2023-10-09 17:36:08,009 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2825956.0, ans=0.0 2023-10-09 17:36:17,628 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2826002.6666666665, ans=0.0 2023-10-09 17:36:19,868 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2826002.6666666665, ans=0.2 2023-10-09 17:36:22,180 INFO [train.py:1031] (3/4) Epoch 14, batch 20850, loss[loss=0.1876, simple_loss=0.2565, pruned_loss=0.04294, ctc_loss=0.08226, over 16724.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2894, pruned_loss=0.06314, ctc_loss=0.1128, over 3315797.48 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:36:29,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2826049.3333333335, ans=0.125 2023-10-09 17:36:29,858 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2023-10-09 17:36:33,773 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826096.0, ans=0.1 2023-10-09 17:36:38,569 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2826096.0, ans=0.025 2023-10-09 17:37:11,271 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826236.0, ans=0.1 2023-10-09 17:37:17,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2826236.0, ans=0.1 2023-10-09 17:37:22,227 INFO [train.py:1031] (3/4) Epoch 14, batch 20900, loss[loss=0.1993, simple_loss=0.26, pruned_loss=0.05136, ctc_loss=0.08989, over 16825.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2859, pruned_loss=0.06031, ctc_loss=0.1083, over 3320355.85 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:37:24,304 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826282.6666666665, ans=0.1 2023-10-09 17:37:27,485 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826282.6666666665, ans=0.1 2023-10-09 17:37:27,525 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2826282.6666666665, ans=0.125 2023-10-09 17:37:48,716 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.791e+02 3.163e+02 3.693e+02 7.251e+02, threshold=6.327e+02, percent-clipped=0.0 2023-10-09 17:38:22,287 INFO [train.py:1031] (3/4) Epoch 14, batch 20950, loss[loss=0.1749, simple_loss=0.2283, pruned_loss=0.04435, ctc_loss=0.08201, over 16787.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2781, pruned_loss=0.05977, ctc_loss=0.1067, over 3311890.62 frames. ], batch size: 141, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:38:33,465 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2826562.6666666665, ans=0.125 2023-10-09 17:38:40,560 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2826562.6666666665, ans=0.2 2023-10-09 17:38:48,757 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2826609.3333333335, ans=0.125 2023-10-09 17:38:53,032 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2826609.3333333335, ans=0.125 2023-10-09 17:39:09,401 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-10-09 17:39:15,536 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2826702.6666666665, ans=0.1 2023-10-09 17:39:16,766 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2826702.6666666665, ans=0.0 2023-10-09 17:39:17,863 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2826702.6666666665, ans=0.0 2023-10-09 17:39:23,274 INFO [train.py:1031] (3/4) Epoch 14, batch 21000, loss[loss=0.2626, simple_loss=0.314, pruned_loss=0.07948, ctc_loss=0.1306, over 16813.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2798, pruned_loss=0.06227, ctc_loss=0.1102, over 3305726.51 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:39:23,274 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 17:39:41,356 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2348, simple_loss=0.3049, pruned_loss=0.06333, ctc_loss=0.09533, over 1796401.00 frames. 2023-10-09 17:39:41,357 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 17:39:50,146 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2826749.3333333335, ans=0.0 2023-10-09 17:39:57,055 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2826796.0, ans=0.125 2023-10-09 17:40:03,344 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2023-10-09 17:40:07,007 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+02 3.238e+02 3.624e+02 4.210e+02 7.239e+02, threshold=7.249e+02, percent-clipped=3.0 2023-10-09 17:40:17,129 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2826889.3333333335, ans=0.2 2023-10-09 17:40:27,127 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2826936.0, ans=0.0 2023-10-09 17:40:29,632 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:30,249 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-10-09 17:40:36,238 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2826936.0, ans=0.125 2023-10-09 17:40:39,035 INFO [train.py:1031] (3/4) Epoch 14, batch 21050, loss[loss=0.2558, simple_loss=0.3199, pruned_loss=0.07031, ctc_loss=0.1278, over 16830.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2838, pruned_loss=0.06184, ctc_loss=0.1087, over 3294678.74 frames. ], batch size: 329, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:40:40,194 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2826982.6666666665, ans=0.015 2023-10-09 17:40:42,231 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2826982.6666666665, ans=0.125 2023-10-09 17:40:47,989 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2826982.6666666665, ans=0.0 2023-10-09 17:41:21,564 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2827122.6666666665, ans=0.0 2023-10-09 17:41:21,596 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2827122.6666666665, ans=0.1 2023-10-09 17:41:26,175 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2827169.3333333335, ans=0.125 2023-10-09 17:41:33,613 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827169.3333333335, ans=0.1 2023-10-09 17:41:36,263 INFO [train.py:1031] (3/4) Epoch 14, batch 21100, loss[loss=0.2266, simple_loss=0.2787, pruned_loss=0.06582, ctc_loss=0.107, over 16887.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.284, pruned_loss=0.06104, ctc_loss=0.1063, over 3292802.58 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:41:55,434 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2827262.6666666665, ans=0.125 2023-10-09 17:42:00,288 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2827309.3333333335, ans=0.0 2023-10-09 17:42:05,336 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.720e+02 3.068e+02 3.590e+02 8.081e+02, threshold=6.137e+02, percent-clipped=1.0 2023-10-09 17:42:12,512 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2827356.0, ans=0.2 2023-10-09 17:42:15,212 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2827356.0, ans=0.5 2023-10-09 17:42:37,593 INFO [train.py:1031] (3/4) Epoch 14, batch 21150, loss[loss=0.2162, simple_loss=0.2632, pruned_loss=0.06411, ctc_loss=0.1024, over 16917.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2803, pruned_loss=0.06147, ctc_loss=0.1067, over 3290969.20 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:42:40,807 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=12.0 2023-10-09 17:42:46,886 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-10-09 17:42:50,858 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-10-09 17:42:57,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2827496.0, ans=0.95 2023-10-09 17:42:58,035 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2827496.0, ans=0.125 2023-10-09 17:43:00,046 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2827542.6666666665, ans=0.125 2023-10-09 17:43:17,053 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2827589.3333333335, ans=0.125 2023-10-09 17:43:19,153 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2827589.3333333335, ans=0.125 2023-10-09 17:43:21,441 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2827589.3333333335, ans=0.125 2023-10-09 17:43:29,820 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2827636.0, ans=0.125 2023-10-09 17:43:36,527 INFO [train.py:1031] (3/4) Epoch 14, batch 21200, loss[loss=0.184, simple_loss=0.261, pruned_loss=0.03836, ctc_loss=0.07569, over 16765.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.276, pruned_loss=0.06179, ctc_loss=0.1071, over 3282543.44 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:43:39,863 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.25 vs. limit=15.0 2023-10-09 17:43:52,308 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2827729.3333333335, ans=0.125 2023-10-09 17:44:02,649 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2023-10-09 17:44:07,100 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.239e+02 3.845e+02 5.038e+02 8.843e+02, threshold=7.690e+02, percent-clipped=9.0 2023-10-09 17:44:11,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2827776.0, ans=0.125 2023-10-09 17:44:24,038 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2827822.6666666665, ans=0.1 2023-10-09 17:44:39,124 INFO [train.py:1031] (3/4) Epoch 14, batch 21250, loss[loss=0.2865, simple_loss=0.3513, pruned_loss=0.08217, ctc_loss=0.1433, over 16828.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2772, pruned_loss=0.0596, ctc_loss=0.1038, over 3285074.11 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:44:42,310 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827916.0, ans=0.1 2023-10-09 17:44:42,326 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2827916.0, ans=0.125 2023-10-09 17:44:48,592 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2827916.0, ans=0.125 2023-10-09 17:44:52,222 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2827962.6666666665, ans=0.2 2023-10-09 17:44:52,265 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2827962.6666666665, ans=0.125 2023-10-09 17:45:01,838 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2827962.6666666665, ans=0.125 2023-10-09 17:45:05,936 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2828009.3333333335, ans=0.0 2023-10-09 17:45:15,824 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2828009.3333333335, ans=0.0 2023-10-09 17:45:22,148 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=12.0 2023-10-09 17:45:23,039 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-10-09 17:45:42,991 INFO [train.py:1031] (3/4) Epoch 14, batch 21300, loss[loss=0.2181, simple_loss=0.277, pruned_loss=0.05941, ctc_loss=0.101, over 16723.00 frames. ], tot_loss[loss=0.2348, simple_loss=0.2947, pruned_loss=0.06481, ctc_loss=0.1135, over 3293518.34 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:45:48,262 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2828149.3333333335, ans=0.0 2023-10-09 17:45:53,876 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-10-09 17:46:02,144 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2828196.0, ans=0.125 2023-10-09 17:46:12,688 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-10-09 17:46:14,300 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+02 3.461e+02 4.159e+02 5.409e+02 1.290e+03, threshold=8.318e+02, percent-clipped=7.0 2023-10-09 17:46:45,012 INFO [train.py:1031] (3/4) Epoch 14, batch 21350, loss[loss=0.2255, simple_loss=0.279, pruned_loss=0.06431, ctc_loss=0.1083, over 16966.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2944, pruned_loss=0.0635, ctc_loss=0.1118, over 3295787.48 frames. ], batch size: 243, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:46:46,055 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2828382.6666666665, ans=0.0 2023-10-09 17:46:57,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2828429.3333333335, ans=0.125 2023-10-09 17:47:21,432 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.92 vs. limit=15.0 2023-10-09 17:47:23,663 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2828522.6666666665, ans=0.125 2023-10-09 17:47:25,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2828522.6666666665, ans=0.1 2023-10-09 17:47:41,911 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2828569.3333333335, ans=0.125 2023-10-09 17:47:47,068 INFO [train.py:1031] (3/4) Epoch 14, batch 21400, loss[loss=0.2351, simple_loss=0.2736, pruned_loss=0.07293, ctc_loss=0.1269, over 16792.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2901, pruned_loss=0.06377, ctc_loss=0.1124, over 3302529.25 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:47:47,404 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2828616.0, ans=0.1 2023-10-09 17:47:48,396 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2828616.0, ans=0.2 2023-10-09 17:48:11,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2828709.3333333335, ans=0.125 2023-10-09 17:48:19,463 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 3.090e+02 3.533e+02 3.983e+02 1.095e+03, threshold=7.067e+02, percent-clipped=1.0 2023-10-09 17:48:25,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2828756.0, ans=0.125 2023-10-09 17:48:27,507 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2828756.0, ans=0.125 2023-10-09 17:48:41,446 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2828802.6666666665, ans=0.2 2023-10-09 17:48:48,679 INFO [train.py:1031] (3/4) Epoch 14, batch 21450, loss[loss=0.2222, simple_loss=0.2648, pruned_loss=0.06817, ctc_loss=0.1081, over 16819.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.284, pruned_loss=0.06373, ctc_loss=0.1119, over 3311032.56 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:48:59,825 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2023-10-09 17:49:09,050 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2828896.0, ans=0.0 2023-10-09 17:49:11,414 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2828942.6666666665, ans=0.2 2023-10-09 17:49:16,444 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2828942.6666666665, ans=0.0 2023-10-09 17:49:27,512 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2828989.3333333335, ans=0.125 2023-10-09 17:49:49,288 INFO [train.py:1031] (3/4) Epoch 14, batch 21500, loss[loss=0.1989, simple_loss=0.2546, pruned_loss=0.05333, ctc_loss=0.09114, over 16835.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2788, pruned_loss=0.0636, ctc_loss=0.111, over 3298324.93 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:49:52,211 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2829082.6666666665, ans=0.125 2023-10-09 17:50:22,939 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 3.091e+02 3.543e+02 4.001e+02 7.738e+02, threshold=7.086e+02, percent-clipped=2.0 2023-10-09 17:50:49,221 INFO [train.py:1031] (3/4) Epoch 14, batch 21550, loss[loss=0.1971, simple_loss=0.2555, pruned_loss=0.05197, ctc_loss=0.08674, over 16837.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2747, pruned_loss=0.06261, ctc_loss=0.1092, over 3309130.49 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:51:10,170 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2829362.6666666665, ans=0.0 2023-10-09 17:51:31,402 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:51:45,616 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2829502.6666666665, ans=0.0 2023-10-09 17:51:52,385 INFO [train.py:1031] (3/4) Epoch 14, batch 21600, loss[loss=0.2978, simple_loss=0.3353, pruned_loss=0.09494, ctc_loss=0.1759, over 16556.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2777, pruned_loss=0.06252, ctc_loss=0.1091, over 3302793.25 frames. ], batch size: 350, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:51:54,743 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2829549.3333333335, ans=0.1 2023-10-09 17:51:58,531 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2829549.3333333335, ans=0.0 2023-10-09 17:52:00,151 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2829549.3333333335, ans=0.035 2023-10-09 17:52:00,256 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2829549.3333333335, ans=0.125 2023-10-09 17:52:09,366 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-10-09 17:52:10,898 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2829596.0, ans=0.1 2023-10-09 17:52:29,889 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 3.318e+02 3.916e+02 4.621e+02 6.071e+02, threshold=7.833e+02, percent-clipped=0.0 2023-10-09 17:52:31,119 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2829689.3333333335, ans=0.0 2023-10-09 17:52:55,782 INFO [train.py:1031] (3/4) Epoch 14, batch 21650, loss[loss=0.2551, simple_loss=0.3027, pruned_loss=0.07461, ctc_loss=0.1457, over 15183.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2848, pruned_loss=0.0661, ctc_loss=0.1153, over 3307488.01 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:53:07,309 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2023-10-09 17:53:31,587 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2829876.0, ans=6.0 2023-10-09 17:53:33,070 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2829876.0, ans=0.1 2023-10-09 17:53:49,545 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-10-09 17:53:59,381 INFO [train.py:1031] (3/4) Epoch 14, batch 21700, loss[loss=0.2556, simple_loss=0.3182, pruned_loss=0.07186, ctc_loss=0.123, over 16906.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2904, pruned_loss=0.06906, ctc_loss=0.1203, over 3306344.63 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:54:12,206 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2830062.6666666665, ans=0.125 2023-10-09 17:54:14,208 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2830062.6666666665, ans=0.125 2023-10-09 17:54:19,799 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2830062.6666666665, ans=0.1 2023-10-09 17:54:34,091 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2830109.3333333335, ans=0.0 2023-10-09 17:54:34,705 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.454e+02 3.938e+02 4.640e+02 9.291e+02, threshold=7.877e+02, percent-clipped=1.0 2023-10-09 17:54:38,120 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2830156.0, ans=0.125 2023-10-09 17:54:58,955 INFO [train.py:1031] (3/4) Epoch 14, batch 21750, loss[loss=0.2205, simple_loss=0.2885, pruned_loss=0.05741, ctc_loss=0.09405, over 16834.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2936, pruned_loss=0.06817, ctc_loss=0.1186, over 3304734.44 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:55:05,551 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2830249.3333333335, ans=0.2 2023-10-09 17:55:26,598 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-10-09 17:55:42,510 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830389.3333333335, ans=0.1 2023-10-09 17:55:42,517 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=2830389.3333333335, ans=0.05 2023-10-09 17:55:51,600 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2830436.0, ans=0.125 2023-10-09 17:55:59,317 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-10-09 17:56:00,620 INFO [train.py:1031] (3/4) Epoch 14, batch 21800, loss[loss=0.1862, simple_loss=0.2779, pruned_loss=0.03368, ctc_loss=0.06766, over 15184.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2895, pruned_loss=0.06461, ctc_loss=0.1127, over 3303890.96 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:56:17,206 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2830529.3333333335, ans=0.0 2023-10-09 17:56:28,105 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.22 vs. limit=6.0 2023-10-09 17:56:37,334 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.581e+02 3.060e+02 4.394e+02 8.007e+02, threshold=6.120e+02, percent-clipped=1.0 2023-10-09 17:56:38,112 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2023-10-09 17:56:41,851 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2830622.6666666665, ans=0.0 2023-10-09 17:56:50,063 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2830669.3333333335, ans=0.125 2023-10-09 17:57:03,732 INFO [train.py:1031] (3/4) Epoch 14, batch 21850, loss[loss=0.2006, simple_loss=0.2754, pruned_loss=0.04577, ctc_loss=0.08567, over 16826.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2853, pruned_loss=0.06012, ctc_loss=0.1053, over 3289295.85 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:57:05,404 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=22.5 2023-10-09 17:57:08,724 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=2830716.0, ans=0.1 2023-10-09 17:57:36,420 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2830809.3333333335, ans=0.125 2023-10-09 17:57:43,563 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.47 vs. limit=10.0 2023-10-09 17:57:54,764 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2830902.6666666665, ans=0.2 2023-10-09 17:58:06,349 INFO [train.py:1031] (3/4) Epoch 14, batch 21900, loss[loss=0.2408, simple_loss=0.3022, pruned_loss=0.0663, ctc_loss=0.1169, over 16214.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2871, pruned_loss=0.06076, ctc_loss=0.1061, over 3288897.10 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:58:08,389 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2830949.3333333335, ans=0.0 2023-10-09 17:58:10,014 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2830949.3333333335, ans=0.0 2023-10-09 17:58:21,285 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2830996.0, ans=0.0 2023-10-09 17:58:37,016 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831042.6666666665, ans=0.1 2023-10-09 17:58:44,601 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:58:45,820 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-10-09 17:58:46,017 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=22.5 2023-10-09 17:58:47,465 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 3.180e+02 3.668e+02 4.478e+02 7.065e+02, threshold=7.335e+02, percent-clipped=3.0 2023-10-09 17:58:53,829 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-10-09 17:58:54,450 WARNING [train.py:1204] (3/4) Exclude cut with ID X0000003684_17524832_S00712_sp1.1 from training. Number of frames (before subsampling): 130. Number of frames (after subsampling): 31. Text: 哒哒哒哒哒哒哒哒哒哒哒哒. Tokens: ['▁', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>']. Number of tokens: 37 2023-10-09 17:58:56,515 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2831089.3333333335, ans=0.125 2023-10-09 17:58:59,692 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2831136.0, ans=0.125 2023-10-09 17:59:10,948 INFO [train.py:1031] (3/4) Epoch 14, batch 21950, loss[loss=0.323, simple_loss=0.4038, pruned_loss=0.08818, ctc_loss=0.1645, over 16298.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2984, pruned_loss=0.06675, ctc_loss=0.1164, over 3284374.96 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:59:19,167 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2023-10-09 17:59:36,196 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2831276.0, ans=0.125 2023-10-09 18:00:01,372 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831369.3333333335, ans=0.1 2023-10-09 18:00:05,643 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2831369.3333333335, ans=0.015 2023-10-09 18:00:14,565 INFO [train.py:1031] (3/4) Epoch 14, batch 22000, loss[loss=0.3172, simple_loss=0.3678, pruned_loss=0.09763, ctc_loss=0.1782, over 15143.00 frames. ], tot_loss[loss=0.2523, simple_loss=0.3101, pruned_loss=0.07212, ctc_loss=0.1258, over 3280392.26 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:00:19,491 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-10-09 18:00:26,977 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2831462.6666666665, ans=0.125 2023-10-09 18:00:55,420 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.995e+02 5.154e+02 7.072e+02 9.807e+02, threshold=1.031e+03, percent-clipped=19.0 2023-10-09 18:01:04,423 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2831602.6666666665, ans=0.0 2023-10-09 18:01:05,408 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2831602.6666666665, ans=0.125 2023-10-09 18:01:15,488 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.13 vs. limit=15.0 2023-10-09 18:01:17,526 INFO [train.py:1031] (3/4) Epoch 14, batch 22050, loss[loss=0.2219, simple_loss=0.2771, pruned_loss=0.06391, ctc_loss=0.09735, over 16877.00 frames. ], tot_loss[loss=0.2465, simple_loss=0.3013, pruned_loss=0.07104, ctc_loss=0.124, over 3285135.46 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:01:28,488 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2831649.3333333335, ans=0.0 2023-10-09 18:01:38,153 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2831696.0, ans=0.0 2023-10-09 18:01:44,527 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2831742.6666666665, ans=0.125 2023-10-09 18:01:46,632 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2831742.6666666665, ans=0.125 2023-10-09 18:01:59,710 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2831789.3333333335, ans=0.125 2023-10-09 18:02:14,759 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2831836.0, ans=0.125 2023-10-09 18:02:22,388 INFO [train.py:1031] (3/4) Epoch 14, batch 22100, loss[loss=0.3193, simple_loss=0.3672, pruned_loss=0.1019, ctc_loss=0.1688, over 16593.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.2997, pruned_loss=0.07051, ctc_loss=0.1221, over 3278085.82 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:02:34,375 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2831929.3333333335, ans=0.0 2023-10-09 18:02:47,147 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2831976.0, ans=0.125 2023-10-09 18:02:51,838 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=22.5 2023-10-09 18:02:55,769 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=22.5 2023-10-09 18:02:59,980 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2832022.6666666665, ans=0.125 2023-10-09 18:03:04,528 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+02 3.384e+02 3.750e+02 4.334e+02 8.202e+02, threshold=7.499e+02, percent-clipped=0.0 2023-10-09 18:03:11,873 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2832069.3333333335, ans=0.2 2023-10-09 18:03:22,963 INFO [train.py:1031] (3/4) Epoch 14, batch 22150, loss[loss=0.2268, simple_loss=0.2877, pruned_loss=0.06126, ctc_loss=0.1085, over 16859.00 frames. ], tot_loss[loss=0.2464, simple_loss=0.3014, pruned_loss=0.07115, ctc_loss=0.1227, over 3277526.80 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:03:32,824 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=22.5 2023-10-09 18:03:46,482 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2832162.6666666665, ans=0.125 2023-10-09 18:03:52,738 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2832209.3333333335, ans=0.0 2023-10-09 18:03:55,190 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2832209.3333333335, ans=0.0 2023-10-09 18:03:59,179 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2832209.3333333335, ans=0.2 2023-10-09 18:04:04,426 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2832256.0, ans=0.1 2023-10-09 18:04:04,598 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2023-10-09 18:04:25,076 INFO [train.py:1031] (3/4) Epoch 14, batch 22200, loss[loss=0.2122, simple_loss=0.2912, pruned_loss=0.04875, ctc_loss=0.0892, over 16908.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.3008, pruned_loss=0.0706, ctc_loss=0.1221, over 3281147.38 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:04:25,441 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2832349.3333333335, ans=0.0 2023-10-09 18:04:26,600 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2832349.3333333335, ans=0.0 2023-10-09 18:04:48,346 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-10-09 18:05:06,086 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.127e+02 3.515e+02 4.166e+02 8.841e+02, threshold=7.030e+02, percent-clipped=1.0 2023-10-09 18:05:13,154 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=12.0 2023-10-09 18:05:24,120 INFO [train.py:1031] (3/4) Epoch 14, batch 22250, loss[loss=0.2363, simple_loss=0.314, pruned_loss=0.05868, ctc_loss=0.1028, over 16756.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.3, pruned_loss=0.06838, ctc_loss=0.1194, over 3284965.48 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:05:27,932 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.43 vs. limit=6.0 2023-10-09 18:05:47,232 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2832676.0, ans=0.0 2023-10-09 18:05:51,826 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2832676.0, ans=0.125 2023-10-09 18:05:57,790 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2832676.0, ans=0.125 2023-10-09 18:06:25,776 INFO [train.py:1031] (3/4) Epoch 14, batch 22300, loss[loss=0.2323, simple_loss=0.2948, pruned_loss=0.06274, ctc_loss=0.111, over 16655.00 frames. ], tot_loss[loss=0.2459, simple_loss=0.3015, pruned_loss=0.07049, ctc_loss=0.1234, over 3283383.46 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:06:33,407 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-10-09 18:06:37,877 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2832862.6666666665, ans=0.0 2023-10-09 18:06:41,703 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2832862.6666666665, ans=0.1 2023-10-09 18:06:43,791 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2832862.6666666665, ans=0.125 2023-10-09 18:06:43,794 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832862.6666666665, ans=0.1 2023-10-09 18:06:52,707 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-10-09 18:06:55,544 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:07:07,032 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2832956.0, ans=0.125 2023-10-09 18:07:07,706 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.458e+02 3.881e+02 4.380e+02 7.162e+02, threshold=7.762e+02, percent-clipped=2.0 2023-10-09 18:07:25,987 INFO [train.py:1031] (3/4) Epoch 14, batch 22350, loss[loss=0.2318, simple_loss=0.2995, pruned_loss=0.05998, ctc_loss=0.1106, over 16807.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.3002, pruned_loss=0.07063, ctc_loss=0.1237, over 3277731.60 frames. ], batch size: 291, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:07:29,000 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2833049.3333333335, ans=0.0 2023-10-09 18:07:32,224 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=12.0 2023-10-09 18:07:35,323 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2833049.3333333335, ans=0.0 2023-10-09 18:07:48,232 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2833096.0, ans=0.125 2023-10-09 18:07:59,559 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2833142.6666666665, ans=0.0 2023-10-09 18:08:10,656 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2833189.3333333335, ans=0.1 2023-10-09 18:08:12,538 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=12.0 2023-10-09 18:08:15,432 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2833236.0, ans=0.0 2023-10-09 18:08:27,607 INFO [train.py:1031] (3/4) Epoch 14, batch 22400, loss[loss=0.3097, simple_loss=0.3748, pruned_loss=0.08983, ctc_loss=0.1624, over 16661.00 frames. ], tot_loss[loss=0.2442, simple_loss=0.3003, pruned_loss=0.06964, ctc_loss=0.1219, over 3288379.93 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:08:53,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2833376.0, ans=0.125 2023-10-09 18:08:54,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2833376.0, ans=0.125 2023-10-09 18:09:07,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2833422.6666666665, ans=0.125 2023-10-09 18:09:11,652 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.558e+02 3.421e+02 3.975e+02 5.211e+02 8.186e+02, threshold=7.949e+02, percent-clipped=2.0 2023-10-09 18:09:19,724 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2833469.3333333335, ans=0.0 2023-10-09 18:09:20,898 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2833469.3333333335, ans=0.125 2023-10-09 18:09:25,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2833469.3333333335, ans=0.125 2023-10-09 18:09:26,385 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-10-09 18:09:29,995 INFO [train.py:1031] (3/4) Epoch 14, batch 22450, loss[loss=0.2508, simple_loss=0.3066, pruned_loss=0.07185, ctc_loss=0.1281, over 16902.00 frames. ], tot_loss[loss=0.2446, simple_loss=0.302, pruned_loss=0.0693, ctc_loss=0.1217, over 3293615.57 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:10:13,274 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2023-10-09 18:10:31,933 INFO [train.py:1031] (3/4) Epoch 14, batch 22500, loss[loss=0.2182, simple_loss=0.2682, pruned_loss=0.06197, ctc_loss=0.1105, over 16807.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.2992, pruned_loss=0.07018, ctc_loss=0.1228, over 3296008.44 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:11:03,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2833842.6666666665, ans=0.0 2023-10-09 18:11:08,196 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2833889.3333333335, ans=0.125 2023-10-09 18:11:08,224 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2833889.3333333335, ans=0.125 2023-10-09 18:11:13,388 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-10-09 18:11:17,899 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+02 3.228e+02 3.590e+02 3.967e+02 7.433e+02, threshold=7.180e+02, percent-clipped=0.0 2023-10-09 18:11:25,385 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-10-09 18:11:32,582 INFO [train.py:1031] (3/4) Epoch 14, batch 22550, loss[loss=0.1833, simple_loss=0.2311, pruned_loss=0.05023, ctc_loss=0.08757, over 16706.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2901, pruned_loss=0.06836, ctc_loss=0.1197, over 3297119.44 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:12:10,606 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2834122.6666666665, ans=0.0 2023-10-09 18:12:23,364 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2023-10-09 18:12:33,381 INFO [train.py:1031] (3/4) Epoch 14, batch 22600, loss[loss=0.1977, simple_loss=0.271, pruned_loss=0.04565, ctc_loss=0.08264, over 16837.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.283, pruned_loss=0.06425, ctc_loss=0.1131, over 3277155.00 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:12:47,109 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2834262.6666666665, ans=0.125 2023-10-09 18:12:58,380 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2834309.3333333335, ans=0.0 2023-10-09 18:13:03,708 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2834309.3333333335, ans=0.1 2023-10-09 18:13:12,830 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2834356.0, ans=0.125 2023-10-09 18:13:20,265 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 2.912e+02 3.400e+02 4.128e+02 6.956e+02, threshold=6.801e+02, percent-clipped=0.0 2023-10-09 18:13:22,945 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-10-09 18:13:24,358 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2023-10-09 18:13:34,040 INFO [train.py:1031] (3/4) Epoch 14, batch 22650, loss[loss=0.2107, simple_loss=0.2536, pruned_loss=0.06219, ctc_loss=0.1086, over 16791.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2777, pruned_loss=0.06317, ctc_loss=0.1111, over 3290542.69 frames. ], batch size: 329, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:13:42,684 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2834449.3333333335, ans=0.125 2023-10-09 18:13:55,553 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2834496.0, ans=0.125 2023-10-09 18:13:56,629 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2834542.6666666665, ans=0.0 2023-10-09 18:14:00,408 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2834542.6666666665, ans=0.125 2023-10-09 18:14:12,514 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2834589.3333333335, ans=0.125 2023-10-09 18:14:29,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2834636.0, ans=0.2 2023-10-09 18:14:35,173 INFO [train.py:1031] (3/4) Epoch 14, batch 22700, loss[loss=0.2678, simple_loss=0.3106, pruned_loss=0.08452, ctc_loss=0.1397, over 16627.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.278, pruned_loss=0.06466, ctc_loss=0.1134, over 3286947.60 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:15:01,722 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2834776.0, ans=0.125 2023-10-09 18:15:24,588 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.390e+02 4.032e+02 4.588e+02 8.428e+02, threshold=8.064e+02, percent-clipped=2.0 2023-10-09 18:15:26,024 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2834869.3333333335, ans=0.125 2023-10-09 18:15:28,600 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2834869.3333333335, ans=0.1 2023-10-09 18:15:28,682 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2834869.3333333335, ans=0.0 2023-10-09 18:15:35,150 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=12.0 2023-10-09 18:15:35,816 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2834869.3333333335, ans=0.125 2023-10-09 18:15:36,886 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834916.0, ans=0.1 2023-10-09 18:15:37,585 INFO [train.py:1031] (3/4) Epoch 14, batch 22750, loss[loss=0.2389, simple_loss=0.3033, pruned_loss=0.06516, ctc_loss=0.1104, over 16700.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2838, pruned_loss=0.06754, ctc_loss=0.1178, over 3296852.25 frames. ], batch size: 111, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:15:54,038 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:15:58,351 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2834962.6666666665, ans=0.0 2023-10-09 18:16:09,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2835009.3333333335, ans=0.0 2023-10-09 18:16:11,964 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2835009.3333333335, ans=0.07 2023-10-09 18:16:39,367 INFO [train.py:1031] (3/4) Epoch 14, batch 22800, loss[loss=0.257, simple_loss=0.3083, pruned_loss=0.07585, ctc_loss=0.1352, over 16727.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2895, pruned_loss=0.0705, ctc_loss=0.1223, over 3295969.68 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:16:48,246 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2835149.3333333335, ans=0.125 2023-10-09 18:17:00,220 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2023-10-09 18:17:02,529 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2835242.6666666665, ans=0.125 2023-10-09 18:17:22,689 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2023-10-09 18:17:25,892 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2835289.3333333335, ans=0.04949747468305833 2023-10-09 18:17:28,727 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+02 3.223e+02 3.755e+02 4.885e+02 7.657e+02, threshold=7.509e+02, percent-clipped=0.0 2023-10-09 18:17:39,519 INFO [train.py:1031] (3/4) Epoch 14, batch 22850, loss[loss=0.2188, simple_loss=0.2881, pruned_loss=0.05472, ctc_loss=0.1003, over 16902.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2918, pruned_loss=0.06875, ctc_loss=0.1198, over 3295631.25 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:17:42,565 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2835382.6666666665, ans=0.125 2023-10-09 18:17:52,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2835429.3333333335, ans=0.0 2023-10-09 18:17:55,970 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2835429.3333333335, ans=0.0 2023-10-09 18:17:56,043 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2835429.3333333335, ans=0.125 2023-10-09 18:18:01,875 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2835429.3333333335, ans=0.0 2023-10-09 18:18:02,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2835476.0, ans=0.0 2023-10-09 18:18:05,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2835476.0, ans=0.125 2023-10-09 18:18:26,932 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:18:36,995 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2835569.3333333335, ans=0.1 2023-10-09 18:18:38,786 INFO [train.py:1031] (3/4) Epoch 14, batch 22900, loss[loss=0.1973, simple_loss=0.2525, pruned_loss=0.0538, ctc_loss=0.0863, over 16823.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2896, pruned_loss=0.06691, ctc_loss=0.1162, over 3304612.03 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:18:42,797 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2835616.0, ans=0.125 2023-10-09 18:18:50,134 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2835662.6666666665, ans=0.0 2023-10-09 18:18:50,653 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2023-10-09 18:18:57,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2835662.6666666665, ans=0.125 2023-10-09 18:19:05,644 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.02 vs. limit=15.0 2023-10-09 18:19:17,845 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2835756.0, ans=0.125 2023-10-09 18:19:18,761 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2835756.0, ans=10.0 2023-10-09 18:19:29,093 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+02 3.037e+02 3.390e+02 3.855e+02 5.718e+02, threshold=6.781e+02, percent-clipped=0.0 2023-10-09 18:19:35,883 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2023-10-09 18:19:40,762 INFO [train.py:1031] (3/4) Epoch 14, batch 22950, loss[loss=0.2522, simple_loss=0.2939, pruned_loss=0.08015, ctc_loss=0.1256, over 11351.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2867, pruned_loss=0.06642, ctc_loss=0.1154, over 3297774.69 frames. ], batch size: 39, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:19:41,759 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-10-09 18:19:55,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2835896.0, ans=0.0 2023-10-09 18:20:02,281 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=12.0 2023-10-09 18:20:03,011 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2835896.0, ans=0.125 2023-10-09 18:20:12,819 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2835942.6666666665, ans=0.125 2023-10-09 18:20:16,264 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:20:22,669 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2835989.3333333335, ans=0.125 2023-10-09 18:20:25,810 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-10-09 18:20:42,885 INFO [train.py:1031] (3/4) Epoch 14, batch 23000, loss[loss=0.2163, simple_loss=0.2864, pruned_loss=0.05442, ctc_loss=0.09334, over 16800.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2886, pruned_loss=0.06408, ctc_loss=0.1123, over 3299145.65 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:20:43,220 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2836082.6666666665, ans=0.0 2023-10-09 18:20:43,276 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2836082.6666666665, ans=0.09899494936611666 2023-10-09 18:20:46,769 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=12.0 2023-10-09 18:21:05,674 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2836129.3333333335, ans=0.95 2023-10-09 18:21:07,580 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2836176.0, ans=0.05 2023-10-09 18:21:20,561 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-10-09 18:21:36,010 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 3.336e+02 3.961e+02 4.906e+02 8.428e+02, threshold=7.922e+02, percent-clipped=4.0 2023-10-09 18:21:37,966 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836269.3333333335, ans=0.1 2023-10-09 18:21:37,992 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2836269.3333333335, ans=0.2 2023-10-09 18:21:45,236 INFO [train.py:1031] (3/4) Epoch 14, batch 23050, loss[loss=0.2208, simple_loss=0.2875, pruned_loss=0.05642, ctc_loss=0.103, over 16764.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2937, pruned_loss=0.06705, ctc_loss=0.1174, over 3299291.69 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:22:11,640 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2836409.3333333335, ans=0.125 2023-10-09 18:22:41,714 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2836502.6666666665, ans=0.125 2023-10-09 18:22:47,985 INFO [train.py:1031] (3/4) Epoch 14, batch 23100, loss[loss=0.1741, simple_loss=0.2495, pruned_loss=0.03595, ctc_loss=0.06699, over 16709.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2898, pruned_loss=0.06387, ctc_loss=0.1125, over 3287923.83 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:23:40,374 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2836736.0, ans=0.125 2023-10-09 18:23:41,038 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.931e+02 3.346e+02 4.278e+02 6.701e+02, threshold=6.692e+02, percent-clipped=0.0 2023-10-09 18:23:43,469 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:23:50,130 INFO [train.py:1031] (3/4) Epoch 14, batch 23150, loss[loss=0.1981, simple_loss=0.2534, pruned_loss=0.05376, ctc_loss=0.08832, over 16905.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2847, pruned_loss=0.06227, ctc_loss=0.1101, over 3287429.31 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:24:11,373 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2836829.3333333335, ans=0.07 2023-10-09 18:24:14,627 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2836876.0, ans=0.125 2023-10-09 18:24:22,887 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=22.5 2023-10-09 18:24:26,071 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-10-09 18:24:49,075 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=22.5 2023-10-09 18:24:49,876 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2837016.0, ans=0.0 2023-10-09 18:24:50,640 INFO [train.py:1031] (3/4) Epoch 14, batch 23200, loss[loss=0.2477, simple_loss=0.3253, pruned_loss=0.06254, ctc_loss=0.1123, over 16887.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2809, pruned_loss=0.06137, ctc_loss=0.1086, over 3291483.92 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:24:57,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2837016.0, ans=0.1 2023-10-09 18:25:08,916 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2837062.6666666665, ans=0.2 2023-10-09 18:25:23,248 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2837109.3333333335, ans=0.0 2023-10-09 18:25:35,322 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2023-10-09 18:25:41,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2837202.6666666665, ans=0.125 2023-10-09 18:25:47,045 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+02 3.050e+02 3.396e+02 3.920e+02 6.096e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 18:25:53,636 INFO [train.py:1031] (3/4) Epoch 14, batch 23250, loss[loss=0.2118, simple_loss=0.2647, pruned_loss=0.05858, ctc_loss=0.1043, over 16713.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2805, pruned_loss=0.06123, ctc_loss=0.1083, over 3291306.96 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:26:09,955 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2837296.0, ans=0.125 2023-10-09 18:26:32,455 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2837389.3333333335, ans=0.125 2023-10-09 18:26:52,713 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2837436.0, ans=0.0 2023-10-09 18:26:54,438 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2837436.0, ans=0.1 2023-10-09 18:26:59,153 INFO [train.py:1031] (3/4) Epoch 14, batch 23300, loss[loss=0.2241, simple_loss=0.3046, pruned_loss=0.05244, ctc_loss=0.09672, over 16887.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2768, pruned_loss=0.06088, ctc_loss=0.1077, over 3293668.78 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:27:01,185 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2837482.6666666665, ans=0.0 2023-10-09 18:27:13,090 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-10-09 18:27:26,663 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2023-10-09 18:27:26,729 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=22.5 2023-10-09 18:27:35,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2837622.6666666665, ans=0.125 2023-10-09 18:27:57,269 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.134e+02 3.806e+02 4.608e+02 8.711e+02, threshold=7.613e+02, percent-clipped=4.0 2023-10-09 18:28:01,253 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2837716.0, ans=0.125 2023-10-09 18:28:01,936 INFO [train.py:1031] (3/4) Epoch 14, batch 23350, loss[loss=0.1943, simple_loss=0.2384, pruned_loss=0.05526, ctc_loss=0.09949, over 16100.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2751, pruned_loss=0.0598, ctc_loss=0.106, over 3296641.43 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:28:12,143 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2837716.0, ans=0.0 2023-10-09 18:28:18,747 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2837762.6666666665, ans=0.125 2023-10-09 18:28:43,025 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2837856.0, ans=0.07 2023-10-09 18:29:03,747 INFO [train.py:1031] (3/4) Epoch 14, batch 23400, loss[loss=0.2114, simple_loss=0.2635, pruned_loss=0.05942, ctc_loss=0.1013, over 16933.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2715, pruned_loss=0.05976, ctc_loss=0.1054, over 3291972.57 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:29:12,087 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.66 vs. limit=10.0 2023-10-09 18:29:25,015 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2837996.0, ans=0.0 2023-10-09 18:29:30,959 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2838042.6666666665, ans=0.125 2023-10-09 18:30:00,414 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 3.093e+02 3.637e+02 4.189e+02 1.057e+03, threshold=7.274e+02, percent-clipped=1.0 2023-10-09 18:30:04,497 INFO [train.py:1031] (3/4) Epoch 14, batch 23450, loss[loss=0.1823, simple_loss=0.2383, pruned_loss=0.04681, ctc_loss=0.08156, over 16763.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2677, pruned_loss=0.05967, ctc_loss=0.1051, over 3296687.90 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:30:07,474 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2838182.6666666665, ans=0.1 2023-10-09 18:30:20,930 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2838229.3333333335, ans=0.035 2023-10-09 18:30:36,406 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2838276.0, ans=0.2 2023-10-09 18:30:44,140 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-10-09 18:30:49,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2838322.6666666665, ans=0.1 2023-10-09 18:31:06,571 INFO [train.py:1031] (3/4) Epoch 14, batch 23500, loss[loss=0.2241, simple_loss=0.2828, pruned_loss=0.06088, ctc_loss=0.1092, over 16746.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2678, pruned_loss=0.06056, ctc_loss=0.1062, over 3292749.78 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:31:10,467 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-10-09 18:31:22,140 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2838462.6666666665, ans=0.1 2023-10-09 18:31:34,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2838509.3333333335, ans=0.0 2023-10-09 18:31:39,706 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2838509.3333333335, ans=0.05 2023-10-09 18:31:43,138 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2838509.3333333335, ans=0.0 2023-10-09 18:31:52,178 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2838556.0, ans=0.125 2023-10-09 18:31:56,302 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-10-09 18:32:05,663 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+02 3.415e+02 3.721e+02 4.306e+02 1.300e+03, threshold=7.442e+02, percent-clipped=1.0 2023-10-09 18:32:08,407 INFO [train.py:1031] (3/4) Epoch 14, batch 23550, loss[loss=0.2095, simple_loss=0.2648, pruned_loss=0.05785, ctc_loss=0.09631, over 16832.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2729, pruned_loss=0.06284, ctc_loss=0.1102, over 3292243.13 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:32:13,692 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2838649.3333333335, ans=0.125 2023-10-09 18:32:17,054 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2838649.3333333335, ans=0.0 2023-10-09 18:32:24,068 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2838696.0, ans=0.0 2023-10-09 18:32:44,782 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2838789.3333333335, ans=0.125 2023-10-09 18:32:52,926 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2838789.3333333335, ans=0.2 2023-10-09 18:32:56,593 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2838836.0, ans=0.0 2023-10-09 18:33:08,875 INFO [train.py:1031] (3/4) Epoch 14, batch 23600, loss[loss=0.1915, simple_loss=0.2401, pruned_loss=0.05352, ctc_loss=0.0896, over 16739.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2697, pruned_loss=0.06243, ctc_loss=0.1095, over 3284904.99 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:33:19,106 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-10-09 18:33:24,682 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2023-10-09 18:33:31,926 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2838929.3333333335, ans=0.125 2023-10-09 18:33:42,667 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2838976.0, ans=0.05 2023-10-09 18:33:45,880 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2839022.6666666665, ans=0.125 2023-10-09 18:33:47,338 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2839022.6666666665, ans=22.5 2023-10-09 18:33:48,573 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2839022.6666666665, ans=0.0 2023-10-09 18:34:01,369 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2023-10-09 18:34:09,457 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 2.978e+02 3.333e+02 3.973e+02 8.640e+02, threshold=6.667e+02, percent-clipped=1.0 2023-10-09 18:34:10,552 INFO [train.py:1031] (3/4) Epoch 14, batch 23650, loss[loss=0.234, simple_loss=0.2951, pruned_loss=0.0653, ctc_loss=0.1057, over 16722.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2724, pruned_loss=0.06223, ctc_loss=0.1094, over 3273515.19 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:34:15,806 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2839116.0, ans=0.0 2023-10-09 18:34:16,175 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-10-09 18:34:25,918 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=22.5 2023-10-09 18:34:39,074 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2839209.3333333335, ans=0.2 2023-10-09 18:34:53,710 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=12.0 2023-10-09 18:34:54,574 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2839256.0, ans=0.125 2023-10-09 18:34:57,269 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2839256.0, ans=0.125 2023-10-09 18:35:02,094 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2839302.6666666665, ans=0.125 2023-10-09 18:35:03,145 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2839302.6666666665, ans=0.0 2023-10-09 18:35:11,978 INFO [train.py:1031] (3/4) Epoch 14, batch 23700, loss[loss=0.2305, simple_loss=0.291, pruned_loss=0.06261, ctc_loss=0.112, over 16487.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2743, pruned_loss=0.05892, ctc_loss=0.1038, over 3287882.76 frames. ], batch size: 415, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:35:14,797 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=12.0 2023-10-09 18:35:20,419 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2839349.3333333335, ans=0.0 2023-10-09 18:35:31,200 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2839396.0, ans=0.0 2023-10-09 18:35:42,271 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2839442.6666666665, ans=0.125 2023-10-09 18:35:52,411 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2839489.3333333335, ans=0.125 2023-10-09 18:35:57,129 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:36:09,128 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-10-09 18:36:11,211 INFO [train.py:1031] (3/4) Epoch 14, batch 23750, loss[loss=0.2594, simple_loss=0.3261, pruned_loss=0.07091, ctc_loss=0.1272, over 16874.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2782, pruned_loss=0.059, ctc_loss=0.1045, over 3297391.91 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 18:36:12,958 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.785e+02 3.356e+02 4.379e+02 6.615e+02, threshold=6.712e+02, percent-clipped=0.0 2023-10-09 18:36:17,270 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2839582.6666666665, ans=0.125 2023-10-09 18:36:24,648 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-10-09 18:36:34,945 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2839676.0, ans=0.0 2023-10-09 18:36:40,712 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=2839676.0, ans=0.02 2023-10-09 18:36:43,400 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2839676.0, ans=0.1 2023-10-09 18:36:47,107 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=12.0 2023-10-09 18:36:54,881 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2839722.6666666665, ans=0.07 2023-10-09 18:37:04,002 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:37:11,734 INFO [train.py:1031] (3/4) Epoch 14, batch 23800, loss[loss=0.2407, simple_loss=0.3162, pruned_loss=0.05941, ctc_loss=0.1158, over 16821.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2784, pruned_loss=0.0566, ctc_loss=0.101, over 3300598.64 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:37:20,252 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2839816.0, ans=0.2 2023-10-09 18:37:28,586 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2839862.6666666665, ans=0.0 2023-10-09 18:37:36,018 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-10-09 18:37:49,192 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2839956.0, ans=0.125 2023-10-09 18:37:54,999 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2839956.0, ans=0.1 2023-10-09 18:38:12,197 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840049.3333333335, ans=0.1 2023-10-09 18:38:12,934 INFO [train.py:1031] (3/4) Epoch 14, batch 23850, loss[loss=0.2403, simple_loss=0.319, pruned_loss=0.05842, ctc_loss=0.1118, over 16862.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2856, pruned_loss=0.05737, ctc_loss=0.1027, over 3306565.32 frames. ], batch size: 242, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:38:14,611 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 3.207e+02 4.081e+02 4.991e+02 8.849e+02, threshold=8.163e+02, percent-clipped=8.0 2023-10-09 18:38:18,474 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=15.0 2023-10-09 18:38:18,890 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2840049.3333333335, ans=0.0 2023-10-09 18:38:21,821 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2840049.3333333335, ans=0.125 2023-10-09 18:38:28,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2840096.0, ans=0.125 2023-10-09 18:38:32,761 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2840096.0, ans=0.125 2023-10-09 18:38:32,791 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2840096.0, ans=0.5 2023-10-09 18:38:35,122 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2840096.0, ans=0.125 2023-10-09 18:38:40,113 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2840142.6666666665, ans=0.125 2023-10-09 18:38:51,695 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-10-09 18:38:56,072 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:38:56,162 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2840189.3333333335, ans=0.125 2023-10-09 18:38:57,510 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2023-10-09 18:39:02,079 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-10-09 18:39:05,800 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=12.0 2023-10-09 18:39:07,259 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2840236.0, ans=0.0 2023-10-09 18:39:13,694 INFO [train.py:1031] (3/4) Epoch 14, batch 23900, loss[loss=0.2205, simple_loss=0.2723, pruned_loss=0.06303, ctc_loss=0.1064, over 16814.00 frames. ], tot_loss[loss=0.224, simple_loss=0.287, pruned_loss=0.0594, ctc_loss=0.1056, over 3303843.91 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:39:16,126 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:39:24,302 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2840282.6666666665, ans=0.5 2023-10-09 18:39:29,298 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-10-09 18:39:59,339 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=22.5 2023-10-09 18:40:00,172 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840422.6666666665, ans=0.1 2023-10-09 18:40:15,807 INFO [train.py:1031] (3/4) Epoch 14, batch 23950, loss[loss=0.2192, simple_loss=0.2758, pruned_loss=0.06021, ctc_loss=0.1053, over 16895.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2847, pruned_loss=0.06131, ctc_loss=0.1084, over 3301329.13 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:40:16,838 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+02 3.283e+02 3.829e+02 4.670e+02 8.731e+02, threshold=7.659e+02, percent-clipped=1.0 2023-10-09 18:40:26,625 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2840562.6666666665, ans=0.125 2023-10-09 18:40:38,637 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2840609.3333333335, ans=0.05 2023-10-09 18:40:49,452 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2840609.3333333335, ans=0.125 2023-10-09 18:41:02,798 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2840702.6666666665, ans=0.125 2023-10-09 18:41:11,258 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2840702.6666666665, ans=0.0 2023-10-09 18:41:15,727 INFO [train.py:1031] (3/4) Epoch 14, batch 24000, loss[loss=0.2018, simple_loss=0.2638, pruned_loss=0.05284, ctc_loss=0.08555, over 16767.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2833, pruned_loss=0.06236, ctc_loss=0.1099, over 3314173.00 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:41:15,727 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 18:41:33,417 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2354, simple_loss=0.3014, pruned_loss=0.06541, ctc_loss=0.09632, over 1796401.00 frames. 2023-10-09 18:41:33,417 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 18:41:36,599 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2840749.3333333335, ans=0.125 2023-10-09 18:41:44,493 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840796.0, ans=0.1 2023-10-09 18:42:03,973 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2840842.6666666665, ans=0.125 2023-10-09 18:42:04,822 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2840842.6666666665, ans=0.125 2023-10-09 18:42:07,041 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2840842.6666666665, ans=0.125 2023-10-09 18:42:20,287 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2840889.3333333335, ans=0.125 2023-10-09 18:42:36,284 INFO [train.py:1031] (3/4) Epoch 14, batch 24050, loss[loss=0.2439, simple_loss=0.307, pruned_loss=0.06704, ctc_loss=0.1169, over 16866.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2876, pruned_loss=0.06284, ctc_loss=0.1111, over 3314901.30 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:42:40,008 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.406e+02 3.196e+02 3.829e+02 4.589e+02 8.519e+02, threshold=7.658e+02, percent-clipped=2.0 2023-10-09 18:42:56,310 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-10-09 18:43:37,901 INFO [train.py:1031] (3/4) Epoch 14, batch 24100, loss[loss=0.2252, simple_loss=0.2728, pruned_loss=0.06636, ctc_loss=0.1123, over 12268.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2904, pruned_loss=0.06477, ctc_loss=0.1138, over 3304442.40 frames. ], batch size: 38, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:43:41,060 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2841216.0, ans=0.2 2023-10-09 18:44:20,156 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2841356.0, ans=0.125 2023-10-09 18:44:22,302 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2841356.0, ans=0.125 2023-10-09 18:44:26,092 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2841402.6666666665, ans=0.0 2023-10-09 18:44:39,361 INFO [train.py:1031] (3/4) Epoch 14, batch 24150, loss[loss=0.1996, simple_loss=0.2606, pruned_loss=0.05067, ctc_loss=0.09319, over 16854.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2829, pruned_loss=0.06139, ctc_loss=0.1081, over 3301143.30 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:44:43,199 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.029e+02 3.494e+02 3.950e+02 7.485e+02, threshold=6.988e+02, percent-clipped=0.0 2023-10-09 18:44:44,513 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2841449.3333333335, ans=0.125 2023-10-09 18:45:34,302 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2841636.0, ans=0.0 2023-10-09 18:45:42,157 INFO [train.py:1031] (3/4) Epoch 14, batch 24200, loss[loss=0.1796, simple_loss=0.2487, pruned_loss=0.04104, ctc_loss=0.07127, over 16692.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2791, pruned_loss=0.05802, ctc_loss=0.1028, over 3301820.96 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:45:42,546 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2841682.6666666665, ans=0.125 2023-10-09 18:45:45,728 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841682.6666666665, ans=0.1 2023-10-09 18:45:50,623 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2841682.6666666665, ans=0.125 2023-10-09 18:46:12,593 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2841776.0, ans=0.0 2023-10-09 18:46:14,195 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2841776.0, ans=0.5 2023-10-09 18:46:17,567 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=2841822.6666666665, ans=22.5 2023-10-09 18:46:35,552 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841869.3333333335, ans=0.1 2023-10-09 18:46:38,310 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2841869.3333333335, ans=0.125 2023-10-09 18:46:43,489 INFO [train.py:1031] (3/4) Epoch 14, batch 24250, loss[loss=0.2706, simple_loss=0.3116, pruned_loss=0.08516, ctc_loss=0.1485, over 16585.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2763, pruned_loss=0.05779, ctc_loss=0.1019, over 3283787.61 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:46:49,469 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.981e+02 3.499e+02 4.269e+02 8.354e+02, threshold=6.999e+02, percent-clipped=3.0 2023-10-09 18:47:00,277 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2841962.6666666665, ans=0.125 2023-10-09 18:47:06,418 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2841962.6666666665, ans=0.0 2023-10-09 18:47:16,783 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2842009.3333333335, ans=0.95 2023-10-09 18:47:36,768 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2842102.6666666665, ans=0.0 2023-10-09 18:47:38,980 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2842102.6666666665, ans=0.125 2023-10-09 18:47:46,791 INFO [train.py:1031] (3/4) Epoch 14, batch 24300, loss[loss=0.2281, simple_loss=0.2787, pruned_loss=0.06538, ctc_loss=0.1168, over 15337.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.282, pruned_loss=0.06114, ctc_loss=0.1073, over 3287439.19 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:48:04,020 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2842196.0, ans=0.125 2023-10-09 18:48:34,169 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2842289.3333333335, ans=0.125 2023-10-09 18:48:48,964 INFO [train.py:1031] (3/4) Epoch 14, batch 24350, loss[loss=0.2231, simple_loss=0.2798, pruned_loss=0.06211, ctc_loss=0.1056, over 16903.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2844, pruned_loss=0.06151, ctc_loss=0.1081, over 3298546.53 frames. ], batch size: 242, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:48:55,801 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+02 3.451e+02 4.035e+02 4.756e+02 1.145e+03, threshold=8.070e+02, percent-clipped=2.0 2023-10-09 18:49:03,565 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=12.0 2023-10-09 18:49:06,971 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2842429.3333333335, ans=0.1 2023-10-09 18:49:09,264 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-10-09 18:49:16,150 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2842476.0, ans=0.125 2023-10-09 18:49:31,403 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2842522.6666666665, ans=0.05 2023-10-09 18:49:46,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2842569.3333333335, ans=0.125 2023-10-09 18:49:49,953 INFO [train.py:1031] (3/4) Epoch 14, batch 24400, loss[loss=0.2261, simple_loss=0.2885, pruned_loss=0.06087, ctc_loss=0.105, over 16913.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2837, pruned_loss=0.0624, ctc_loss=0.1097, over 3306906.35 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:50:12,673 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2842662.6666666665, ans=0.2 2023-10-09 18:50:12,719 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2842662.6666666665, ans=0.125 2023-10-09 18:50:22,220 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2842709.3333333335, ans=0.0 2023-10-09 18:50:50,579 INFO [train.py:1031] (3/4) Epoch 14, batch 24450, loss[loss=0.1808, simple_loss=0.2343, pruned_loss=0.04793, ctc_loss=0.0788, over 16767.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2837, pruned_loss=0.06368, ctc_loss=0.112, over 3304309.25 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:50:57,483 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.474e+02 3.798e+02 4.507e+02 6.680e+02, threshold=7.596e+02, percent-clipped=0.0 2023-10-09 18:51:31,896 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2842989.3333333335, ans=0.125 2023-10-09 18:51:33,776 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2842989.3333333335, ans=0.025 2023-10-09 18:51:40,599 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2843036.0, ans=0.125 2023-10-09 18:51:51,709 INFO [train.py:1031] (3/4) Epoch 14, batch 24500, loss[loss=0.1788, simple_loss=0.2309, pruned_loss=0.04711, ctc_loss=0.0812, over 16650.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2811, pruned_loss=0.06393, ctc_loss=0.1111, over 3305852.64 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:52:18,655 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2843176.0, ans=0.0 2023-10-09 18:52:24,869 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2843176.0, ans=0.125 2023-10-09 18:52:30,130 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-10-09 18:52:39,851 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2843222.6666666665, ans=0.125 2023-10-09 18:52:42,506 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2843269.3333333335, ans=0.2 2023-10-09 18:52:48,437 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843269.3333333335, ans=0.1 2023-10-09 18:52:48,898 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2023-10-09 18:52:51,151 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843269.3333333335, ans=0.1 2023-10-09 18:52:54,590 INFO [train.py:1031] (3/4) Epoch 14, batch 24550, loss[loss=0.2705, simple_loss=0.3416, pruned_loss=0.07277, ctc_loss=0.1346, over 16695.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2798, pruned_loss=0.06144, ctc_loss=0.1063, over 3292361.01 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:53:03,099 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+02 3.410e+02 4.193e+02 5.169e+02 8.028e+02, threshold=8.385e+02, percent-clipped=3.0 2023-10-09 18:53:39,063 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2843456.0, ans=0.125 2023-10-09 18:53:45,579 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2843502.6666666665, ans=0.0 2023-10-09 18:53:57,854 INFO [train.py:1031] (3/4) Epoch 14, batch 24600, loss[loss=0.2412, simple_loss=0.2879, pruned_loss=0.07378, ctc_loss=0.1171, over 16868.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2817, pruned_loss=0.06116, ctc_loss=0.1063, over 3290228.51 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:54:02,593 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2843549.3333333335, ans=0.2 2023-10-09 18:54:17,744 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2843596.0, ans=0.125 2023-10-09 18:54:20,495 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2843596.0, ans=0.125 2023-10-09 18:54:27,719 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2843642.6666666665, ans=0.0 2023-10-09 18:54:38,966 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2843689.3333333335, ans=0.0 2023-10-09 18:54:50,404 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2843736.0, ans=0.2 2023-10-09 18:55:02,746 INFO [train.py:1031] (3/4) Epoch 14, batch 24650, loss[loss=0.2347, simple_loss=0.3149, pruned_loss=0.05803, ctc_loss=0.09582, over 16815.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2917, pruned_loss=0.0644, ctc_loss=0.1121, over 3297314.92 frames. ], batch size: 188, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:55:13,665 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.365e+02 3.995e+02 4.722e+02 9.808e+02, threshold=7.989e+02, percent-clipped=0.0 2023-10-09 18:55:22,161 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-10-09 18:55:23,955 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2843829.3333333335, ans=0.0 2023-10-09 18:55:26,087 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2843829.3333333335, ans=0.125 2023-10-09 18:55:33,032 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=12.0 2023-10-09 18:55:35,627 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2843876.0, ans=0.125 2023-10-09 18:55:47,326 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2843922.6666666665, ans=0.125 2023-10-09 18:56:05,274 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2844016.0, ans=0.125 2023-10-09 18:56:05,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2844016.0, ans=0.0 2023-10-09 18:56:06,109 INFO [train.py:1031] (3/4) Epoch 14, batch 24700, loss[loss=0.2575, simple_loss=0.3243, pruned_loss=0.06942, ctc_loss=0.1295, over 16907.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.3002, pruned_loss=0.06527, ctc_loss=0.114, over 3298374.02 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:56:29,866 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2844062.6666666665, ans=0.0 2023-10-09 18:56:36,424 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2844109.3333333335, ans=0.1 2023-10-09 18:56:41,431 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2844109.3333333335, ans=0.125 2023-10-09 18:56:43,039 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2844109.3333333335, ans=0.0 2023-10-09 18:56:45,352 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2844156.0, ans=0.2 2023-10-09 18:57:05,946 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2844202.6666666665, ans=0.125 2023-10-09 18:57:06,989 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2844202.6666666665, ans=0.0 2023-10-09 18:57:10,472 INFO [train.py:1031] (3/4) Epoch 14, batch 24750, loss[loss=0.2374, simple_loss=0.2853, pruned_loss=0.07198, ctc_loss=0.114, over 16726.00 frames. ], tot_loss[loss=0.2445, simple_loss=0.3046, pruned_loss=0.06838, ctc_loss=0.1194, over 3302372.17 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:57:11,965 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2844249.3333333335, ans=0.1 2023-10-09 18:57:13,647 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2844249.3333333335, ans=0.125 2023-10-09 18:57:23,590 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.844e+02 3.622e+02 4.141e+02 4.992e+02 1.091e+03, threshold=8.281e+02, percent-clipped=4.0 2023-10-09 18:57:31,205 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2844296.0, ans=0.0 2023-10-09 18:57:36,245 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2844342.6666666665, ans=0.2 2023-10-09 18:57:37,654 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-10-09 18:57:38,354 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:57:43,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2844342.6666666665, ans=0.125 2023-10-09 18:58:03,018 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2844436.0, ans=0.125 2023-10-09 18:58:13,777 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2844436.0, ans=0.95 2023-10-09 18:58:17,276 INFO [train.py:1031] (3/4) Epoch 14, batch 24800, loss[loss=0.2339, simple_loss=0.3339, pruned_loss=0.04963, ctc_loss=0.08678, over 15033.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.3013, pruned_loss=0.06773, ctc_loss=0.1172, over 3304660.30 frames. ], batch size: 527, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:58:19,879 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2844482.6666666665, ans=0.125 2023-10-09 18:58:46,623 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=12.0 2023-10-09 18:58:47,289 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2844576.0, ans=0.025 2023-10-09 18:59:20,852 INFO [train.py:1031] (3/4) Epoch 14, batch 24850, loss[loss=0.2471, simple_loss=0.3029, pruned_loss=0.07185, ctc_loss=0.1192, over 16572.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.3006, pruned_loss=0.06848, ctc_loss=0.1179, over 3294650.50 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 18:59:26,117 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2844716.0, ans=0.1 2023-10-09 18:59:32,574 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2844762.6666666665, ans=0.0 2023-10-09 18:59:35,025 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.266e+02 3.931e+02 4.617e+02 8.041e+02, threshold=7.862e+02, percent-clipped=0.0 2023-10-09 18:59:58,285 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2844809.3333333335, ans=0.125 2023-10-09 19:00:13,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2844902.6666666665, ans=0.0 2023-10-09 19:00:27,353 INFO [train.py:1031] (3/4) Epoch 14, batch 24900, loss[loss=0.258, simple_loss=0.3599, pruned_loss=0.05641, ctc_loss=0.108, over 15084.00 frames. ], tot_loss[loss=0.2476, simple_loss=0.3049, pruned_loss=0.07068, ctc_loss=0.1223, over 3293361.59 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:00:34,141 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2844949.3333333335, ans=0.125 2023-10-09 19:00:42,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2844996.0, ans=0.125 2023-10-09 19:01:26,794 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2845136.0, ans=0.125 2023-10-09 19:01:28,987 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:01:29,181 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-10-09 19:01:30,760 INFO [train.py:1031] (3/4) Epoch 14, batch 24950, loss[loss=0.269, simple_loss=0.327, pruned_loss=0.07682, ctc_loss=0.1434, over 16807.00 frames. ], tot_loss[loss=0.2478, simple_loss=0.3088, pruned_loss=0.06937, ctc_loss=0.1204, over 3284805.31 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:01:33,992 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.44 vs. limit=22.5 2023-10-09 19:01:35,093 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=12.0 2023-10-09 19:01:36,917 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2845182.6666666665, ans=0.0 2023-10-09 19:01:40,891 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=12.0 2023-10-09 19:01:45,694 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2845229.3333333335, ans=0.2 2023-10-09 19:01:46,518 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.532e+02 4.120e+02 4.965e+02 9.701e+02, threshold=8.240e+02, percent-clipped=4.0 2023-10-09 19:01:50,766 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2845229.3333333335, ans=0.2 2023-10-09 19:02:00,831 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2845276.0, ans=0.125 2023-10-09 19:02:06,650 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2845322.6666666665, ans=0.05 2023-10-09 19:02:09,237 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2845322.6666666665, ans=0.1 2023-10-09 19:02:09,513 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-10-09 19:02:32,822 INFO [train.py:1031] (3/4) Epoch 14, batch 25000, loss[loss=0.2306, simple_loss=0.2796, pruned_loss=0.06836, ctc_loss=0.1121, over 16879.00 frames. ], tot_loss[loss=0.2453, simple_loss=0.3035, pruned_loss=0.06946, ctc_loss=0.1207, over 3291568.85 frames. ], batch size: 141, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:02:50,488 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2845462.6666666665, ans=0.125 2023-10-09 19:02:54,282 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2845462.6666666665, ans=0.125 2023-10-09 19:03:00,995 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-10-09 19:03:06,683 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2845509.3333333335, ans=0.125 2023-10-09 19:03:11,518 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2845556.0, ans=0.125 2023-10-09 19:03:14,199 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2845556.0, ans=0.125 2023-10-09 19:03:18,716 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-10-09 19:03:33,089 INFO [train.py:1031] (3/4) Epoch 14, batch 25050, loss[loss=0.2257, simple_loss=0.2831, pruned_loss=0.06417, ctc_loss=0.09999, over 16801.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2987, pruned_loss=0.06899, ctc_loss=0.12, over 3300927.01 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:03:33,757 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=15.0 2023-10-09 19:03:35,581 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2845649.3333333335, ans=0.0 2023-10-09 19:03:42,961 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2845649.3333333335, ans=0.125 2023-10-09 19:03:49,356 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2845696.0, ans=0.1 2023-10-09 19:03:50,005 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+02 3.393e+02 3.859e+02 4.552e+02 1.527e+03, threshold=7.717e+02, percent-clipped=2.0 2023-10-09 19:03:52,457 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2845696.0, ans=0.125 2023-10-09 19:03:54,277 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2845696.0, ans=0.1 2023-10-09 19:04:27,034 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=12.0 2023-10-09 19:04:29,309 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2845836.0, ans=0.125 2023-10-09 19:04:34,836 INFO [train.py:1031] (3/4) Epoch 14, batch 25100, loss[loss=0.1936, simple_loss=0.26, pruned_loss=0.04689, ctc_loss=0.08343, over 16916.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2934, pruned_loss=0.06711, ctc_loss=0.1168, over 3293120.37 frames. ], batch size: 243, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:04:43,555 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2845882.6666666665, ans=0.125 2023-10-09 19:04:45,143 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2845882.6666666665, ans=0.125 2023-10-09 19:04:45,248 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2845882.6666666665, ans=0.0 2023-10-09 19:04:47,841 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2845929.3333333335, ans=0.0 2023-10-09 19:04:53,840 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2845929.3333333335, ans=0.125 2023-10-09 19:05:35,055 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2846116.0, ans=0.2 2023-10-09 19:05:36,319 INFO [train.py:1031] (3/4) Epoch 14, batch 25150, loss[loss=0.1984, simple_loss=0.2583, pruned_loss=0.05245, ctc_loss=0.08382, over 16734.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2868, pruned_loss=0.06498, ctc_loss=0.1136, over 3291134.20 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:05:52,020 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 2.975e+02 3.474e+02 4.105e+02 7.010e+02, threshold=6.948e+02, percent-clipped=0.0 2023-10-09 19:06:31,711 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2846302.6666666665, ans=0.1 2023-10-09 19:06:32,712 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2846302.6666666665, ans=0.0 2023-10-09 19:06:36,059 INFO [train.py:1031] (3/4) Epoch 14, batch 25200, loss[loss=0.1999, simple_loss=0.2599, pruned_loss=0.05211, ctc_loss=0.08903, over 16891.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2834, pruned_loss=0.06478, ctc_loss=0.1133, over 3305455.79 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:07:18,121 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2846489.3333333335, ans=0.125 2023-10-09 19:07:35,929 INFO [train.py:1031] (3/4) Epoch 14, batch 25250, loss[loss=0.2528, simple_loss=0.2935, pruned_loss=0.07718, ctc_loss=0.1442, over 16909.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2818, pruned_loss=0.06569, ctc_loss=0.1148, over 3296113.43 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:07:43,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2846582.6666666665, ans=0.125 2023-10-09 19:07:53,656 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2846629.3333333335, ans=0.07 2023-10-09 19:07:56,504 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+02 3.269e+02 3.734e+02 4.463e+02 8.122e+02, threshold=7.469e+02, percent-clipped=1.0 2023-10-09 19:08:09,044 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2846676.0, ans=0.0 2023-10-09 19:08:13,461 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:08:24,534 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-10-09 19:08:27,330 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-10-09 19:08:29,813 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2846769.3333333335, ans=0.125 2023-10-09 19:08:32,677 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:08:39,563 INFO [train.py:1031] (3/4) Epoch 14, batch 25300, loss[loss=0.2421, simple_loss=0.3185, pruned_loss=0.06043, ctc_loss=0.1123, over 16895.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2883, pruned_loss=0.06821, ctc_loss=0.1196, over 3303755.29 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:08:39,947 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2846816.0, ans=0.125 2023-10-09 19:08:58,444 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=22.5 2023-10-09 19:09:34,716 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2847002.6666666665, ans=0.125 2023-10-09 19:09:35,803 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2847002.6666666665, ans=0.0 2023-10-09 19:09:41,237 INFO [train.py:1031] (3/4) Epoch 14, batch 25350, loss[loss=0.2386, simple_loss=0.2914, pruned_loss=0.06921, ctc_loss=0.1184, over 16713.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.2969, pruned_loss=0.06958, ctc_loss=0.1224, over 3301904.27 frames. ], batch size: 111, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:09:41,774 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-10-09 19:09:45,308 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2847049.3333333335, ans=0.1 2023-10-09 19:09:52,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2847096.0, ans=0.125 2023-10-09 19:10:00,695 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2023-10-09 19:10:01,553 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.479e+02 4.151e+02 5.048e+02 8.470e+02, threshold=8.302e+02, percent-clipped=4.0 2023-10-09 19:10:25,480 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2847189.3333333335, ans=0.2 2023-10-09 19:10:41,702 INFO [train.py:1031] (3/4) Epoch 14, batch 25400, loss[loss=0.2138, simple_loss=0.2687, pruned_loss=0.05847, ctc_loss=0.1048, over 16940.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2929, pruned_loss=0.0692, ctc_loss=0.1219, over 3305329.12 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:10:42,065 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2847282.6666666665, ans=0.125 2023-10-09 19:11:32,214 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2847469.3333333335, ans=0.05 2023-10-09 19:11:37,267 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-10-09 19:11:40,313 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-10-09 19:11:40,829 INFO [train.py:1031] (3/4) Epoch 14, batch 25450, loss[loss=0.2703, simple_loss=0.2884, pruned_loss=0.09441, ctc_loss=0.1584, over 16599.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2887, pruned_loss=0.06906, ctc_loss=0.1216, over 3311690.76 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:12:00,510 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2847562.6666666665, ans=0.125 2023-10-09 19:12:01,164 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+02 3.142e+02 3.636e+02 4.300e+02 1.054e+03, threshold=7.273e+02, percent-clipped=3.0 2023-10-09 19:12:05,129 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2847609.3333333335, ans=0.125 2023-10-09 19:12:21,086 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2847656.0, ans=0.125 2023-10-09 19:12:35,346 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2023-10-09 19:12:41,752 INFO [train.py:1031] (3/4) Epoch 14, batch 25500, loss[loss=0.2034, simple_loss=0.2658, pruned_loss=0.05219, ctc_loss=0.09174, over 16884.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2833, pruned_loss=0.06553, ctc_loss=0.1153, over 3316321.88 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:12:56,360 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2847796.0, ans=0.2 2023-10-09 19:12:57,377 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2847796.0, ans=0.0 2023-10-09 19:13:16,522 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2847842.6666666665, ans=0.09899494936611666 2023-10-09 19:13:25,777 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2847889.3333333335, ans=0.125 2023-10-09 19:13:25,813 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2847889.3333333335, ans=0.2 2023-10-09 19:13:26,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2847889.3333333335, ans=0.125 2023-10-09 19:13:28,567 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2847889.3333333335, ans=0.125 2023-10-09 19:13:30,733 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2847936.0, ans=0.125 2023-10-09 19:13:31,994 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=22.5 2023-10-09 19:13:44,907 INFO [train.py:1031] (3/4) Epoch 14, batch 25550, loss[loss=0.2342, simple_loss=0.2887, pruned_loss=0.0663, ctc_loss=0.1181, over 16862.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2861, pruned_loss=0.06701, ctc_loss=0.1178, over 3303609.63 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:14:00,351 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2848029.3333333335, ans=0.0 2023-10-09 19:14:07,200 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+02 3.271e+02 3.768e+02 4.486e+02 1.096e+03, threshold=7.537e+02, percent-clipped=1.0 2023-10-09 19:14:11,966 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2848076.0, ans=0.125 2023-10-09 19:14:20,161 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2848076.0, ans=0.125 2023-10-09 19:14:30,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2848122.6666666665, ans=0.1 2023-10-09 19:14:45,709 INFO [train.py:1031] (3/4) Epoch 14, batch 25600, loss[loss=0.2424, simple_loss=0.308, pruned_loss=0.06541, ctc_loss=0.1149, over 16761.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2897, pruned_loss=0.06868, ctc_loss=0.1205, over 3304516.35 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:15:05,769 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2848262.6666666665, ans=0.1 2023-10-09 19:15:47,650 INFO [train.py:1031] (3/4) Epoch 14, batch 25650, loss[loss=0.2826, simple_loss=0.3371, pruned_loss=0.08486, ctc_loss=0.1458, over 16658.00 frames. ], tot_loss[loss=0.2437, simple_loss=0.2964, pruned_loss=0.07069, ctc_loss=0.1239, over 3301312.60 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:15:55,984 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.58 vs. limit=22.5 2023-10-09 19:16:06,151 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2848496.0, ans=0.125 2023-10-09 19:16:11,372 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+02 3.570e+02 3.954e+02 4.505e+02 1.083e+03, threshold=7.908e+02, percent-clipped=2.0 2023-10-09 19:16:13,701 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2848542.6666666665, ans=0.0 2023-10-09 19:16:37,335 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=22.5 2023-10-09 19:16:50,496 INFO [train.py:1031] (3/4) Epoch 14, batch 25700, loss[loss=0.2885, simple_loss=0.3255, pruned_loss=0.09318, ctc_loss=0.1629, over 16523.00 frames. ], tot_loss[loss=0.2503, simple_loss=0.3027, pruned_loss=0.07332, ctc_loss=0.1282, over 3302248.05 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:16:50,778 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2848682.6666666665, ans=0.2 2023-10-09 19:17:13,082 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2848776.0, ans=0.0 2023-10-09 19:17:42,730 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2848869.3333333335, ans=0.125 2023-10-09 19:17:51,107 INFO [train.py:1031] (3/4) Epoch 14, batch 25750, loss[loss=0.2049, simple_loss=0.2858, pruned_loss=0.04602, ctc_loss=0.07971, over 16878.00 frames. ], tot_loss[loss=0.2518, simple_loss=0.304, pruned_loss=0.07402, ctc_loss=0.1291, over 3296510.02 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:17:52,555 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2848916.0, ans=0.125 2023-10-09 19:18:00,006 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2848916.0, ans=0.125 2023-10-09 19:18:06,780 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848962.6666666665, ans=0.1 2023-10-09 19:18:07,285 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2023-10-09 19:18:10,857 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-10-09 19:18:12,255 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2848962.6666666665, ans=0.125 2023-10-09 19:18:14,797 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2848962.6666666665, ans=0.125 2023-10-09 19:18:17,304 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+02 3.581e+02 3.886e+02 4.426e+02 7.686e+02, threshold=7.772e+02, percent-clipped=0.0 2023-10-09 19:18:26,444 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2849009.3333333335, ans=0.1 2023-10-09 19:18:41,092 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-10-09 19:18:42,211 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=22.5 2023-10-09 19:18:50,459 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2849102.6666666665, ans=0.2 2023-10-09 19:18:56,344 INFO [train.py:1031] (3/4) Epoch 14, batch 25800, loss[loss=0.1756, simple_loss=0.2248, pruned_loss=0.0468, ctc_loss=0.08185, over 16413.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.3006, pruned_loss=0.06966, ctc_loss=0.1222, over 3293125.36 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:19:47,781 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2849336.0, ans=0.125 2023-10-09 19:19:48,864 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2849336.0, ans=0.125 2023-10-09 19:19:59,396 INFO [train.py:1031] (3/4) Epoch 14, batch 25850, loss[loss=0.3051, simple_loss=0.3483, pruned_loss=0.09773, ctc_loss=0.1663, over 16663.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2993, pruned_loss=0.06757, ctc_loss=0.1185, over 3295488.33 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:20:15,463 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2849429.3333333335, ans=0.125 2023-10-09 19:20:19,758 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2849429.3333333335, ans=0.125 2023-10-09 19:20:24,801 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.590e+02 3.413e+02 3.966e+02 4.957e+02 9.645e+02, threshold=7.933e+02, percent-clipped=3.0 2023-10-09 19:20:57,558 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2023-10-09 19:21:00,851 INFO [train.py:1031] (3/4) Epoch 14, batch 25900, loss[loss=0.1896, simple_loss=0.2301, pruned_loss=0.05529, ctc_loss=0.09648, over 16908.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2953, pruned_loss=0.06582, ctc_loss=0.1145, over 3300248.46 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:21:03,924 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2849616.0, ans=0.125 2023-10-09 19:21:31,108 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2023-10-09 19:21:32,929 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2849709.3333333335, ans=0.0 2023-10-09 19:21:37,634 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2849756.0, ans=0.125 2023-10-09 19:21:38,637 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2849756.0, ans=0.1 2023-10-09 19:21:38,757 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849756.0, ans=0.125 2023-10-09 19:21:47,344 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2849756.0, ans=0.125 2023-10-09 19:21:53,153 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2849802.6666666665, ans=0.125 2023-10-09 19:22:01,802 INFO [train.py:1031] (3/4) Epoch 14, batch 25950, loss[loss=0.2028, simple_loss=0.263, pruned_loss=0.05447, ctc_loss=0.08446, over 16753.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2898, pruned_loss=0.06194, ctc_loss=0.1079, over 3288981.66 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:22:28,774 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.824e+02 3.535e+02 4.166e+02 1.027e+03, threshold=7.071e+02, percent-clipped=2.0 2023-10-09 19:23:00,690 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=22.5 2023-10-09 19:23:02,050 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2850082.6666666665, ans=0.0 2023-10-09 19:23:02,807 INFO [train.py:1031] (3/4) Epoch 14, batch 26000, loss[loss=0.2536, simple_loss=0.2921, pruned_loss=0.07916, ctc_loss=0.1421, over 16914.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2869, pruned_loss=0.06269, ctc_loss=0.1092, over 3298022.54 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:23:17,587 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2850129.3333333335, ans=0.1 2023-10-09 19:23:48,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2850222.6666666665, ans=0.125 2023-10-09 19:24:04,553 INFO [train.py:1031] (3/4) Epoch 14, batch 26050, loss[loss=0.233, simple_loss=0.3041, pruned_loss=0.06091, ctc_loss=0.1003, over 16821.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2867, pruned_loss=0.0616, ctc_loss=0.1077, over 3298366.12 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:24:04,986 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2850316.0, ans=0.125 2023-10-09 19:24:05,892 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2850316.0, ans=0.125 2023-10-09 19:24:18,465 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=12.0 2023-10-09 19:24:19,622 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-10-09 19:24:31,163 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.011e+02 3.542e+02 4.270e+02 6.836e+02, threshold=7.085e+02, percent-clipped=0.0 2023-10-09 19:24:31,524 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:24:58,364 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2850502.6666666665, ans=0.2 2023-10-09 19:25:04,426 INFO [train.py:1031] (3/4) Epoch 14, batch 26100, loss[loss=0.2204, simple_loss=0.285, pruned_loss=0.05984, ctc_loss=0.09035, over 12215.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2896, pruned_loss=0.06152, ctc_loss=0.1063, over 3292083.10 frames. ], batch size: 41, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:25:06,721 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2850549.3333333335, ans=0.125 2023-10-09 19:25:53,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2850736.0, ans=0.04949747468305833 2023-10-09 19:25:57,727 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2850736.0, ans=0.125 2023-10-09 19:26:06,160 INFO [train.py:1031] (3/4) Epoch 14, batch 26150, loss[loss=0.235, simple_loss=0.2825, pruned_loss=0.06959, ctc_loss=0.121, over 16730.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.293, pruned_loss=0.0642, ctc_loss=0.1108, over 3294987.03 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:26:09,638 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.31 vs. limit=15.0 2023-10-09 19:26:12,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2850782.6666666665, ans=0.125 2023-10-09 19:26:16,863 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2850782.6666666665, ans=10.0 2023-10-09 19:26:22,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=2850829.3333333335, ans=10.0 2023-10-09 19:26:26,772 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2850829.3333333335, ans=0.125 2023-10-09 19:26:36,016 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 3.218e+02 3.789e+02 4.435e+02 6.214e+02, threshold=7.579e+02, percent-clipped=0.0 2023-10-09 19:26:36,293 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2850876.0, ans=0.025 2023-10-09 19:26:38,693 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-10-09 19:27:07,873 INFO [train.py:1031] (3/4) Epoch 14, batch 26200, loss[loss=0.2049, simple_loss=0.2668, pruned_loss=0.05307, ctc_loss=0.09201, over 16949.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.291, pruned_loss=0.06475, ctc_loss=0.1115, over 3296840.64 frames. ], batch size: 293, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:27:13,184 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2851016.0, ans=0.04949747468305833 2023-10-09 19:27:27,033 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2851062.6666666665, ans=0.0 2023-10-09 19:27:51,258 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2023-10-09 19:28:09,474 INFO [train.py:1031] (3/4) Epoch 14, batch 26250, loss[loss=0.193, simple_loss=0.2765, pruned_loss=0.04051, ctc_loss=0.07106, over 16840.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2799, pruned_loss=0.06122, ctc_loss=0.1048, over 3289450.34 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:28:43,498 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 3.073e+02 4.030e+02 5.136e+02 8.779e+02, threshold=8.059e+02, percent-clipped=2.0 2023-10-09 19:28:51,469 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2851389.3333333335, ans=0.1 2023-10-09 19:29:03,482 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2851436.0, ans=0.125 2023-10-09 19:29:08,152 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=15.0 2023-10-09 19:29:11,519 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:29:13,859 INFO [train.py:1031] (3/4) Epoch 14, batch 26300, loss[loss=0.2513, simple_loss=0.3064, pruned_loss=0.072, ctc_loss=0.1305, over 16863.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2843, pruned_loss=0.06168, ctc_loss=0.1061, over 3294008.85 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:29:18,681 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2851482.6666666665, ans=0.0 2023-10-09 19:29:36,189 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2851529.3333333335, ans=0.125 2023-10-09 19:29:46,587 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2851576.0, ans=0.125 2023-10-09 19:29:51,594 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:29:54,768 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.83 vs. limit=22.5 2023-10-09 19:30:05,724 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2851669.3333333335, ans=0.0 2023-10-09 19:30:11,729 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851669.3333333335, ans=0.1 2023-10-09 19:30:17,350 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2851716.0, ans=0.2 2023-10-09 19:30:17,430 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851716.0, ans=0.1 2023-10-09 19:30:18,138 INFO [train.py:1031] (3/4) Epoch 14, batch 26350, loss[loss=0.2655, simple_loss=0.338, pruned_loss=0.07274, ctc_loss=0.1188, over 16810.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2915, pruned_loss=0.06456, ctc_loss=0.1116, over 3297388.78 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:30:18,492 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851716.0, ans=0.1 2023-10-09 19:30:26,558 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851716.0, ans=0.1 2023-10-09 19:30:39,255 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851762.6666666665, ans=0.1 2023-10-09 19:30:49,813 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+02 3.520e+02 4.150e+02 4.845e+02 1.370e+03, threshold=8.299e+02, percent-clipped=2.0 2023-10-09 19:31:03,215 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2851856.0, ans=0.125 2023-10-09 19:31:18,520 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2851902.6666666665, ans=0.95 2023-10-09 19:31:20,351 INFO [train.py:1031] (3/4) Epoch 14, batch 26400, loss[loss=0.2003, simple_loss=0.2764, pruned_loss=0.04545, ctc_loss=0.08329, over 16758.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2934, pruned_loss=0.06516, ctc_loss=0.1133, over 3292523.91 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:32:24,405 INFO [train.py:1031] (3/4) Epoch 14, batch 26450, loss[loss=0.2358, simple_loss=0.3232, pruned_loss=0.05568, ctc_loss=0.09256, over 16269.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2889, pruned_loss=0.0625, ctc_loss=0.1091, over 3292241.57 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:32:38,957 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2852229.3333333335, ans=0.0 2023-10-09 19:32:52,233 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.14 vs. limit=6.0 2023-10-09 19:32:54,662 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2852276.0, ans=0.125 2023-10-09 19:32:58,049 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.048e+02 3.586e+02 4.298e+02 7.757e+02, threshold=7.171e+02, percent-clipped=0.0 2023-10-09 19:33:04,187 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2852322.6666666665, ans=0.125 2023-10-09 19:33:11,583 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2852322.6666666665, ans=0.0 2023-10-09 19:33:28,751 INFO [train.py:1031] (3/4) Epoch 14, batch 26500, loss[loss=0.2362, simple_loss=0.2961, pruned_loss=0.06553, ctc_loss=0.1132, over 16858.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2919, pruned_loss=0.06356, ctc_loss=0.1104, over 3294918.20 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 19:33:31,156 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2852416.0, ans=0.125 2023-10-09 19:34:12,661 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2852556.0, ans=0.0 2023-10-09 19:34:15,944 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2852556.0, ans=0.1 2023-10-09 19:34:20,446 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2852602.6666666665, ans=10.0 2023-10-09 19:34:26,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2852602.6666666665, ans=0.0 2023-10-09 19:34:30,318 INFO [train.py:1031] (3/4) Epoch 14, batch 26550, loss[loss=0.2598, simple_loss=0.3033, pruned_loss=0.08166, ctc_loss=0.1326, over 16704.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.296, pruned_loss=0.0666, ctc_loss=0.1156, over 3303967.16 frames. ], batch size: 111, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:34:35,222 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-10-09 19:35:06,512 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 3.537e+02 4.198e+02 5.222e+02 9.143e+02, threshold=8.395e+02, percent-clipped=3.0 2023-10-09 19:35:20,464 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2852836.0, ans=0.125 2023-10-09 19:35:25,496 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2852836.0, ans=0.125 2023-10-09 19:35:32,304 INFO [train.py:1031] (3/4) Epoch 14, batch 26600, loss[loss=0.2147, simple_loss=0.297, pruned_loss=0.04894, ctc_loss=0.0862, over 16771.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2988, pruned_loss=0.06589, ctc_loss=0.1148, over 3310283.53 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:35:33,137 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2023-10-09 19:35:52,663 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2023-10-09 19:35:54,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2852929.3333333335, ans=0.2 2023-10-09 19:35:55,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2852976.0, ans=0.95 2023-10-09 19:36:08,039 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2853022.6666666665, ans=0.125 2023-10-09 19:36:26,599 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2853069.3333333335, ans=0.125 2023-10-09 19:36:34,511 INFO [train.py:1031] (3/4) Epoch 14, batch 26650, loss[loss=0.2045, simple_loss=0.2913, pruned_loss=0.04291, ctc_loss=0.07991, over 16853.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2965, pruned_loss=0.06235, ctc_loss=0.1098, over 3292865.85 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:36:49,614 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2853162.6666666665, ans=0.0 2023-10-09 19:36:54,453 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2853162.6666666665, ans=0.125 2023-10-09 19:37:01,831 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:37:03,875 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2853209.3333333335, ans=0.1 2023-10-09 19:37:10,949 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.027e+02 3.490e+02 4.414e+02 7.979e+02, threshold=6.980e+02, percent-clipped=0.0 2023-10-09 19:37:22,530 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2853302.6666666665, ans=0.0 2023-10-09 19:37:28,271 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-10-09 19:37:35,113 INFO [train.py:1031] (3/4) Epoch 14, batch 26700, loss[loss=0.2209, simple_loss=0.2653, pruned_loss=0.06619, ctc_loss=0.1101, over 17147.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.291, pruned_loss=0.06008, ctc_loss=0.1067, over 3298989.02 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:37:47,830 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2853396.0, ans=0.125 2023-10-09 19:38:08,359 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2853442.6666666665, ans=0.125 2023-10-09 19:38:15,270 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2853489.3333333335, ans=0.2 2023-10-09 19:38:36,852 INFO [train.py:1031] (3/4) Epoch 14, batch 26750, loss[loss=0.2059, simple_loss=0.2662, pruned_loss=0.05455, ctc_loss=0.09115, over 16727.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2833, pruned_loss=0.05983, ctc_loss=0.1057, over 3309258.80 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:38:47,897 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2853629.3333333335, ans=0.0 2023-10-09 19:39:14,601 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.221e+02 3.735e+02 4.264e+02 6.455e+02, threshold=7.471e+02, percent-clipped=0.0 2023-10-09 19:39:38,950 INFO [train.py:1031] (3/4) Epoch 14, batch 26800, loss[loss=0.2567, simple_loss=0.2945, pruned_loss=0.07958, ctc_loss=0.1492, over 16792.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2792, pruned_loss=0.0596, ctc_loss=0.1053, over 3313973.39 frames. ], batch size: 329, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:39:39,795 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.10 vs. limit=15.0 2023-10-09 19:39:47,261 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2853816.0, ans=0.0 2023-10-09 19:39:59,677 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2853862.6666666665, ans=0.2 2023-10-09 19:40:07,996 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2853909.3333333335, ans=0.125 2023-10-09 19:40:23,614 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2853956.0, ans=0.0 2023-10-09 19:40:33,255 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-10-09 19:40:40,180 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2854002.6666666665, ans=0.2 2023-10-09 19:40:41,947 INFO [train.py:1031] (3/4) Epoch 14, batch 26850, loss[loss=0.2268, simple_loss=0.28, pruned_loss=0.06402, ctc_loss=0.114, over 15230.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2828, pruned_loss=0.06232, ctc_loss=0.1097, over 3301975.26 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:40:58,720 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.40 vs. limit=15.0 2023-10-09 19:40:59,398 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2854096.0, ans=0.0 2023-10-09 19:41:21,409 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.526e+02 4.022e+02 4.797e+02 9.323e+02, threshold=8.043e+02, percent-clipped=3.0 2023-10-09 19:41:29,115 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2854189.3333333335, ans=0.1 2023-10-09 19:41:39,417 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2854236.0, ans=0.125 2023-10-09 19:41:45,196 INFO [train.py:1031] (3/4) Epoch 14, batch 26900, loss[loss=0.239, simple_loss=0.3106, pruned_loss=0.06145, ctc_loss=0.1114, over 16908.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2872, pruned_loss=0.06232, ctc_loss=0.1101, over 3289191.75 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:41:51,916 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2854282.6666666665, ans=0.125 2023-10-09 19:42:13,049 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-10-09 19:42:32,960 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2854422.6666666665, ans=0.1 2023-10-09 19:42:37,014 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2023-10-09 19:42:38,174 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2854469.3333333335, ans=0.125 2023-10-09 19:42:47,764 INFO [train.py:1031] (3/4) Epoch 14, batch 26950, loss[loss=0.2227, simple_loss=0.2723, pruned_loss=0.06292, ctc_loss=0.1181, over 16205.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2858, pruned_loss=0.06108, ctc_loss=0.1081, over 3278594.30 frames. ], batch size: 466, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:42:51,810 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2854516.0, ans=0.0 2023-10-09 19:43:10,106 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2023-10-09 19:43:26,397 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+02 3.110e+02 3.559e+02 4.212e+02 9.939e+02, threshold=7.118e+02, percent-clipped=2.0 2023-10-09 19:43:48,324 INFO [train.py:1031] (3/4) Epoch 14, batch 27000, loss[loss=0.2447, simple_loss=0.2656, pruned_loss=0.08328, ctc_loss=0.1432, over 16605.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2796, pruned_loss=0.06124, ctc_loss=0.1076, over 3273031.26 frames. ], batch size: 386, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:43:48,324 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 19:44:06,710 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2336, simple_loss=0.3018, pruned_loss=0.06376, ctc_loss=0.09459, over 1796401.00 frames. 2023-10-09 19:44:06,715 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 19:44:08,839 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2854749.3333333335, ans=0.2 2023-10-09 19:44:43,746 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2854889.3333333335, ans=0.0 2023-10-09 19:45:06,467 INFO [train.py:1031] (3/4) Epoch 14, batch 27050, loss[loss=0.1856, simple_loss=0.2468, pruned_loss=0.04735, ctc_loss=0.07427, over 16887.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2744, pruned_loss=0.05971, ctc_loss=0.1044, over 3277655.65 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:45:08,075 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-10-09 19:45:15,471 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2854982.6666666665, ans=0.1 2023-10-09 19:45:25,391 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2855029.3333333335, ans=0.0 2023-10-09 19:45:41,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2855122.6666666665, ans=0.0 2023-10-09 19:45:44,970 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.824e+02 3.206e+02 4.209e+02 1.336e+03, threshold=6.413e+02, percent-clipped=5.0 2023-10-09 19:45:57,537 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2855169.3333333335, ans=0.0 2023-10-09 19:46:05,131 INFO [train.py:1031] (3/4) Epoch 14, batch 27100, loss[loss=0.1977, simple_loss=0.251, pruned_loss=0.05429, ctc_loss=0.08959, over 16859.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.27, pruned_loss=0.05816, ctc_loss=0.1011, over 3294242.39 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:46:15,471 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2855262.6666666665, ans=0.125 2023-10-09 19:46:33,971 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2855309.3333333335, ans=0.1 2023-10-09 19:46:38,119 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2855309.3333333335, ans=0.2 2023-10-09 19:46:46,252 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2855356.0, ans=0.125 2023-10-09 19:46:48,304 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2855356.0, ans=0.5 2023-10-09 19:46:50,761 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.37 vs. limit=15.0 2023-10-09 19:47:04,172 INFO [train.py:1031] (3/4) Epoch 14, batch 27150, loss[loss=0.2242, simple_loss=0.2787, pruned_loss=0.06437, ctc_loss=0.1021, over 16800.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2703, pruned_loss=0.05943, ctc_loss=0.1031, over 3297642.83 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:47:13,964 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-10-09 19:47:20,601 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2855496.0, ans=0.125 2023-10-09 19:47:43,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2855589.3333333335, ans=0.1 2023-10-09 19:47:46,371 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+02 3.038e+02 3.523e+02 4.275e+02 1.319e+03, threshold=7.047e+02, percent-clipped=7.0 2023-10-09 19:47:48,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2855589.3333333335, ans=0.0 2023-10-09 19:48:02,061 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2855636.0, ans=0.0 2023-10-09 19:48:02,438 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2023-10-09 19:48:03,360 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-10-09 19:48:05,720 INFO [train.py:1031] (3/4) Epoch 14, batch 27200, loss[loss=0.3157, simple_loss=0.3806, pruned_loss=0.08974, ctc_loss=0.1781, over 16600.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2806, pruned_loss=0.06052, ctc_loss=0.1057, over 3302868.21 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:48:53,591 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2855869.3333333335, ans=0.0 2023-10-09 19:49:06,309 INFO [train.py:1031] (3/4) Epoch 14, batch 27250, loss[loss=0.2429, simple_loss=0.2808, pruned_loss=0.07608, ctc_loss=0.1322, over 16728.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2826, pruned_loss=0.06042, ctc_loss=0.1057, over 3297457.38 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:49:30,838 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2855962.6666666665, ans=0.0 2023-10-09 19:49:35,857 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-10-09 19:49:49,641 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2856056.0, ans=0.07 2023-10-09 19:49:51,927 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 3.213e+02 3.949e+02 4.744e+02 1.249e+03, threshold=7.899e+02, percent-clipped=6.0 2023-10-09 19:50:10,457 INFO [train.py:1031] (3/4) Epoch 14, batch 27300, loss[loss=0.2165, simple_loss=0.284, pruned_loss=0.05398, ctc_loss=0.1026, over 16749.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2775, pruned_loss=0.06008, ctc_loss=0.1052, over 3292444.59 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:50:26,383 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.26 vs. limit=10.0 2023-10-09 19:50:33,039 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2856196.0, ans=0.2 2023-10-09 19:50:39,043 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2856242.6666666665, ans=0.1 2023-10-09 19:50:59,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2856336.0, ans=0.125 2023-10-09 19:51:07,636 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2856336.0, ans=0.015 2023-10-09 19:51:11,942 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-10-09 19:51:13,361 INFO [train.py:1031] (3/4) Epoch 14, batch 27350, loss[loss=0.2028, simple_loss=0.2765, pruned_loss=0.04745, ctc_loss=0.08551, over 16846.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.276, pruned_loss=0.05773, ctc_loss=0.1016, over 3272816.26 frames. ], batch size: 243, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:51:18,543 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2856382.6666666665, ans=0.0 2023-10-09 19:51:31,233 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2856429.3333333335, ans=0.125 2023-10-09 19:51:33,994 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2856429.3333333335, ans=0.1 2023-10-09 19:51:49,713 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2856522.6666666665, ans=0.5 2023-10-09 19:51:52,885 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2856522.6666666665, ans=0.125 2023-10-09 19:51:58,974 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.714e+02 3.156e+02 4.138e+02 1.229e+03, threshold=6.312e+02, percent-clipped=2.0 2023-10-09 19:51:59,276 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2856522.6666666665, ans=0.1 2023-10-09 19:52:01,860 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-10-09 19:52:07,942 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-10-09 19:52:11,627 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-10-09 19:52:14,607 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2856616.0, ans=0.95 2023-10-09 19:52:15,404 INFO [train.py:1031] (3/4) Epoch 14, batch 27400, loss[loss=0.1917, simple_loss=0.2464, pruned_loss=0.05059, ctc_loss=0.08937, over 16688.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2726, pruned_loss=0.05482, ctc_loss=0.09683, over 3266823.40 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:52:33,123 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2856662.6666666665, ans=0.05 2023-10-09 19:52:35,715 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2856662.6666666665, ans=0.1 2023-10-09 19:52:38,899 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:52:48,592 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2856709.3333333335, ans=0.125 2023-10-09 19:53:14,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2856849.3333333335, ans=0.0 2023-10-09 19:53:15,197 INFO [train.py:1031] (3/4) Epoch 14, batch 27450, loss[loss=0.2469, simple_loss=0.2856, pruned_loss=0.07833, ctc_loss=0.1289, over 16266.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2672, pruned_loss=0.05473, ctc_loss=0.09671, over 3263633.55 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:53:24,149 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 2023-10-09 19:53:41,223 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2856942.6666666665, ans=0.125 2023-10-09 19:53:44,394 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=22.5 2023-10-09 19:54:00,072 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.825e+02 3.198e+02 4.029e+02 6.832e+02, threshold=6.397e+02, percent-clipped=4.0 2023-10-09 19:54:15,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=2857082.6666666665, ans=0.5 2023-10-09 19:54:16,220 INFO [train.py:1031] (3/4) Epoch 14, batch 27500, loss[loss=0.1833, simple_loss=0.2349, pruned_loss=0.05037, ctc_loss=0.07734, over 16681.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2663, pruned_loss=0.0541, ctc_loss=0.09577, over 3269094.72 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:54:34,902 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-10-09 19:54:40,285 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2857176.0, ans=0.125 2023-10-09 19:55:10,614 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2857269.3333333335, ans=0.125 2023-10-09 19:55:15,374 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2857269.3333333335, ans=0.125 2023-10-09 19:55:17,199 INFO [train.py:1031] (3/4) Epoch 14, batch 27550, loss[loss=0.2035, simple_loss=0.2553, pruned_loss=0.05625, ctc_loss=0.09809, over 16841.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2655, pruned_loss=0.05496, ctc_loss=0.09706, over 3282926.83 frames. ], batch size: 243, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:55:19,901 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=22.5 2023-10-09 19:55:38,294 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2857362.6666666665, ans=0.125 2023-10-09 19:55:47,862 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2857409.3333333335, ans=0.125 2023-10-09 19:55:52,786 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2857409.3333333335, ans=0.125 2023-10-09 19:56:06,687 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.165e+02 3.747e+02 4.293e+02 1.170e+03, threshold=7.493e+02, percent-clipped=3.0 2023-10-09 19:56:15,394 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=22.5 2023-10-09 19:56:20,206 INFO [train.py:1031] (3/4) Epoch 14, batch 27600, loss[loss=0.2354, simple_loss=0.2974, pruned_loss=0.06204, ctc_loss=0.1233, over 16836.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2658, pruned_loss=0.05592, ctc_loss=0.09869, over 3293355.38 frames. ], batch size: 329, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:56:40,711 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2857596.0, ans=0.0 2023-10-09 19:56:43,469 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2857596.0, ans=0.0 2023-10-09 19:56:56,664 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2857689.3333333335, ans=0.1 2023-10-09 19:57:02,682 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2857689.3333333335, ans=0.1 2023-10-09 19:57:16,110 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2857736.0, ans=0.0 2023-10-09 19:57:22,127 INFO [train.py:1031] (3/4) Epoch 14, batch 27650, loss[loss=0.1945, simple_loss=0.2653, pruned_loss=0.045, ctc_loss=0.08416, over 16928.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2694, pruned_loss=0.05672, ctc_loss=0.1003, over 3296233.37 frames. ], batch size: 259, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:57:22,540 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2857782.6666666665, ans=0.125 2023-10-09 19:57:53,578 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2857876.0, ans=0.0 2023-10-09 19:57:58,079 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2857876.0, ans=0.1 2023-10-09 19:58:10,217 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.178e+02 3.688e+02 4.513e+02 1.131e+03, threshold=7.375e+02, percent-clipped=1.0 2023-10-09 19:58:24,464 INFO [train.py:1031] (3/4) Epoch 14, batch 27700, loss[loss=0.2832, simple_loss=0.2929, pruned_loss=0.1001, ctc_loss=0.1832, over 16613.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2681, pruned_loss=0.05802, ctc_loss=0.1025, over 3306176.26 frames. ], batch size: 386, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:58:45,671 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2858062.6666666665, ans=0.125 2023-10-09 19:59:10,111 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2858156.0, ans=0.09899494936611666 2023-10-09 19:59:12,697 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2858202.6666666665, ans=0.1 2023-10-09 19:59:13,708 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2858202.6666666665, ans=0.0 2023-10-09 19:59:24,059 INFO [train.py:1031] (3/4) Epoch 14, batch 27750, loss[loss=0.2124, simple_loss=0.2673, pruned_loss=0.05939, ctc_loss=0.09671, over 16901.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2664, pruned_loss=0.05923, ctc_loss=0.1042, over 3313345.47 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:59:31,238 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2858249.3333333335, ans=0.035 2023-10-09 19:59:53,971 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2858342.6666666665, ans=0.2 2023-10-09 20:00:04,938 WARNING [train.py:1204] (3/4) Exclude cut with ID R0014_M0086-0174-157 from training. Number of frames (before subsampling): 147. Number of frames (after subsampling): 35. Text: 你买多少东西一会儿他就送你这么多东西啊啊三大桶那三大桶得用多少时间就啊. Tokens: ['▁你', '买', '多', '少', '东', '西', '一', '会', '儿', '他', '就', '送', '你', '这', '么', '多', '东', '西', '啊', '啊', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '那', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '得', '用', '多', '少', '时', '间', '就', '啊']. Number of tokens: 39 2023-10-09 20:00:14,001 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+02 3.399e+02 3.890e+02 4.499e+02 8.877e+02, threshold=7.779e+02, percent-clipped=2.0 2023-10-09 20:00:15,811 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-10-09 20:00:24,187 INFO [train.py:1031] (3/4) Epoch 14, batch 27800, loss[loss=0.2266, simple_loss=0.2855, pruned_loss=0.06155, ctc_loss=0.1114, over 16947.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2674, pruned_loss=0.06041, ctc_loss=0.1059, over 3314079.04 frames. ], batch size: 243, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:00:29,818 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2858482.6666666665, ans=0.0 2023-10-09 20:00:30,923 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:00:31,431 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2023-10-09 20:00:47,317 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2858529.3333333335, ans=0.0 2023-10-09 20:01:07,496 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2858622.6666666665, ans=22.5 2023-10-09 20:01:18,214 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.85 vs. limit=10.0 2023-10-09 20:01:25,409 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2858669.3333333335, ans=0.125 2023-10-09 20:01:27,853 INFO [train.py:1031] (3/4) Epoch 14, batch 27850, loss[loss=0.238, simple_loss=0.2951, pruned_loss=0.06727, ctc_loss=0.1161, over 16813.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2777, pruned_loss=0.0642, ctc_loss=0.1135, over 3307621.43 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 20:01:34,086 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2858716.0, ans=0.125 2023-10-09 20:01:38,286 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2858762.6666666665, ans=0.0 2023-10-09 20:01:40,795 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2858762.6666666665, ans=0.2 2023-10-09 20:01:43,576 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2858762.6666666665, ans=0.125 2023-10-09 20:01:44,003 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=22.5 2023-10-09 20:02:04,346 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2023-10-09 20:02:10,972 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2858856.0, ans=0.1 2023-10-09 20:02:18,205 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+02 3.601e+02 4.394e+02 5.369e+02 1.444e+03, threshold=8.787e+02, percent-clipped=3.0 2023-10-09 20:02:24,395 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2858902.6666666665, ans=0.125 2023-10-09 20:02:24,444 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2858902.6666666665, ans=0.0 2023-10-09 20:02:27,388 INFO [train.py:1031] (3/4) Epoch 14, batch 27900, loss[loss=0.165, simple_loss=0.2287, pruned_loss=0.03728, ctc_loss=0.06668, over 16867.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2827, pruned_loss=0.06332, ctc_loss=0.1136, over 3304898.62 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:02:30,540 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2858949.3333333335, ans=0.0 2023-10-09 20:02:41,066 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2023-10-09 20:02:42,744 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2858996.0, ans=0.125 2023-10-09 20:02:52,005 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2859042.6666666665, ans=0.125 2023-10-09 20:03:09,855 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2859089.3333333335, ans=0.125 2023-10-09 20:03:11,862 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2859089.3333333335, ans=0.125 2023-10-09 20:03:29,820 INFO [train.py:1031] (3/4) Epoch 14, batch 27950, loss[loss=0.1671, simple_loss=0.2545, pruned_loss=0.02903, ctc_loss=0.05388, over 16917.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2796, pruned_loss=0.05832, ctc_loss=0.1057, over 3305893.34 frames. ], batch size: 229, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:04:20,566 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:04:21,887 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.805e+02 3.200e+02 4.012e+02 8.186e+02, threshold=6.399e+02, percent-clipped=0.0 2023-10-09 20:04:31,539 INFO [train.py:1031] (3/4) Epoch 14, batch 28000, loss[loss=0.1949, simple_loss=0.2418, pruned_loss=0.055, ctc_loss=0.0948, over 16773.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2735, pruned_loss=0.05649, ctc_loss=0.1023, over 3303648.11 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:04:38,454 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-10-09 20:04:39,915 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2859416.0, ans=0.0 2023-10-09 20:05:03,953 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2859509.3333333335, ans=0.0 2023-10-09 20:05:16,984 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2859556.0, ans=0.2 2023-10-09 20:05:19,055 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2859556.0, ans=0.125 2023-10-09 20:05:23,078 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-10-09 20:05:23,351 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-10-09 20:05:24,360 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.43 vs. limit=10.0 2023-10-09 20:05:33,909 INFO [train.py:1031] (3/4) Epoch 14, batch 28050, loss[loss=0.2184, simple_loss=0.2391, pruned_loss=0.07292, ctc_loss=0.1293, over 15369.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2695, pruned_loss=0.05746, ctc_loss=0.1036, over 3308086.93 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:05:59,517 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2859742.6666666665, ans=0.025 2023-10-09 20:06:09,042 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2859789.3333333335, ans=0.0 2023-10-09 20:06:20,707 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2859789.3333333335, ans=0.1 2023-10-09 20:06:20,776 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2859789.3333333335, ans=0.0 2023-10-09 20:06:25,713 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.244e+02 3.661e+02 4.395e+02 6.655e+02, threshold=7.321e+02, percent-clipped=2.0 2023-10-09 20:06:34,744 INFO [train.py:1031] (3/4) Epoch 14, batch 28100, loss[loss=0.2181, simple_loss=0.277, pruned_loss=0.05795, ctc_loss=0.1085, over 16525.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2697, pruned_loss=0.05929, ctc_loss=0.1059, over 3315334.40 frames. ], batch size: 466, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:07:39,092 INFO [train.py:1031] (3/4) Epoch 14, batch 28150, loss[loss=0.2374, simple_loss=0.2655, pruned_loss=0.07972, ctc_loss=0.1247, over 10955.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2773, pruned_loss=0.06014, ctc_loss=0.1077, over 3307444.27 frames. ], batch size: 36, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:07:39,449 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2860116.0, ans=0.0 2023-10-09 20:07:59,251 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2860162.6666666665, ans=0.0 2023-10-09 20:08:02,956 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2860209.3333333335, ans=0.1 2023-10-09 20:08:20,698 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2860256.0, ans=0.125 2023-10-09 20:08:30,333 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2023-10-09 20:08:34,457 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.258e+02 3.630e+02 4.315e+02 7.484e+02, threshold=7.260e+02, percent-clipped=1.0 2023-10-09 20:08:41,523 INFO [train.py:1031] (3/4) Epoch 14, batch 28200, loss[loss=0.2851, simple_loss=0.3368, pruned_loss=0.08697, ctc_loss=0.1485, over 16830.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2868, pruned_loss=0.06338, ctc_loss=0.113, over 3309769.06 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:08:50,828 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2860349.3333333335, ans=0.1 2023-10-09 20:09:30,693 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2860536.0, ans=0.125 2023-10-09 20:09:43,227 INFO [train.py:1031] (3/4) Epoch 14, batch 28250, loss[loss=0.2561, simple_loss=0.3, pruned_loss=0.07894, ctc_loss=0.1359, over 16805.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2882, pruned_loss=0.06537, ctc_loss=0.1159, over 3316294.63 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:09:54,699 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=2860629.3333333335, ans=0.02 2023-10-09 20:09:58,979 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2860629.3333333335, ans=0.125 2023-10-09 20:10:01,670 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2860629.3333333335, ans=0.0 2023-10-09 20:10:07,461 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2860676.0, ans=0.125 2023-10-09 20:10:23,039 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.86 vs. limit=22.5 2023-10-09 20:10:41,217 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+02 3.506e+02 4.003e+02 4.873e+02 1.007e+03, threshold=8.006e+02, percent-clipped=4.0 2023-10-09 20:10:43,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2860769.3333333335, ans=0.0 2023-10-09 20:10:46,123 INFO [train.py:1031] (3/4) Epoch 14, batch 28300, loss[loss=0.235, simple_loss=0.2882, pruned_loss=0.0694, ctc_loss=0.1074, over 16885.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.287, pruned_loss=0.06645, ctc_loss=0.1167, over 3317716.94 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:11:11,366 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2860909.3333333335, ans=0.0 2023-10-09 20:11:12,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2860909.3333333335, ans=0.0 2023-10-09 20:11:13,854 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2860909.3333333335, ans=0.1 2023-10-09 20:11:38,854 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2861002.6666666665, ans=0.025 2023-10-09 20:11:47,955 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2023-10-09 20:11:48,190 INFO [train.py:1031] (3/4) Epoch 14, batch 28350, loss[loss=0.198, simple_loss=0.2436, pruned_loss=0.05646, ctc_loss=0.0989, over 16681.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2819, pruned_loss=0.06615, ctc_loss=0.1163, over 3313784.96 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:12:00,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2861096.0, ans=0.0 2023-10-09 20:12:20,245 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2861142.6666666665, ans=0.2 2023-10-09 20:12:29,494 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:12:33,376 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2861189.3333333335, ans=0.125 2023-10-09 20:12:46,030 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.563e+02 3.327e+02 3.829e+02 4.439e+02 7.732e+02, threshold=7.659e+02, percent-clipped=0.0 2023-10-09 20:12:48,537 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2861236.0, ans=0.1 2023-10-09 20:12:49,829 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.57 vs. limit=10.0 2023-10-09 20:12:50,328 INFO [train.py:1031] (3/4) Epoch 14, batch 28400, loss[loss=0.23, simple_loss=0.2978, pruned_loss=0.05867, ctc_loss=0.1122, over 16864.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2864, pruned_loss=0.06691, ctc_loss=0.1181, over 3314957.81 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:13:55,089 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2861469.3333333335, ans=0.0 2023-10-09 20:13:56,934 INFO [train.py:1031] (3/4) Epoch 14, batch 28450, loss[loss=0.2471, simple_loss=0.3224, pruned_loss=0.06274, ctc_loss=0.1157, over 16905.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.2946, pruned_loss=0.06701, ctc_loss=0.1188, over 3312155.61 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:14:07,591 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-10-09 20:14:10,174 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2861562.6666666665, ans=0.125 2023-10-09 20:14:17,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2861562.6666666665, ans=0.125 2023-10-09 20:14:19,611 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2861562.6666666665, ans=0.125 2023-10-09 20:14:34,068 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2861609.3333333335, ans=0.125 2023-10-09 20:14:35,127 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2861656.0, ans=0.125 2023-10-09 20:14:43,265 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2861656.0, ans=0.0 2023-10-09 20:14:44,192 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2861656.0, ans=0.125 2023-10-09 20:14:49,696 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=22.5 2023-10-09 20:14:58,087 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+02 3.582e+02 4.557e+02 5.514e+02 1.079e+03, threshold=9.115e+02, percent-clipped=9.0 2023-10-09 20:15:00,626 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2861749.3333333335, ans=0.125 2023-10-09 20:15:01,353 INFO [train.py:1031] (3/4) Epoch 14, batch 28500, loss[loss=0.2321, simple_loss=0.3379, pruned_loss=0.04632, ctc_loss=0.08392, over 16278.00 frames. ], tot_loss[loss=0.2408, simple_loss=0.3015, pruned_loss=0.06647, ctc_loss=0.1181, over 3309276.19 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:15:03,764 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2861749.3333333335, ans=0.125 2023-10-09 20:15:05,266 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-10-09 20:15:47,161 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2861889.3333333335, ans=10.0 2023-10-09 20:15:48,244 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2861889.3333333335, ans=0.125 2023-10-09 20:16:03,058 INFO [train.py:1031] (3/4) Epoch 14, batch 28550, loss[loss=0.1761, simple_loss=0.2568, pruned_loss=0.03566, ctc_loss=0.06016, over 16695.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2959, pruned_loss=0.06071, ctc_loss=0.1083, over 3301752.81 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:16:04,707 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=2861982.6666666665, ans=15.0 2023-10-09 20:16:05,566 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2861982.6666666665, ans=0.125 2023-10-09 20:16:20,552 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2862029.3333333335, ans=0.2 2023-10-09 20:16:39,302 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2862122.6666666665, ans=0.125 2023-10-09 20:16:41,229 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2862122.6666666665, ans=15.0 2023-10-09 20:16:41,806 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2862122.6666666665, ans=0.015 2023-10-09 20:17:00,496 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.790e+02 3.333e+02 3.902e+02 5.980e+02, threshold=6.666e+02, percent-clipped=0.0 2023-10-09 20:17:03,188 INFO [train.py:1031] (3/4) Epoch 14, batch 28600, loss[loss=0.2156, simple_loss=0.2705, pruned_loss=0.0597, ctc_loss=0.1032, over 17122.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2914, pruned_loss=0.05906, ctc_loss=0.1052, over 3311907.15 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:17:36,303 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2862309.3333333335, ans=0.125 2023-10-09 20:17:36,710 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2023-10-09 20:17:52,646 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=12.0 2023-10-09 20:17:59,439 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2023-10-09 20:18:05,181 INFO [train.py:1031] (3/4) Epoch 14, batch 28650, loss[loss=0.1836, simple_loss=0.2508, pruned_loss=0.04306, ctc_loss=0.07557, over 16775.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2871, pruned_loss=0.05911, ctc_loss=0.1049, over 3315441.23 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:18:14,119 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2862449.3333333335, ans=0.125 2023-10-09 20:18:16,462 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2862496.0, ans=0.2 2023-10-09 20:18:37,569 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2862542.6666666665, ans=0.025 2023-10-09 20:18:57,420 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2862636.0, ans=0.2 2023-10-09 20:19:05,964 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 2.995e+02 3.402e+02 4.215e+02 9.672e+02, threshold=6.804e+02, percent-clipped=2.0 2023-10-09 20:19:07,092 INFO [train.py:1031] (3/4) Epoch 14, batch 28700, loss[loss=0.1915, simple_loss=0.2667, pruned_loss=0.04324, ctc_loss=0.07458, over 16906.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2834, pruned_loss=0.05652, ctc_loss=0.1008, over 3313142.01 frames. ], batch size: 229, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:19:16,219 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2862682.6666666665, ans=0.125 2023-10-09 20:19:36,856 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2862776.0, ans=0.035 2023-10-09 20:19:49,467 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-10-09 20:19:57,685 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2862869.3333333335, ans=0.1 2023-10-09 20:19:59,782 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2862869.3333333335, ans=0.125 2023-10-09 20:20:03,532 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2862869.3333333335, ans=0.125 2023-10-09 20:20:07,318 INFO [train.py:1031] (3/4) Epoch 14, batch 28750, loss[loss=0.213, simple_loss=0.2633, pruned_loss=0.06095, ctc_loss=0.1022, over 16733.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2815, pruned_loss=0.05614, ctc_loss=0.1002, over 3311248.03 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:20:19,254 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-10-09 20:20:22,755 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862962.6666666665, ans=0.1 2023-10-09 20:20:56,577 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2863102.6666666665, ans=0.125 2023-10-09 20:21:04,001 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2863102.6666666665, ans=0.125 2023-10-09 20:21:05,639 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2863102.6666666665, ans=0.1 2023-10-09 20:21:09,038 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 3.101e+02 3.665e+02 4.221e+02 6.562e+02, threshold=7.330e+02, percent-clipped=0.0 2023-10-09 20:21:09,065 INFO [train.py:1031] (3/4) Epoch 14, batch 28800, loss[loss=0.2374, simple_loss=0.286, pruned_loss=0.06834, ctc_loss=0.1304, over 16848.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2806, pruned_loss=0.05802, ctc_loss=0.103, over 3318203.73 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:21:18,187 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.65 vs. limit=10.0 2023-10-09 20:21:21,801 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2863196.0, ans=0.125 2023-10-09 20:21:32,261 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-10-09 20:21:33,000 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2863242.6666666665, ans=0.125 2023-10-09 20:21:42,459 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2863242.6666666665, ans=0.125 2023-10-09 20:22:10,815 INFO [train.py:1031] (3/4) Epoch 14, batch 28850, loss[loss=0.2355, simple_loss=0.2664, pruned_loss=0.07762, ctc_loss=0.1233, over 16613.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2782, pruned_loss=0.05995, ctc_loss=0.1061, over 3319433.12 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:22:24,278 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2863429.3333333335, ans=0.0 2023-10-09 20:22:55,403 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2863522.6666666665, ans=0.2 2023-10-09 20:23:12,083 INFO [train.py:1031] (3/4) Epoch 14, batch 28900, loss[loss=0.2023, simple_loss=0.2564, pruned_loss=0.05579, ctc_loss=0.09127, over 16796.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2747, pruned_loss=0.06122, ctc_loss=0.1077, over 3316485.55 frames. ], batch size: 141, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:23:13,126 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+02 3.415e+02 3.744e+02 4.568e+02 8.890e+02, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 20:23:19,511 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2863616.0, ans=0.1 2023-10-09 20:23:25,692 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2863662.6666666665, ans=0.0 2023-10-09 20:23:28,907 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2863662.6666666665, ans=0.125 2023-10-09 20:23:54,586 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2863756.0, ans=0.2 2023-10-09 20:24:13,690 INFO [train.py:1031] (3/4) Epoch 14, batch 28950, loss[loss=0.2087, simple_loss=0.2688, pruned_loss=0.05589, ctc_loss=0.09198, over 16929.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2727, pruned_loss=0.06119, ctc_loss=0.1064, over 3299902.09 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:24:58,589 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2863989.3333333335, ans=0.2 2023-10-09 20:25:01,745 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2864036.0, ans=0.125 2023-10-09 20:25:01,756 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2864036.0, ans=0.0 2023-10-09 20:25:13,221 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2864036.0, ans=0.125 2023-10-09 20:25:15,069 INFO [train.py:1031] (3/4) Epoch 14, batch 29000, loss[loss=0.18, simple_loss=0.2494, pruned_loss=0.0403, ctc_loss=0.07522, over 16828.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2686, pruned_loss=0.05887, ctc_loss=0.1016, over 3292610.53 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:25:17,204 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+02 3.225e+02 3.785e+02 4.643e+02 9.976e+02, threshold=7.570e+02, percent-clipped=3.0 2023-10-09 20:25:20,865 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-10-09 20:25:32,073 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2864129.3333333335, ans=0.5 2023-10-09 20:25:39,624 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2864176.0, ans=0.125 2023-10-09 20:25:42,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2864176.0, ans=0.125 2023-10-09 20:25:46,611 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2864176.0, ans=0.1 2023-10-09 20:25:52,582 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2864222.6666666665, ans=0.2 2023-10-09 20:26:15,124 INFO [train.py:1031] (3/4) Epoch 14, batch 29050, loss[loss=0.2394, simple_loss=0.2843, pruned_loss=0.07192, ctc_loss=0.1265, over 16841.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2706, pruned_loss=0.05907, ctc_loss=0.1024, over 3297888.22 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:26:37,546 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2023-10-09 20:27:08,827 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2864502.6666666665, ans=0.05 2023-10-09 20:27:16,985 INFO [train.py:1031] (3/4) Epoch 14, batch 29100, loss[loss=0.2312, simple_loss=0.2797, pruned_loss=0.06713, ctc_loss=0.121, over 16930.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2757, pruned_loss=0.06336, ctc_loss=0.1102, over 3302138.19 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:27:20,251 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+02 3.447e+02 3.769e+02 4.635e+02 6.729e+02, threshold=7.539e+02, percent-clipped=0.0 2023-10-09 20:27:22,748 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2864549.3333333335, ans=0.125 2023-10-09 20:27:26,673 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2864549.3333333335, ans=0.0 2023-10-09 20:27:30,472 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2864596.0, ans=0.125 2023-10-09 20:27:45,535 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2864642.6666666665, ans=0.125 2023-10-09 20:27:55,738 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:28:03,770 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2864689.3333333335, ans=0.0 2023-10-09 20:28:08,596 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2864736.0, ans=0.0 2023-10-09 20:28:17,454 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2864782.6666666665, ans=0.1 2023-10-09 20:28:18,188 INFO [train.py:1031] (3/4) Epoch 14, batch 29150, loss[loss=0.2224, simple_loss=0.2537, pruned_loss=0.06946, ctc_loss=0.1306, over 15484.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2793, pruned_loss=0.06617, ctc_loss=0.1153, over 3288027.06 frames. ], batch size: 530, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:28:28,868 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2864782.6666666665, ans=0.0 2023-10-09 20:28:30,931 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2023-10-09 20:28:37,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2864829.3333333335, ans=0.2 2023-10-09 20:29:13,771 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-10-09 20:29:22,662 INFO [train.py:1031] (3/4) Epoch 14, batch 29200, loss[loss=0.212, simple_loss=0.2842, pruned_loss=0.05166, ctc_loss=0.0914, over 16915.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2839, pruned_loss=0.06635, ctc_loss=0.1164, over 3296767.58 frames. ], batch size: 229, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:29:28,278 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+02 3.299e+02 3.814e+02 4.330e+02 6.435e+02, threshold=7.628e+02, percent-clipped=0.0 2023-10-09 20:29:47,913 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.28 vs. limit=22.5 2023-10-09 20:29:50,545 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-10-09 20:30:19,207 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2865202.6666666665, ans=0.125 2023-10-09 20:30:27,648 INFO [train.py:1031] (3/4) Epoch 14, batch 29250, loss[loss=0.3256, simple_loss=0.369, pruned_loss=0.1019, ctc_loss=0.1961, over 16664.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2845, pruned_loss=0.06381, ctc_loss=0.1122, over 3290408.58 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:30:32,978 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-10-09 20:30:46,160 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2865296.0, ans=0.125 2023-10-09 20:30:52,994 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2865296.0, ans=0.05 2023-10-09 20:31:27,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2865436.0, ans=0.0 2023-10-09 20:31:32,754 INFO [train.py:1031] (3/4) Epoch 14, batch 29300, loss[loss=0.2741, simple_loss=0.3323, pruned_loss=0.07884, ctc_loss=0.1456, over 16901.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.297, pruned_loss=0.06648, ctc_loss=0.1175, over 3292304.39 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:31:38,516 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 3.153e+02 3.767e+02 4.679e+02 9.052e+02, threshold=7.535e+02, percent-clipped=4.0 2023-10-09 20:31:43,721 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2865529.3333333335, ans=0.0 2023-10-09 20:31:46,827 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:31:55,277 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2865529.3333333335, ans=0.0 2023-10-09 20:32:14,934 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-10-09 20:32:26,731 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2865669.3333333335, ans=0.0 2023-10-09 20:32:33,854 INFO [train.py:1031] (3/4) Epoch 14, batch 29350, loss[loss=0.2241, simple_loss=0.271, pruned_loss=0.06546, ctc_loss=0.1155, over 16853.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2948, pruned_loss=0.06667, ctc_loss=0.1174, over 3297713.41 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:32:44,101 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2865716.0, ans=0.125 2023-10-09 20:32:47,411 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2865762.6666666665, ans=0.125 2023-10-09 20:32:54,357 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2865762.6666666665, ans=0.1 2023-10-09 20:32:54,493 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2865762.6666666665, ans=0.125 2023-10-09 20:33:00,269 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2865809.3333333335, ans=0.125 2023-10-09 20:33:00,279 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2865809.3333333335, ans=0.0 2023-10-09 20:33:00,714 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2023-10-09 20:33:08,979 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2865809.3333333335, ans=0.0 2023-10-09 20:33:23,859 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2865902.6666666665, ans=0.125 2023-10-09 20:33:30,157 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2023-10-09 20:33:34,479 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2865902.6666666665, ans=0.09899494936611666 2023-10-09 20:33:36,259 INFO [train.py:1031] (3/4) Epoch 14, batch 29400, loss[loss=0.1755, simple_loss=0.2579, pruned_loss=0.03436, ctc_loss=0.06079, over 16798.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2871, pruned_loss=0.06289, ctc_loss=0.1111, over 3300985.66 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:33:44,077 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.925e+02 3.429e+02 4.063e+02 7.311e+02, threshold=6.858e+02, percent-clipped=0.0 2023-10-09 20:33:56,044 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2865996.0, ans=0.0 2023-10-09 20:33:58,495 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-10-09 20:34:13,119 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=22.5 2023-10-09 20:34:21,251 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2023-10-09 20:34:27,628 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2866136.0, ans=0.0 2023-10-09 20:34:39,999 INFO [train.py:1031] (3/4) Epoch 14, batch 29450, loss[loss=0.2447, simple_loss=0.3212, pruned_loss=0.0604, ctc_loss=0.1183, over 16431.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2816, pruned_loss=0.05876, ctc_loss=0.1048, over 3300289.46 frames. ], batch size: 415, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:34:53,291 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2866229.3333333335, ans=0.125 2023-10-09 20:35:27,068 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-10-09 20:35:43,430 INFO [train.py:1031] (3/4) Epoch 14, batch 29500, loss[loss=0.1971, simple_loss=0.2595, pruned_loss=0.04932, ctc_loss=0.08994, over 16769.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2842, pruned_loss=0.05755, ctc_loss=0.1039, over 3287914.05 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 20:35:51,252 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.902e+02 3.659e+02 4.459e+02 8.520e+02, threshold=7.319e+02, percent-clipped=6.0 2023-10-09 20:36:05,627 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2866462.6666666665, ans=0.125 2023-10-09 20:36:16,874 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2866509.3333333335, ans=0.125 2023-10-09 20:36:21,612 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2866556.0, ans=0.125 2023-10-09 20:36:30,680 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2866556.0, ans=0.2 2023-10-09 20:36:34,822 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2866602.6666666665, ans=0.0 2023-10-09 20:36:44,252 INFO [train.py:1031] (3/4) Epoch 14, batch 29550, loss[loss=0.205, simple_loss=0.2273, pruned_loss=0.06675, ctc_loss=0.1229, over 15503.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2792, pruned_loss=0.05742, ctc_loss=0.1032, over 3284675.12 frames. ], batch size: 529, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:36:44,507 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2866649.3333333335, ans=0.125 2023-10-09 20:37:11,673 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2866742.6666666665, ans=0.125 2023-10-09 20:37:25,797 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2866789.3333333335, ans=0.1 2023-10-09 20:37:27,328 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2866789.3333333335, ans=0.1 2023-10-09 20:37:29,758 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2866789.3333333335, ans=0.04949747468305833 2023-10-09 20:37:44,950 INFO [train.py:1031] (3/4) Epoch 14, batch 29600, loss[loss=0.2064, simple_loss=0.2747, pruned_loss=0.05066, ctc_loss=0.09224, over 16870.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2746, pruned_loss=0.0571, ctc_loss=0.1023, over 3281729.75 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:37:45,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2866882.6666666665, ans=0.125 2023-10-09 20:37:50,283 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2866882.6666666665, ans=0.125 2023-10-09 20:37:54,764 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.047e+02 3.582e+02 4.028e+02 6.950e+02, threshold=7.163e+02, percent-clipped=0.0 2023-10-09 20:37:57,220 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2866929.3333333335, ans=0.125 2023-10-09 20:38:03,098 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2866929.3333333335, ans=0.05 2023-10-09 20:38:16,787 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2866976.0, ans=0.0 2023-10-09 20:38:18,517 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2866976.0, ans=10.0 2023-10-09 20:38:27,211 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2867022.6666666665, ans=0.07 2023-10-09 20:38:37,790 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2867069.3333333335, ans=0.025 2023-10-09 20:38:46,697 INFO [train.py:1031] (3/4) Epoch 14, batch 29650, loss[loss=0.1993, simple_loss=0.2648, pruned_loss=0.0493, ctc_loss=0.0881, over 16849.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2784, pruned_loss=0.05826, ctc_loss=0.1043, over 3282481.22 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:38:49,129 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2867116.0, ans=0.025 2023-10-09 20:39:15,412 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2867209.3333333335, ans=0.2 2023-10-09 20:39:26,924 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-10-09 20:39:41,658 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2867302.6666666665, ans=0.125 2023-10-09 20:39:45,506 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2867302.6666666665, ans=0.125 2023-10-09 20:39:48,359 INFO [train.py:1031] (3/4) Epoch 14, batch 29700, loss[loss=0.2105, simple_loss=0.2637, pruned_loss=0.05837, ctc_loss=0.1013, over 16714.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2793, pruned_loss=0.05998, ctc_loss=0.1065, over 3293115.37 frames. ], batch size: 111, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:39:59,295 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.266e+02 3.794e+02 4.396e+02 1.319e+03, threshold=7.588e+02, percent-clipped=2.0 2023-10-09 20:40:26,638 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2867489.3333333335, ans=0.125 2023-10-09 20:40:50,194 INFO [train.py:1031] (3/4) Epoch 14, batch 29750, loss[loss=0.2298, simple_loss=0.2829, pruned_loss=0.06547, ctc_loss=0.1145, over 16922.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.28, pruned_loss=0.06154, ctc_loss=0.1093, over 3303356.24 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:40:52,623 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2867582.6666666665, ans=0.2 2023-10-09 20:41:07,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2867629.3333333335, ans=0.125 2023-10-09 20:41:19,867 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2867676.0, ans=0.2 2023-10-09 20:41:21,431 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2867676.0, ans=0.125 2023-10-09 20:41:25,882 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2867676.0, ans=0.05 2023-10-09 20:41:51,034 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2867769.3333333335, ans=0.0 2023-10-09 20:41:53,582 INFO [train.py:1031] (3/4) Epoch 14, batch 29800, loss[loss=0.2002, simple_loss=0.2741, pruned_loss=0.04691, ctc_loss=0.08134, over 16809.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2813, pruned_loss=0.063, ctc_loss=0.1118, over 3306363.66 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:41:59,871 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2867816.0, ans=0.1 2023-10-09 20:42:05,755 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.683e+02 3.252e+02 3.750e+02 4.690e+02 1.156e+03, threshold=7.500e+02, percent-clipped=2.0 2023-10-09 20:42:17,059 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2867909.3333333335, ans=10.0 2023-10-09 20:42:17,627 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2023-10-09 20:42:27,687 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2867909.3333333335, ans=0.0 2023-10-09 20:42:53,404 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=2868002.6666666665, ans=0.1 2023-10-09 20:42:55,392 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=22.5 2023-10-09 20:42:56,942 INFO [train.py:1031] (3/4) Epoch 14, batch 29850, loss[loss=0.2142, simple_loss=0.2776, pruned_loss=0.05667, ctc_loss=0.09365, over 16847.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2906, pruned_loss=0.06519, ctc_loss=0.1156, over 3303376.38 frames. ], batch size: 141, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:42:57,567 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=22.5 2023-10-09 20:43:06,858 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2023-10-09 20:43:23,878 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2868142.6666666665, ans=0.2 2023-10-09 20:44:02,050 INFO [train.py:1031] (3/4) Epoch 14, batch 29900, loss[loss=0.2757, simple_loss=0.3293, pruned_loss=0.0825, ctc_loss=0.1427, over 16799.00 frames. ], tot_loss[loss=0.2401, simple_loss=0.2961, pruned_loss=0.06806, ctc_loss=0.1201, over 3310258.70 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:44:05,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2868282.6666666665, ans=0.0 2023-10-09 20:44:15,807 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+02 3.520e+02 3.961e+02 4.963e+02 1.132e+03, threshold=7.922e+02, percent-clipped=8.0 2023-10-09 20:44:45,338 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2868422.6666666665, ans=0.0 2023-10-09 20:44:52,505 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2868469.3333333335, ans=0.125 2023-10-09 20:44:53,484 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:45:00,183 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2868469.3333333335, ans=0.125 2023-10-09 20:45:04,793 INFO [train.py:1031] (3/4) Epoch 14, batch 29950, loss[loss=0.1445, simple_loss=0.1796, pruned_loss=0.04204, ctc_loss=0.06343, over 11362.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.2978, pruned_loss=0.06891, ctc_loss=0.1201, over 3304535.26 frames. ], batch size: 40, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:45:12,963 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2868516.0, ans=0.0 2023-10-09 20:45:13,905 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2868516.0, ans=0.2 2023-10-09 20:45:31,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2868609.3333333335, ans=22.5 2023-10-09 20:45:44,072 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2868656.0, ans=0.09899494936611666 2023-10-09 20:46:00,821 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-10-09 20:46:05,483 INFO [train.py:1031] (3/4) Epoch 14, batch 30000, loss[loss=0.2367, simple_loss=0.3011, pruned_loss=0.06183, ctc_loss=0.1215, over 16821.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.3005, pruned_loss=0.06743, ctc_loss=0.1177, over 3308478.87 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 20:46:05,483 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 20:46:22,666 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2308, simple_loss=0.3022, pruned_loss=0.06118, ctc_loss=0.09249, over 1796401.00 frames. 2023-10-09 20:46:22,667 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 20:46:28,930 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2868749.3333333335, ans=0.1 2023-10-09 20:46:34,412 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2868796.0, ans=0.0 2023-10-09 20:46:36,785 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.188e+02 3.941e+02 4.902e+02 7.309e+02, threshold=7.881e+02, percent-clipped=0.0 2023-10-09 20:47:01,081 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-10-09 20:47:02,709 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2023-10-09 20:47:24,787 INFO [train.py:1031] (3/4) Epoch 14, batch 30050, loss[loss=0.304, simple_loss=0.3539, pruned_loss=0.09334, ctc_loss=0.1682, over 16743.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2973, pruned_loss=0.06616, ctc_loss=0.1159, over 3304970.59 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:47:27,300 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:47:31,467 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2868982.6666666665, ans=0.125 2023-10-09 20:48:08,620 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2023-10-09 20:48:25,620 INFO [train.py:1031] (3/4) Epoch 14, batch 30100, loss[loss=0.1681, simple_loss=0.2073, pruned_loss=0.04862, ctc_loss=0.07899, over 16655.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2957, pruned_loss=0.06464, ctc_loss=0.114, over 3296758.71 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:48:37,448 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2869262.6666666665, ans=0.0 2023-10-09 20:48:43,028 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+02 3.136e+02 3.710e+02 4.656e+02 9.667e+02, threshold=7.419e+02, percent-clipped=2.0 2023-10-09 20:48:43,417 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2869262.6666666665, ans=0.2 2023-10-09 20:48:44,885 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2869262.6666666665, ans=15.0 2023-10-09 20:49:02,648 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2869356.0, ans=0.125 2023-10-09 20:49:03,028 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-10-09 20:49:05,289 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2869356.0, ans=10.0 2023-10-09 20:49:08,722 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.01 vs. limit=10.0 2023-10-09 20:49:27,213 INFO [train.py:1031] (3/4) Epoch 14, batch 30150, loss[loss=0.1943, simple_loss=0.2684, pruned_loss=0.04512, ctc_loss=0.075, over 16905.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2943, pruned_loss=0.06323, ctc_loss=0.1117, over 3290521.97 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:49:28,606 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2869449.3333333335, ans=0.0 2023-10-09 20:49:32,367 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2869449.3333333335, ans=0.1 2023-10-09 20:49:38,228 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2869496.0, ans=0.125 2023-10-09 20:49:48,456 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2869496.0, ans=0.0 2023-10-09 20:50:21,069 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2869636.0, ans=0.0 2023-10-09 20:50:27,446 INFO [train.py:1031] (3/4) Epoch 14, batch 30200, loss[loss=0.2756, simple_loss=0.3084, pruned_loss=0.08932, ctc_loss=0.1605, over 16579.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2954, pruned_loss=0.06486, ctc_loss=0.1142, over 3275934.35 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:50:31,488 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2869682.6666666665, ans=0.1 2023-10-09 20:50:33,534 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2869682.6666666665, ans=0.0 2023-10-09 20:50:40,535 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-10-09 20:50:45,495 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.166e+02 3.687e+02 4.321e+02 7.960e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 20:50:50,559 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2869776.0, ans=0.0 2023-10-09 20:51:09,544 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.10 vs. limit=15.0 2023-10-09 20:51:20,534 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2869869.3333333335, ans=0.0 2023-10-09 20:51:27,029 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2869869.3333333335, ans=0.125 2023-10-09 20:51:28,749 INFO [train.py:1031] (3/4) Epoch 14, batch 30250, loss[loss=0.2916, simple_loss=0.3463, pruned_loss=0.086, ctc_loss=0.1624, over 15277.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2985, pruned_loss=0.06778, ctc_loss=0.1193, over 3264886.42 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:51:33,963 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2869916.0, ans=0.125 2023-10-09 20:51:54,273 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2870009.3333333335, ans=0.1 2023-10-09 20:52:03,177 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2870009.3333333335, ans=0.125 2023-10-09 20:52:13,515 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2870056.0, ans=0.2 2023-10-09 20:52:17,461 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2870056.0, ans=10.0 2023-10-09 20:52:32,397 INFO [train.py:1031] (3/4) Epoch 14, batch 30300, loss[loss=0.2359, simple_loss=0.2892, pruned_loss=0.06589, ctc_loss=0.1269, over 16247.00 frames. ], tot_loss[loss=0.2445, simple_loss=0.3004, pruned_loss=0.06982, ctc_loss=0.1225, over 3262910.76 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:52:33,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2870149.3333333335, ans=0.0 2023-10-09 20:52:34,869 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2870149.3333333335, ans=0.125 2023-10-09 20:52:51,951 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+02 3.410e+02 3.951e+02 4.930e+02 7.071e+02, threshold=7.902e+02, percent-clipped=0.0 2023-10-09 20:52:54,823 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2870196.0, ans=0.125 2023-10-09 20:53:06,583 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:53:14,013 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2870289.3333333335, ans=0.125 2023-10-09 20:53:24,649 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2870336.0, ans=0.2 2023-10-09 20:53:33,823 INFO [train.py:1031] (3/4) Epoch 14, batch 30350, loss[loss=0.221, simple_loss=0.2469, pruned_loss=0.07186, ctc_loss=0.1286, over 15518.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.2986, pruned_loss=0.07072, ctc_loss=0.1238, over 3273352.17 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:53:46,685 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2023-10-09 20:53:52,440 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=12.0 2023-10-09 20:54:08,899 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2870522.6666666665, ans=0.2 2023-10-09 20:54:29,880 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2870569.3333333335, ans=0.015 2023-10-09 20:54:31,671 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2870569.3333333335, ans=0.125 2023-10-09 20:54:35,114 INFO [train.py:1031] (3/4) Epoch 14, batch 30400, loss[loss=0.2499, simple_loss=0.3059, pruned_loss=0.07376, ctc_loss=0.1157, over 12098.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.294, pruned_loss=0.07031, ctc_loss=0.1231, over 3265645.09 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:54:54,417 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.508e+02 3.255e+02 4.087e+02 4.759e+02 9.430e+02, threshold=8.174e+02, percent-clipped=1.0 2023-10-09 20:54:57,500 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2870662.6666666665, ans=0.0 2023-10-09 20:55:30,181 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2023-10-09 20:55:31,050 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2870802.6666666665, ans=0.0 2023-10-09 20:55:35,536 INFO [train.py:1031] (3/4) Epoch 14, batch 30450, loss[loss=0.2104, simple_loss=0.2608, pruned_loss=0.05878, ctc_loss=0.1063, over 16770.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2866, pruned_loss=0.06835, ctc_loss=0.1196, over 3270479.80 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:55:37,539 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2870849.3333333335, ans=0.0 2023-10-09 20:55:42,363 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2870849.3333333335, ans=0.0 2023-10-09 20:55:55,802 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2870896.0, ans=0.2 2023-10-09 20:56:17,996 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2870989.3333333335, ans=0.0 2023-10-09 20:56:25,701 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2023-10-09 20:56:31,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2871036.0, ans=0.0 2023-10-09 20:56:38,921 INFO [train.py:1031] (3/4) Epoch 14, batch 30500, loss[loss=0.2011, simple_loss=0.2559, pruned_loss=0.0536, ctc_loss=0.0979, over 16624.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2839, pruned_loss=0.06615, ctc_loss=0.1161, over 3280012.10 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:56:40,187 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2871082.6666666665, ans=0.1 2023-10-09 20:56:48,237 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2871082.6666666665, ans=0.0 2023-10-09 20:56:52,562 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2871129.3333333335, ans=0.0 2023-10-09 20:56:59,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2871129.3333333335, ans=0.125 2023-10-09 20:57:00,014 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.155e+02 3.681e+02 4.532e+02 7.008e+02, threshold=7.361e+02, percent-clipped=0.0 2023-10-09 20:57:08,377 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2871176.0, ans=0.125 2023-10-09 20:57:20,913 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2871222.6666666665, ans=0.2 2023-10-09 20:57:30,070 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2871269.3333333335, ans=0.125 2023-10-09 20:57:41,473 INFO [train.py:1031] (3/4) Epoch 14, batch 30550, loss[loss=0.2862, simple_loss=0.3224, pruned_loss=0.09225, ctc_loss=0.1638, over 16523.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.289, pruned_loss=0.06601, ctc_loss=0.1161, over 3291847.90 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:57:46,117 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2871316.0, ans=0.025 2023-10-09 20:58:09,326 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2871409.3333333335, ans=0.2 2023-10-09 20:58:29,607 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2871502.6666666665, ans=0.125 2023-10-09 20:58:40,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2871549.3333333335, ans=0.2 2023-10-09 20:58:41,005 INFO [train.py:1031] (3/4) Epoch 14, batch 30600, loss[loss=0.2341, simple_loss=0.2846, pruned_loss=0.06737, ctc_loss=0.1222, over 16913.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2888, pruned_loss=0.06791, ctc_loss=0.1192, over 3294905.08 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:57,550 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-10-09 20:59:01,071 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.194e+02 3.624e+02 4.233e+02 1.074e+03, threshold=7.249e+02, percent-clipped=2.0 2023-10-09 20:59:15,286 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2871689.3333333335, ans=0.0 2023-10-09 20:59:22,310 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2871689.3333333335, ans=0.2 2023-10-09 20:59:28,758 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2871736.0, ans=0.2 2023-10-09 20:59:40,076 INFO [train.py:1031] (3/4) Epoch 14, batch 30650, loss[loss=0.1703, simple_loss=0.2302, pruned_loss=0.04062, ctc_loss=0.07269, over 16833.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2817, pruned_loss=0.06595, ctc_loss=0.1159, over 3302322.34 frames. ], batch size: 141, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:59:49,228 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-10-09 21:00:26,577 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2871922.6666666665, ans=0.125 2023-10-09 21:00:29,209 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2871969.3333333335, ans=0.125 2023-10-09 21:00:35,674 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2023-10-09 21:00:38,489 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2871969.3333333335, ans=0.2 2023-10-09 21:00:41,981 INFO [train.py:1031] (3/4) Epoch 14, batch 30700, loss[loss=0.2667, simple_loss=0.3206, pruned_loss=0.07907, ctc_loss=0.1368, over 16457.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2761, pruned_loss=0.06291, ctc_loss=0.1105, over 3296417.89 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:00:49,362 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2872016.0, ans=0.0 2023-10-09 21:00:50,561 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2872016.0, ans=0.05 2023-10-09 21:00:50,900 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-10-09 21:01:05,615 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2872062.6666666665, ans=0.1 2023-10-09 21:01:06,276 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.086e+02 3.703e+02 4.369e+02 9.445e+02, threshold=7.405e+02, percent-clipped=1.0 2023-10-09 21:01:07,666 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2872109.3333333335, ans=0.1 2023-10-09 21:01:12,788 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2872109.3333333335, ans=0.125 2023-10-09 21:01:46,064 INFO [train.py:1031] (3/4) Epoch 14, batch 30750, loss[loss=0.2263, simple_loss=0.2995, pruned_loss=0.05713, ctc_loss=0.0973, over 16830.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2774, pruned_loss=0.06174, ctc_loss=0.1073, over 3293592.68 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:01:46,418 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2872249.3333333335, ans=0.125 2023-10-09 21:01:49,139 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2872249.3333333335, ans=0.1 2023-10-09 21:02:06,162 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2872296.0, ans=0.2 2023-10-09 21:02:07,248 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2872296.0, ans=0.125 2023-10-09 21:02:08,473 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2023-10-09 21:02:49,429 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-10-09 21:02:50,037 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2872482.6666666665, ans=0.0 2023-10-09 21:02:50,792 INFO [train.py:1031] (3/4) Epoch 14, batch 30800, loss[loss=0.3081, simple_loss=0.382, pruned_loss=0.08638, ctc_loss=0.1534, over 16890.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2888, pruned_loss=0.06401, ctc_loss=0.1115, over 3297068.17 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:16,583 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+02 3.887e+02 4.535e+02 5.921e+02 9.056e+02, threshold=9.070e+02, percent-clipped=5.0 2023-10-09 21:03:33,171 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2872622.6666666665, ans=0.0 2023-10-09 21:03:33,321 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-10-09 21:03:44,571 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-10-09 21:03:54,406 INFO [train.py:1031] (3/4) Epoch 14, batch 30850, loss[loss=0.175, simple_loss=0.2019, pruned_loss=0.05575, ctc_loss=0.09129, over 10234.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2873, pruned_loss=0.0642, ctc_loss=0.1123, over 3295961.22 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:59,544 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2872716.0, ans=0.125 2023-10-09 21:04:09,461 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2023-10-09 21:04:10,494 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-10-09 21:04:36,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2872856.0, ans=0.0 2023-10-09 21:04:49,341 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=12.0 2023-10-09 21:04:56,197 INFO [train.py:1031] (3/4) Epoch 14, batch 30900, loss[loss=0.1807, simple_loss=0.2417, pruned_loss=0.04428, ctc_loss=0.07784, over 16620.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2794, pruned_loss=0.06208, ctc_loss=0.1085, over 3299418.43 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:04:57,578 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2872949.3333333335, ans=0.125 2023-10-09 21:05:20,044 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.130e+02 3.635e+02 4.208e+02 6.076e+02, threshold=7.270e+02, percent-clipped=0.0 2023-10-09 21:05:24,313 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=22.5 2023-10-09 21:05:45,803 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2873136.0, ans=0.05 2023-10-09 21:05:47,023 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-10-09 21:05:53,643 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-10-09 21:05:56,101 INFO [train.py:1031] (3/4) Epoch 14, batch 30950, loss[loss=0.2397, simple_loss=0.2922, pruned_loss=0.07057, ctc_loss=0.1153, over 16924.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2761, pruned_loss=0.06115, ctc_loss=0.1072, over 3311017.82 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:06:35,430 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=22.5 2023-10-09 21:06:39,242 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2023-10-09 21:06:49,574 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.51 vs. limit=10.0 2023-10-09 21:06:58,788 INFO [train.py:1031] (3/4) Epoch 14, batch 31000, loss[loss=0.1936, simple_loss=0.2543, pruned_loss=0.04937, ctc_loss=0.08519, over 16923.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2773, pruned_loss=0.06261, ctc_loss=0.1094, over 3312220.17 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:07:10,384 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2873462.6666666665, ans=0.125 2023-10-09 21:07:10,885 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-10-09 21:07:20,229 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-10-09 21:07:23,305 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-10-09 21:07:25,521 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+02 3.246e+02 3.902e+02 4.981e+02 7.271e+02, threshold=7.805e+02, percent-clipped=1.0 2023-10-09 21:07:28,518 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2873509.3333333335, ans=0.125 2023-10-09 21:07:30,669 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2873509.3333333335, ans=0.125 2023-10-09 21:07:40,174 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2873556.0, ans=0.1 2023-10-09 21:07:50,390 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2873602.6666666665, ans=0.125 2023-10-09 21:07:50,397 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2873602.6666666665, ans=0.1 2023-10-09 21:07:58,591 INFO [train.py:1031] (3/4) Epoch 14, batch 31050, loss[loss=0.2383, simple_loss=0.2932, pruned_loss=0.06591, ctc_loss=0.1288, over 16696.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2737, pruned_loss=0.05966, ctc_loss=0.1041, over 3296956.37 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:08:00,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2873649.3333333335, ans=0.125 2023-10-09 21:08:01,439 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-10-09 21:08:08,945 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2873696.0, ans=0.0 2023-10-09 21:08:22,118 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2873742.6666666665, ans=0.0 2023-10-09 21:08:35,707 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2873789.3333333335, ans=0.1 2023-10-09 21:08:36,278 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.42 vs. limit=22.5 2023-10-09 21:08:37,601 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=22.5 2023-10-09 21:08:52,931 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2873836.0, ans=0.125 2023-10-09 21:08:58,996 INFO [train.py:1031] (3/4) Epoch 14, batch 31100, loss[loss=0.2263, simple_loss=0.2783, pruned_loss=0.06574, ctc_loss=0.1069, over 16842.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2708, pruned_loss=0.0585, ctc_loss=0.1022, over 3288110.24 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:09:02,170 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2873882.6666666665, ans=0.2 2023-10-09 21:09:12,590 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2873929.3333333335, ans=0.0 2023-10-09 21:09:21,973 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-10-09 21:09:26,091 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.900e+02 3.232e+02 3.684e+02 6.119e+02, threshold=6.464e+02, percent-clipped=0.0 2023-10-09 21:09:39,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2874022.6666666665, ans=0.0 2023-10-09 21:09:45,910 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2874069.3333333335, ans=0.125 2023-10-09 21:09:46,909 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874069.3333333335, ans=0.1 2023-10-09 21:09:57,959 INFO [train.py:1031] (3/4) Epoch 14, batch 31150, loss[loss=0.2262, simple_loss=0.2833, pruned_loss=0.06291, ctc_loss=0.1083, over 16813.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2718, pruned_loss=0.05975, ctc_loss=0.1045, over 3290855.84 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:10:17,501 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874162.6666666665, ans=0.1 2023-10-09 21:10:21,546 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2874209.3333333335, ans=0.125 2023-10-09 21:10:57,601 INFO [train.py:1031] (3/4) Epoch 14, batch 31200, loss[loss=0.1682, simple_loss=0.2239, pruned_loss=0.0422, ctc_loss=0.07056, over 10111.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2703, pruned_loss=0.05927, ctc_loss=0.1034, over 3286039.90 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:10:59,887 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2023-10-09 21:11:27,521 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.224e+02 3.763e+02 4.515e+02 7.909e+02, threshold=7.526e+02, percent-clipped=5.0 2023-10-09 21:11:46,821 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2874536.0, ans=0.0 2023-10-09 21:11:48,644 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874536.0, ans=0.1 2023-10-09 21:11:58,154 INFO [train.py:1031] (3/4) Epoch 14, batch 31250, loss[loss=0.207, simple_loss=0.2659, pruned_loss=0.05509, ctc_loss=0.0946, over 16760.00 frames. ], tot_loss[loss=0.214, simple_loss=0.268, pruned_loss=0.05932, ctc_loss=0.1036, over 3286766.52 frames. ], batch size: 141, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:12:08,440 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-10-09 21:12:14,355 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-10-09 21:12:27,176 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-10-09 21:12:28,271 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:12:31,433 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:12:41,844 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=12.0 2023-10-09 21:13:01,203 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2874816.0, ans=0.1 2023-10-09 21:13:01,917 INFO [train.py:1031] (3/4) Epoch 14, batch 31300, loss[loss=0.185, simple_loss=0.2339, pruned_loss=0.04986, ctc_loss=0.09071, over 16728.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2648, pruned_loss=0.059, ctc_loss=0.1029, over 3290509.62 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:13:22,040 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2874862.6666666665, ans=0.1 2023-10-09 21:13:28,763 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2874909.3333333335, ans=0.125 2023-10-09 21:13:32,851 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.069e+02 3.507e+02 3.932e+02 8.166e+02, threshold=7.015e+02, percent-clipped=1.0 2023-10-09 21:14:03,887 INFO [train.py:1031] (3/4) Epoch 14, batch 31350, loss[loss=0.2666, simple_loss=0.2763, pruned_loss=0.09584, ctc_loss=0.1628, over 16602.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.262, pruned_loss=0.05976, ctc_loss=0.1042, over 3299112.83 frames. ], batch size: 386, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:14:06,139 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2875049.3333333335, ans=0.125 2023-10-09 21:14:09,423 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2875049.3333333335, ans=0.125 2023-10-09 21:14:09,446 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875049.3333333335, ans=0.1 2023-10-09 21:14:16,526 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2875096.0, ans=0.125 2023-10-09 21:14:23,622 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2875096.0, ans=0.125 2023-10-09 21:14:54,937 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2875236.0, ans=0.125 2023-10-09 21:15:02,106 INFO [train.py:1031] (3/4) Epoch 14, batch 31400, loss[loss=0.1875, simple_loss=0.2582, pruned_loss=0.04259, ctc_loss=0.07875, over 16200.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2603, pruned_loss=0.06004, ctc_loss=0.1046, over 3302731.14 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:15:23,429 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2875329.3333333335, ans=0.2 2023-10-09 21:15:25,557 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2875376.0, ans=0.2 2023-10-09 21:15:27,720 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2875376.0, ans=0.125 2023-10-09 21:15:34,908 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.143e+02 3.682e+02 4.471e+02 1.037e+03, threshold=7.364e+02, percent-clipped=4.0 2023-10-09 21:15:36,281 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2875376.0, ans=0.2 2023-10-09 21:15:52,670 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-10-09 21:16:03,129 INFO [train.py:1031] (3/4) Epoch 14, batch 31450, loss[loss=0.2035, simple_loss=0.2601, pruned_loss=0.05319, ctc_loss=0.1015, over 16775.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2593, pruned_loss=0.05909, ctc_loss=0.1031, over 3294669.06 frames. ], batch size: 310, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:16:12,319 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-10-09 21:16:20,810 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2875562.6666666665, ans=0.09899494936611666 2023-10-09 21:16:26,023 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2875562.6666666665, ans=0.1 2023-10-09 21:16:47,858 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875656.0, ans=0.1 2023-10-09 21:16:47,903 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:16:54,246 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2023-10-09 21:17:06,118 INFO [train.py:1031] (3/4) Epoch 14, batch 31500, loss[loss=0.2, simple_loss=0.256, pruned_loss=0.05375, ctc_loss=0.09115, over 16631.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2582, pruned_loss=0.05939, ctc_loss=0.1035, over 3289701.70 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:17:12,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875749.3333333335, ans=0.1 2023-10-09 21:17:29,508 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2875796.0, ans=0.125 2023-10-09 21:17:37,240 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2875842.6666666665, ans=0.0 2023-10-09 21:17:40,593 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.158e+02 3.690e+02 4.602e+02 7.979e+02, threshold=7.380e+02, percent-clipped=2.0 2023-10-09 21:17:50,566 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2875889.3333333335, ans=0.2 2023-10-09 21:17:51,608 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2875889.3333333335, ans=0.0 2023-10-09 21:17:57,211 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2875936.0, ans=0.04949747468305833 2023-10-09 21:18:06,926 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2875936.0, ans=0.5 2023-10-09 21:18:09,322 INFO [train.py:1031] (3/4) Epoch 14, batch 31550, loss[loss=0.2078, simple_loss=0.265, pruned_loss=0.05632, ctc_loss=0.09465, over 16725.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2666, pruned_loss=0.06163, ctc_loss=0.1071, over 3289262.68 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:18:25,169 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2876029.3333333335, ans=0.2 2023-10-09 21:18:25,449 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.23 vs. limit=6.0 2023-10-09 21:18:30,427 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=22.5 2023-10-09 21:18:46,258 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=10.0 2023-10-09 21:18:49,628 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2876122.6666666665, ans=0.125 2023-10-09 21:19:00,011 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-10-09 21:19:09,419 INFO [train.py:1031] (3/4) Epoch 14, batch 31600, loss[loss=0.2371, simple_loss=0.2773, pruned_loss=0.07397, ctc_loss=0.1226, over 16720.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2717, pruned_loss=0.06363, ctc_loss=0.1103, over 3300551.04 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:19:11,320 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2876216.0, ans=0.1 2023-10-09 21:19:31,322 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2876262.6666666665, ans=0.2 2023-10-09 21:19:33,366 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2876309.3333333335, ans=10.0 2023-10-09 21:19:39,987 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2876309.3333333335, ans=0.125 2023-10-09 21:19:45,052 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+02 3.252e+02 3.692e+02 4.282e+02 8.692e+02, threshold=7.384e+02, percent-clipped=4.0 2023-10-09 21:19:45,380 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2876309.3333333335, ans=0.0 2023-10-09 21:20:13,360 INFO [train.py:1031] (3/4) Epoch 14, batch 31650, loss[loss=0.1812, simple_loss=0.2666, pruned_loss=0.03543, ctc_loss=0.06228, over 16841.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2745, pruned_loss=0.06293, ctc_loss=0.1091, over 3297194.33 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:20:18,072 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2876449.3333333335, ans=0.125 2023-10-09 21:20:24,866 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=22.5 2023-10-09 21:20:50,839 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2876589.3333333335, ans=0.2 2023-10-09 21:20:57,598 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2876589.3333333335, ans=0.125 2023-10-09 21:21:15,706 INFO [train.py:1031] (3/4) Epoch 14, batch 31700, loss[loss=0.2364, simple_loss=0.2934, pruned_loss=0.06568, ctc_loss=0.1201, over 16941.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2747, pruned_loss=0.06112, ctc_loss=0.1064, over 3300252.76 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:21:31,702 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-10-09 21:21:39,379 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-10-09 21:21:52,723 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+02 3.115e+02 3.904e+02 4.739e+02 1.536e+03, threshold=7.807e+02, percent-clipped=3.0 2023-10-09 21:22:15,488 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2876869.3333333335, ans=0.0 2023-10-09 21:22:18,292 INFO [train.py:1031] (3/4) Epoch 14, batch 31750, loss[loss=0.2437, simple_loss=0.3008, pruned_loss=0.06857, ctc_loss=0.1235, over 16857.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.281, pruned_loss=0.06378, ctc_loss=0.1115, over 3302576.19 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:22:20,319 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2876916.0, ans=0.2 2023-10-09 21:22:30,426 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2023-10-09 21:23:03,790 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2877056.0, ans=0.0 2023-10-09 21:23:10,700 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2877102.6666666665, ans=0.125 2023-10-09 21:23:20,606 INFO [train.py:1031] (3/4) Epoch 14, batch 31800, loss[loss=0.232, simple_loss=0.2845, pruned_loss=0.06784, ctc_loss=0.1093, over 16945.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2832, pruned_loss=0.06519, ctc_loss=0.1135, over 3291756.49 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:23:21,939 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2877149.3333333335, ans=0.125 2023-10-09 21:23:57,870 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+02 3.284e+02 3.680e+02 4.274e+02 9.032e+02, threshold=7.360e+02, percent-clipped=1.0 2023-10-09 21:24:04,732 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2877289.3333333335, ans=0.125 2023-10-09 21:24:12,654 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2877336.0, ans=0.125 2023-10-09 21:24:14,326 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.38 vs. limit=15.0 2023-10-09 21:24:16,382 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2877336.0, ans=0.0 2023-10-09 21:24:17,450 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2877336.0, ans=0.07 2023-10-09 21:24:22,048 INFO [train.py:1031] (3/4) Epoch 14, batch 31850, loss[loss=0.2107, simple_loss=0.2586, pruned_loss=0.06031, ctc_loss=0.1056, over 16818.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2795, pruned_loss=0.0651, ctc_loss=0.1134, over 3279486.01 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:24:22,409 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:24:28,588 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-10-09 21:24:41,874 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2877429.3333333335, ans=0.09899494936611666 2023-10-09 21:24:43,963 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2877429.3333333335, ans=0.1 2023-10-09 21:24:54,169 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=22.5 2023-10-09 21:24:54,850 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:24:59,549 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2877522.6666666665, ans=0.2 2023-10-09 21:25:23,224 INFO [train.py:1031] (3/4) Epoch 14, batch 31900, loss[loss=0.2041, simple_loss=0.2441, pruned_loss=0.06082, ctc_loss=0.1059, over 16366.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2751, pruned_loss=0.06492, ctc_loss=0.1131, over 3288702.40 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:25:27,510 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2877616.0, ans=0.125 2023-10-09 21:25:42,552 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2877662.6666666665, ans=0.125 2023-10-09 21:25:55,499 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-10-09 21:26:03,164 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+02 3.194e+02 3.623e+02 4.182e+02 7.324e+02, threshold=7.246e+02, percent-clipped=0.0 2023-10-09 21:26:04,734 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2877756.0, ans=0.125 2023-10-09 21:26:11,012 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2877756.0, ans=0.0 2023-10-09 21:26:11,502 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-10-09 21:26:14,154 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-10-09 21:26:23,419 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2877802.6666666665, ans=0.1 2023-10-09 21:26:25,782 INFO [train.py:1031] (3/4) Epoch 14, batch 31950, loss[loss=0.206, simple_loss=0.2535, pruned_loss=0.05859, ctc_loss=0.1031, over 16665.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2683, pruned_loss=0.06138, ctc_loss=0.1073, over 3290436.62 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:26:32,744 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2023-10-09 21:26:43,645 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2877896.0, ans=0.125 2023-10-09 21:26:55,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2877942.6666666665, ans=0.2 2023-10-09 21:26:58,647 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2877942.6666666665, ans=0.0 2023-10-09 21:26:59,677 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2877942.6666666665, ans=0.0 2023-10-09 21:27:21,268 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2878036.0, ans=0.125 2023-10-09 21:27:26,900 INFO [train.py:1031] (3/4) Epoch 14, batch 32000, loss[loss=0.2103, simple_loss=0.2607, pruned_loss=0.05942, ctc_loss=0.1029, over 16798.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2638, pruned_loss=0.06044, ctc_loss=0.106, over 3294539.60 frames. ], batch size: 329, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:27:37,858 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2878082.6666666665, ans=0.0 2023-10-09 21:27:49,260 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-10-09 21:27:54,464 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878176.0, ans=0.1 2023-10-09 21:28:05,176 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:28:06,909 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+02 3.047e+02 3.544e+02 4.263e+02 6.076e+02, threshold=7.087e+02, percent-clipped=0.0 2023-10-09 21:28:13,525 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2878222.6666666665, ans=0.125 2023-10-09 21:28:20,038 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2878269.3333333335, ans=0.0 2023-10-09 21:28:30,105 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2878316.0, ans=0.1 2023-10-09 21:28:30,834 INFO [train.py:1031] (3/4) Epoch 14, batch 32050, loss[loss=0.2289, simple_loss=0.3328, pruned_loss=0.04494, ctc_loss=0.08792, over 15203.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2658, pruned_loss=0.05821, ctc_loss=0.1028, over 3299803.19 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:28:31,157 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2878316.0, ans=0.125 2023-10-09 21:28:37,400 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2878316.0, ans=0.0 2023-10-09 21:28:44,989 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=12.0 2023-10-09 21:28:47,914 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2878362.6666666665, ans=0.125 2023-10-09 21:28:49,341 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2023-10-09 21:29:15,003 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2878456.0, ans=0.125 2023-10-09 21:29:27,767 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2878502.6666666665, ans=0.0 2023-10-09 21:29:33,756 INFO [train.py:1031] (3/4) Epoch 14, batch 32100, loss[loss=0.2297, simple_loss=0.3128, pruned_loss=0.05378, ctc_loss=0.09768, over 16862.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2734, pruned_loss=0.05723, ctc_loss=0.1015, over 3301303.80 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:29:41,248 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2878549.3333333335, ans=0.5 2023-10-09 21:30:12,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2878689.3333333335, ans=0.2 2023-10-09 21:30:13,061 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.944e+02 3.415e+02 4.141e+02 9.202e+02, threshold=6.830e+02, percent-clipped=4.0 2023-10-09 21:30:13,372 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2878689.3333333335, ans=0.0 2023-10-09 21:30:32,511 INFO [train.py:1031] (3/4) Epoch 14, batch 32150, loss[loss=0.1952, simple_loss=0.2517, pruned_loss=0.05096, ctc_loss=0.09179, over 16721.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2726, pruned_loss=0.0557, ctc_loss=0.09819, over 3287590.35 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:30:46,320 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2023-10-09 21:30:51,467 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2878829.3333333335, ans=0.0 2023-10-09 21:31:18,095 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2878922.6666666665, ans=0.125 2023-10-09 21:31:21,892 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2878969.3333333335, ans=0.0 2023-10-09 21:31:25,363 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2878969.3333333335, ans=0.0 2023-10-09 21:31:33,111 INFO [train.py:1031] (3/4) Epoch 14, batch 32200, loss[loss=0.1991, simple_loss=0.2485, pruned_loss=0.05533, ctc_loss=0.09768, over 16785.00 frames. ], tot_loss[loss=0.2103, simple_loss=0.2688, pruned_loss=0.05616, ctc_loss=0.09885, over 3296205.11 frames. ], batch size: 243, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:31:34,469 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2879016.0, ans=0.1 2023-10-09 21:31:35,831 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2023-10-09 21:31:48,932 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2879062.6666666665, ans=0.125 2023-10-09 21:31:58,510 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2879109.3333333335, ans=0.07 2023-10-09 21:32:04,472 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2879109.3333333335, ans=0.1 2023-10-09 21:32:06,527 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2879109.3333333335, ans=0.125 2023-10-09 21:32:14,186 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+02 3.053e+02 3.349e+02 3.952e+02 6.213e+02, threshold=6.698e+02, percent-clipped=0.0 2023-10-09 21:32:29,436 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-10-09 21:32:32,674 INFO [train.py:1031] (3/4) Epoch 14, batch 32250, loss[loss=0.2011, simple_loss=0.2282, pruned_loss=0.06459, ctc_loss=0.1124, over 16013.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2648, pruned_loss=0.05707, ctc_loss=0.09986, over 3298345.22 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:32:40,633 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2023-10-09 21:32:46,319 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2879296.0, ans=0.125 2023-10-09 21:32:53,970 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2879296.0, ans=0.125 2023-10-09 21:33:16,288 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2879389.3333333335, ans=0.125 2023-10-09 21:33:33,810 INFO [train.py:1031] (3/4) Epoch 14, batch 32300, loss[loss=0.2266, simple_loss=0.292, pruned_loss=0.05752, ctc_loss=0.1154, over 15437.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2632, pruned_loss=0.05792, ctc_loss=0.1014, over 3292601.11 frames. ], batch size: 529, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:33:39,677 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2879482.6666666665, ans=15.0 2023-10-09 21:34:19,601 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+02 3.404e+02 3.971e+02 4.753e+02 7.959e+02, threshold=7.942e+02, percent-clipped=3.0 2023-10-09 21:34:27,988 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2879669.3333333335, ans=0.1 2023-10-09 21:34:39,152 INFO [train.py:1031] (3/4) Epoch 14, batch 32350, loss[loss=0.2907, simple_loss=0.3611, pruned_loss=0.07925, ctc_loss=0.1548, over 16561.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2743, pruned_loss=0.05919, ctc_loss=0.1049, over 3296567.95 frames. ], batch size: 350, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:34:48,465 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-10-09 21:35:05,683 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2879809.3333333335, ans=0.2 2023-10-09 21:35:13,040 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2879809.3333333335, ans=0.1 2023-10-09 21:35:25,999 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2879856.0, ans=0.07 2023-10-09 21:35:27,330 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-10-09 21:35:40,828 INFO [train.py:1031] (3/4) Epoch 14, batch 32400, loss[loss=0.2678, simple_loss=0.3105, pruned_loss=0.08271, ctc_loss=0.1494, over 16963.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2782, pruned_loss=0.05932, ctc_loss=0.1058, over 3292936.16 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:35:47,969 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2879949.3333333335, ans=0.2 2023-10-09 21:36:03,821 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2879996.0, ans=0.0 2023-10-09 21:36:04,916 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880042.6666666665, ans=0.1 2023-10-09 21:36:12,688 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2880042.6666666665, ans=0.0 2023-10-09 21:36:21,774 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2880089.3333333335, ans=0.125 2023-10-09 21:36:26,226 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.219e+02 3.562e+02 4.144e+02 6.944e+02, threshold=7.124e+02, percent-clipped=0.0 2023-10-09 21:36:34,017 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2880136.0, ans=0.0 2023-10-09 21:36:43,386 INFO [train.py:1031] (3/4) Epoch 14, batch 32450, loss[loss=0.1961, simple_loss=0.2485, pruned_loss=0.05364, ctc_loss=0.09104, over 16878.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2757, pruned_loss=0.0606, ctc_loss=0.1078, over 3298282.08 frames. ], batch size: 243, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:36:44,941 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.36 vs. limit=15.0 2023-10-09 21:36:51,024 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2880182.6666666665, ans=0.0 2023-10-09 21:36:53,806 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2880182.6666666665, ans=0.125 2023-10-09 21:36:56,487 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880229.3333333335, ans=0.1 2023-10-09 21:37:11,931 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2880276.0, ans=0.125 2023-10-09 21:37:44,364 INFO [train.py:1031] (3/4) Epoch 14, batch 32500, loss[loss=0.1744, simple_loss=0.2307, pruned_loss=0.04318, ctc_loss=0.079, over 16775.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2699, pruned_loss=0.06046, ctc_loss=0.1072, over 3303127.28 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:37:58,133 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2880462.6666666665, ans=0.2 2023-10-09 21:37:58,246 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2880462.6666666665, ans=0.125 2023-10-09 21:37:58,363 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-10-09 21:38:00,883 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2880462.6666666665, ans=0.2 2023-10-09 21:38:13,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2880509.3333333335, ans=0.09899494936611666 2023-10-09 21:38:20,438 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=22.5 2023-10-09 21:38:29,652 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2880556.0, ans=0.04949747468305833 2023-10-09 21:38:31,981 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.513e+02 2.966e+02 3.455e+02 3.936e+02 8.435e+02, threshold=6.910e+02, percent-clipped=1.0 2023-10-09 21:38:35,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2880602.6666666665, ans=0.1 2023-10-09 21:38:46,475 INFO [train.py:1031] (3/4) Epoch 14, batch 32550, loss[loss=0.1367, simple_loss=0.2077, pruned_loss=0.02405, ctc_loss=0.04385, over 16701.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2624, pruned_loss=0.05584, ctc_loss=0.09922, over 3305562.97 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:39:21,596 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2023-10-09 21:39:25,238 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2880789.3333333335, ans=0.1 2023-10-09 21:39:43,857 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2880836.0, ans=0.1 2023-10-09 21:39:46,870 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-10-09 21:39:47,248 INFO [train.py:1031] (3/4) Epoch 14, batch 32600, loss[loss=0.1773, simple_loss=0.2267, pruned_loss=0.04739, ctc_loss=0.08265, over 16817.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2585, pruned_loss=0.05475, ctc_loss=0.09704, over 3298883.79 frames. ], batch size: 141, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:40:15,831 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=22.5 2023-10-09 21:40:31,008 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-10-09 21:40:32,468 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-10-09 21:40:34,041 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.902e+02 3.409e+02 5.024e+02 1.088e+03, threshold=6.817e+02, percent-clipped=5.0 2023-10-09 21:40:38,255 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2023-10-09 21:40:48,729 INFO [train.py:1031] (3/4) Epoch 14, batch 32650, loss[loss=0.1582, simple_loss=0.2276, pruned_loss=0.03335, ctc_loss=0.05511, over 11325.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2636, pruned_loss=0.05596, ctc_loss=0.0981, over 3295204.28 frames. ], batch size: 41, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:40:49,028 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2881116.0, ans=0.125 2023-10-09 21:41:20,709 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2881209.3333333335, ans=0.125 2023-10-09 21:41:23,924 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2881209.3333333335, ans=0.1 2023-10-09 21:41:25,313 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=22.5 2023-10-09 21:41:26,160 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2881256.0, ans=0.125 2023-10-09 21:41:39,745 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2881302.6666666665, ans=0.0 2023-10-09 21:41:48,180 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2881302.6666666665, ans=0.125 2023-10-09 21:41:52,684 INFO [train.py:1031] (3/4) Epoch 14, batch 32700, loss[loss=0.3033, simple_loss=0.3375, pruned_loss=0.0992, ctc_loss=0.177, over 16528.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2742, pruned_loss=0.05975, ctc_loss=0.1042, over 3291462.38 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:42:24,063 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2881442.6666666665, ans=0.1 2023-10-09 21:42:41,079 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2881489.3333333335, ans=0.0 2023-10-09 21:42:41,824 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+02 3.539e+02 4.014e+02 5.290e+02 1.076e+03, threshold=8.028e+02, percent-clipped=8.0 2023-10-09 21:42:55,732 INFO [train.py:1031] (3/4) Epoch 14, batch 32750, loss[loss=0.3068, simple_loss=0.3263, pruned_loss=0.1064, ctc_loss=0.1868, over 16804.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2813, pruned_loss=0.06393, ctc_loss=0.1114, over 3288241.76 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:43:53,613 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:43:55,753 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2881816.0, ans=0.07 2023-10-09 21:43:57,092 INFO [train.py:1031] (3/4) Epoch 14, batch 32800, loss[loss=0.2027, simple_loss=0.28, pruned_loss=0.04694, ctc_loss=0.07887, over 16787.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2807, pruned_loss=0.06465, ctc_loss=0.1124, over 3302250.49 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:44:23,696 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:44:31,865 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2023-10-09 21:44:46,103 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.217e+02 3.697e+02 4.305e+02 8.023e+02, threshold=7.395e+02, percent-clipped=0.0 2023-10-09 21:44:57,225 INFO [train.py:1031] (3/4) Epoch 14, batch 32850, loss[loss=0.1847, simple_loss=0.2425, pruned_loss=0.04798, ctc_loss=0.07739, over 16567.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2798, pruned_loss=0.06427, ctc_loss=0.1118, over 3310388.86 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:44:59,705 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2882049.3333333335, ans=0.125 2023-10-09 21:45:04,425 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2882049.3333333335, ans=0.0 2023-10-09 21:45:08,647 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2882096.0, ans=0.125 2023-10-09 21:45:11,927 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2882096.0, ans=0.0 2023-10-09 21:45:27,343 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2882142.6666666665, ans=0.0 2023-10-09 21:45:28,495 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2882142.6666666665, ans=0.125 2023-10-09 21:45:29,556 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2882142.6666666665, ans=0.125 2023-10-09 21:45:32,246 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=2882142.6666666665, ans=0.95 2023-10-09 21:45:51,297 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-10-09 21:45:59,355 INFO [train.py:1031] (3/4) Epoch 14, batch 32900, loss[loss=0.2269, simple_loss=0.2936, pruned_loss=0.05971, ctc_loss=0.1018, over 16701.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2812, pruned_loss=0.06447, ctc_loss=0.1123, over 3307052.61 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:46:10,846 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2882329.3333333335, ans=0.125 2023-10-09 21:46:19,143 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2882329.3333333335, ans=0.0 2023-10-09 21:46:35,132 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:46:41,672 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2882422.6666666665, ans=10.0 2023-10-09 21:46:51,627 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+02 3.233e+02 3.650e+02 4.547e+02 8.623e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 21:47:02,671 INFO [train.py:1031] (3/4) Epoch 14, batch 32950, loss[loss=0.2488, simple_loss=0.3025, pruned_loss=0.0738, ctc_loss=0.1189, over 16669.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2877, pruned_loss=0.06532, ctc_loss=0.1141, over 3309309.08 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:47:13,342 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2882516.0, ans=0.0 2023-10-09 21:47:15,108 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2882562.6666666665, ans=0.125 2023-10-09 21:47:55,521 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2023-10-09 21:48:05,296 INFO [train.py:1031] (3/4) Epoch 14, batch 33000, loss[loss=0.2121, simple_loss=0.2697, pruned_loss=0.05703, ctc_loss=0.101, over 16712.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2897, pruned_loss=0.06755, ctc_loss=0.1182, over 3310664.46 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:48:05,297 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 21:48:23,063 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2327, simple_loss=0.3031, pruned_loss=0.06268, ctc_loss=0.09218, over 1796401.00 frames. 2023-10-09 21:48:23,063 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 21:48:24,896 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=22.5 2023-10-09 21:48:30,965 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2882749.3333333335, ans=0.0 2023-10-09 21:48:39,827 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2882796.0, ans=0.2 2023-10-09 21:48:43,561 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2882796.0, ans=0.09899494936611666 2023-10-09 21:49:12,659 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2882936.0, ans=0.125 2023-10-09 21:49:13,417 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.435e+02 3.950e+02 5.096e+02 8.924e+02, threshold=7.899e+02, percent-clipped=1.0 2023-10-09 21:49:23,565 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-10-09 21:49:24,081 INFO [train.py:1031] (3/4) Epoch 14, batch 33050, loss[loss=0.2334, simple_loss=0.2821, pruned_loss=0.06915, ctc_loss=0.116, over 17049.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2877, pruned_loss=0.06799, ctc_loss=0.1187, over 3306171.27 frames. ], batch size: 259, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:49:25,877 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2882982.6666666665, ans=0.125 2023-10-09 21:50:09,037 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2883122.6666666665, ans=0.125 2023-10-09 21:50:25,684 INFO [train.py:1031] (3/4) Epoch 14, batch 33100, loss[loss=0.2448, simple_loss=0.2752, pruned_loss=0.08042, ctc_loss=0.1338, over 16646.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2848, pruned_loss=0.06772, ctc_loss=0.118, over 3295747.80 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:50:35,655 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=2883216.0, ans=0.5 2023-10-09 21:50:37,871 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2883262.6666666665, ans=0.125 2023-10-09 21:50:43,357 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2883262.6666666665, ans=0.1 2023-10-09 21:50:56,727 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2883309.3333333335, ans=0.0 2023-10-09 21:51:18,563 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.087e+02 3.637e+02 4.211e+02 8.906e+02, threshold=7.275e+02, percent-clipped=1.0 2023-10-09 21:51:28,123 INFO [train.py:1031] (3/4) Epoch 14, batch 33150, loss[loss=0.1714, simple_loss=0.2473, pruned_loss=0.03528, ctc_loss=0.06219, over 16774.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2821, pruned_loss=0.06479, ctc_loss=0.1136, over 3300965.06 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:51:35,145 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2883449.3333333335, ans=0.0 2023-10-09 21:51:58,061 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2883542.6666666665, ans=0.1 2023-10-09 21:52:05,801 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2883589.3333333335, ans=0.125 2023-10-09 21:52:10,454 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.73 vs. limit=10.0 2023-10-09 21:52:19,891 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2883636.0, ans=0.5 2023-10-09 21:52:19,899 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2883636.0, ans=0.125 2023-10-09 21:52:30,774 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2883682.6666666665, ans=0.125 2023-10-09 21:52:31,978 INFO [train.py:1031] (3/4) Epoch 14, batch 33200, loss[loss=0.1906, simple_loss=0.2429, pruned_loss=0.05079, ctc_loss=0.0921, over 16799.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2826, pruned_loss=0.06389, ctc_loss=0.1128, over 3299977.12 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:52:32,371 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2883682.6666666665, ans=0.0 2023-10-09 21:52:49,100 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2883729.3333333335, ans=0.125 2023-10-09 21:52:52,925 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2883729.3333333335, ans=0.125 2023-10-09 21:52:58,817 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2883776.0, ans=0.0 2023-10-09 21:52:59,359 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.07 vs. limit=5.0 2023-10-09 21:53:24,752 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-10-09 21:53:25,122 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+02 3.107e+02 3.465e+02 4.067e+02 6.400e+02, threshold=6.930e+02, percent-clipped=0.0 2023-10-09 21:53:28,582 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2883869.3333333335, ans=0.1 2023-10-09 21:53:28,680 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2883869.3333333335, ans=10.0 2023-10-09 21:53:32,623 INFO [train.py:1031] (3/4) Epoch 14, batch 33250, loss[loss=0.1914, simple_loss=0.2464, pruned_loss=0.0503, ctc_loss=0.08977, over 16666.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2765, pruned_loss=0.06366, ctc_loss=0.112, over 3295963.31 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:53:40,487 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2883916.0, ans=0.1 2023-10-09 21:53:45,915 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2883962.6666666665, ans=0.0 2023-10-09 21:53:55,449 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2883962.6666666665, ans=0.0 2023-10-09 21:54:06,202 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2884009.3333333335, ans=0.125 2023-10-09 21:54:21,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2884102.6666666665, ans=0.0 2023-10-09 21:54:35,047 INFO [train.py:1031] (3/4) Epoch 14, batch 33300, loss[loss=0.237, simple_loss=0.272, pruned_loss=0.07498, ctc_loss=0.1303, over 16774.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2702, pruned_loss=0.06286, ctc_loss=0.1105, over 3303230.03 frames. ], batch size: 329, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:54:39,710 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2884149.3333333335, ans=0.2 2023-10-09 21:54:40,254 INFO [scaling.py:979] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.83 vs. limit=5.0 2023-10-09 21:55:00,201 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-10-09 21:55:03,572 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2884242.6666666665, ans=0.125 2023-10-09 21:55:08,364 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2884242.6666666665, ans=0.125 2023-10-09 21:55:12,939 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2884289.3333333335, ans=0.125 2023-10-09 21:55:32,062 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.141e+02 3.663e+02 4.502e+02 8.687e+02, threshold=7.326e+02, percent-clipped=2.0 2023-10-09 21:55:38,481 INFO [train.py:1031] (3/4) Epoch 14, batch 33350, loss[loss=0.2014, simple_loss=0.2623, pruned_loss=0.05217, ctc_loss=0.09041, over 16692.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2744, pruned_loss=0.06303, ctc_loss=0.111, over 3302456.05 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:56:03,019 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884476.0, ans=0.1 2023-10-09 21:56:04,943 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-10-09 21:56:09,608 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2884476.0, ans=0.0 2023-10-09 21:56:09,662 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2884476.0, ans=0.1 2023-10-09 21:56:16,841 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2023-10-09 21:56:23,877 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:56:39,483 INFO [train.py:1031] (3/4) Epoch 14, batch 33400, loss[loss=0.2588, simple_loss=0.3029, pruned_loss=0.08, ctc_loss=0.1367, over 16618.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2793, pruned_loss=0.0635, ctc_loss=0.1121, over 3311445.30 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:56:44,704 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2884616.0, ans=0.125 2023-10-09 21:57:22,808 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-10-09 21:57:36,673 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+02 3.309e+02 3.821e+02 4.723e+02 1.099e+03, threshold=7.641e+02, percent-clipped=5.0 2023-10-09 21:57:37,258 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.40 vs. limit=10.0 2023-10-09 21:57:40,278 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2884802.6666666665, ans=0.07 2023-10-09 21:57:42,137 INFO [train.py:1031] (3/4) Epoch 14, batch 33450, loss[loss=0.2186, simple_loss=0.2814, pruned_loss=0.05822, ctc_loss=0.0984, over 16893.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2816, pruned_loss=0.06419, ctc_loss=0.113, over 3313311.07 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:57:42,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2884849.3333333335, ans=0.05 2023-10-09 21:57:43,615 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2884849.3333333335, ans=0.125 2023-10-09 21:57:51,628 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2884849.3333333335, ans=0.0 2023-10-09 21:57:53,500 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2884849.3333333335, ans=15.0 2023-10-09 21:58:00,363 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2023-10-09 21:58:06,297 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2884896.0, ans=0.2 2023-10-09 21:58:26,310 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:58:27,443 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2884989.3333333335, ans=0.1 2023-10-09 21:58:39,766 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2885036.0, ans=0.0 2023-10-09 21:58:47,411 INFO [train.py:1031] (3/4) Epoch 14, batch 33500, loss[loss=0.226, simple_loss=0.3017, pruned_loss=0.05561, ctc_loss=0.0977, over 16176.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2836, pruned_loss=0.06387, ctc_loss=0.1112, over 3314326.36 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:59:02,525 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-10-09 21:59:33,925 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2885222.6666666665, ans=0.125 2023-10-09 21:59:46,063 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.525e+02 4.202e+02 5.122e+02 8.777e+02, threshold=8.403e+02, percent-clipped=5.0 2023-10-09 21:59:48,884 INFO [train.py:1031] (3/4) Epoch 14, batch 33550, loss[loss=0.2283, simple_loss=0.2689, pruned_loss=0.06905, ctc_loss=0.1237, over 16734.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2798, pruned_loss=0.06364, ctc_loss=0.1107, over 3308226.38 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:59:53,612 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:59:54,568 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2885316.0, ans=0.04949747468305833 2023-10-09 22:00:31,383 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2885456.0, ans=0.07 2023-10-09 22:00:43,075 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2885502.6666666665, ans=0.0 2023-10-09 22:00:49,676 INFO [train.py:1031] (3/4) Epoch 14, batch 33600, loss[loss=0.208, simple_loss=0.252, pruned_loss=0.06091, ctc_loss=0.1055, over 16829.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2737, pruned_loss=0.06289, ctc_loss=0.1093, over 3309063.65 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:01:08,521 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2885596.0, ans=0.0 2023-10-09 22:01:33,851 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2885689.3333333335, ans=0.0 2023-10-09 22:01:45,622 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2885736.0, ans=0.125 2023-10-09 22:01:48,017 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.170e+02 3.773e+02 4.552e+02 1.576e+03, threshold=7.545e+02, percent-clipped=1.0 2023-10-09 22:01:48,407 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2885782.6666666665, ans=0.2 2023-10-09 22:01:49,718 INFO [train.py:1031] (3/4) Epoch 14, batch 33650, loss[loss=0.2165, simple_loss=0.2646, pruned_loss=0.06201, ctc_loss=0.1109, over 16820.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2691, pruned_loss=0.06301, ctc_loss=0.1098, over 3304084.43 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:01:51,038 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2885782.6666666665, ans=0.5 2023-10-09 22:02:05,680 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2885829.3333333335, ans=0.1 2023-10-09 22:02:12,631 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2885829.3333333335, ans=0.125 2023-10-09 22:02:36,843 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-10-09 22:02:51,315 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2886016.0, ans=0.125 2023-10-09 22:02:52,473 INFO [train.py:1031] (3/4) Epoch 14, batch 33700, loss[loss=0.2696, simple_loss=0.3255, pruned_loss=0.08049, ctc_loss=0.1315, over 17036.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2734, pruned_loss=0.06497, ctc_loss=0.1127, over 3307606.77 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:03:09,281 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2886062.6666666665, ans=0.0 2023-10-09 22:03:32,430 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2886156.0, ans=0.0 2023-10-09 22:03:33,430 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2886156.0, ans=0.0 2023-10-09 22:03:52,832 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+02 3.271e+02 3.899e+02 4.405e+02 9.865e+02, threshold=7.797e+02, percent-clipped=1.0 2023-10-09 22:03:52,873 INFO [train.py:1031] (3/4) Epoch 14, batch 33750, loss[loss=0.2137, simple_loss=0.2466, pruned_loss=0.0665, ctc_loss=0.1193, over 15515.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2772, pruned_loss=0.06701, ctc_loss=0.1162, over 3310758.58 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:03:53,906 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2886249.3333333335, ans=0.125 2023-10-09 22:04:07,429 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2886296.0, ans=0.2 2023-10-09 22:04:10,210 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2886296.0, ans=0.125 2023-10-09 22:04:11,262 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2886296.0, ans=0.125 2023-10-09 22:04:15,530 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-10-09 22:04:42,034 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2886436.0, ans=0.125 2023-10-09 22:04:54,305 INFO [train.py:1031] (3/4) Epoch 14, batch 33800, loss[loss=0.1882, simple_loss=0.2402, pruned_loss=0.05004, ctc_loss=0.09057, over 16792.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2766, pruned_loss=0.06667, ctc_loss=0.1154, over 3310105.56 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:04:57,410 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2886482.6666666665, ans=0.09899494936611666 2023-10-09 22:04:58,910 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=22.5 2023-10-09 22:05:08,198 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2886529.3333333335, ans=0.125 2023-10-09 22:05:23,441 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2886576.0, ans=0.2 2023-10-09 22:05:25,835 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2023-10-09 22:05:41,162 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:05:41,356 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.65 vs. limit=10.0 2023-10-09 22:05:55,360 INFO [train.py:1031] (3/4) Epoch 14, batch 33850, loss[loss=0.1999, simple_loss=0.2514, pruned_loss=0.05437, ctc_loss=0.09909, over 16829.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2709, pruned_loss=0.06521, ctc_loss=0.1132, over 3301565.68 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:05:56,412 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+02 3.178e+02 3.599e+02 4.092e+02 7.716e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:06:20,380 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.22 vs. limit=22.5 2023-10-09 22:06:21,671 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-10-09 22:06:24,686 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2886809.3333333335, ans=0.1 2023-10-09 22:06:38,224 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-10-09 22:06:56,611 INFO [train.py:1031] (3/4) Epoch 14, batch 33900, loss[loss=0.1888, simple_loss=0.2437, pruned_loss=0.05064, ctc_loss=0.08177, over 16631.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2732, pruned_loss=0.06535, ctc_loss=0.1131, over 3310157.45 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:07:09,110 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2886996.0, ans=0.125 2023-10-09 22:07:46,623 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-10-09 22:07:54,014 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2887136.0, ans=0.0 2023-10-09 22:07:59,532 INFO [train.py:1031] (3/4) Epoch 14, batch 33950, loss[loss=0.2292, simple_loss=0.2462, pruned_loss=0.0805, ctc_loss=0.1282, over 10863.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.282, pruned_loss=0.06398, ctc_loss=0.1113, over 3295765.36 frames. ], batch size: 36, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:08:03,404 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.763e+02 3.464e+02 4.205e+02 4.959e+02 7.578e+02, threshold=8.409e+02, percent-clipped=4.0 2023-10-09 22:08:16,292 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2887229.3333333335, ans=0.0 2023-10-09 22:08:23,383 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2887276.0, ans=0.125 2023-10-09 22:08:26,580 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-10-09 22:08:40,373 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2887322.6666666665, ans=0.125 2023-10-09 22:08:42,008 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2887322.6666666665, ans=0.1 2023-10-09 22:08:44,086 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2887322.6666666665, ans=0.125 2023-10-09 22:08:54,472 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2887369.3333333335, ans=0.125 2023-10-09 22:09:02,846 INFO [train.py:1031] (3/4) Epoch 14, batch 34000, loss[loss=0.2741, simple_loss=0.3651, pruned_loss=0.06598, ctc_loss=0.1279, over 16616.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2955, pruned_loss=0.06271, ctc_loss=0.1114, over 3289355.94 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:09:05,604 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2023-10-09 22:09:15,799 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2023-10-09 22:09:19,676 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2887462.6666666665, ans=0.125 2023-10-09 22:09:24,111 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2887462.6666666665, ans=0.125 2023-10-09 22:09:29,805 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2887509.3333333335, ans=0.125 2023-10-09 22:09:49,570 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-10-09 22:10:03,086 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2887649.3333333335, ans=0.0 2023-10-09 22:10:03,844 INFO [train.py:1031] (3/4) Epoch 14, batch 34050, loss[loss=0.219, simple_loss=0.2894, pruned_loss=0.05438, ctc_loss=0.09961, over 16802.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2947, pruned_loss=0.06068, ctc_loss=0.1085, over 3291964.18 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:10:05,150 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2887649.3333333335, ans=0.125 2023-10-09 22:10:08,601 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.098e+02 3.845e+02 4.884e+02 8.519e+02, threshold=7.690e+02, percent-clipped=1.0 2023-10-09 22:10:08,845 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2887649.3333333335, ans=0.125 2023-10-09 22:10:08,993 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2887649.3333333335, ans=0.1 2023-10-09 22:10:13,951 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.74 vs. limit=6.0 2023-10-09 22:10:20,474 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2887696.0, ans=0.07 2023-10-09 22:10:23,493 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.10 vs. limit=6.0 2023-10-09 22:10:40,748 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:11:04,709 INFO [train.py:1031] (3/4) Epoch 14, batch 34100, loss[loss=0.2297, simple_loss=0.2856, pruned_loss=0.06532, ctc_loss=0.1082, over 12502.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2917, pruned_loss=0.06186, ctc_loss=0.1101, over 3285989.64 frames. ], batch size: 40, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:11:06,138 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2887882.6666666665, ans=0.125 2023-10-09 22:11:20,354 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2887929.3333333335, ans=0.1 2023-10-09 22:11:52,206 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2888022.6666666665, ans=0.125 2023-10-09 22:12:05,998 INFO [train.py:1031] (3/4) Epoch 14, batch 34150, loss[loss=0.2516, simple_loss=0.3281, pruned_loss=0.06445, ctc_loss=0.1155, over 13549.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.294, pruned_loss=0.06442, ctc_loss=0.1142, over 3287175.04 frames. ], batch size: 37, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:12:11,415 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+02 3.257e+02 3.702e+02 4.193e+02 7.598e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 22:12:13,919 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2888116.0, ans=0.1 2023-10-09 22:12:15,039 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888116.0, ans=0.125 2023-10-09 22:12:24,453 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2888162.6666666665, ans=0.125 2023-10-09 22:12:35,613 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2888209.3333333335, ans=0.125 2023-10-09 22:12:46,602 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2888256.0, ans=0.125 2023-10-09 22:13:08,599 INFO [train.py:1031] (3/4) Epoch 14, batch 34200, loss[loss=0.2238, simple_loss=0.2596, pruned_loss=0.07049, ctc_loss=0.1174, over 16793.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2909, pruned_loss=0.06542, ctc_loss=0.1159, over 3293043.27 frames. ], batch size: 131, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:13:31,960 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2888442.6666666665, ans=0.125 2023-10-09 22:13:33,997 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2888442.6666666665, ans=0.0 2023-10-09 22:13:34,083 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2888442.6666666665, ans=0.125 2023-10-09 22:13:48,747 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=15.0 2023-10-09 22:13:51,990 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2888489.3333333335, ans=0.125 2023-10-09 22:13:52,005 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2888489.3333333335, ans=0.125 2023-10-09 22:13:54,304 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-10-09 22:14:09,170 INFO [train.py:1031] (3/4) Epoch 14, batch 34250, loss[loss=0.2032, simple_loss=0.2632, pruned_loss=0.05212, ctc_loss=0.09735, over 16813.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2826, pruned_loss=0.06385, ctc_loss=0.1131, over 3295025.71 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:14:13,912 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2888582.6666666665, ans=0.2 2023-10-09 22:14:15,724 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.386e+02 3.191e+02 3.616e+02 4.129e+02 7.013e+02, threshold=7.231e+02, percent-clipped=0.0 2023-10-09 22:14:18,144 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.42 vs. limit=10.0 2023-10-09 22:14:18,951 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-10-09 22:14:30,581 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2888629.3333333335, ans=0.125 2023-10-09 22:14:35,023 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2888676.0, ans=0.125 2023-10-09 22:14:54,099 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-10-09 22:15:04,359 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888769.3333333335, ans=0.1 2023-10-09 22:15:10,738 INFO [train.py:1031] (3/4) Epoch 14, batch 34300, loss[loss=0.2285, simple_loss=0.2658, pruned_loss=0.07296, ctc_loss=0.1131, over 16711.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2804, pruned_loss=0.06387, ctc_loss=0.1126, over 3297520.15 frames. ], batch size: 111, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:15:45,852 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2888956.0, ans=0.0 2023-10-09 22:15:58,940 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-10-09 22:16:01,111 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2889002.6666666665, ans=0.0 2023-10-09 22:16:01,481 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-10-09 22:16:06,282 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2023-10-09 22:16:07,705 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2889002.6666666665, ans=0.125 2023-10-09 22:16:08,059 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2023-10-09 22:16:09,862 INFO [train.py:1031] (3/4) Epoch 14, batch 34350, loss[loss=0.2195, simple_loss=0.2718, pruned_loss=0.06199, ctc_loss=0.1079, over 17005.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2798, pruned_loss=0.06454, ctc_loss=0.1133, over 3306681.77 frames. ], batch size: 203, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:16:13,470 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2889049.3333333335, ans=0.0 2023-10-09 22:16:16,844 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 3.283e+02 3.799e+02 4.453e+02 1.021e+03, threshold=7.599e+02, percent-clipped=4.0 2023-10-09 22:16:24,081 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=15.0 2023-10-09 22:16:46,363 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2889189.3333333335, ans=0.125 2023-10-09 22:16:50,066 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2889189.3333333335, ans=0.2 2023-10-09 22:16:51,026 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2889189.3333333335, ans=0.125 2023-10-09 22:17:01,168 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2889236.0, ans=0.2 2023-10-09 22:17:08,269 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-10-09 22:17:10,486 INFO [train.py:1031] (3/4) Epoch 14, batch 34400, loss[loss=0.1935, simple_loss=0.2261, pruned_loss=0.05901, ctc_loss=0.1073, over 15567.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2783, pruned_loss=0.06421, ctc_loss=0.1126, over 3307259.37 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:17:29,682 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2023-10-09 22:18:11,064 INFO [train.py:1031] (3/4) Epoch 14, batch 34450, loss[loss=0.2198, simple_loss=0.2696, pruned_loss=0.06244, ctc_loss=0.1128, over 16434.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2784, pruned_loss=0.06553, ctc_loss=0.1154, over 3306584.98 frames. ], batch size: 466, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:18:19,264 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+02 3.186e+02 3.591e+02 4.331e+02 7.838e+02, threshold=7.182e+02, percent-clipped=2.0 2023-10-09 22:18:28,983 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2889562.6666666665, ans=0.1 2023-10-09 22:18:45,500 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889609.3333333335, ans=0.1 2023-10-09 22:19:02,447 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2889702.6666666665, ans=0.125 2023-10-09 22:19:14,167 INFO [train.py:1031] (3/4) Epoch 14, batch 34500, loss[loss=0.2498, simple_loss=0.3316, pruned_loss=0.06062, ctc_loss=0.1171, over 16868.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2838, pruned_loss=0.06689, ctc_loss=0.1176, over 3297563.88 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:19:39,927 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2889842.6666666665, ans=0.125 2023-10-09 22:19:54,711 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2889889.3333333335, ans=0.125 2023-10-09 22:19:55,734 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2889889.3333333335, ans=0.125 2023-10-09 22:20:00,871 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-10-09 22:20:20,485 INFO [train.py:1031] (3/4) Epoch 14, batch 34550, loss[loss=0.2704, simple_loss=0.3398, pruned_loss=0.0724, ctc_loss=0.1404, over 16865.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2914, pruned_loss=0.06625, ctc_loss=0.1171, over 3296566.65 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:20:21,769 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2889982.6666666665, ans=0.0 2023-10-09 22:20:27,838 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2889982.6666666665, ans=0.2 2023-10-09 22:20:28,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2889982.6666666665, ans=0.125 2023-10-09 22:20:30,359 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.650e+02 3.661e+02 4.529e+02 6.004e+02 9.470e+02, threshold=9.059e+02, percent-clipped=10.0 2023-10-09 22:20:32,433 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2890029.3333333335, ans=0.0 2023-10-09 22:21:03,782 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2890122.6666666665, ans=0.125 2023-10-09 22:21:03,932 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=22.5 2023-10-09 22:21:24,118 INFO [train.py:1031] (3/4) Epoch 14, batch 34600, loss[loss=0.1833, simple_loss=0.2576, pruned_loss=0.03911, ctc_loss=0.07718, over 16956.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2911, pruned_loss=0.06502, ctc_loss=0.1153, over 3290414.75 frames. ], batch size: 259, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:21:31,483 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2890216.0, ans=0.125 2023-10-09 22:21:33,056 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2890216.0, ans=0.0 2023-10-09 22:21:52,594 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-10-09 22:21:55,991 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2890309.3333333335, ans=0.125 2023-10-09 22:22:02,186 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2890356.0, ans=0.04949747468305833 2023-10-09 22:22:18,619 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2890402.6666666665, ans=0.2 2023-10-09 22:22:22,797 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2023-10-09 22:22:25,234 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2890449.3333333335, ans=0.125 2023-10-09 22:22:25,918 INFO [train.py:1031] (3/4) Epoch 14, batch 34650, loss[loss=0.2234, simple_loss=0.281, pruned_loss=0.06087, ctc_loss=0.11, over 16972.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2841, pruned_loss=0.06125, ctc_loss=0.1088, over 3292072.02 frames. ], batch size: 293, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:22:29,246 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-10-09 22:22:32,529 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2890449.3333333335, ans=0.125 2023-10-09 22:22:37,035 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.928e+02 3.445e+02 4.113e+02 6.666e+02, threshold=6.890e+02, percent-clipped=0.0 2023-10-09 22:22:43,909 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2890496.0, ans=0.0 2023-10-09 22:22:44,847 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2890496.0, ans=0.04949747468305833 2023-10-09 22:23:27,780 INFO [train.py:1031] (3/4) Epoch 14, batch 34700, loss[loss=0.2407, simple_loss=0.2895, pruned_loss=0.07006, ctc_loss=0.1294, over 16940.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2843, pruned_loss=0.06311, ctc_loss=0.1116, over 3296122.14 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:23:31,297 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.37 vs. limit=6.0 2023-10-09 22:23:43,626 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-10-09 22:23:51,047 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2890729.3333333335, ans=0.0 2023-10-09 22:23:57,205 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2890776.0, ans=0.125 2023-10-09 22:24:03,526 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2890776.0, ans=0.1 2023-10-09 22:24:03,644 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2890776.0, ans=0.0 2023-10-09 22:24:05,290 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2890776.0, ans=0.125 2023-10-09 22:24:05,333 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2890776.0, ans=0.125 2023-10-09 22:24:31,584 INFO [train.py:1031] (3/4) Epoch 14, batch 34750, loss[loss=0.2623, simple_loss=0.3099, pruned_loss=0.08012, ctc_loss=0.1361, over 16748.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2884, pruned_loss=0.06626, ctc_loss=0.1163, over 3292842.65 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:24:35,375 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2023-10-09 22:24:42,690 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+02 3.549e+02 4.003e+02 4.772e+02 8.039e+02, threshold=8.005e+02, percent-clipped=2.0 2023-10-09 22:24:54,536 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2891009.3333333335, ans=0.1 2023-10-09 22:25:24,730 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2891102.6666666665, ans=0.0 2023-10-09 22:25:27,277 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2023-10-09 22:25:30,494 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2891149.3333333335, ans=0.0 2023-10-09 22:25:31,246 INFO [train.py:1031] (3/4) Epoch 14, batch 34800, loss[loss=0.2713, simple_loss=0.2981, pruned_loss=0.08915, ctc_loss=0.1655, over 16833.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2871, pruned_loss=0.06719, ctc_loss=0.1175, over 3289055.70 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:25:59,528 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2891242.6666666665, ans=0.125 2023-10-09 22:26:06,059 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2891242.6666666665, ans=0.1 2023-10-09 22:26:11,958 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2891289.3333333335, ans=0.07 2023-10-09 22:26:32,579 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2891382.6666666665, ans=0.125 2023-10-09 22:26:33,344 INFO [train.py:1031] (3/4) Epoch 14, batch 34850, loss[loss=0.214, simple_loss=0.2664, pruned_loss=0.05974, ctc_loss=0.1053, over 16741.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2845, pruned_loss=0.06768, ctc_loss=0.1182, over 3298118.11 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:26:34,893 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-10-09 22:26:46,834 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+02 3.209e+02 3.596e+02 4.244e+02 8.793e+02, threshold=7.192e+02, percent-clipped=1.0 2023-10-09 22:26:54,528 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-10-09 22:26:58,397 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2891476.0, ans=0.0 2023-10-09 22:27:02,871 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2023-10-09 22:27:12,814 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2891522.6666666665, ans=0.0 2023-10-09 22:27:23,728 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:27:28,207 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2891569.3333333335, ans=0.0 2023-10-09 22:27:34,013 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2891569.3333333335, ans=0.125 2023-10-09 22:27:35,832 INFO [train.py:1031] (3/4) Epoch 14, batch 34900, loss[loss=0.205, simple_loss=0.2654, pruned_loss=0.05446, ctc_loss=0.08916, over 16654.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2788, pruned_loss=0.06676, ctc_loss=0.1165, over 3296483.25 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:28:04,061 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=2891709.3333333335, ans=22.5 2023-10-09 22:28:05,959 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2891709.3333333335, ans=0.125 2023-10-09 22:28:09,479 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2891709.3333333335, ans=0.0 2023-10-09 22:28:25,431 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2891802.6666666665, ans=0.0 2023-10-09 22:28:38,935 INFO [train.py:1031] (3/4) Epoch 14, batch 34950, loss[loss=0.2198, simple_loss=0.2799, pruned_loss=0.05966, ctc_loss=0.1009, over 13246.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2803, pruned_loss=0.06653, ctc_loss=0.1163, over 3293181.05 frames. ], batch size: 38, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:28:54,411 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 3.348e+02 3.779e+02 4.801e+02 1.162e+03, threshold=7.559e+02, percent-clipped=3.0 2023-10-09 22:28:58,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2891896.0, ans=0.0 2023-10-09 22:29:07,064 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2891942.6666666665, ans=0.1 2023-10-09 22:29:21,814 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2891989.3333333335, ans=0.2 2023-10-09 22:29:30,915 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2892036.0, ans=0.09899494936611666 2023-10-09 22:29:42,586 INFO [train.py:1031] (3/4) Epoch 14, batch 35000, loss[loss=0.181, simple_loss=0.2529, pruned_loss=0.04117, ctc_loss=0.06695, over 16735.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2815, pruned_loss=0.06579, ctc_loss=0.1153, over 3290517.30 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:29:52,435 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=12.0 2023-10-09 22:30:00,392 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:30:02,579 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2892129.3333333335, ans=0.95 2023-10-09 22:30:48,022 INFO [train.py:1031] (3/4) Epoch 14, batch 35050, loss[loss=0.2534, simple_loss=0.3063, pruned_loss=0.07251, ctc_loss=0.1384, over 16707.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2831, pruned_loss=0.06481, ctc_loss=0.1141, over 3287152.05 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:30:48,433 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2892316.0, ans=0.1 2023-10-09 22:30:54,902 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2892316.0, ans=0.0 2023-10-09 22:31:04,383 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 3.170e+02 3.753e+02 4.510e+02 9.970e+02, threshold=7.506e+02, percent-clipped=2.0 2023-10-09 22:31:07,408 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2892362.6666666665, ans=0.2 2023-10-09 22:31:24,032 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-10-09 22:31:26,767 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2892456.0, ans=0.125 2023-10-09 22:31:27,995 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2023-10-09 22:31:33,100 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2892456.0, ans=0.125 2023-10-09 22:31:41,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2892502.6666666665, ans=0.125 2023-10-09 22:31:51,701 INFO [train.py:1031] (3/4) Epoch 14, batch 35100, loss[loss=0.2298, simple_loss=0.2925, pruned_loss=0.06317, ctc_loss=0.102, over 16739.00 frames. ], tot_loss[loss=0.23, simple_loss=0.286, pruned_loss=0.06417, ctc_loss=0.1139, over 3298069.10 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:32:09,426 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=22.5 2023-10-09 22:32:14,325 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2892596.0, ans=0.1 2023-10-09 22:32:14,432 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:32:36,064 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2892689.3333333335, ans=0.125 2023-10-09 22:32:54,765 INFO [train.py:1031] (3/4) Epoch 14, batch 35150, loss[loss=0.263, simple_loss=0.3187, pruned_loss=0.07634, ctc_loss=0.1368, over 16778.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2872, pruned_loss=0.06452, ctc_loss=0.1143, over 3305394.62 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:33:05,442 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2892782.6666666665, ans=0.125 2023-10-09 22:33:09,837 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2892829.3333333335, ans=0.125 2023-10-09 22:33:12,901 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.271e+02 3.877e+02 4.489e+02 9.044e+02, threshold=7.754e+02, percent-clipped=1.0 2023-10-09 22:33:26,832 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=12.0 2023-10-09 22:33:42,077 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2892922.6666666665, ans=0.1 2023-10-09 22:33:56,335 INFO [train.py:1031] (3/4) Epoch 14, batch 35200, loss[loss=0.1796, simple_loss=0.2586, pruned_loss=0.03765, ctc_loss=0.06321, over 16859.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2869, pruned_loss=0.06233, ctc_loss=0.1103, over 3304135.65 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:33:56,671 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893016.0, ans=0.1 2023-10-09 22:34:07,244 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893062.6666666665, ans=0.1 2023-10-09 22:34:32,748 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2893156.0, ans=0.0 2023-10-09 22:34:37,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2893156.0, ans=0.0 2023-10-09 22:34:43,449 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2893156.0, ans=0.125 2023-10-09 22:34:50,267 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=2893202.6666666665, ans=0.2 2023-10-09 22:34:59,203 INFO [train.py:1031] (3/4) Epoch 14, batch 35250, loss[loss=0.1945, simple_loss=0.2532, pruned_loss=0.0501, ctc_loss=0.08917, over 16799.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2852, pruned_loss=0.06144, ctc_loss=0.1086, over 3310017.00 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:35:19,503 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.987e+02 3.599e+02 4.398e+02 6.579e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:35:19,834 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2893296.0, ans=0.025 2023-10-09 22:35:22,369 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2023-10-09 22:35:33,369 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2893342.6666666665, ans=0.0 2023-10-09 22:35:39,521 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2893389.3333333335, ans=0.1 2023-10-09 22:35:47,425 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2893389.3333333335, ans=0.1 2023-10-09 22:35:50,229 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2893389.3333333335, ans=0.0 2023-10-09 22:35:58,612 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2893436.0, ans=0.125 2023-10-09 22:36:01,260 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893436.0, ans=0.1 2023-10-09 22:36:03,463 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2893436.0, ans=10.0 2023-10-09 22:36:05,973 INFO [train.py:1031] (3/4) Epoch 14, batch 35300, loss[loss=0.193, simple_loss=0.2333, pruned_loss=0.05731, ctc_loss=0.09516, over 16471.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.296, pruned_loss=0.06377, ctc_loss=0.1129, over 3313298.99 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:36:34,703 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2893576.0, ans=0.125 2023-10-09 22:36:41,148 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2893576.0, ans=0.2 2023-10-09 22:36:42,260 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2893576.0, ans=0.125 2023-10-09 22:37:01,785 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2893669.3333333335, ans=0.125 2023-10-09 22:37:05,639 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2893669.3333333335, ans=0.125 2023-10-09 22:37:06,749 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2893669.3333333335, ans=0.0 2023-10-09 22:37:10,998 INFO [train.py:1031] (3/4) Epoch 14, batch 35350, loss[loss=0.2454, simple_loss=0.3072, pruned_loss=0.06804, ctc_loss=0.1188, over 16862.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.2989, pruned_loss=0.06582, ctc_loss=0.1159, over 3310358.91 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:37:31,481 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+02 3.418e+02 3.862e+02 4.842e+02 9.244e+02, threshold=7.725e+02, percent-clipped=2.0 2023-10-09 22:37:41,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2893809.3333333335, ans=0.0 2023-10-09 22:37:42,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2893809.3333333335, ans=0.0 2023-10-09 22:37:48,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2893856.0, ans=0.04949747468305833 2023-10-09 22:38:10,103 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2023-10-09 22:38:11,793 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893902.6666666665, ans=0.1 2023-10-09 22:38:14,131 INFO [train.py:1031] (3/4) Epoch 14, batch 35400, loss[loss=0.2416, simple_loss=0.3024, pruned_loss=0.06621, ctc_loss=0.1208, over 16171.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.3041, pruned_loss=0.0672, ctc_loss=0.1184, over 3314007.71 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:39:12,850 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2894136.0, ans=0.125 2023-10-09 22:39:14,642 INFO [train.py:1031] (3/4) Epoch 14, batch 35450, loss[loss=0.1935, simple_loss=0.2429, pruned_loss=0.05455, ctc_loss=0.08751, over 16711.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.296, pruned_loss=0.06625, ctc_loss=0.1166, over 3307795.88 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:39:17,703 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2894182.6666666665, ans=0.2 2023-10-09 22:39:29,569 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-10-09 22:39:36,532 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+02 3.230e+02 3.810e+02 4.860e+02 8.869e+02, threshold=7.620e+02, percent-clipped=1.0 2023-10-09 22:39:44,002 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2894276.0, ans=0.0 2023-10-09 22:40:11,941 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2894369.3333333335, ans=0.125 2023-10-09 22:40:17,256 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=22.5 2023-10-09 22:40:17,593 INFO [train.py:1031] (3/4) Epoch 14, batch 35500, loss[loss=0.27, simple_loss=0.3251, pruned_loss=0.0802, ctc_loss=0.1363, over 16825.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2928, pruned_loss=0.0669, ctc_loss=0.1172, over 3307534.38 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:40:20,262 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-10-09 22:40:22,722 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2894416.0, ans=0.125 2023-10-09 22:40:44,938 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2894509.3333333335, ans=0.0 2023-10-09 22:41:05,736 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2894556.0, ans=0.1 2023-10-09 22:41:17,472 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.42 vs. limit=6.0 2023-10-09 22:41:20,535 INFO [train.py:1031] (3/4) Epoch 14, batch 35550, loss[loss=0.2455, simple_loss=0.2982, pruned_loss=0.07225, ctc_loss=0.121, over 16783.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2948, pruned_loss=0.06927, ctc_loss=0.1212, over 3308706.99 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:41:34,354 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2894696.0, ans=0.125 2023-10-09 22:41:36,053 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2894696.0, ans=0.2 2023-10-09 22:41:39,639 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2894696.0, ans=0.125 2023-10-09 22:41:42,146 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.738e+02 3.683e+02 4.220e+02 5.051e+02 8.035e+02, threshold=8.441e+02, percent-clipped=1.0 2023-10-09 22:41:45,971 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-10-09 22:41:50,849 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2894742.6666666665, ans=0.125 2023-10-09 22:42:22,017 INFO [train.py:1031] (3/4) Epoch 14, batch 35600, loss[loss=0.2536, simple_loss=0.3129, pruned_loss=0.07172, ctc_loss=0.127, over 16759.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.2956, pruned_loss=0.06964, ctc_loss=0.1214, over 3311764.85 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:42:22,530 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2023-10-09 22:42:23,386 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2894882.6666666665, ans=0.0 2023-10-09 22:42:33,030 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2894929.3333333335, ans=0.2 2023-10-09 22:42:45,823 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2023-10-09 22:43:02,086 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2895022.6666666665, ans=0.0 2023-10-09 22:43:23,149 INFO [train.py:1031] (3/4) Epoch 14, batch 35650, loss[loss=0.2139, simple_loss=0.2827, pruned_loss=0.05381, ctc_loss=0.09361, over 16933.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2918, pruned_loss=0.06561, ctc_loss=0.1149, over 3316951.89 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:43:24,617 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:43:32,269 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2895116.0, ans=0.0 2023-10-09 22:43:39,219 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2895162.6666666665, ans=15.0 2023-10-09 22:43:47,003 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.989e+02 3.692e+02 4.285e+02 1.206e+03, threshold=7.384e+02, percent-clipped=2.0 2023-10-09 22:44:06,169 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=15.0 2023-10-09 22:44:26,166 INFO [train.py:1031] (3/4) Epoch 14, batch 35700, loss[loss=0.2169, simple_loss=0.2736, pruned_loss=0.05936, ctc_loss=0.1037, over 16723.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2959, pruned_loss=0.06628, ctc_loss=0.1164, over 3309150.88 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:44:53,634 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2895442.6666666665, ans=0.125 2023-10-09 22:45:00,694 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2023-10-09 22:45:27,082 INFO [train.py:1031] (3/4) Epoch 14, batch 35750, loss[loss=0.2548, simple_loss=0.3111, pruned_loss=0.07363, ctc_loss=0.128, over 16844.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2942, pruned_loss=0.06748, ctc_loss=0.1182, over 3312892.99 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:45:50,606 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2895676.0, ans=0.125 2023-10-09 22:45:53,024 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.764e+02 4.390e+02 5.354e+02 1.212e+03, threshold=8.781e+02, percent-clipped=8.0 2023-10-09 22:46:11,341 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2895722.6666666665, ans=0.0 2023-10-09 22:46:18,290 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-10-09 22:46:29,804 INFO [train.py:1031] (3/4) Epoch 14, batch 35800, loss[loss=0.2552, simple_loss=0.3063, pruned_loss=0.07661, ctc_loss=0.1269, over 16917.00 frames. ], tot_loss[loss=0.2406, simple_loss=0.2945, pruned_loss=0.06922, ctc_loss=0.1207, over 3321172.29 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:46:59,524 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2895909.3333333335, ans=0.125 2023-10-09 22:47:11,141 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2895956.0, ans=0.0 2023-10-09 22:47:24,985 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2896002.6666666665, ans=0.125 2023-10-09 22:47:31,698 INFO [train.py:1031] (3/4) Epoch 14, batch 35850, loss[loss=0.1962, simple_loss=0.2602, pruned_loss=0.04823, ctc_loss=0.08928, over 16788.00 frames. ], tot_loss[loss=0.242, simple_loss=0.2969, pruned_loss=0.0694, ctc_loss=0.121, over 3306767.01 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:47:33,181 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2896049.3333333335, ans=0.04949747468305833 2023-10-09 22:47:36,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2896049.3333333335, ans=0.125 2023-10-09 22:47:44,754 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2896096.0, ans=0.0 2023-10-09 22:47:57,701 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 3.376e+02 4.105e+02 5.188e+02 8.758e+02, threshold=8.210e+02, percent-clipped=0.0 2023-10-09 22:48:06,862 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2896142.6666666665, ans=0.125 2023-10-09 22:48:19,269 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2023-10-09 22:48:27,362 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2896236.0, ans=0.0 2023-10-09 22:48:32,283 INFO [train.py:1031] (3/4) Epoch 14, batch 35900, loss[loss=0.1627, simple_loss=0.2231, pruned_loss=0.03825, ctc_loss=0.06412, over 16877.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2926, pruned_loss=0.06341, ctc_loss=0.1112, over 3308411.59 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:48:38,741 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2896282.6666666665, ans=0.125 2023-10-09 22:49:16,381 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2023-10-09 22:49:30,160 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2896469.3333333335, ans=0.125 2023-10-09 22:49:36,317 INFO [train.py:1031] (3/4) Epoch 14, batch 35950, loss[loss=0.2525, simple_loss=0.3175, pruned_loss=0.06833, ctc_loss=0.1271, over 16802.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2923, pruned_loss=0.06013, ctc_loss=0.1065, over 3308888.96 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:49:42,528 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2896516.0, ans=0.125 2023-10-09 22:50:04,032 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.666e+02 3.384e+02 4.357e+02 7.839e+02, threshold=6.768e+02, percent-clipped=0.0 2023-10-09 22:50:04,368 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2896609.3333333335, ans=0.125 2023-10-09 22:50:11,151 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2896609.3333333335, ans=0.125 2023-10-09 22:50:38,128 INFO [train.py:1031] (3/4) Epoch 14, batch 36000, loss[loss=0.2495, simple_loss=0.3101, pruned_loss=0.07136, ctc_loss=0.1157, over 17009.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2826, pruned_loss=0.05498, ctc_loss=0.0976, over 3314478.97 frames. ], batch size: 86, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:50:38,128 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 22:50:58,877 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2335, simple_loss=0.304, pruned_loss=0.06295, ctc_loss=0.09275, over 1796401.00 frames. 2023-10-09 22:50:58,878 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 22:51:06,691 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2896749.3333333335, ans=0.125 2023-10-09 22:51:18,105 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2896796.0, ans=0.0 2023-10-09 22:51:31,188 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2023-10-09 22:51:34,141 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2896842.6666666665, ans=0.125 2023-10-09 22:51:37,293 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2896889.3333333335, ans=0.2 2023-10-09 22:51:52,380 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2896936.0, ans=0.125 2023-10-09 22:51:59,941 INFO [train.py:1031] (3/4) Epoch 14, batch 36050, loss[loss=0.2185, simple_loss=0.2474, pruned_loss=0.06971, ctc_loss=0.1256, over 15475.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2805, pruned_loss=0.05675, ctc_loss=0.1001, over 3309305.60 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:52:18,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2897029.3333333335, ans=0.0 2023-10-09 22:52:29,192 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.810e+02 3.555e+02 4.396e+02 7.920e+02, threshold=7.110e+02, percent-clipped=1.0 2023-10-09 22:52:33,349 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2897076.0, ans=0.0 2023-10-09 22:52:49,223 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2897169.3333333335, ans=0.0 2023-10-09 22:53:02,990 INFO [train.py:1031] (3/4) Epoch 14, batch 36100, loss[loss=0.2685, simple_loss=0.3157, pruned_loss=0.08093, ctc_loss=0.1489, over 16756.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2838, pruned_loss=0.06077, ctc_loss=0.1074, over 3308683.83 frames. ], batch size: 353, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:53:05,103 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2897216.0, ans=0.1 2023-10-09 22:53:09,952 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2897216.0, ans=0.125 2023-10-09 22:53:17,808 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2897262.6666666665, ans=0.04949747468305833 2023-10-09 22:53:19,590 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2897262.6666666665, ans=0.0 2023-10-09 22:53:19,591 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2897262.6666666665, ans=0.04949747468305833 2023-10-09 22:53:31,604 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2897309.3333333335, ans=0.125 2023-10-09 22:53:42,336 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2897356.0, ans=0.125 2023-10-09 22:53:45,731 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2897356.0, ans=0.0 2023-10-09 22:53:54,265 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2897402.6666666665, ans=0.125 2023-10-09 22:53:55,267 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2897402.6666666665, ans=0.0 2023-10-09 22:54:06,391 INFO [train.py:1031] (3/4) Epoch 14, batch 36150, loss[loss=0.1879, simple_loss=0.2361, pruned_loss=0.05233, ctc_loss=0.08775, over 16178.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2865, pruned_loss=0.06267, ctc_loss=0.1106, over 3305862.60 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:54:16,849 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-10-09 22:54:36,550 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+02 3.412e+02 4.167e+02 5.128e+02 1.236e+03, threshold=8.334e+02, percent-clipped=3.0 2023-10-09 22:54:47,479 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-10-09 22:54:50,253 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2897589.3333333335, ans=0.125 2023-10-09 22:55:09,652 INFO [train.py:1031] (3/4) Epoch 14, batch 36200, loss[loss=0.1972, simple_loss=0.2704, pruned_loss=0.04509, ctc_loss=0.08449, over 16670.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2922, pruned_loss=0.06519, ctc_loss=0.1156, over 3311388.12 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:55:11,704 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2897682.6666666665, ans=0.125 2023-10-09 22:55:11,747 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2897682.6666666665, ans=0.0 2023-10-09 22:55:20,577 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2897682.6666666665, ans=0.5 2023-10-09 22:55:35,896 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2897776.0, ans=0.2 2023-10-09 22:55:42,399 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2897776.0, ans=0.2 2023-10-09 22:55:52,895 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.63 vs. limit=22.5 2023-10-09 22:56:08,852 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2897869.3333333335, ans=0.125 2023-10-09 22:56:11,657 INFO [train.py:1031] (3/4) Epoch 14, batch 36250, loss[loss=0.1933, simple_loss=0.2538, pruned_loss=0.0492, ctc_loss=0.08612, over 16798.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2974, pruned_loss=0.06533, ctc_loss=0.1176, over 3304124.29 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:56:34,207 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:56:42,282 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.544e+02 3.450e+02 4.069e+02 4.879e+02 1.069e+03, threshold=8.138e+02, percent-clipped=4.0 2023-10-09 22:57:09,564 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2898102.6666666665, ans=0.125 2023-10-09 22:57:13,572 INFO [train.py:1031] (3/4) Epoch 14, batch 36300, loss[loss=0.2203, simple_loss=0.278, pruned_loss=0.06165, ctc_loss=0.09854, over 16817.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2943, pruned_loss=0.0647, ctc_loss=0.1161, over 3307202.45 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:57:19,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2898149.3333333335, ans=0.125 2023-10-09 22:57:46,024 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2898242.6666666665, ans=0.1 2023-10-09 22:57:49,608 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=22.5 2023-10-09 22:57:53,659 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2898289.3333333335, ans=0.05 2023-10-09 22:57:56,326 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:58:02,464 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2898336.0, ans=0.2 2023-10-09 22:58:16,201 INFO [train.py:1031] (3/4) Epoch 14, batch 36350, loss[loss=0.2696, simple_loss=0.3263, pruned_loss=0.08059, ctc_loss=0.1294, over 16791.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2959, pruned_loss=0.06661, ctc_loss=0.1188, over 3306259.51 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:58:43,216 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2898476.0, ans=0.125 2023-10-09 22:58:43,224 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2898476.0, ans=0.125 2023-10-09 22:58:43,230 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:58:48,784 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+02 3.450e+02 4.170e+02 4.968e+02 1.204e+03, threshold=8.340e+02, percent-clipped=3.0 2023-10-09 22:59:04,449 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2898522.6666666665, ans=0.0 2023-10-09 22:59:05,619 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2898569.3333333335, ans=0.2 2023-10-09 22:59:19,336 INFO [train.py:1031] (3/4) Epoch 14, batch 36400, loss[loss=0.1999, simple_loss=0.2553, pruned_loss=0.05332, ctc_loss=0.09458, over 17081.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2947, pruned_loss=0.06805, ctc_loss=0.1205, over 3307449.85 frames. ], batch size: 91, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:59:28,871 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=15.0 2023-10-09 22:59:40,611 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2898662.6666666665, ans=0.1 2023-10-09 22:59:42,600 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2898662.6666666665, ans=0.125 2023-10-09 23:00:01,477 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2898756.0, ans=10.0 2023-10-09 23:00:01,927 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-10-09 23:00:02,493 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2898756.0, ans=0.05 2023-10-09 23:00:11,099 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2898802.6666666665, ans=0.125 2023-10-09 23:00:13,114 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2898802.6666666665, ans=0.0 2023-10-09 23:00:20,143 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2898849.3333333335, ans=0.125 2023-10-09 23:00:20,163 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2898849.3333333335, ans=0.0 2023-10-09 23:00:21,501 INFO [train.py:1031] (3/4) Epoch 14, batch 36450, loss[loss=0.1968, simple_loss=0.2495, pruned_loss=0.05358, ctc_loss=0.09251, over 16793.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2867, pruned_loss=0.06652, ctc_loss=0.1172, over 3305792.67 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:00:54,965 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.085e+02 3.494e+02 4.091e+02 1.458e+03, threshold=6.988e+02, percent-clipped=1.0 2023-10-09 23:01:10,976 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2899036.0, ans=0.125 2023-10-09 23:01:24,211 INFO [train.py:1031] (3/4) Epoch 14, batch 36500, loss[loss=0.2039, simple_loss=0.2515, pruned_loss=0.05764, ctc_loss=0.1026, over 16759.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2794, pruned_loss=0.06521, ctc_loss=0.1145, over 3295789.58 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:01:34,495 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2899082.6666666665, ans=0.0 2023-10-09 23:02:11,201 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2899222.6666666665, ans=0.95 2023-10-09 23:02:23,057 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2899269.3333333335, ans=0.1 2023-10-09 23:02:27,713 INFO [train.py:1031] (3/4) Epoch 14, batch 36550, loss[loss=0.2263, simple_loss=0.301, pruned_loss=0.05613, ctc_loss=0.09822, over 16864.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2787, pruned_loss=0.06367, ctc_loss=0.1119, over 3293893.13 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:02:29,029 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2899316.0, ans=0.125 2023-10-09 23:02:31,229 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2899316.0, ans=0.2 2023-10-09 23:02:34,950 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2899316.0, ans=0.0 2023-10-09 23:02:42,337 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-10-09 23:02:47,106 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2899362.6666666665, ans=0.2 2023-10-09 23:03:01,170 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.250e+02 3.665e+02 4.225e+02 1.129e+03, threshold=7.330e+02, percent-clipped=1.0 2023-10-09 23:03:05,695 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-10-09 23:03:07,087 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2899456.0, ans=0.125 2023-10-09 23:03:16,527 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2899502.6666666665, ans=0.125 2023-10-09 23:03:16,987 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-10-09 23:03:28,788 INFO [train.py:1031] (3/4) Epoch 14, batch 36600, loss[loss=0.188, simple_loss=0.2341, pruned_loss=0.05224, ctc_loss=0.09371, over 16752.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2763, pruned_loss=0.06257, ctc_loss=0.1102, over 3291896.10 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:03:57,730 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2899642.6666666665, ans=0.125 2023-10-09 23:04:07,526 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2899689.3333333335, ans=0.125 2023-10-09 23:04:19,439 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2899736.0, ans=0.125 2023-10-09 23:04:26,064 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2899736.0, ans=0.0 2023-10-09 23:04:27,426 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2023-10-09 23:04:27,972 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2899736.0, ans=0.0 2023-10-09 23:04:30,820 INFO [train.py:1031] (3/4) Epoch 14, batch 36650, loss[loss=0.2052, simple_loss=0.257, pruned_loss=0.05635, ctc_loss=0.1018, over 16761.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2703, pruned_loss=0.06058, ctc_loss=0.1069, over 3291021.94 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:04:33,849 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2899782.6666666665, ans=0.0 2023-10-09 23:04:33,868 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2899782.6666666665, ans=0.125 2023-10-09 23:04:37,884 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-10-09 23:05:06,011 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 3.036e+02 3.411e+02 4.060e+02 1.638e+03, threshold=6.823e+02, percent-clipped=3.0 2023-10-09 23:05:11,797 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-10-09 23:05:23,554 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2899969.3333333335, ans=0.95 2023-10-09 23:05:33,282 INFO [train.py:1031] (3/4) Epoch 14, batch 36700, loss[loss=0.242, simple_loss=0.2645, pruned_loss=0.08083, ctc_loss=0.1447, over 16477.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2654, pruned_loss=0.06074, ctc_loss=0.1072, over 3291504.36 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:05:34,563 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2900016.0, ans=0.1 2023-10-09 23:05:34,590 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900016.0, ans=0.1 2023-10-09 23:05:35,762 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2900016.0, ans=0.04949747468305833 2023-10-09 23:05:40,396 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2900016.0, ans=0.2 2023-10-09 23:06:34,448 INFO [train.py:1031] (3/4) Epoch 14, batch 36750, loss[loss=0.2466, simple_loss=0.2939, pruned_loss=0.07439, ctc_loss=0.126, over 16874.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2674, pruned_loss=0.06252, ctc_loss=0.1101, over 3294611.02 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:06:36,816 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2900249.3333333335, ans=0.0 2023-10-09 23:07:09,706 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+02 3.153e+02 3.511e+02 4.063e+02 5.415e+02, threshold=7.022e+02, percent-clipped=0.0 2023-10-09 23:07:10,006 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2900389.3333333335, ans=0.125 2023-10-09 23:07:20,300 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:07:34,283 INFO [train.py:1031] (3/4) Epoch 14, batch 36800, loss[loss=0.2222, simple_loss=0.3097, pruned_loss=0.04944, ctc_loss=0.08975, over 15216.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2687, pruned_loss=0.06295, ctc_loss=0.1102, over 3288621.84 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:07:34,678 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2900482.6666666665, ans=0.1 2023-10-09 23:07:39,357 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2900482.6666666665, ans=0.125 2023-10-09 23:07:46,220 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2900529.3333333335, ans=0.2 2023-10-09 23:08:06,141 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2900576.0, ans=0.125 2023-10-09 23:08:06,173 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2900576.0, ans=0.125 2023-10-09 23:08:18,727 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2900622.6666666665, ans=0.0 2023-10-09 23:08:35,594 INFO [train.py:1031] (3/4) Epoch 14, batch 36850, loss[loss=0.216, simple_loss=0.2739, pruned_loss=0.0594, ctc_loss=0.09841, over 16863.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.272, pruned_loss=0.06364, ctc_loss=0.1107, over 3298657.65 frames. ], batch size: 141, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:08:36,938 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2900716.0, ans=0.1 2023-10-09 23:08:47,630 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2900762.6666666665, ans=0.0 2023-10-09 23:08:56,548 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-10-09 23:09:02,253 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2900809.3333333335, ans=0.125 2023-10-09 23:09:11,956 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2023-10-09 23:09:16,101 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+02 3.452e+02 4.218e+02 5.067e+02 9.154e+02, threshold=8.437e+02, percent-clipped=6.0 2023-10-09 23:09:22,967 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:09:38,554 INFO [train.py:1031] (3/4) Epoch 14, batch 36900, loss[loss=0.2299, simple_loss=0.2916, pruned_loss=0.06183, ctc_loss=0.1113, over 16815.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2793, pruned_loss=0.06626, ctc_loss=0.1154, over 3300289.88 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:09:42,757 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2900949.3333333335, ans=0.2 2023-10-09 23:09:52,206 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2900996.0, ans=0.125 2023-10-09 23:10:04,806 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2901042.6666666665, ans=0.1 2023-10-09 23:10:17,109 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.07 vs. limit=10.0 2023-10-09 23:10:21,276 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901089.3333333335, ans=0.1 2023-10-09 23:10:34,429 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2901136.0, ans=0.125 2023-10-09 23:10:43,339 INFO [train.py:1031] (3/4) Epoch 14, batch 36950, loss[loss=0.2752, simple_loss=0.3211, pruned_loss=0.0855, ctc_loss=0.1459, over 16576.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2858, pruned_loss=0.06896, ctc_loss=0.12, over 3306757.19 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:11:06,772 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901229.3333333335, ans=0.1 2023-10-09 23:11:12,131 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2901276.0, ans=0.125 2023-10-09 23:11:24,306 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2901322.6666666665, ans=0.125 2023-10-09 23:11:25,048 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+02 3.608e+02 4.056e+02 4.983e+02 1.030e+03, threshold=8.112e+02, percent-clipped=3.0 2023-10-09 23:11:29,352 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2901322.6666666665, ans=0.09899494936611666 2023-10-09 23:11:29,372 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2901322.6666666665, ans=0.5 2023-10-09 23:11:46,857 INFO [train.py:1031] (3/4) Epoch 14, batch 37000, loss[loss=0.2531, simple_loss=0.3101, pruned_loss=0.07274, ctc_loss=0.1263, over 16706.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2933, pruned_loss=0.07022, ctc_loss=0.1222, over 3304022.69 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:11:59,726 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2901462.6666666665, ans=0.125 2023-10-09 23:11:59,735 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2901462.6666666665, ans=0.125 2023-10-09 23:12:04,461 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2901462.6666666665, ans=0.0 2023-10-09 23:12:13,353 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2901509.3333333335, ans=0.0 2023-10-09 23:12:16,088 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:12:49,890 INFO [train.py:1031] (3/4) Epoch 14, batch 37050, loss[loss=0.1979, simple_loss=0.2479, pruned_loss=0.05489, ctc_loss=0.0956, over 16798.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2883, pruned_loss=0.06881, ctc_loss=0.1198, over 3303258.02 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:13:03,252 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-10-09 23:13:05,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2901696.0, ans=0.125 2023-10-09 23:13:31,556 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.263e+02 3.806e+02 4.315e+02 8.340e+02, threshold=7.611e+02, percent-clipped=1.0 2023-10-09 23:13:51,985 INFO [train.py:1031] (3/4) Epoch 14, batch 37100, loss[loss=0.2252, simple_loss=0.2709, pruned_loss=0.06576, ctc_loss=0.1199, over 16787.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2805, pruned_loss=0.06667, ctc_loss=0.1161, over 3311681.16 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:13:53,354 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2901882.6666666665, ans=0.0 2023-10-09 23:13:54,732 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.58 vs. limit=10.0 2023-10-09 23:14:52,736 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-10-09 23:14:53,064 INFO [train.py:1031] (3/4) Epoch 14, batch 37150, loss[loss=0.2194, simple_loss=0.2679, pruned_loss=0.06291, ctc_loss=0.1125, over 16811.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2738, pruned_loss=0.06498, ctc_loss=0.1132, over 3309982.95 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:14:53,657 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.56 vs. limit=6.0 2023-10-09 23:14:59,321 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2902116.0, ans=0.5 2023-10-09 23:15:07,807 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2902162.6666666665, ans=0.07 2023-10-09 23:15:12,123 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2023-10-09 23:15:31,196 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2902256.0, ans=0.125 2023-10-09 23:15:34,534 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 3.053e+02 3.584e+02 4.083e+02 7.481e+02, threshold=7.169e+02, percent-clipped=0.0 2023-10-09 23:15:38,538 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2902256.0, ans=0.04949747468305833 2023-10-09 23:15:41,289 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2902302.6666666665, ans=0.5 2023-10-09 23:15:43,425 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:15:49,343 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2902302.6666666665, ans=0.125 2023-10-09 23:15:54,258 INFO [train.py:1031] (3/4) Epoch 14, batch 37200, loss[loss=0.2463, simple_loss=0.3247, pruned_loss=0.06132, ctc_loss=0.1132, over 16832.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2779, pruned_loss=0.06384, ctc_loss=0.1116, over 3314357.05 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:16:11,937 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2902396.0, ans=0.0 2023-10-09 23:16:53,250 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2902582.6666666665, ans=0.0 2023-10-09 23:16:53,922 INFO [train.py:1031] (3/4) Epoch 14, batch 37250, loss[loss=0.2113, simple_loss=0.2863, pruned_loss=0.0496, ctc_loss=0.09298, over 16371.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2818, pruned_loss=0.06194, ctc_loss=0.1089, over 3308736.25 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:17:07,616 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2902629.3333333335, ans=0.0 2023-10-09 23:17:09,596 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2902629.3333333335, ans=0.2 2023-10-09 23:17:18,382 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2902676.0, ans=0.125 2023-10-09 23:17:23,677 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.02 vs. limit=6.0 2023-10-09 23:17:33,590 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2902722.6666666665, ans=0.1 2023-10-09 23:17:36,500 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 2.936e+02 3.384e+02 3.917e+02 6.225e+02, threshold=6.767e+02, percent-clipped=0.0 2023-10-09 23:17:41,607 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:17:46,484 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2902769.3333333335, ans=0.0 2023-10-09 23:17:46,529 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2902769.3333333335, ans=0.0 2023-10-09 23:17:52,462 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2902769.3333333335, ans=0.0 2023-10-09 23:17:54,217 INFO [train.py:1031] (3/4) Epoch 14, batch 37300, loss[loss=0.2463, simple_loss=0.2989, pruned_loss=0.07103, ctc_loss=0.1294, over 16822.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2795, pruned_loss=0.06068, ctc_loss=0.1065, over 3314286.80 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:17:55,531 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2902816.0, ans=0.125 2023-10-09 23:18:01,199 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=22.5 2023-10-09 23:18:31,097 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2023-10-09 23:18:32,795 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2902956.0, ans=0.0 2023-10-09 23:18:41,571 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2902956.0, ans=0.125 2023-10-09 23:18:44,254 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2903002.6666666665, ans=0.125 2023-10-09 23:18:46,340 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2903002.6666666665, ans=0.0 2023-10-09 23:18:55,640 INFO [train.py:1031] (3/4) Epoch 14, batch 37350, loss[loss=0.2332, simple_loss=0.2851, pruned_loss=0.06835, ctc_loss=0.1118, over 16916.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2801, pruned_loss=0.05906, ctc_loss=0.1035, over 3311725.33 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:19:06,871 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2903096.0, ans=0.0 2023-10-09 23:19:09,926 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2903096.0, ans=0.0 2023-10-09 23:19:12,682 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2903096.0, ans=0.0 2023-10-09 23:19:14,833 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2903096.0, ans=0.125 2023-10-09 23:19:32,659 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=22.5 2023-10-09 23:19:38,025 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 2.949e+02 3.528e+02 4.105e+02 1.147e+03, threshold=7.057e+02, percent-clipped=0.0 2023-10-09 23:19:44,264 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2903236.0, ans=0.125 2023-10-09 23:19:54,562 INFO [train.py:1031] (3/4) Epoch 14, batch 37400, loss[loss=0.2237, simple_loss=0.2721, pruned_loss=0.06624, ctc_loss=0.1071, over 16917.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.276, pruned_loss=0.05931, ctc_loss=0.1038, over 3305104.71 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:20:32,007 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2903422.6666666665, ans=0.125 2023-10-09 23:20:47,127 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2903469.3333333335, ans=0.0 2023-10-09 23:20:52,083 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2903469.3333333335, ans=0.0 2023-10-09 23:20:55,596 INFO [train.py:1031] (3/4) Epoch 14, batch 37450, loss[loss=0.2317, simple_loss=0.3012, pruned_loss=0.05859, ctc_loss=0.1125, over 16868.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2739, pruned_loss=0.05814, ctc_loss=0.1018, over 3279031.97 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:20:59,040 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2023-10-09 23:21:29,844 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2903609.3333333335, ans=0.125 2023-10-09 23:21:41,545 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 2.982e+02 3.878e+02 4.493e+02 7.805e+02, threshold=7.755e+02, percent-clipped=2.0 2023-10-09 23:21:53,416 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2903702.6666666665, ans=0.2 2023-10-09 23:21:58,569 INFO [train.py:1031] (3/4) Epoch 14, batch 37500, loss[loss=0.2403, simple_loss=0.3251, pruned_loss=0.05665, ctc_loss=0.1055, over 15198.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2793, pruned_loss=0.06044, ctc_loss=0.106, over 3273717.04 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:22:03,076 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2903749.3333333335, ans=0.125 2023-10-09 23:22:28,190 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2903842.6666666665, ans=0.0 2023-10-09 23:22:31,407 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2903842.6666666665, ans=0.0 2023-10-09 23:22:34,661 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2903889.3333333335, ans=0.125 2023-10-09 23:22:41,836 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=2903889.3333333335, ans=0.025 2023-10-09 23:22:56,502 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2903936.0, ans=0.0 2023-10-09 23:22:59,331 INFO [train.py:1031] (3/4) Epoch 14, batch 37550, loss[loss=0.2269, simple_loss=0.274, pruned_loss=0.06574, ctc_loss=0.1207, over 16484.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2782, pruned_loss=0.05777, ctc_loss=0.1021, over 3280888.14 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:23:03,778 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2903982.6666666665, ans=0.125 2023-10-09 23:23:41,416 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-10-09 23:23:46,183 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.897e+02 3.320e+02 4.036e+02 7.809e+02, threshold=6.640e+02, percent-clipped=1.0 2023-10-09 23:23:53,114 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-10-09 23:24:00,647 INFO [train.py:1031] (3/4) Epoch 14, batch 37600, loss[loss=0.2186, simple_loss=0.2589, pruned_loss=0.06716, ctc_loss=0.1102, over 16802.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2735, pruned_loss=0.0585, ctc_loss=0.103, over 3285113.38 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:24:05,467 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=22.5 2023-10-09 23:24:22,802 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-10-09 23:24:22,866 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=15.0 2023-10-09 23:24:26,509 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2904309.3333333335, ans=0.0 2023-10-09 23:24:44,526 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.19 vs. limit=15.0 2023-10-09 23:24:51,989 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-10-09 23:24:59,293 INFO [train.py:1031] (3/4) Epoch 14, batch 37650, loss[loss=0.2104, simple_loss=0.2535, pruned_loss=0.06118, ctc_loss=0.112, over 16279.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2732, pruned_loss=0.06019, ctc_loss=0.1052, over 3282414.27 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:25:14,009 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-10-09 23:25:48,255 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+02 3.454e+02 4.118e+02 4.727e+02 1.151e+03, threshold=8.236e+02, percent-clipped=7.0 2023-10-09 23:26:01,811 INFO [train.py:1031] (3/4) Epoch 14, batch 37700, loss[loss=0.1997, simple_loss=0.2818, pruned_loss=0.04262, ctc_loss=0.08114, over 16878.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2759, pruned_loss=0.05981, ctc_loss=0.1049, over 3288653.57 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:26:19,908 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2904729.3333333335, ans=15.0 2023-10-09 23:26:38,999 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2904822.6666666665, ans=0.1 2023-10-09 23:26:42,140 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-10-09 23:26:43,055 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=22.5 2023-10-09 23:26:58,924 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2904869.3333333335, ans=0.125 2023-10-09 23:27:01,539 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2904869.3333333335, ans=0.125 2023-10-09 23:27:05,161 INFO [train.py:1031] (3/4) Epoch 14, batch 37750, loss[loss=0.2307, simple_loss=0.296, pruned_loss=0.06057, ctc_loss=0.1109, over 16890.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2764, pruned_loss=0.05634, ctc_loss=0.09998, over 3294893.08 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:27:06,516 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2904916.0, ans=0.125 2023-10-09 23:27:08,019 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-10-09 23:27:11,393 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2904916.0, ans=0.1 2023-10-09 23:27:38,054 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2905009.3333333335, ans=0.0 2023-10-09 23:27:56,207 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.858e+02 3.601e+02 4.405e+02 1.102e+03, threshold=7.202e+02, percent-clipped=1.0 2023-10-09 23:28:07,549 INFO [train.py:1031] (3/4) Epoch 14, batch 37800, loss[loss=0.2408, simple_loss=0.3339, pruned_loss=0.05464, ctc_loss=0.09601, over 15171.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2837, pruned_loss=0.05808, ctc_loss=0.1035, over 3298621.76 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:28:07,980 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2905149.3333333335, ans=0.125 2023-10-09 23:28:19,062 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2905149.3333333335, ans=0.125 2023-10-09 23:28:24,323 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2905196.0, ans=0.07 2023-10-09 23:28:29,375 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2905196.0, ans=0.125 2023-10-09 23:28:33,689 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2905242.6666666665, ans=0.125 2023-10-09 23:28:43,707 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2905289.3333333335, ans=0.125 2023-10-09 23:28:47,459 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2905289.3333333335, ans=0.1 2023-10-09 23:29:07,936 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2905382.6666666665, ans=0.0 2023-10-09 23:29:08,316 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2023-10-09 23:29:08,607 INFO [train.py:1031] (3/4) Epoch 14, batch 37850, loss[loss=0.1979, simple_loss=0.2832, pruned_loss=0.04164, ctc_loss=0.07313, over 16878.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.288, pruned_loss=0.05679, ctc_loss=0.1016, over 3289405.94 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:29:20,716 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2905429.3333333335, ans=0.125 2023-10-09 23:29:35,515 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2905476.0, ans=0.2 2023-10-09 23:29:39,398 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2905476.0, ans=0.125 2023-10-09 23:29:45,936 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2905522.6666666665, ans=0.2 2023-10-09 23:30:00,964 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 3.190e+02 3.752e+02 4.348e+02 7.334e+02, threshold=7.503e+02, percent-clipped=1.0 2023-10-09 23:30:13,341 INFO [train.py:1031] (3/4) Epoch 14, batch 37900, loss[loss=0.2739, simple_loss=0.3161, pruned_loss=0.08626, ctc_loss=0.1481, over 16511.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2904, pruned_loss=0.05978, ctc_loss=0.1065, over 3294458.26 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:30:14,758 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2905616.0, ans=0.2 2023-10-09 23:30:29,955 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2905662.6666666665, ans=0.0 2023-10-09 23:30:32,090 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2905662.6666666665, ans=0.2 2023-10-09 23:31:03,413 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2023-10-09 23:31:11,759 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2905802.6666666665, ans=0.2 2023-10-09 23:31:13,597 INFO [train.py:1031] (3/4) Epoch 14, batch 37950, loss[loss=0.2078, simple_loss=0.2419, pruned_loss=0.06396, ctc_loss=0.1143, over 16162.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2905, pruned_loss=0.06228, ctc_loss=0.1102, over 3293377.12 frames. ], batch size: 466, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:31:43,954 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2023-10-09 23:31:53,051 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2905989.3333333335, ans=0.0 2023-10-09 23:32:02,352 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2906036.0, ans=0.1 2023-10-09 23:32:05,288 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.279e+02 3.863e+02 4.623e+02 8.979e+02, threshold=7.726e+02, percent-clipped=3.0 2023-10-09 23:32:13,046 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2906036.0, ans=0.0 2023-10-09 23:32:15,504 INFO [train.py:1031] (3/4) Epoch 14, batch 38000, loss[loss=0.2299, simple_loss=0.2831, pruned_loss=0.06649, ctc_loss=0.1092, over 16962.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2838, pruned_loss=0.06269, ctc_loss=0.1106, over 3296776.53 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:32:25,387 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2023-10-09 23:32:37,868 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2023-10-09 23:32:59,684 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2906222.6666666665, ans=0.0 2023-10-09 23:33:05,713 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:33:16,626 INFO [train.py:1031] (3/4) Epoch 14, batch 38050, loss[loss=0.2836, simple_loss=0.3185, pruned_loss=0.09123, ctc_loss=0.1654, over 16589.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2818, pruned_loss=0.06302, ctc_loss=0.1108, over 3298961.38 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:33:16,939 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2906316.0, ans=0.0 2023-10-09 23:33:25,459 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.84 vs. limit=15.0 2023-10-09 23:33:40,252 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2906409.3333333335, ans=0.0 2023-10-09 23:33:46,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2906409.3333333335, ans=0.125 2023-10-09 23:33:49,644 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-10-09 23:34:03,870 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2906456.0, ans=0.1 2023-10-09 23:34:10,170 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.275e+02 3.696e+02 4.420e+02 6.425e+02, threshold=7.391e+02, percent-clipped=0.0 2023-10-09 23:34:18,441 INFO [train.py:1031] (3/4) Epoch 14, batch 38100, loss[loss=0.2486, simple_loss=0.3033, pruned_loss=0.07158, ctc_loss=0.1268, over 16778.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2852, pruned_loss=0.06461, ctc_loss=0.1132, over 3309345.31 frames. ], batch size: 308, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:34:21,495 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2906549.3333333335, ans=0.125 2023-10-09 23:34:22,518 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2906549.3333333335, ans=0.1 2023-10-09 23:34:25,648 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2023-10-09 23:34:43,325 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=22.5 2023-10-09 23:34:46,381 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2906642.6666666665, ans=0.125 2023-10-09 23:34:46,389 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2906642.6666666665, ans=0.0 2023-10-09 23:35:23,913 INFO [train.py:1031] (3/4) Epoch 14, batch 38150, loss[loss=0.2265, simple_loss=0.2908, pruned_loss=0.06139, ctc_loss=0.09845, over 16741.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.2984, pruned_loss=0.06821, ctc_loss=0.1207, over 3302334.30 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:35:48,382 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2906829.3333333335, ans=0.125 2023-10-09 23:35:52,082 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2906876.0, ans=0.0 2023-10-09 23:35:56,896 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2906876.0, ans=0.0 2023-10-09 23:35:56,900 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2906876.0, ans=0.2 2023-10-09 23:35:58,816 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.45 vs. limit=6.0 2023-10-09 23:36:18,294 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2906969.3333333335, ans=0.2 2023-10-09 23:36:22,910 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.043e+02 3.850e+02 4.496e+02 5.551e+02 1.259e+03, threshold=8.992e+02, percent-clipped=8.0 2023-10-09 23:36:24,374 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2906969.3333333335, ans=0.0 2023-10-09 23:36:29,637 INFO [train.py:1031] (3/4) Epoch 14, batch 38200, loss[loss=0.2431, simple_loss=0.3086, pruned_loss=0.0659, ctc_loss=0.1143, over 16892.00 frames. ], tot_loss[loss=0.2461, simple_loss=0.3029, pruned_loss=0.06989, ctc_loss=0.1238, over 3293908.19 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:36:30,575 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2907016.0, ans=0.125 2023-10-09 23:36:35,095 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2907016.0, ans=0.125 2023-10-09 23:36:35,268 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-10-09 23:36:52,495 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-10-09 23:37:14,655 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2907156.0, ans=0.1 2023-10-09 23:37:33,239 INFO [train.py:1031] (3/4) Epoch 14, batch 38250, loss[loss=0.2805, simple_loss=0.3267, pruned_loss=0.08643, ctc_loss=0.1534, over 16754.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.3031, pruned_loss=0.06825, ctc_loss=0.1214, over 3302291.16 frames. ], batch size: 384, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:38:03,296 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2907342.6666666665, ans=0.0 2023-10-09 23:38:29,117 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.325e+02 3.787e+02 4.498e+02 1.020e+03, threshold=7.574e+02, percent-clipped=1.0 2023-10-09 23:38:34,792 INFO [train.py:1031] (3/4) Epoch 14, batch 38300, loss[loss=0.2589, simple_loss=0.2986, pruned_loss=0.08025, ctc_loss=0.1466, over 16842.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.2981, pruned_loss=0.06761, ctc_loss=0.12, over 3304524.11 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:39:18,665 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=2907622.6666666665, ans=0.1 2023-10-09 23:39:22,576 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2907622.6666666665, ans=0.125 2023-10-09 23:39:27,548 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2907669.3333333335, ans=0.0 2023-10-09 23:39:36,987 INFO [train.py:1031] (3/4) Epoch 14, batch 38350, loss[loss=0.2347, simple_loss=0.3046, pruned_loss=0.0614, ctc_loss=0.1049, over 16855.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.3034, pruned_loss=0.06918, ctc_loss=0.1225, over 3313732.87 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:39:41,034 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-10-09 23:39:43,905 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2907716.0, ans=0.0 2023-10-09 23:39:50,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2907762.6666666665, ans=0.0 2023-10-09 23:39:51,767 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2907762.6666666665, ans=0.0 2023-10-09 23:40:05,973 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2023-10-09 23:40:19,690 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2907856.0, ans=0.1 2023-10-09 23:40:26,120 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2907856.0, ans=0.0 2023-10-09 23:40:35,688 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.571e+02 4.561e+02 5.514e+02 1.040e+03, threshold=9.121e+02, percent-clipped=3.0 2023-10-09 23:40:41,066 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2023-10-09 23:40:41,320 INFO [train.py:1031] (3/4) Epoch 14, batch 38400, loss[loss=0.2505, simple_loss=0.311, pruned_loss=0.07008, ctc_loss=0.1248, over 16848.00 frames. ], tot_loss[loss=0.2525, simple_loss=0.3099, pruned_loss=0.07207, ctc_loss=0.1276, over 3308281.58 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:40:50,910 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=22.5 2023-10-09 23:41:03,522 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2023-10-09 23:41:04,389 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-10-09 23:41:07,385 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2908042.6666666665, ans=0.125 2023-10-09 23:41:35,067 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2908136.0, ans=0.09899494936611666 2023-10-09 23:41:36,086 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2908136.0, ans=0.125 2023-10-09 23:41:44,679 INFO [train.py:1031] (3/4) Epoch 14, batch 38450, loss[loss=0.2131, simple_loss=0.2805, pruned_loss=0.05545, ctc_loss=0.08675, over 16713.00 frames. ], tot_loss[loss=0.2504, simple_loss=0.3078, pruned_loss=0.07134, ctc_loss=0.126, over 3307023.26 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:41:50,637 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2908182.6666666665, ans=0.05 2023-10-09 23:41:52,765 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2908182.6666666665, ans=0.0 2023-10-09 23:42:23,877 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2908322.6666666665, ans=0.125 2023-10-09 23:42:40,140 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2908369.3333333335, ans=0.0 2023-10-09 23:42:42,961 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.189e+02 3.714e+02 4.508e+02 1.225e+03, threshold=7.428e+02, percent-clipped=2.0 2023-10-09 23:42:47,065 INFO [train.py:1031] (3/4) Epoch 14, batch 38500, loss[loss=0.2605, simple_loss=0.3177, pruned_loss=0.07499, ctc_loss=0.1331, over 16737.00 frames. ], tot_loss[loss=0.2481, simple_loss=0.3065, pruned_loss=0.07008, ctc_loss=0.1239, over 3302125.86 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:42:49,052 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2908416.0, ans=0.07 2023-10-09 23:43:00,848 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2908462.6666666665, ans=0.1 2023-10-09 23:43:16,846 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2908509.3333333335, ans=0.0 2023-10-09 23:43:26,835 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2908556.0, ans=0.1 2023-10-09 23:43:33,432 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2908556.0, ans=0.125 2023-10-09 23:43:41,331 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2908602.6666666665, ans=0.125 2023-10-09 23:43:49,235 INFO [train.py:1031] (3/4) Epoch 14, batch 38550, loss[loss=0.2492, simple_loss=0.2904, pruned_loss=0.07924, ctc_loss=0.1239, over 16717.00 frames. ], tot_loss[loss=0.2484, simple_loss=0.3045, pruned_loss=0.07106, ctc_loss=0.1255, over 3303527.33 frames. ], batch size: 111, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:43:49,835 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.55 vs. limit=6.0 2023-10-09 23:43:52,713 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2908649.3333333335, ans=0.04949747468305833 2023-10-09 23:43:58,446 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:44:05,024 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2908696.0, ans=10.0 2023-10-09 23:44:12,087 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2023-10-09 23:44:37,719 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=12.0 2023-10-09 23:44:48,216 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+02 3.231e+02 3.757e+02 4.445e+02 8.253e+02, threshold=7.513e+02, percent-clipped=2.0 2023-10-09 23:44:49,816 INFO [train.py:1031] (3/4) Epoch 14, batch 38600, loss[loss=0.2193, simple_loss=0.2673, pruned_loss=0.06285, ctc_loss=0.1139, over 16933.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.298, pruned_loss=0.07031, ctc_loss=0.124, over 3307482.83 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:45:10,839 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2908929.3333333335, ans=0.125 2023-10-09 23:45:11,834 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:45:22,824 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.33 vs. limit=10.0 2023-10-09 23:45:32,250 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2909022.6666666665, ans=0.125 2023-10-09 23:45:35,861 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:45:51,543 INFO [train.py:1031] (3/4) Epoch 14, batch 38650, loss[loss=0.2451, simple_loss=0.2889, pruned_loss=0.07302, ctc_loss=0.1381, over 16858.00 frames. ], tot_loss[loss=0.2405, simple_loss=0.2924, pruned_loss=0.0697, ctc_loss=0.1228, over 3302924.65 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:46:00,057 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2909116.0, ans=0.1 2023-10-09 23:46:25,512 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2909209.3333333335, ans=0.0 2023-10-09 23:46:39,051 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-10-09 23:46:40,933 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2909302.6666666665, ans=0.95 2023-10-09 23:46:48,386 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2909302.6666666665, ans=0.125 2023-10-09 23:46:54,839 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.287e+02 3.697e+02 4.587e+02 9.347e+02, threshold=7.394e+02, percent-clipped=1.0 2023-10-09 23:46:54,867 INFO [train.py:1031] (3/4) Epoch 14, batch 38700, loss[loss=0.2243, simple_loss=0.2812, pruned_loss=0.06248, ctc_loss=0.1061, over 16994.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.2912, pruned_loss=0.06994, ctc_loss=0.1231, over 3310818.32 frames. ], batch size: 243, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:47:10,217 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2909396.0, ans=0.125 2023-10-09 23:47:10,228 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2909396.0, ans=0.025 2023-10-09 23:47:19,847 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2909442.6666666665, ans=0.2 2023-10-09 23:47:23,126 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2909442.6666666665, ans=0.125 2023-10-09 23:47:27,759 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2909442.6666666665, ans=0.0 2023-10-09 23:47:28,785 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2909442.6666666665, ans=0.125 2023-10-09 23:47:30,389 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2909442.6666666665, ans=0.125 2023-10-09 23:47:31,405 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2909442.6666666665, ans=0.125 2023-10-09 23:47:38,365 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:47:50,490 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2909536.0, ans=0.125 2023-10-09 23:47:58,600 INFO [train.py:1031] (3/4) Epoch 14, batch 38750, loss[loss=0.2142, simple_loss=0.2811, pruned_loss=0.05473, ctc_loss=0.09443, over 16844.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2912, pruned_loss=0.06877, ctc_loss=0.1212, over 3306980.51 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:48:30,237 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2023-10-09 23:48:48,602 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2023-10-09 23:48:52,134 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2909769.3333333335, ans=0.1 2023-10-09 23:48:53,355 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2909769.3333333335, ans=0.125 2023-10-09 23:48:57,209 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2909769.3333333335, ans=0.1 2023-10-09 23:49:02,386 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 3.418e+02 4.112e+02 5.352e+02 1.046e+03, threshold=8.225e+02, percent-clipped=4.0 2023-10-09 23:49:02,414 INFO [train.py:1031] (3/4) Epoch 14, batch 38800, loss[loss=0.258, simple_loss=0.3404, pruned_loss=0.06569, ctc_loss=0.1107, over 16885.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2954, pruned_loss=0.06584, ctc_loss=0.1166, over 3305682.49 frames. ], batch size: 243, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:49:11,357 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2909816.0, ans=0.0 2023-10-09 23:49:23,759 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2909862.6666666665, ans=0.0 2023-10-09 23:49:30,841 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2909909.3333333335, ans=0.2 2023-10-09 23:49:41,548 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2909956.0, ans=0.125 2023-10-09 23:50:04,803 INFO [train.py:1031] (3/4) Epoch 14, batch 38850, loss[loss=0.2353, simple_loss=0.2927, pruned_loss=0.06476, ctc_loss=0.1208, over 16711.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2988, pruned_loss=0.06517, ctc_loss=0.1161, over 3307389.80 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:50:15,642 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2910049.3333333335, ans=0.125 2023-10-09 23:50:16,675 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2910096.0, ans=0.1 2023-10-09 23:50:17,058 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=22.5 2023-10-09 23:50:28,229 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2910096.0, ans=0.125 2023-10-09 23:50:45,751 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2910189.3333333335, ans=0.2 2023-10-09 23:50:55,150 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2910236.0, ans=0.125 2023-10-09 23:51:06,339 INFO [train.py:1031] (3/4) Epoch 14, batch 38900, loss[loss=0.2264, simple_loss=0.2838, pruned_loss=0.06366, ctc_loss=0.1041, over 16791.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2943, pruned_loss=0.06539, ctc_loss=0.1164, over 3309219.60 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:51:07,979 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+02 3.456e+02 4.311e+02 5.586e+02 1.002e+03, threshold=8.621e+02, percent-clipped=2.0 2023-10-09 23:51:11,016 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2910282.6666666665, ans=0.1 2023-10-09 23:51:29,646 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2910329.3333333335, ans=0.0 2023-10-09 23:51:39,345 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2910376.0, ans=0.07 2023-10-09 23:51:44,963 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2910422.6666666665, ans=0.125 2023-10-09 23:52:08,610 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2910516.0, ans=0.125 2023-10-09 23:52:09,369 INFO [train.py:1031] (3/4) Epoch 14, batch 38950, loss[loss=0.2245, simple_loss=0.2774, pruned_loss=0.06258, ctc_loss=0.1161, over 16852.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2902, pruned_loss=0.06573, ctc_loss=0.1163, over 3299683.36 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:52:37,099 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2910609.3333333335, ans=0.0 2023-10-09 23:53:01,410 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=22.5 2023-10-09 23:53:14,603 INFO [train.py:1031] (3/4) Epoch 14, batch 39000, loss[loss=0.2642, simple_loss=0.3328, pruned_loss=0.07337, ctc_loss=0.1219, over 16798.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2897, pruned_loss=0.06592, ctc_loss=0.1164, over 3301826.45 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:53:14,603 INFO [train.py:1054] (3/4) Computing validation loss 2023-10-09 23:53:27,780 INFO [zipformer.py:1853] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6064, 2.2806, 4.7695, 2.4072], device='cuda:3') 2023-10-09 23:53:33,417 INFO [train.py:1063] (3/4) Epoch 14, validation: loss=0.2363, simple_loss=0.3035, pruned_loss=0.06558, ctc_loss=0.09478, over 1796401.00 frames. 2023-10-09 23:53:33,418 INFO [train.py:1064] (3/4) Maximum memory allocated so far is 14573MB 2023-10-09 23:53:34,791 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2910749.3333333335, ans=0.025 2023-10-09 23:53:35,524 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+02 3.267e+02 3.662e+02 4.475e+02 7.642e+02, threshold=7.323e+02, percent-clipped=0.0 2023-10-09 23:53:51,009 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2910796.0, ans=0.0 2023-10-09 23:53:57,880 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2910842.6666666665, ans=0.0 2023-10-09 23:54:01,496 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2910842.6666666665, ans=0.025 2023-10-09 23:54:07,455 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2910842.6666666665, ans=0.125 2023-10-09 23:54:11,709 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2910889.3333333335, ans=0.125 2023-10-09 23:54:11,732 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2910889.3333333335, ans=0.1 2023-10-09 23:54:18,346 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2910889.3333333335, ans=0.0 2023-10-09 23:54:34,011 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2910982.6666666665, ans=0.0 2023-10-09 23:54:34,747 INFO [train.py:1031] (3/4) Epoch 14, batch 39050, loss[loss=0.2284, simple_loss=0.2831, pruned_loss=0.06451, ctc_loss=0.1118, over 16852.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2898, pruned_loss=0.06779, ctc_loss=0.119, over 3298671.94 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:55:29,012 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2911169.3333333335, ans=0.125 2023-10-09 23:55:33,299 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2911169.3333333335, ans=0.125 2023-10-09 23:55:34,240 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2911216.0, ans=0.125 2023-10-09 23:55:35,626 INFO [train.py:1031] (3/4) Epoch 14, batch 39100, loss[loss=0.203, simple_loss=0.2498, pruned_loss=0.05796, ctc_loss=0.1005, over 16803.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2818, pruned_loss=0.06675, ctc_loss=0.1173, over 3297030.63 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:55:39,886 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.243e+02 3.619e+02 4.225e+02 8.592e+02, threshold=7.239e+02, percent-clipped=2.0 2023-10-09 23:55:55,975 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2911262.6666666665, ans=0.125 2023-10-09 23:56:37,981 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2911402.6666666665, ans=0.1 2023-10-09 23:56:39,874 INFO [train.py:1031] (3/4) Epoch 14, batch 39150, loss[loss=0.2281, simple_loss=0.2677, pruned_loss=0.06983, ctc_loss=0.1222, over 16753.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2854, pruned_loss=0.06693, ctc_loss=0.1175, over 3295552.53 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:56:47,847 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.05 vs. limit=10.0 2023-10-09 23:56:59,884 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2911496.0, ans=0.0 2023-10-09 23:57:00,921 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2911496.0, ans=0.125 2023-10-09 23:57:44,727 INFO [train.py:1031] (3/4) Epoch 14, batch 39200, loss[loss=0.2037, simple_loss=0.272, pruned_loss=0.05018, ctc_loss=0.08763, over 16684.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2929, pruned_loss=0.06654, ctc_loss=0.1173, over 3293825.94 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:57:49,010 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+02 3.956e+02 5.075e+02 6.707e+02 1.311e+03, threshold=1.015e+03, percent-clipped=19.0 2023-10-09 23:57:56,104 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=15.0 2023-10-09 23:58:03,188 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=22.5 2023-10-09 23:58:16,164 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2911776.0, ans=0.1 2023-10-09 23:58:21,370 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-10-09 23:58:23,683 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2911822.6666666665, ans=0.125 2023-10-09 23:58:44,665 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2911869.3333333335, ans=0.0 2023-10-09 23:58:47,250 INFO [train.py:1031] (3/4) Epoch 14, batch 39250, loss[loss=0.1859, simple_loss=0.2354, pruned_loss=0.05179, ctc_loss=0.08223, over 16788.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2937, pruned_loss=0.06593, ctc_loss=0.1148, over 3297635.62 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:59:03,837 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=22.5 2023-10-09 23:59:21,774 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2912009.3333333335, ans=0.125 2023-10-09 23:59:46,177 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2912102.6666666665, ans=0.0 2023-10-09 23:59:47,310 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2912102.6666666665, ans=0.125 2023-10-09 23:59:50,348 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2912102.6666666665, ans=0.0 2023-10-09 23:59:53,292 INFO [train.py:1031] (3/4) Epoch 14, batch 39300, loss[loss=0.2186, simple_loss=0.2929, pruned_loss=0.05298, ctc_loss=0.09591, over 16838.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2891, pruned_loss=0.06327, ctc_loss=0.1095, over 3299042.79 frames. ], batch size: 291, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:59:54,045 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=12.0 2023-10-10 00:00:00,604 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+02 3.228e+02 3.774e+02 4.947e+02 8.395e+02, threshold=7.547e+02, percent-clipped=0.0 2023-10-10 00:00:04,797 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-10-10 00:00:06,956 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=12.0 2023-10-10 00:00:08,284 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=15.0 2023-10-10 00:00:12,040 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2023-10-10 00:00:21,749 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2912242.6666666665, ans=0.125 2023-10-10 00:00:57,814 INFO [train.py:1031] (3/4) Epoch 14, batch 39350, loss[loss=0.2231, simple_loss=0.309, pruned_loss=0.04957, ctc_loss=0.09527, over 16786.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2872, pruned_loss=0.06049, ctc_loss=0.1051, over 3293969.94 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:00:59,753 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2912382.6666666665, ans=0.125 2023-10-10 00:01:10,103 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2912429.3333333335, ans=0.0 2023-10-10 00:01:23,676 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2912476.0, ans=0.07 2023-10-10 00:01:28,618 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2912476.0, ans=0.125 2023-10-10 00:01:45,293 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2912522.6666666665, ans=0.1 2023-10-10 00:01:59,520 INFO [train.py:1031] (3/4) Epoch 14, batch 39400, loss[loss=0.2158, simple_loss=0.2703, pruned_loss=0.05924, ctc_loss=0.1071, over 15248.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2865, pruned_loss=0.05982, ctc_loss=0.1044, over 3292170.34 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:01:59,870 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2912616.0, ans=0.125 2023-10-10 00:02:03,069 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:02:06,854 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.092e+02 4.162e+02 5.159e+02 1.181e+03, threshold=8.323e+02, percent-clipped=5.0 2023-10-10 00:02:12,551 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2912662.6666666665, ans=0.0 2023-10-10 00:02:19,048 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912662.6666666665, ans=0.1 2023-10-10 00:02:27,680 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2912709.3333333335, ans=0.125 2023-10-10 00:02:27,771 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2912709.3333333335, ans=0.125 2023-10-10 00:02:38,906 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2912756.0, ans=0.125 2023-10-10 00:02:42,180 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2912756.0, ans=0.05 2023-10-10 00:02:50,325 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2912802.6666666665, ans=0.125 2023-10-10 00:02:59,395 INFO [train.py:1031] (3/4) Epoch 14, batch 39450, loss[loss=0.2161, simple_loss=0.2746, pruned_loss=0.05736, ctc_loss=0.1072, over 16346.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2808, pruned_loss=0.05949, ctc_loss=0.104, over 3280334.03 frames. ], batch size: 416, lr: 2.51e-03, grad_scale: 1.0 2023-10-10 00:03:00,020 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-10-10 00:03:06,801 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912849.3333333335, ans=0.1 2023-10-10 00:03:07,925 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2912849.3333333335, ans=0.1 2023-10-10 00:03:14,926 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2912896.0, ans=0.0 2023-10-10 00:03:17,277 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2912896.0, ans=0.125 2023-10-10 00:03:18,711 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-10-10 00:03:24,372 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2912942.6666666665, ans=0.125 2023-10-10 00:03:29,816 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2912942.6666666665, ans=0.0 2023-10-10 00:03:29,855 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2912942.6666666665, ans=0.0 2023-10-10 00:03:42,942 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.54 vs. limit=22.5 2023-10-10 00:04:00,452 INFO [train.py:1031] (3/4) Epoch 14, batch 39500, loss[loss=0.2042, simple_loss=0.2652, pruned_loss=0.0521, ctc_loss=0.09762, over 16901.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2731, pruned_loss=0.05536, ctc_loss=0.09716, over 3281770.53 frames. ], batch size: 309, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:04:07,135 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2913082.6666666665, ans=0.125 2023-10-10 00:04:10,156 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-10-10 00:04:10,625 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.656e+02 3.168e+02 3.988e+02 1.383e+03, threshold=6.335e+02, percent-clipped=1.0 2023-10-10 00:04:15,698 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2913129.3333333335, ans=0.125 2023-10-10 00:04:26,884 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2913176.0, ans=0.1 2023-10-10 00:04:29,093 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2913176.0, ans=0.125 2023-10-10 00:04:29,119 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2913176.0, ans=0.0 2023-10-10 00:04:32,921 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2913176.0, ans=0.2 2023-10-10 00:04:51,697 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2913269.3333333335, ans=0.0 2023-10-10 00:04:53,893 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2913269.3333333335, ans=0.125 2023-10-10 00:04:59,828 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2913269.3333333335, ans=0.0 2023-10-10 00:05:01,683 INFO [train.py:1031] (3/4) Epoch 14, batch 39550, loss[loss=0.2126, simple_loss=0.2675, pruned_loss=0.05966, ctc_loss=0.09623, over 16715.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2736, pruned_loss=0.05705, ctc_loss=0.09966, over 3273776.95 frames. ], batch size: 102, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:05:48,610 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-10-10 00:06:03,879 INFO [train.py:1031] (3/4) Epoch 14, batch 39600, loss[loss=0.2237, simple_loss=0.3029, pruned_loss=0.05246, ctc_loss=0.0991, over 16248.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2743, pruned_loss=0.05514, ctc_loss=0.09665, over 3280838.22 frames. ], batch size: 463, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:06:08,085 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2913549.3333333335, ans=0.125 2023-10-10 00:06:13,820 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.076e+02 3.392e+02 3.895e+02 1.156e+03, threshold=6.785e+02, percent-clipped=2.0 2023-10-10 00:06:24,265 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2913596.0, ans=6.0 2023-10-10 00:07:06,380 INFO [train.py:1031] (3/4) Epoch 14, batch 39650, loss[loss=0.2535, simple_loss=0.3018, pruned_loss=0.07739, ctc_loss=0.1261, over 16772.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2811, pruned_loss=0.05949, ctc_loss=0.1038, over 3283315.30 frames. ], batch size: 121, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:07:14,720 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-10-10 00:07:20,945 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2913829.3333333335, ans=10.0 2023-10-10 00:07:49,417 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2023-10-10 00:07:51,354 INFO [scaling.py:1069] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:08:09,078 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2914016.0, ans=10.0 2023-10-10 00:08:09,798 INFO [train.py:1031] (3/4) Epoch 14, batch 39700, loss[loss=0.2443, simple_loss=0.3034, pruned_loss=0.06816, ctc_loss=0.1225, over 16192.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2861, pruned_loss=0.06303, ctc_loss=0.1102, over 3283381.81 frames. ], batch size: 463, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:08:10,068 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2914016.0, ans=0.125 2023-10-10 00:08:15,174 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.64 vs. limit=22.5 2023-10-10 00:08:19,604 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2914016.0, ans=0.2 2023-10-10 00:08:21,363 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+02 3.745e+02 4.250e+02 5.439e+02 1.201e+03, threshold=8.500e+02, percent-clipped=8.0 2023-10-10 00:08:22,896 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2914062.6666666665, ans=0.125 2023-10-10 00:08:25,571 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2914062.6666666665, ans=0.125 2023-10-10 00:08:30,025 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2914062.6666666665, ans=0.0 2023-10-10 00:08:31,605 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2914062.6666666665, ans=0.1 2023-10-10 00:08:37,030 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2914109.3333333335, ans=0.07 2023-10-10 00:08:43,163 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2914109.3333333335, ans=0.125 2023-10-10 00:08:50,136 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2914156.0, ans=0.0 2023-10-10 00:08:51,144 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2914156.0, ans=0.2 2023-10-10 00:08:52,861 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2914156.0, ans=0.125 2023-10-10 00:09:13,585 INFO [train.py:1031] (3/4) Epoch 14, batch 39750, loss[loss=0.2038, simple_loss=0.257, pruned_loss=0.05596, ctc_loss=0.09668, over 16720.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2863, pruned_loss=0.06529, ctc_loss=0.1138, over 3282097.58 frames. ], batch size: 130, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:09:21,776 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2023-10-10 00:09:24,541 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2914296.0, ans=0.125 2023-10-10 00:09:38,692 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-10-10 00:09:39,214 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2914342.6666666665, ans=0.1 2023-10-10 00:09:56,584 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2914389.3333333335, ans=0.125 2023-10-10 00:10:13,894 INFO [train.py:1031] (3/4) Epoch 14, batch 39800, loss[loss=0.2159, simple_loss=0.2573, pruned_loss=0.06339, ctc_loss=0.1196, over 16861.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2786, pruned_loss=0.06443, ctc_loss=0.1122, over 3277158.75 frames. ], batch size: 259, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:10:14,218 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2914482.6666666665, ans=0.125 2023-10-10 00:10:16,049 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2914482.6666666665, ans=0.0 2023-10-10 00:10:17,979 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2914482.6666666665, ans=0.125 2023-10-10 00:10:18,234 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2023-10-10 00:10:19,991 INFO [scaling.py:979] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.68 vs. limit=10.0 2023-10-10 00:10:26,701 INFO [optim.py:471] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+02 3.167e+02 3.567e+02 4.087e+02 1.118e+03, threshold=7.135e+02, percent-clipped=1.0 2023-10-10 00:10:32,379 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2914529.3333333335, ans=0.125 2023-10-10 00:10:41,565 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2914576.0, ans=0.0 2023-10-10 00:10:47,953 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2914576.0, ans=0.0 2023-10-10 00:11:07,266 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2914669.3333333335, ans=0.125 2023-10-10 00:11:12,732 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2914669.3333333335, ans=0.125 2023-10-10 00:11:15,153 INFO [train.py:1031] (3/4) Epoch 14, batch 39850, loss[loss=0.228, simple_loss=0.2654, pruned_loss=0.07015, ctc_loss=0.1259, over 16811.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2722, pruned_loss=0.0634, ctc_loss=0.1102, over 3290595.84 frames. ], batch size: 311, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:11:20,729 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2914716.0, ans=0.0 2023-10-10 00:11:35,174 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2914762.6666666665, ans=0.125 2023-10-10 00:11:35,201 INFO [scaling.py:199] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2914762.6666666665, ans=0.125